Import data from CSV file(s) — from

read_csv_polars() imports the data as a Polars DataFrame.

scan_csv_polars() imports the data as a Polars LazyFrame.

Usage

read_csv_polars(
  source,
  ...,
  has_header = TRUE,
  separator = ",",
  comment_prefix = NULL,
  quote_char = "\"",
  skip_rows = 0,
  dtypes = NULL,
  null_values = NULL,
  ignore_errors = FALSE,
  cache = FALSE,
  infer_schema_length = 100,
  n_rows = NULL,
  encoding = "utf8",
  low_memory = FALSE,
  rechunk = TRUE,
  skip_rows_after_header = 0,
  row_index_name = NULL,
  row_index_offset = 0,
  try_parse_dates = FALSE,
  eol_char = "\n",
  raise_if_empty = TRUE,
  truncate_ragged_lines = FALSE,
  reuse_downloaded = TRUE,
  include_file_paths = NULL
)

scan_csv_polars(
  source,
  ...,
  has_header = TRUE,
  separator = ",",
  comment_prefix = NULL,
  quote_char = "\"",
  skip_rows = 0,
  dtypes = NULL,
  null_values = NULL,
  ignore_errors = FALSE,
  cache = FALSE,
  infer_schema_length = 100,
  n_rows = NULL,
  encoding = "utf8",
  low_memory = FALSE,
  rechunk = TRUE,
  skip_rows_after_header = 0,
  row_index_name = NULL,
  row_index_offset = 0,
  try_parse_dates = FALSE,
  eol_char = "\n",
  raise_if_empty = TRUE,
  truncate_ragged_lines = FALSE,
  reuse_downloaded = TRUE,
  include_file_paths = NULL
)

Arguments

source

Path to a file or URL. It is possible to provide multiple paths provided that all CSV files have the same schema. It is not possible to provide several URLs.

...

Ignored.

has_header

Indicate if the first row of dataset is a header or not.If FALSE, column names will be autogenerated in the following format: "column_x" x being an enumeration over every column in the dataset starting at 1.

separator

Single byte character to use as separator in the file.

comment_prefix

A string, which can be up to 5 symbols in length, used to indicate the start of a comment line. For instance, it can be set to # or //.

quote_char

Single byte character used for quoting. Set to NULL to turn off special handling and escaping of quotes.

skip_rows

Start reading after a particular number of rows. The header will be parsed at this offset.

dtypes

Named list of column names - dtypes or dtype - column names. This list is used while reading to overwrite dtypes. Supported types so far are:

"Boolean" or "logical" for DataType::Boolean,
"Categorical" or "factor" for DataType::Categorical,
"Float32" or "double" for DataType::Float32,
"Float64" or "float64" for DataType::Float64,
"Int32" or "integer" for DataType::Int32,
"Int64" or "integer64" for DataType::Int64,
"String" or "character" for DataType::String,

null_values

Values to interpret as NA values. Can be:

a character vector: all values that match one of the values in this vector will be NA;
a named list with column names and null values.

ignore_errors

Keep reading the file even if some lines yield errors. You can also use infer_schema_length = 0 to read all columns as UTF8 to check which values might cause an issue.

cache

Cache the result after reading.

infer_schema_length

Maximum number of rows to read to infer the column types. If set to 0, all columns will be read as UTF-8. If NULL, a full table scan will be done (slow).

n_rows

Maximum number of rows to read.

encoding

Either "utf8" or "utf8-lossy". Lossy means that invalid UTF8 values are replaced with "?" characters.

low_memory

Reduce memory usage (will yield a lower performance).

rechunk

Reallocate to contiguous memory when all chunks / files are parsed.

skip_rows_after_header

Parse the first row as headers, and then skip this number of rows.

row_index_name

If not NULL, this will insert a row index column with the given name into the DataFrame.

row_index_offset

Offset to start the row index column (only used if the name is set).

try_parse_dates

Try to automatically parse dates. Most ISO8601-like formats can be inferred, as well as a handful of others. If this does not succeed, the column remains of data type pl$String.

eol_char

Single byte end of line character (default: \n). When encountering a file with Windows line endings (\r\n), one can go with the default \n. The extra \r will be removed when processed.

raise_if_empty

If FALSE, parsing an empty file returns an empty DataFrame or LazyFrame.

truncate_ragged_lines

Truncate lines that are longer than the schema.

reuse_downloaded

If TRUE(default) and a URL was provided, cache the downloaded files in session for an easy reuse.

include_file_paths

Include the path of the source file(s) as a column with this name.