read_csv_polars()
imports the data as a Polars DataFrame.
scan_csv_polars()
imports the data as a Polars LazyFrame.
Usage
read_csv_polars(
source,
...,
has_header = TRUE,
separator = ",",
comment_prefix = NULL,
quote_char = "\"",
skip_rows = 0,
dtypes = NULL,
null_values = NULL,
ignore_errors = FALSE,
cache = FALSE,
infer_schema_length = 100,
n_rows = NULL,
encoding = "utf8",
low_memory = FALSE,
rechunk = TRUE,
skip_rows_after_header = 0,
row_index_name = NULL,
row_index_offset = 0,
try_parse_dates = FALSE,
eol_char = "\n",
raise_if_empty = TRUE,
truncate_ragged_lines = FALSE,
reuse_downloaded = TRUE
)
scan_csv_polars(
source,
...,
has_header = TRUE,
separator = ",",
comment_prefix = NULL,
quote_char = "\"",
skip_rows = 0,
dtypes = NULL,
null_values = NULL,
ignore_errors = FALSE,
cache = FALSE,
infer_schema_length = 100,
n_rows = NULL,
encoding = "utf8",
low_memory = FALSE,
rechunk = TRUE,
skip_rows_after_header = 0,
row_index_name = NULL,
row_index_offset = 0,
try_parse_dates = FALSE,
eol_char = "\n",
raise_if_empty = TRUE,
truncate_ragged_lines = FALSE,
reuse_downloaded = TRUE
)
Arguments
- source
Path to a file or URL. It is possible to provide multiple paths provided that all CSV files have the same schema. It is not possible to provide several URLs.
- ...
Ignored.
- has_header
Indicate if the first row of dataset is a header or not.If
FALSE
, column names will be autogenerated in the following format:"column_x"
x
being an enumeration over every column in the dataset starting at 1.- separator
Single byte character to use as separator in the file.
- comment_prefix
A string, which can be up to 5 symbols in length, used to indicate the start of a comment line. For instance, it can be set to
#
or//
.- quote_char
Single byte character used for quoting. Set to
NULL
to turn off special handling and escaping of quotes.- skip_rows
Start reading after a particular number of rows. The header will be parsed at this offset.
- dtypes
Named list of column names - dtypes or dtype - column names. This list is used while reading to overwrite dtypes. Supported types so far are:
"Boolean" or "logical" for DataType::Boolean,
"Categorical" or "factor" for DataType::Categorical,
"Float32" or "double" for DataType::Float32,
"Float64" or "float64" for DataType::Float64,
"Int32" or "integer" for DataType::Int32,
"Int64" or "integer64" for DataType::Int64,
"String" or "character" for DataType::String,
- null_values
Values to interpret as
NA
values. Can be:a character vector: all values that match one of the values in this vector will be
NA
;a named list with column names and null values.
- ignore_errors
Keep reading the file even if some lines yield errors. You can also use
infer_schema_length = 0
to read all columns as UTF8 to check which values might cause an issue.- cache
Cache the result after reading.
- infer_schema_length
Maximum number of rows to read to infer the column types. If set to 0, all columns will be read as UTF-8. If
NULL
, a full table scan will be done (slow).- n_rows
Maximum number of rows to read.
- encoding
Either
"utf8"
or"utf8-lossy"
. Lossy means that invalid UTF8 values are replaced with "?" characters.- low_memory
Reduce memory usage (will yield a lower performance).
- rechunk
Reallocate to contiguous memory when all chunks / files are parsed.
- skip_rows_after_header
Parse the first row as headers, and then skip this number of rows.
- row_index_name
If not
NULL
, this will insert a row index column with the given name into the DataFrame.- row_index_offset
Offset to start the row index column (only used if the name is set).
- try_parse_dates
Try to automatically parse dates. Most ISO8601-like formats can be inferred, as well as a handful of others. If this does not succeed, the column remains of data type
pl$String
.- eol_char
Single byte end of line character (default:
\n
). When encountering a file with Windows line endings (\r\n
), one can go with the default\n
. The extra\r
will be removed when processed.- raise_if_empty
If
FALSE
, parsing an empty file returns an empty DataFrame or LazyFrame.- truncate_ragged_lines
Truncate lines that are longer than the schema.
- reuse_downloaded
If
TRUE
(default) and a URL was provided, cache the downloaded files in session for an easy reuse.