Skip to contents

read_ndjson_polars() imports the data as a Polars DataFrame.

scan_ndjson_polars() imports the data as a Polars LazyFrame.

Usage

read_ndjson_polars(
  source,
  ...,
  infer_schema_length = 100,
  batch_size = NULL,
  n_rows = NULL,
  low_memory = FALSE,
  rechunk = FALSE,
  row_index_name = NULL,
  row_index_offset = 0,
  reuse_downloaded = TRUE,
  ignore_errors = FALSE
)

scan_ndjson_polars(
  source,
  ...,
  infer_schema_length = 100,
  batch_size = NULL,
  n_rows = NULL,
  low_memory = FALSE,
  rechunk = FALSE,
  row_index_name = NULL,
  row_index_offset = 0,
  reuse_downloaded = TRUE,
  ignore_errors = FALSE
)

Arguments

source

Path to a file or URL. It is possible to provide multiple paths provided that all NDJSON files have the same schema. It is not possible to provide several URLs.

...

Ignored.

infer_schema_length

Maximum number of rows to read to infer the column types. If set to 0, all columns will be read as UTF-8. If NULL, a full table scan will be done (slow).

batch_size

Number of rows that will be processed per thread.

n_rows

Maximum number of rows to read.

low_memory

Reduce memory usage (will yield a lower performance).

rechunk

Reallocate to contiguous memory when all chunks / files are parsed.

row_index_name

If not NULL, this will insert a row index column with the given name into the DataFrame.

row_index_offset

Offset to start the row index column (only used if the name is set).

reuse_downloaded

If TRUE(default) and a URL was provided, cache the downloaded files in session for an easy reuse.

ignore_errors

Keep reading the file even if some lines yield errors. You can also use infer_schema_length = 0 to read all columns as UTF8 to check which values might cause an issue.