Import data from NDJSON file(s) — from

read_ndjson_polars() imports the data as a Polars DataFrame.

scan_ndjson_polars() imports the data as a Polars LazyFrame.

Usage

read_ndjson_polars(
  source,
  ...,
  infer_schema_length = 100,
  batch_size = NULL,
  n_rows = NULL,
  low_memory = FALSE,
  rechunk = FALSE,
  row_index_name = NULL,
  row_index_offset = 0,
  ignore_errors = FALSE,
  reuse_downloaded
)

scan_ndjson_polars(
  source,
  ...,
  infer_schema_length = 100,
  batch_size = NULL,
  n_rows = NULL,
  low_memory = FALSE,
  rechunk = FALSE,
  row_index_name = NULL,
  row_index_offset = 0,
  ignore_errors = FALSE,
  reuse_downloaded
)

Arguments

source: Path(s) to a file or directory. When needing to authenticate for scanning cloud locations, see the storage_options parameter.
...: These dots are for future extensions and must be empty.
infer_schema_length: The maximum number of rows to scan for schema inference. If NULL, the full data may be scanned (this is slow). Set infer_schema = FALSE to read all columns as pl$String.
batch_size: Number of rows to read in each batch.
n_rows: Stop reading from the source after reading n_rows.
low_memory: Reduce memory pressure at the expense of performance.
rechunk: Reallocate to contiguous memory when all chunks/files are parsed.
row_index_name: If not NULL, this will insert a row index column with the given name.
row_index_offset: Offset to start the row index column (only used if the name is set by row_index_name).
ignore_errors: Keep reading the file even if some lines yield errors. You can also use infer_schema = FALSE to read all columns as UTF8 to check which values might cause an issue.
reuse_downloaded: Deprecated with no replacement.