Skip to contents

read_ndjson_polars() imports the data as a Polars DataFrame.

scan_ndjson_polars() imports the data as a Polars LazyFrame.

Usage

read_ndjson_polars(
  source,
  ...,
  infer_schema_length = 100,
  batch_size = NULL,
  n_rows = NULL,
  low_memory = FALSE,
  rechunk = FALSE,
  row_index_name = NULL,
  row_index_offset = 0,
  ignore_errors = FALSE,
  reuse_downloaded
)

scan_ndjson_polars(
  source,
  ...,
  infer_schema_length = 100,
  batch_size = NULL,
  n_rows = NULL,
  low_memory = FALSE,
  rechunk = FALSE,
  row_index_name = NULL,
  row_index_offset = 0,
  ignore_errors = FALSE,
  reuse_downloaded
)

Arguments

source

Path(s) to a file or directory. When needing to authenticate for scanning cloud locations, see the storage_options parameter.

...

These dots are for future extensions and must be empty.

infer_schema_length

The maximum number of rows to scan for schema inference. If NULL, the full data may be scanned (this is slow). Set infer_schema = FALSE to read all columns as pl$String.

batch_size

Number of rows to read in each batch.

n_rows

Stop reading from the source after reading n_rows.

low_memory

Reduce memory pressure at the expense of performance.

rechunk

Reallocate to contiguous memory when all chunks/files are parsed.

row_index_name

If not NULL, this will insert a row index column with the given name.

row_index_offset

Offset to start the row index column (only used if the name is set by row_index_name).

ignore_errors

Keep reading the file even if some lines yield errors. You can also use infer_schema = FALSE to read all columns as UTF8 to check which values might cause an issue.

reuse_downloaded

[Deprecated] Deprecated with no replacement.