Skip to contents

This function allows to stream a LazyFrame that is larger than RAM directly to a .csv file without collecting it in the R session, thus preventing crashes because of too small memory.


  include_bom = FALSE,
  include_header = TRUE,
  separator = ",",
  line_terminator = "\n",
  quote = "\"",
  batch_size = 1024,
  datetime_format = NULL,
  date_format = NULL,
  time_format = NULL,
  float_precision = NULL,
  null_values = "",
  quote_style = "necessary",
  maintain_order = TRUE,
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  no_optimization = FALSE,
  inherit_optimization = FALSE



A Polars LazyFrame.


Output file (must be a .csv file).


Whether to include UTF-8 BOM (byte order mark) in the CSV output.


Whether to include header in the CSV output.


Separate CSV fields with this symbol.


String used to end each row.


Byte to use as quoting character.


Number of rows that will be processed per thread.

datetime_format, date_format, time_format

A format string used to format date and time values. See ?strptime() for accepted values. If no format specified, the default fractional-second precision is inferred from the maximum time unit found in the Datetime cols (if any).


Number of decimal places to write, applied to both Float32 and Float64 datatypes.


A string representing null values (defaulting to the empty string).


Determines the quoting strategy used:

  • "necessary" (default): This puts quotes around fields only when necessary. They are necessary when fields contain a quote, delimiter or record terminator. Quotes are also necessary when writing an empty record (which is indistinguishable from a record with one empty field).

  • "always": This puts quotes around every field.

  • "non_numeric": This puts quotes around all fields that are non-numeric. Namely, when writing a field that does not parse as a valid float or integer, then quotes will be used even if they aren't strictly necessary.


Whether maintain the order the data was processed (default is TRUE). Setting this to FALSE will be slightly faster.


Coerce types such that operations succeed and run on minimal required memory (default is TRUE).


Applies filters as early as possible at scan level (default is TRUE).


Select only the columns that are needed at the scan level (default is TRUE).


Various optimizations, such as constant folding and replacing expensive operations with faster alternatives (default is TRUE).


Only load the required slice from the scan. Don't materialize sliced outputs level. Don't materialize sliced outputs (default is TRUE).


Sets the following optimizations to FALSE: predicate_pushdown, projection_pushdown, slice_pushdown, simplify_expression. Default is FALSE.


Use existing optimization settings regardless of the settings specified in this function call. Default is FALSE.


Writes a .csv file with the content of the LazyFrame.


if (FALSE) {
# This is an example workflow where sink_csv() is not very useful because
# the data would fit in memory. It simply is an example of using it at the
# end of a piped workflow.

# Create files for the CSV input and output:
file_csv <- tempfile(fileext = ".csv")
file_csv2 <- tempfile(fileext = ".csv")

# Write some data in a CSV file
fake_data <-"rbind", rep(list(mtcars), 1000))
write.csv(fake_data, file = file_csv)

# In a new R session, we could read this file as a LazyFrame, do some operations,
# and write it to another CSV file without ever collecting it in the R session:
polars::pl$scan_csv(file_csv) |>
  filter(cyl %in% c(4, 6), mpg > 22) |>
    hp_gear_ratio = hp / gear
  ) |>
  sink_csv(path = file_csv2)