Collect a LazyFrame

compute() checks the query, optimizes it in the background, and runs it. The output is a Polars DataFrame. collect() is similar to compute() but converts the output to an R data.frame, which consumes more memory.

Until tidypolars 0.7.0, there was only collect() and it was used to collect a LazyFrame into a Polars DataFrame. This usage is still valid for now but will change in 0.8.0 to automatically convert a DataFrame to a data.frame. Use compute() to have a Polars DataFrame as output.

Usage

# S3 method for class 'RPolarsLazyFrame'
compute(
  x,
  ...,
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  cluster_with_columns = TRUE,
  no_optimization = FALSE,
  streaming = FALSE,
  collect_in_background = FALSE
)

# S3 method for class 'RPolarsLazyFrame'
collect(
  x,
  ...,
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  cluster_with_columns = TRUE,
  no_optimization = FALSE,
  streaming = FALSE,
  collect_in_background = FALSE
)

Arguments

x: A Polars LazyFrame
...: Dots which should be empty.
type_coercion: Coerce types such that operations succeed and run on minimal required memory (default is TRUE).
predicate_pushdown: Applies filters as early as possible at scan level (default is TRUE).
projection_pushdown: Select only the columns that are needed at the scan level (default is TRUE).
simplify_expression: Various optimizations, such as constant folding and replacing expensive operations with faster alternatives (default is TRUE).
slice_pushdown: Only load the required slice from the scan. Don't materialize sliced outputs level. Don't materialize sliced outputs (default is TRUE).
comm_subplan_elim: Cache branching subplans that occur on self-joins or unions (default is TRUE).
comm_subexpr_elim: Cache common subexpressions (default is TRUE).
cluster_with_columns: Combine sequential independent calls to $with_columns().
no_optimization: Sets the following optimizations to FALSE: predicate_pushdown, projection_pushdown, slice_pushdown, simplify_expression. Default is FALSE.
streaming: Run parts of the query in a streaming fashion (this is in an alpha state). Default is FALSE.
collect_in_background: Detach this query from the R session. Computation will start in background. Get a handle which later can be converted into the resulting DataFrame. Useful in interactive mode to not lock R session (default is FALSE).

Examples

dat_lazy <- polars::as_polars_df(iris)$lazy()

compute(dat_lazy)
#> shape: (150, 5)
#> ┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species   │
#> │ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       │
#> │ f64          ┆ f64         ┆ f64          ┆ f64         ┆ cat       │
#> ╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
#> │ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    │
#> │ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    │
#> │ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    │
#> │ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    │
#> │ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa    │
#> │ …            ┆ …           ┆ …            ┆ …           ┆ …         │
#> │ 6.7          ┆ 3.0         ┆ 5.2          ┆ 2.3         ┆ virginica │
#> │ 6.3          ┆ 2.5         ┆ 5.0          ┆ 1.9         ┆ virginica │
#> │ 6.5          ┆ 3.0         ┆ 5.2          ┆ 2.0         ┆ virginica │
#> │ 6.2          ┆ 3.4         ┆ 5.4          ┆ 2.3         ┆ virginica │
#> │ 5.9          ┆ 3.0         ┆ 5.1          ┆ 1.8         ┆ virginica │
#> └──────────────┴─────────────┴──────────────┴─────────────┴───────────┘

# you can build a query and add compute() as the last piece
dat_lazy |>
  select(starts_with("Sepal")) |>
  filter(between(Sepal.Length, 5, 6)) |>
  compute()
#> shape: (67, 2)
#> ┌──────────────┬─────────────┐
#> │ Sepal.Length ┆ Sepal.Width │
#> │ ---          ┆ ---         │
#> │ f64          ┆ f64         │
#> ╞══════════════╪═════════════╡
#> │ 5.1          ┆ 3.5         │
#> │ 5.0          ┆ 3.6         │
#> │ 5.4          ┆ 3.9         │
#> │ 5.0          ┆ 3.4         │
#> │ 5.4          ┆ 3.7         │
#> │ …            ┆ …           │
#> │ 6.0          ┆ 2.2         │
#> │ 5.6          ┆ 2.8         │
#> │ 6.0          ┆ 3.0         │
#> │ 5.8          ┆ 2.7         │
#> │ 5.9          ┆ 3.0         │
#> └──────────────┴─────────────┘

# call collect() instead to return a data.frame (note that this is more
# expensive than compute())
dat_lazy |>
  select(starts_with("Sepal")) |>
  filter(between(Sepal.Length, 5, 6)) |>
  collect()
#>    Sepal.Length Sepal.Width
#> 1           5.1         3.5
#> 2           5.0         3.6
#> 3           5.4         3.9
#> 4           5.0         3.4
#> 5           5.4         3.7
#> 6           5.8         4.0
#> 7           5.7         4.4
#> 8           5.4         3.9
#> 9           5.1         3.5
#> 10          5.7         3.8
#> 11          5.1         3.8
#> 12          5.4         3.4
#> 13          5.1         3.7
#> 14          5.1         3.3
#> 15          5.0         3.0
#> 16          5.0         3.4
#> 17          5.2         3.5
#> 18          5.2         3.4
#> 19          5.4         3.4
#> 20          5.2         4.1
#> 21          5.5         4.2
#> 22          5.0         3.2
#> 23          5.5         3.5
#> 24          5.1         3.4
#> 25          5.0         3.5
#> 26          5.0         3.5
#> 27          5.1         3.8
#> 28          5.1         3.8
#> 29          5.3         3.7
#> 30          5.0         3.3
#> 31          5.5         2.3
#> 32          5.7         2.8
#> 33          5.2         2.7
#> 34          5.0         2.0
#> 35          5.9         3.0
#> 36          6.0         2.2
#> 37          5.6         2.9
#> 38          5.6         3.0
#> 39          5.8         2.7
#> 40          5.6         2.5
#> 41          5.9         3.2
#> 42          6.0         2.9
#> 43          5.7         2.6
#> 44          5.5         2.4
#> 45          5.5         2.4
#> 46          5.8         2.7
#> 47          6.0         2.7
#> 48          5.4         3.0
#> 49          6.0         3.4
#> 50          5.6         3.0
#> 51          5.5         2.5
#> 52          5.5         2.6
#> 53          5.8         2.6
#> 54          5.0         2.3
#> 55          5.6         2.7
#> 56          5.7         3.0
#> 57          5.7         2.9
#> 58          5.1         2.5
#> 59          5.7         2.8
#> 60          5.8         2.7
#> 61          5.7         2.5
#> 62          5.8         2.8
#> 63          6.0         2.2
#> 64          5.6         2.8
#> 65          6.0         3.0
#> 66          5.8         2.7
#> 67          5.9         3.0

Usage

Arguments

See also

Examples