Fetch n rows of a LazyFrame — fetch • tidypolars

Fetch is a way to collect only the first n rows of a LazyFrame. It is mainly used to test that a query runs as expected on a subset of the data before using collect() on the full query. Note that fetching n rows doesn't mean that the output will actually contain n rows, see the section 'Details' for more information.

Usage

fetch(
  .data,
  n_rows = 500,
  type_coercion = TRUE,
  predicate_pushdown = TRUE,
  projection_pushdown = TRUE,
  simplify_expression = TRUE,
  slice_pushdown = TRUE,
  comm_subplan_elim = TRUE,
  comm_subexpr_elim = TRUE,
  cluster_with_columns = TRUE,
  no_optimization = FALSE,
  streaming = FALSE
)

Arguments

.data: A Polars LazyFrame
n_rows: Number of rows to fetch.
type_coercion: Coerce types such that operations succeed and run on minimal required memory (default is TRUE).
predicate_pushdown: Applies filters as early as possible at scan level (default is TRUE).
projection_pushdown: Select only the columns that are needed at the scan level (default is TRUE).
simplify_expression: Various optimizations, such as constant folding and replacing expensive operations with faster alternatives (default is TRUE).
slice_pushdown: Only load the required slice from the scan. Don't materialize sliced outputs level. Don't materialize sliced outputs (default is TRUE).
comm_subplan_elim: Cache branching subplans that occur on self-joins or unions (default is TRUE).
comm_subexpr_elim: Cache common subexpressions (default is TRUE).
cluster_with_columns: Combine sequential independent calls to $with_columns().
no_optimization: Sets the following optimizations to FALSE: predicate_pushdown, projection_pushdown, slice_pushdown, simplify_expression. Default is FALSE.
streaming: Run parts of the query in a streaming fashion (this is in an alpha state). Default is FALSE.

Details

The parameter n_rows indicates how many rows from the LazyFrame should be used at the beginning of the query, but it doesn't guarantee that n_rows will be returned. For example, if the query contains a filter or join operations with other datasets, then the final number of rows can be lower than n_rows. On the other hand, appending some rows during the query can lead to an output that has more rows than n_rows.

Examples

dat_lazy <- polars::as_polars_df(iris)$lazy()

# this will return 30 rows
fetch(dat_lazy, 30)
#> shape: (30, 5)
#> ┌──────────────┬─────────────┬──────────────┬─────────────┬─────────┐
#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species │
#> │ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---     │
#> │ f64          ┆ f64         ┆ f64          ┆ f64         ┆ cat     │
#> ╞══════════════╪═════════════╪══════════════╪═════════════╪═════════╡
#> │ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa  │
#> │ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa  │
#> │ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa  │
#> │ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa  │
#> │ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa  │
#> │ …            ┆ …           ┆ …            ┆ …           ┆ …       │
#> │ 5.0          ┆ 3.0         ┆ 1.6          ┆ 0.2         ┆ setosa  │
#> │ 5.0          ┆ 3.4         ┆ 1.6          ┆ 0.4         ┆ setosa  │
#> │ 5.2          ┆ 3.5         ┆ 1.5          ┆ 0.2         ┆ setosa  │
#> │ 5.2          ┆ 3.4         ┆ 1.4          ┆ 0.2         ┆ setosa  │
#> │ 4.7          ┆ 3.2         ┆ 1.6          ┆ 0.2         ┆ setosa  │
#> └──────────────┴─────────────┴──────────────┴─────────────┴─────────┘

# this will return less than 30 rows because there are less than 30 matches
# for this filter in the whole dataset
dat_lazy |>
  filter(Sepal.Length > 7.0) |>
  fetch(30)
#> shape: (12, 5)
#> ┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
#> │ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species   │
#> │ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       │
#> │ f64          ┆ f64         ┆ f64          ┆ f64         ┆ cat       │
#> ╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
#> │ 7.1          ┆ 3.0         ┆ 5.9          ┆ 2.1         ┆ virginica │
#> │ 7.6          ┆ 3.0         ┆ 6.6          ┆ 2.1         ┆ virginica │
#> │ 7.3          ┆ 2.9         ┆ 6.3          ┆ 1.8         ┆ virginica │
#> │ 7.2          ┆ 3.6         ┆ 6.1          ┆ 2.5         ┆ virginica │
#> │ 7.7          ┆ 3.8         ┆ 6.7          ┆ 2.2         ┆ virginica │
#> │ …            ┆ …           ┆ …            ┆ …           ┆ …         │
#> │ 7.2          ┆ 3.2         ┆ 6.0          ┆ 1.8         ┆ virginica │
#> │ 7.2          ┆ 3.0         ┆ 5.8          ┆ 1.6         ┆ virginica │
#> │ 7.4          ┆ 2.8         ┆ 6.1          ┆ 1.9         ┆ virginica │
#> │ 7.9          ┆ 3.8         ┆ 6.4          ┆ 2.0         ┆ virginica │
#> │ 7.7          ┆ 3.0         ┆ 6.1          ┆ 2.3         ┆ virginica │
#> └──────────────┴─────────────┴──────────────┴─────────────┴───────────┘

Fetch `n` rows of a LazyFrame

Usage

Arguments

Details

See also

Examples