summarize()
returns one row for each combination of grouping variables
(one difference with dplyr::summarize()
is that summarize()
only
accepts grouped data). It will contain one column for each grouping variable
and one column for each of the summary statistics that you have specified.
Usage
# S3 method for class 'RPolarsDataFrame'
summarize(.data, ..., .by = NULL, .groups = "drop_last")
# S3 method for class 'RPolarsDataFrame'
summarise(.data, ..., .by = NULL, .groups = "drop_last")
# S3 method for class 'RPolarsLazyFrame'
summarize(.data, ..., .by = NULL, .groups = "drop_last")
# S3 method for class 'RPolarsLazyFrame'
summarise(.data, ..., .by = NULL, .groups = "drop_last")
Arguments
- .data
A Polars Data/LazyFrame
- ...
Name-value pairs. The name gives the name of the column in the output. The value can be:
A vector the same length as the current group (or the whole data frame if ungrouped).
NULL, to remove the column.
across()
is mostly supported, except in a few cases. In particular, if the.cols
argument iswhere(...)
, it will not select variables that were created beforeacross()
. Other select helpers are supported. See the examples.- .by
Optionally, a selection of columns to group by for just this operation, functioning as an alternative to
group_by()
. The group order is not maintained, usegroup_by()
if you want more control over it.- .groups
Grouping structure of the result. Must be one of:
"drop_last"
(default): drop the last level of grouping;"drop"
: all levels of grouping are dropped;"keep"
: keep the same grouping structure as.data
.
For now,
"rowwise"
is not supported. Note thatdplyr
uses.groups = NULL
by default, whose behavior depends on the number of rows by group in the output. However, returning several rows by group insummarize()
is deprecated (one should usereframe()
instead), which is why.groups = NULL
is not supported bytidypolars
.
Examples
mtcars |>
as_polars_df() |>
group_by(cyl) |>
summarize(m_gear = mean(gear), sd_gear = sd(gear))
#> shape: (3, 3)
#> ┌─────┬──────────┬──────────┐
#> │ cyl ┆ m_gear ┆ sd_gear │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪══════════╪══════════╡
#> │ 6.0 ┆ 3.857143 ┆ 0.690066 │
#> │ 8.0 ┆ 3.285714 ┆ 0.726273 │
#> │ 4.0 ┆ 4.090909 ┆ 0.53936 │
#> └─────┴──────────┴──────────┘
# an alternative syntax is to use `.by`
mtcars |>
as_polars_df() |>
summarize(m_gear = mean(gear), sd_gear = sd(gear), .by = cyl)
#> shape: (3, 3)
#> ┌─────┬──────────┬──────────┐
#> │ cyl ┆ m_gear ┆ sd_gear │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪══════════╪══════════╡
#> │ 4.0 ┆ 4.090909 ┆ 0.53936 │
#> │ 8.0 ┆ 3.285714 ┆ 0.726273 │
#> │ 6.0 ┆ 3.857143 ┆ 0.690066 │
#> └─────┴──────────┴──────────┘