summarize()
returns one row for each combination of grouping variables
(one difference with dplyr::summarize()
is that summarize()
only
accepts grouped data). It will contain one column for each grouping variable
and one column for each of the summary statistics that you have specified.
Arguments
- .data
A Polars Data/LazyFrame
- ...
Name-value pairs. The name gives the name of the column in the output. The value can be:
A vector the same length as the current group (or the whole data frame if ungrouped).
NULL, to remove the column.
across()
is mostly supported, except in a few cases. In particular, if the.cols
argument iswhere(...)
, it will not select variables that were created beforeacross()
. Other select helpers are supported. See the examples.- .by
Optionally, a selection of columns to group by for just this operation, functioning as an alternative to
group_by()
. The group order is not maintained, usegroup_by()
if you want more control over it.
Examples
mtcars |>
as_polars_df() |>
group_by(cyl) |>
summarize(m_gear = mean(gear), sd_gear = sd(gear))
#> shape: (3, 3)
#> ┌─────┬──────────┬──────────┐
#> │ cyl ┆ m_gear ┆ sd_gear │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪══════════╪══════════╡
#> │ 4.0 ┆ 4.090909 ┆ 0.53936 │
#> │ 8.0 ┆ 3.285714 ┆ 0.726273 │
#> │ 6.0 ┆ 3.857143 ┆ 0.690066 │
#> └─────┴──────────┴──────────┘
#> Groups [3]: cyl
#> Maintain order: FALSE
# an alternative syntax is to use `.by`
mtcars |>
as_polars_df() |>
summarize(m_gear = mean(gear), sd_gear = sd(gear), .by = cyl)
#> shape: (3, 3)
#> ┌─────┬──────────┬──────────┐
#> │ cyl ┆ m_gear ┆ sd_gear │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪══════════╪══════════╡
#> │ 8.0 ┆ 3.285714 ┆ 0.726273 │
#> │ 6.0 ┆ 3.857143 ┆ 0.690066 │
#> │ 4.0 ┆ 4.090909 ┆ 0.53936 │
#> └─────┴──────────┴──────────┘