Skip to contents

summarize() returns one row for each combination of grouping variables (one difference with dplyr::summarize() is that summarize() only accepts grouped data). It will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

Usage

# S3 method for class 'RPolarsDataFrame'
summarize(.data, ..., .by = NULL, .groups = "drop_last")

# S3 method for class 'RPolarsDataFrame'
summarise(.data, ..., .by = NULL, .groups = "drop_last")

# S3 method for class 'RPolarsLazyFrame'
summarize(.data, ..., .by = NULL, .groups = "drop_last")

# S3 method for class 'RPolarsLazyFrame'
summarise(.data, ..., .by = NULL, .groups = "drop_last")

Arguments

.data

A Polars Data/LazyFrame

...

Name-value pairs. The name gives the name of the column in the output. The value can be:

  • A vector the same length as the current group (or the whole data frame if ungrouped).

  • NULL, to remove the column.

across() is mostly supported, except in a few cases. In particular, if the .cols argument is where(...), it will not select variables that were created before across(). Other select helpers are supported. See the examples.

.by

Optionally, a selection of columns to group by for just this operation, functioning as an alternative to group_by(). The group order is not maintained, use group_by() if you want more control over it.

.groups

Grouping structure of the result. Must be one of:

  • "drop_last" (default): drop the last level of grouping;

  • "drop": all levels of grouping are dropped;

  • "keep": keep the same grouping structure as .data.

For now, "rowwise" is not supported. Note that dplyr uses .groups = NULL by default, whose behavior depends on the number of rows by group in the output. However, returning several rows by group in summarize() is deprecated (one should use reframe() instead), which is why .groups = NULL is not supported by tidypolars.

Examples

mtcars |>
  as_polars_df() |>
  group_by(cyl) |>
  summarize(m_gear = mean(gear), sd_gear = sd(gear))
#> shape: (3, 3)
#> ┌─────┬──────────┬──────────┐
#> │ cyl ┆ m_gear   ┆ sd_gear  │
#> │ --- ┆ ---      ┆ ---      │
#> │ f64 ┆ f64      ┆ f64      │
#> ╞═════╪══════════╪══════════╡
#> │ 6.0 ┆ 3.857143 ┆ 0.690066 │
#> │ 8.0 ┆ 3.285714 ┆ 0.726273 │
#> │ 4.0 ┆ 4.090909 ┆ 0.53936  │
#> └─────┴──────────┴──────────┘

# an alternative syntax is to use `.by`
mtcars |>
  as_polars_df() |>
  summarize(m_gear = mean(gear), sd_gear = sd(gear), .by = cyl)
#> shape: (3, 3)
#> ┌─────┬──────────┬──────────┐
#> │ cyl ┆ m_gear   ┆ sd_gear  │
#> │ --- ┆ ---      ┆ ---      │
#> │ f64 ┆ f64      ┆ f64      │
#> ╞═════╪══════════╪══════════╡
#> │ 4.0 ┆ 4.090909 ┆ 0.53936  │
#> │ 8.0 ┆ 3.285714 ┆ 0.726273 │
#> │ 6.0 ┆ 3.857143 ┆ 0.690066 │
#> └─────┴──────────┴──────────┘