unnest_longer_polars() turns each element of a list-column into a row.
This is the equivalent of tidyr::unnest_longer().
Usage
unnest_longer_polars(
data,
col,
...,
values_to = NULL,
indices_to = NULL,
keep_empty = FALSE
)Arguments
- data
A Polars DataFrame or LazyFrame.
- col
<
tidy-select> Column(s) to unnest. Can be bare column names, character strings, or tidyselect expressions. When selecting multiple columns, the list elements in each row must have the same length across all selected columns.- ...
These dots are for future extensions and must be empty.
- values_to
A string giving the column name to store the unnested values in. If
NULL(the default), the original column name is used. When multiple columns are selected, this can be a glue string containing"{col}"to provide a template for the column names (e.g.,values_to = "{col}_val").- indices_to
A string giving the column name to store the index of the values. If
NULL(the default), no index column is created. When multiple columns are selected, this can be a glue string containing"{col}"to create separate index columns for each unnested column (e.g.,indices_to = "{col}_idx").- keep_empty
If
TRUE, empty values (NULL or empty lists) are kept asNAin the output. IfFALSE(the default), empty values are dropped.
Details
When multiple columns are selected, the corresponding list elements from each row are expanded together. This requires that all selected columns have lists of the same length in each row.
The indices_to parameter creates an integer column with the position
(1-indexed) of each element within the original list. Named elements in the
list are not currently supported for index names (they will use integer
positions).
When using "{col}" templates with multiple columns, the template is applied
to each column name to generate the output column names.
See also
tidyr::unnest_longer() for the tidyr equivalent.
Examples
library(polars)
# Basic example with a list column
df <- pl$DataFrame(
id = 1:3,
values = list(c(1, 2), c(3, 4, 5), 6)
)
df
#> shape: (3, 2)
#> ┌─────┬─────────────────┐
#> │ id ┆ values │
#> │ --- ┆ --- │
#> │ i32 ┆ list[f64] │
#> ╞═════╪═════════════════╡
#> │ 1 ┆ [1.0, 2.0] │
#> │ 2 ┆ [3.0, 4.0, 5.0] │
#> │ 3 ┆ [6.0] │
#> └─────┴─────────────────┘
unnest_longer_polars(df, values)
#> shape: (6, 2)
#> ┌─────┬────────┐
#> │ id ┆ values │
#> │ --- ┆ --- │
#> │ i32 ┆ f64 │
#> ╞═════╪════════╡
#> │ 1 ┆ 1.0 │
#> │ 1 ┆ 2.0 │
#> │ 2 ┆ 3.0 │
#> │ 2 ┆ 4.0 │
#> │ 2 ┆ 5.0 │
#> │ 3 ┆ 6.0 │
#> └─────┴────────┘
# With indices
unnest_longer_polars(df, values, indices_to = "idx")
#> shape: (6, 3)
#> ┌─────┬─────┬────────┐
#> │ id ┆ idx ┆ values │
#> │ --- ┆ --- ┆ --- │
#> │ i32 ┆ u32 ┆ f64 │
#> ╞═════╪═════╪════════╡
#> │ 1 ┆ 1 ┆ 1.0 │
#> │ 1 ┆ 2 ┆ 2.0 │
#> │ 2 ┆ 1 ┆ 3.0 │
#> │ 2 ┆ 2 ┆ 4.0 │
#> │ 2 ┆ 3 ┆ 5.0 │
#> │ 3 ┆ 1 ┆ 6.0 │
#> └─────┴─────┴────────┘
# Rename the output column
unnest_longer_polars(df, values, values_to = "val")
#> shape: (6, 2)
#> ┌─────┬─────┐
#> │ id ┆ val │
#> │ --- ┆ --- │
#> │ i32 ┆ f64 │
#> ╞═════╪═════╡
#> │ 1 ┆ 1.0 │
#> │ 1 ┆ 2.0 │
#> │ 2 ┆ 3.0 │
#> │ 2 ┆ 4.0 │
#> │ 2 ┆ 5.0 │
#> │ 3 ┆ 6.0 │
#> └─────┴─────┘
# Multiple columns - list elements must have same length per row
df2 <- pl$DataFrame(
id = 1:2,
a = list(c(1, 2), c(3, 4)),
b = list(c("x", "y"), c("z", "w"))
)
unnest_longer_polars(df2, c(a, b))
#> shape: (4, 3)
#> ┌─────┬─────┬─────┐
#> │ id ┆ a ┆ b │
#> │ --- ┆ --- ┆ --- │
#> │ i32 ┆ f64 ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1 ┆ 1.0 ┆ x │
#> │ 1 ┆ 2.0 ┆ y │
#> │ 2 ┆ 3.0 ┆ z │
#> │ 2 ┆ 4.0 ┆ w │
#> └─────┴─────┴─────┘
# Multiple columns with values_to template
unnest_longer_polars(df2, c(a, b), values_to = "{col}_val")
#> shape: (4, 3)
#> ┌─────┬───────┬───────┐
#> │ id ┆ a_val ┆ b_val │
#> │ --- ┆ --- ┆ --- │
#> │ i32 ┆ f64 ┆ str │
#> ╞═════╪═══════╪═══════╡
#> │ 1 ┆ 1.0 ┆ x │
#> │ 1 ┆ 2.0 ┆ y │
#> │ 2 ┆ 3.0 ┆ z │
#> │ 2 ┆ 4.0 ┆ w │
#> └─────┴───────┴───────┘
# Multiple columns with indices_to template
unnest_longer_polars(df2, c(a, b), indices_to = "{col}_idx")
#> shape: (4, 5)
#> ┌─────┬───────┬─────┬───────┬─────┐
#> │ id ┆ a_idx ┆ a ┆ b_idx ┆ b │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ i32 ┆ u32 ┆ f64 ┆ u32 ┆ str │
#> ╞═════╪═══════╪═════╪═══════╪═════╡
#> │ 1 ┆ 1 ┆ 1.0 ┆ 1 ┆ x │
#> │ 1 ┆ 2 ┆ 2.0 ┆ 2 ┆ y │
#> │ 2 ┆ 1 ┆ 3.0 ┆ 1 ┆ z │
#> │ 2 ┆ 2 ┆ 4.0 ┆ 2 ┆ w │
#> └─────┴───────┴─────┴───────┴─────┘
# keep_empty example
df4 <- pl$DataFrame(
id = 1:3,
values = list(c(1, 2), NULL, 3)
)
# By default, NULL/empty values are dropped
unnest_longer_polars(df4, values)
#> shape: (3, 2)
#> ┌─────┬────────┐
#> │ id ┆ values │
#> │ --- ┆ --- │
#> │ i32 ┆ f64 │
#> ╞═════╪════════╡
#> │ 1 ┆ 1.0 │
#> │ 1 ┆ 2.0 │
#> │ 3 ┆ 3.0 │
#> └─────┴────────┘
# Use keep_empty = TRUE to keep them as NA
unnest_longer_polars(df4, values, keep_empty = TRUE)
#> shape: (4, 2)
#> ┌─────┬────────┐
#> │ id ┆ values │
#> │ --- ┆ --- │
#> │ i32 ┆ f64 │
#> ╞═════╪════════╡
#> │ 1 ┆ 1.0 │
#> │ 1 ┆ 2.0 │
#> │ 2 ┆ null │
#> │ 3 ┆ 3.0 │
#> └─────┴────────┘
