Skip to contents

The first thing to do before using tidypolars is to get some data as a Polars DataFrame or LazyFrame. You can read files with polars::pl$read_*() functions (to import them as DataFrames) or with polars::pl$scan_*() functions (to import them as LazyFrames). polars can read various file formats, such as CSV, Parquet, or JSON.

You could also read data with other packages and then convert it with as_polars_df() (of as_polars_lf() if you want to make it a LazyFrame).

Here, we’re going to use the who dataset that is available in the tidyr package. I import it both as a classic R data.frame and as a Polars DataFrame so that we can easily compare dplyr and tidypolars functions.

library(polars)
library(tidypolars)
library(dplyr, warn.conflicts = FALSE)
library(tidyr, warn.conflicts = FALSE)

who_df <- tidyr::who
who_pl <- as_polars_df(tidyr::who)

tidypolars provides methods for dplyr and tidyr S3 generics. In simpler words, it means that you can use the same functions on a Polars DataFrame or LazyFrame as in a classic tidyverse workflow and it should just work (if it doesn’t, please open an issue). Note that you still need to load dplyr and tidyr in your code.

Here’s an example of some dplyr and tidyr code on the classic R data.frame:

who_df |> 
  filter(year > 1990) |> 
  drop_na(newrel_f3544) |> 
  select(iso3, year, matches("^newrel(.*)_f")) |> 
  arrange(iso3, year) |> 
  rename_with(.fn = toupper) |> 
  head()
#> # A tibble: 6 × 9
#>   ISO3   YEAR NEWREL_F014 NEWREL_F1524 NEWREL_F2534 NEWREL_F3544 NEWREL_F4554
#>   <chr> <dbl>       <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
#> 1 AGO    2013         626         2644         2480         1671          991
#> 2 AIA    2013           0            0            0            0            0
#> 3 ALB    2013           5           28           34           13           18
#> 4 AND    2013           0            0            0            1            0
#> 5 ARE    2013           5            4            9            3            3
#> 6 ARG    2013         431          927          808          537          395
#> # ℹ 2 more variables: NEWREL_F5564 <dbl>, NEWREL_F65 <dbl>

We can simply use our Polars dataset instead:

who_pl |> 
  filter(year > 1990) |> 
  drop_na(newrel_f3544) |> 
  select(iso3, year, matches("^newrel(.*)_f")) |> 
  arrange(iso3, year) |> 
  rename_with(.fn = toupper) |> 
  head()
#> shape: (6, 9)
#> ┌──────┬────────┬─────────────┬────────────┬───┬────────────┬────────────┬────────────┬────────────┐
#> │ ISO3 ┆ YEAR   ┆ NEWREL_F014 ┆ NEWREL_F15 ┆ … ┆ NEWREL_F35 ┆ NEWREL_F45 ┆ NEWREL_F55 ┆ NEWREL_F65 │
#> │ ---  ┆ ---    ┆ ---         ┆ 24         ┆   ┆ 44         ┆ 54         ┆ 64         ┆ ---        │
#> │ str  ┆ f64    ┆ f64         ┆ ---        ┆   ┆ ---        ┆ ---        ┆ ---        ┆ f64        │
#> │      ┆        ┆             ┆ f64        ┆   ┆ f64        ┆ f64        ┆ f64        ┆            │
#> ╞══════╪════════╪═════════════╪════════════╪═══╪════════════╪════════════╪════════════╪════════════╡
#> │ AGO  ┆ 2013.0 ┆ 626.0       ┆ 2644.0     ┆ … ┆ 1671.0     ┆ 991.0      ┆ 481.0      ┆ 314.0      │
#> │ AIA  ┆ 2013.0 ┆ 0.0         ┆ 0.0        ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
#> │ ALB  ┆ 2013.0 ┆ 5.0         ┆ 28.0       ┆ … ┆ 13.0       ┆ 18.0       ┆ 14.0       ┆ 34.0       │
#> │ AND  ┆ 2013.0 ┆ 0.0         ┆ 0.0        ┆ … ┆ 1.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
#> │ ARE  ┆ 2013.0 ┆ 5.0         ┆ 4.0        ┆ … ┆ 3.0        ┆ 3.0        ┆ 1.0        ┆ 6.0        │
#> │ ARG  ┆ 2013.0 ┆ 431.0       ┆ 927.0      ┆ … ┆ 537.0      ┆ 395.0      ┆ 307.0      ┆ 374.0      │
#> └──────┴────────┴─────────────┴────────────┴───┴────────────┴────────────┴────────────┴────────────┘

If you use Polars lazy API, you need to call compute() at the end of the chained expression to evaluate the query:

who_pl_lazy <- as_polars_lf(tidyr::who)

who_pl_lazy |> 
  filter(year > 1990) |> 
  drop_na(newrel_f3544) |> 
  select(iso3, year, matches("^newrel(.*)_f")) |> 
  arrange(iso3, year) |> 
  rename_with(.fn = toupper) |> 
  compute() |> 
  head()
#> shape: (6, 9)
#> ┌──────┬────────┬─────────────┬────────────┬───┬────────────┬────────────┬────────────┬────────────┐
#> │ ISO3 ┆ YEAR   ┆ NEWREL_F014 ┆ NEWREL_F15 ┆ … ┆ NEWREL_F35 ┆ NEWREL_F45 ┆ NEWREL_F55 ┆ NEWREL_F65 │
#> │ ---  ┆ ---    ┆ ---         ┆ 24         ┆   ┆ 44         ┆ 54         ┆ 64         ┆ ---        │
#> │ str  ┆ f64    ┆ f64         ┆ ---        ┆   ┆ ---        ┆ ---        ┆ ---        ┆ f64        │
#> │      ┆        ┆             ┆ f64        ┆   ┆ f64        ┆ f64        ┆ f64        ┆            │
#> ╞══════╪════════╪═════════════╪════════════╪═══╪════════════╪════════════╪════════════╪════════════╡
#> │ AGO  ┆ 2013.0 ┆ 626.0       ┆ 2644.0     ┆ … ┆ 1671.0     ┆ 991.0      ┆ 481.0      ┆ 314.0      │
#> │ AIA  ┆ 2013.0 ┆ 0.0         ┆ 0.0        ┆ … ┆ 0.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
#> │ ALB  ┆ 2013.0 ┆ 5.0         ┆ 28.0       ┆ … ┆ 13.0       ┆ 18.0       ┆ 14.0       ┆ 34.0       │
#> │ AND  ┆ 2013.0 ┆ 0.0         ┆ 0.0        ┆ … ┆ 1.0        ┆ 0.0        ┆ 0.0        ┆ 0.0        │
#> │ ARE  ┆ 2013.0 ┆ 5.0         ┆ 4.0        ┆ … ┆ 3.0        ┆ 3.0        ┆ 1.0        ┆ 6.0        │
#> │ ARG  ┆ 2013.0 ┆ 431.0       ┆ 927.0      ┆ … ┆ 537.0      ┆ 395.0      ┆ 307.0      ┆ 374.0      │
#> └──────┴────────┴─────────────┴────────────┴───┴────────────┴────────────┴────────────┴────────────┘

tidypolars also supports many functions from base, lubridate or stringr. When these are used inside filter(), mutate() or summarize(), tidypolars will automatically convert them to use the Polars engine under the hood. Take a look at the vignette “R and Polars expressions” for more information.