The first thing to do before using tidypolars
is to get
some data as a Polars DataFrame
or LazyFrame
.
You can read files with polars::pl$read_*()
functions (to
import them as DataFrame
s) or with
polars::pl$scan_*()
functions (to import them as
LazyFrame
s). polars
can read various file
formats, such as CSV, Parquet, or JSON.
You could also read data with other packages and then convert it with
as_polars_df()
(of as_polars_lf()
if you want
to make it a LazyFrame
).
Here, we’re going to use the who
dataset that is
available in the tidyr
package. I import it both as a
classic R data.frame
and as a Polars DataFrame
so that we can easily compare dplyr
and
tidypolars
functions.
library(polars)
library(tidypolars)
library(dplyr, warn.conflicts = FALSE)
library(tidyr, warn.conflicts = FALSE)
who_df <- tidyr::who
who_pl <- as_polars_df(tidyr::who)
tidypolars
provides methods for dplyr
and
tidyr
S3 generics. In simpler words, it means that you can
use the same functions on a Polars DataFrame
or
LazyFrame
as in a classic tidyverse
workflow
and it should just work (if it doesn’t, please open an
issue). Note that you still need to load dplyr
and
tidyr
in your code.
Here’s an example of some dplyr
and tidyr
code on the classic R data.frame
:
who_df |>
filter(year > 1990) |>
drop_na(newrel_f3544) |>
select(iso3, year, matches("^newrel(.*)_f")) |>
arrange(iso3, year) |>
rename_with(.fn = toupper) |>
head()
#> # A tibble: 6 × 9
#> ISO3 YEAR NEWREL_F014 NEWREL_F1524 NEWREL_F2534 NEWREL_F3544 NEWREL_F4554
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AGO 2013 626 2644 2480 1671 991
#> 2 AIA 2013 0 0 0 0 0
#> 3 ALB 2013 5 28 34 13 18
#> 4 AND 2013 0 0 0 1 0
#> 5 ARE 2013 5 4 9 3 3
#> 6 ARG 2013 431 927 808 537 395
#> # ℹ 2 more variables: NEWREL_F5564 <dbl>, NEWREL_F65 <dbl>
We can simply use our Polars dataset instead:
who_pl |>
filter(year > 1990) |>
drop_na(newrel_f3544) |>
select(iso3, year, matches("^newrel(.*)_f")) |>
arrange(iso3, year) |>
rename_with(.fn = toupper) |>
head()
#> shape: (6, 9)
#> ┌──────┬────────┬─────────────┬────────────┬───┬────────────┬────────────┬────────────┬────────────┐
#> │ ISO3 ┆ YEAR ┆ NEWREL_F014 ┆ NEWREL_F15 ┆ … ┆ NEWREL_F35 ┆ NEWREL_F45 ┆ NEWREL_F55 ┆ NEWREL_F65 │
#> │ --- ┆ --- ┆ --- ┆ 24 ┆ ┆ 44 ┆ 54 ┆ 64 ┆ --- │
#> │ str ┆ f64 ┆ f64 ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ f64 │
#> │ ┆ ┆ ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ │
#> ╞══════╪════════╪═════════════╪════════════╪═══╪════════════╪════════════╪════════════╪════════════╡
#> │ AGO ┆ 2013.0 ┆ 626.0 ┆ 2644.0 ┆ … ┆ 1671.0 ┆ 991.0 ┆ 481.0 ┆ 314.0 │
#> │ AIA ┆ 2013.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
#> │ ALB ┆ 2013.0 ┆ 5.0 ┆ 28.0 ┆ … ┆ 13.0 ┆ 18.0 ┆ 14.0 ┆ 34.0 │
#> │ AND ┆ 2013.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 1.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
#> │ ARE ┆ 2013.0 ┆ 5.0 ┆ 4.0 ┆ … ┆ 3.0 ┆ 3.0 ┆ 1.0 ┆ 6.0 │
#> │ ARG ┆ 2013.0 ┆ 431.0 ┆ 927.0 ┆ … ┆ 537.0 ┆ 395.0 ┆ 307.0 ┆ 374.0 │
#> └──────┴────────┴─────────────┴────────────┴───┴────────────┴────────────┴────────────┴────────────┘
If you use Polars lazy API, you need to call compute()
at the end of the chained expression to evaluate the query:
who_pl_lazy <- as_polars_lf(tidyr::who)
who_pl_lazy |>
filter(year > 1990) |>
drop_na(newrel_f3544) |>
select(iso3, year, matches("^newrel(.*)_f")) |>
arrange(iso3, year) |>
rename_with(.fn = toupper) |>
compute() |>
head()
#> shape: (6, 9)
#> ┌──────┬────────┬─────────────┬────────────┬───┬────────────┬────────────┬────────────┬────────────┐
#> │ ISO3 ┆ YEAR ┆ NEWREL_F014 ┆ NEWREL_F15 ┆ … ┆ NEWREL_F35 ┆ NEWREL_F45 ┆ NEWREL_F55 ┆ NEWREL_F65 │
#> │ --- ┆ --- ┆ --- ┆ 24 ┆ ┆ 44 ┆ 54 ┆ 64 ┆ --- │
#> │ str ┆ f64 ┆ f64 ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ f64 │
#> │ ┆ ┆ ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ │
#> ╞══════╪════════╪═════════════╪════════════╪═══╪════════════╪════════════╪════════════╪════════════╡
#> │ AGO ┆ 2013.0 ┆ 626.0 ┆ 2644.0 ┆ … ┆ 1671.0 ┆ 991.0 ┆ 481.0 ┆ 314.0 │
#> │ AIA ┆ 2013.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 0.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
#> │ ALB ┆ 2013.0 ┆ 5.0 ┆ 28.0 ┆ … ┆ 13.0 ┆ 18.0 ┆ 14.0 ┆ 34.0 │
#> │ AND ┆ 2013.0 ┆ 0.0 ┆ 0.0 ┆ … ┆ 1.0 ┆ 0.0 ┆ 0.0 ┆ 0.0 │
#> │ ARE ┆ 2013.0 ┆ 5.0 ┆ 4.0 ┆ … ┆ 3.0 ┆ 3.0 ┆ 1.0 ┆ 6.0 │
#> │ ARG ┆ 2013.0 ┆ 431.0 ┆ 927.0 ┆ … ┆ 537.0 ┆ 395.0 ┆ 307.0 ┆ 374.0 │
#> └──────┴────────┴─────────────┴────────────┴───┴────────────┴────────────┴────────────┴────────────┘
tidypolars
also supports many functions from
base
, lubridate
or stringr
. When
these are used inside filter()
, mutate()
or
summarize()
, tidypolars
will automatically
convert them to use the Polars engine under the hood. Take a look at the
vignette “R
and Polars expressions” for more information.