Filtering joins filter rows from x
based on the presence or absence of
matches in y
:
semi_join()
return all rows fromx
with a match iny
.anti_join()
return all rows fromx
without a match iny
.
Usage
# S3 method for class 'RPolarsDataFrame'
semi_join(x, y, by = NULL, ..., na_matches = "na")
# S3 method for class 'RPolarsDataFrame'
anti_join(x, y, by = NULL, ..., na_matches = "na")
# S3 method for class 'RPolarsLazyFrame'
semi_join(x, y, by = NULL, ..., na_matches = "na")
# S3 method for class 'RPolarsLazyFrame'
anti_join(x, y, by = NULL, ..., na_matches = "na")
Arguments
- x, y
Two Polars Data/LazyFrames
- by
Variables to join by. If
NULL
(default),*_join()
will perform a natural join, using all variables in common acrossx
andy
. A message lists the variables so that you can check they're correct; suppress the message by supplyingby
explicitly.by
can take a character vector, likec("x", "y")
ifx
andy
are in both datasets. To join on variables that don't have the same name, use equalities in the character vector, likec("x1" = "x2", "y")
. If you use a character vector, the join can only be done using strict equality.by
can also be a specification created bydplyr::join_by()
. Contrary to the input as character vector shown above,join_by()
uses unquoted column names, e.gjoin_by(x1 == x2, y)
.Finally,
inner_join()
also supports inequality joins, e.g.join_by(x1 >= x2)
, and the helpersbetween()
,overlaps()
, andwithin()
. See the documentation ofdplyr::join_by()
for more information. Other join types will likely support inequality joins in the future.- ...
Dots which should be empty.
- na_matches
Should two
NA
values match?"na"
, the default, treats twoNA
values as equal."never"
treats twoNA
values as different and will never match them together or to any other values.
Note that when joining Polars Data/LazyFrames,
NaN
are always considered equal, no matter the value ofna_matches
. This differs from the originaldplyr
implementation.
Unknown arguments
Arguments that are supported by the original implementation in the tidyverse
but are not listed above will throw a warning by default if they are
specified. To change this behavior to error instead, use
options(tidypolars_unknown_args = "error")
.
Examples
test <- polars::pl$DataFrame(
x = c(1, 2, 3),
y = c(1, 2, 3),
z = c(1, 2, 3)
)
test2 <- polars::pl$DataFrame(
x = c(1, 2, 4),
y = c(1, 2, 4),
z2 = c(1, 2, 4)
)
test
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ 1.0 ┆ 1.0 │
#> │ 2.0 ┆ 2.0 ┆ 2.0 │
#> │ 3.0 ┆ 3.0 ┆ 3.0 │
#> └─────┴─────┴─────┘
test2
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z2 │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ 1.0 ┆ 1.0 │
#> │ 2.0 ┆ 2.0 ┆ 2.0 │
#> │ 4.0 ┆ 4.0 ┆ 4.0 │
#> └─────┴─────┴─────┘
# only keep the rows of `test` that have matching keys in `test2`
semi_join(test, test2, by = c("x", "y"))
#> shape: (2, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ 1.0 ┆ 1.0 │
#> │ 2.0 ┆ 2.0 ┆ 2.0 │
#> └─────┴─────┴─────┘
# only keep the rows of `test` that don't have matching keys in `test2`
anti_join(test, test2, by = c("x", "y"))
#> shape: (1, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 3.0 ┆ 3.0 ┆ 3.0 │
#> └─────┴─────┴─────┘