Filtering joins filter rows from x based on the presence or absence of
matches in y:
semi_join()return all rows fromxwith a match iny.anti_join()return all rows fromxwithout a match iny.
Usage
# S3 method for class 'polars_data_frame'
semi_join(x, y, by = NULL, ..., na_matches = "na")
# S3 method for class 'polars_data_frame'
anti_join(x, y, by = NULL, ..., na_matches = "na")
# S3 method for class 'polars_lazy_frame'
semi_join(x, y, by = NULL, ..., na_matches = "na")
# S3 method for class 'polars_lazy_frame'
anti_join(x, y, by = NULL, ..., na_matches = "na")Arguments
- x, y
Two Polars Data/LazyFrames
- by
Variables to join by. If
NULL(default),*_join()will perform a natural join, using all variables in common acrossxandy. A message lists the variables so that you can check they're correct; suppress the message by supplyingbyexplicitly.bycan take a character vector, likec("x", "y")ifxandyare in both datasets. To join on variables that don't have the same name, use equalities in the character vector, likec("x1" = "x2", "y"). If you use a character vector, the join can only be done using strict equality.bycan also be a specification created bydplyr::join_by(). Contrary to the input as character vector shown above,join_by()uses unquoted column names, e.gjoin_by(x1 == x2, y).Finally,
inner_join()also supports inequality joins, e.g.join_by(x1 >= x2), and the helpersbetween(),overlaps(), andwithin(). See the documentation ofdplyr::join_by()for more information. Other join types will likely support inequality joins in the future.- ...
Dots which should be empty.
- na_matches
Should two
NAvalues match?"na", the default, treats twoNAvalues as equal."never"treats twoNAvalues as different and will never match them together or to any other values.
Note that when joining Polars Data/LazyFrames,
NaNare always considered equal, no matter the value ofna_matches. This differs from the originaldplyrimplementation.
Unknown arguments
Arguments that are supported by the original implementation in the tidyverse
but are not listed above will throw a warning by default if they are
specified. To change this behavior to error instead, use
options(tidypolars_unknown_args = "error").
Examples
test <- polars::pl$DataFrame(
x = c(1, 2, 3),
y = c(1, 2, 3),
z = c(1, 2, 3)
)
test2 <- polars::pl$DataFrame(
x = c(1, 2, 4),
y = c(1, 2, 4),
z2 = c(1, 2, 4)
)
test
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ 1.0 ┆ 1.0 │
#> │ 2.0 ┆ 2.0 ┆ 2.0 │
#> │ 3.0 ┆ 3.0 ┆ 3.0 │
#> └─────┴─────┴─────┘
test2
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z2 │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ 1.0 ┆ 1.0 │
#> │ 2.0 ┆ 2.0 ┆ 2.0 │
#> │ 4.0 ┆ 4.0 ┆ 4.0 │
#> └─────┴─────┴─────┘
# only keep the rows of `test` that have matching keys in `test2`
semi_join(test, test2, by = c("x", "y"))
#> shape: (2, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ 1.0 ┆ 1.0 │
#> │ 2.0 ┆ 2.0 ┆ 2.0 │
#> └─────┴─────┴─────┘
# only keep the rows of `test` that don't have matching keys in `test2`
anti_join(test, test2, by = c("x", "y"))
#> shape: (1, 3)
#> ┌─────┬─────┬─────┐
#> │ x ┆ y ┆ z │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ 3.0 ┆ 3.0 ┆ 3.0 │
#> └─────┴─────┴─────┘
