Contributing to tidypolars
Source:.github/CONTRIBUTING.md
This outlines how to propose a change to tidypolars.
Fixing typos
Small typos or grammatical errors in vignettes can be fixed directly on the Github interface since vignettes are automatically rendered by pkgdown
.
Fixing typos in the documentation of functions (those in the “Reference” page) requires editing the source in the corresponding .R
file and then run devtools::document()
. Do not edit an .Rd
file in man/
.
Filing an issue
The easiest way to propose a change or new feature is to file an issue. If you’ve found a bug, you may also create an associated issue. If possible, try to illustrate your proposal or the bug with a minimal reproducible example.
Pull requests
General information
- Please create a Git branch for each pull request (PR).
- tidypolars uses roxygen2, with Markdown syntax, for documentation.
- If your PR is a user-visible change, you may add a bullet point in
NEWS.md
describing the changes made. You may optionally add your GitHub username, and links to relevant issue(s)/PR(s).
How to add support for an R function in tidypolars
?
If you use a function that is not supported, tidypolars
will throw an error. This doesn’t mean that tidypolars
cannot support it, simply that it is not implemented yet. tidypolars
can technically support hundreds of functions that manipulate numbers, strings, or dates, and those are not limited to the tidyverse
.
The first step when you come across an unsupported function is to find the corresponding syntax in polars
. Just for the example, suppose that stringr::str_ends()
is not yet supported (it actually is). We first define a simple test data.frame
and its DataFrame
counterpart in polars
:
library(dplyr, warn.conflicts = FALSE)
library(stringr)
library(polars)
test <- tibble(x = c("abc", "a1", "dac"))
test_pl <- pl$DataFrame(x = c("abc", "a1", "dac"))
Then, find the polars
syntax that gives the same output as stringr::str_ends()
. In this case, polars
has a string methods $ends_with()
:
test_pl$with_columns(
ends_with_c = pl$col("x")$str$ends_with("c")
)
## shape: (3, 2)
## ┌─────┬─────────────┐
## │ x ┆ ends_with_c │
## │ --- ┆ --- │
## │ str ┆ bool │
## ╞═════╪═════════════╡
## │ abc ┆ true │
## │ a1 ┆ false │
## │ dac ┆ true │
## └─────┴─────────────┘
What interests us is the expression inside the $with_columns()
call. Now that we know the polars
translation, let’s put this in a function that tidypolars
will know. Internally, tidypolars
will prefix the function name (str_ends
) with pl_
and look for it in the translated functions.
Depending on the function we translate, it can be stored in different places. This function only works on strings, we can store it in R/funs-string.R
. Once we add support for the negate
argument, this is what it will look like:
pl_str_ends_stringr <- function(string, pattern, negate = FALSE, ...) {
check_empty_dots(...)
out <- string$str$ends_with(pattern)
if (isTRUE(negate)) {
out$not()
} else {
out
}
}
You can notice that the new function must be named as pl_<function_name>_<package_name>
. This is because multiple packages can export functions with the same name (for example data.table::between()
and dplyr::between()
). Therefore, we want to ensure that our translation is called only if the function comes from a specific package.
The check_empty_dots(...)
is here to grab all additional arguments that stringr::str_ends()
might have and ignore them because they have no counterparts in polars
(it also adds a message warning the user that these arguments are not used).
Once the package is reloaded, we can use this function in a tidypolars
workflow:
library(tidypolars, warn.conflicts = FALSE)
test_pl |>
mutate(ends_with_c = str_ends(x, "c"))
## shape: (3, 2)
## ┌─────┬─────────────┐
## │ x ┆ ends_with_c │
## │ --- ┆ --- │
## │ str ┆ bool │
## ╞═════╪═════════════╡
## │ abc ┆ true │
## │ a1 ┆ false │
## │ dac ┆ true │
## └─────┴─────────────┘
Finally, the only thing left to do is to add some tests in the tests/testthat
folder.
Testing changes
tidypolars
uses testthat
to check that functions have the expected output. If changes affect the behavior of some functions (for example when adding an extra argument), add some tests in the related file in tests/testthat
.
tidyverse
functions that apply to an entire dataframe (e.g., drop_na()
) have their own test file (test-drop_na.R
in this case).
Functions that apply to one or multiple vectors only (i.e. functions that are used in mutate()
, summarize()
, etc.) are tested in specific files. If you added a function in R/funs-string.R
, then you should add tests in tests/testthat/test-funs_string.R
(same for datetime functions, etc.).
No matter the type of function you added or modified, you should only write tests on Polars DataFrames. Those will be automatically modified when running the tests to run on LazyFrames as well (for instance, the file test-drop_na-lazy.R
will be automatically generated).