Read a data file into a tibble and log data access on GitHub
read_data.Rd
This is a wrapper around any specified function for reading in data files. Upon accessing the data file, it checks the file against the history of previously accessed data files (through its MD5 hash) to assess whether it constitutes first-time access to the data. If so, it automatically logs this event on GitHub (after prompting the user). This is useful if you want to show in your log that you accessed parts of your data in a particular order (e.g., you first accessed your independent variables to establish an analysis plan and only then accessed your dependent variables).
Usage
read_data(
file,
read_fun,
col_select = NULL,
row_filter = NULL,
row_shuffle = NULL,
long_format = FALSE,
seed = 3985843,
...
)
Arguments
- file
Either a path to a file, a connection, or literal data (either a single string or a raw vector).
- read_fun
The name of a function to read data. for 'readr' functions, you only have to specify the function name (e.g., `read_csv()`). If you use a function from another package, name the package explicitly (e.g., `haven::read_spss()`).
- col_select
Columns to include in the results. You can use the same mini-language as `dplyr::select()` to refer to the columns by name. Use `c()` to use more than one selection expression. Although this usage is less common, col_select also accepts a numeric column index. See ?tidyselect::language for full details on the selection language.
- row_filter
Optional rows to include in the results. Uses `dplyr::filter()`.
- row_shuffle
Optional variables to randomly shuffle.
- long_format
Logical indicating whether the data are in long format (only relevant when shuffling variables using row_shuffle).
- seed
integer used for replicability purposes when randomly shuffling data.
- ...
Additional arguments for the read function.