-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
65 lines (52 loc) · 2.51 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# tidbits
The goal of tidbits is to package up some utility functions that have proven
useful in multiple data analysis projects and teaching, so they can be
properly documented and more easily deployed. Including `autoread()` function
which wraps readers for a wide variety of data formats so the same script can
run on different files without editing the file-loading code.
## Installation
You can install tidbits from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("bokov/tidbits")
```
## Example
This is a basic example which shows you how to solve a common problem:
```{r example}
library(tidbits);
# Read data from the NAACCR website
dat00 <- autoread('https://www.naaccr.org/wp-content/uploads/2017/02/naaccr_cina_2009_2013_stage.sas7bdat');
# Build an automatic data dictionary
dct0 <- tblinfo(dat00)
```
Now that there exists a data.frame compatible object named `dct0` in your environment, you can pull various
collections of column names out of it for the table on which it was based (`dat00`).
```{r varnames}
# To see which column groupings exist, call it without any arguments
v()
# To get the names of just the numeric columns
v(c_numeric)
# To get the names of uninformative columns (i.e. their value never changes)
v(c_uninformative)
# Complex columns aren't literally complex numbers, but rather factors that have a huge number of levels
v(c_complex)
# Ordinal columns are ones that are numeric, yet have few distinct values and it might make sense to discretize them
v(c_ordinal)
# 'c_factor' columns are non-numeric ones that might be good choices for converting to factors
v(c_factor)
# the 'c_tm' group are columns which have only one distinct non-missing value, 'c_tf' ones have only two distinct non-missing values, and 'c_empty' ones are missing all values. None of those are represented in the NAACCR dataset.
```
[](https://travis-ci.org/bokov/tidbits)
[](https://codecov.io/github/bokov/tidbits?branch=integration)
[](https://ci.appveyor.com/project/bokov/tidbits)