README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# tidbits

The goal of tidbits is to package up some utility functions that have proven 
useful in multiple data analysis projects and teaching, so they can be 
properly documented and more easily deployed. Including `autoread()` function
which wraps readers for a wide variety of data formats so the same script can
run on different files without editing the file-loading code.

## Installation

You can install tidbits from github with:

```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("bokov/tidbits")
```

## Example

This is a basic example which shows you how to solve a common problem:

```{r example}
library(tidbits);
# Read data from the NAACCR website
dat00 <- autoread('https://www.naaccr.org/wp-content/uploads/2017/02/naaccr_cina_2009_2013_stage.sas7bdat');
# Build an automatic data dictionary
dct0 <- tblinfo(dat00)

```
Now that there exists a data.frame compatible object named `dct0` in your environment, you can pull various 
collections of column names out of it for the table on which it was based (`dat00`).
```{r varnames}
# To see which column groupings exist, call it without any arguments
v()
# To get the names of just the numeric columns 
v(c_numeric)
# To get the names of uninformative columns (i.e. their value never changes)
v(c_uninformative)
# Complex columns aren't literally complex numbers, but rather factors that have a huge number of levels
v(c_complex)
# Ordinal columns are ones that are numeric, yet have few distinct values and it might make sense to discretize them
v(c_ordinal)
# 'c_factor' columns are non-numeric ones that might be good choices for converting to factors
v(c_factor)
# the 'c_tm' group are columns which have only one distinct non-missing value, 'c_tf' ones have only two distinct non-missing values, and 'c_empty' ones are missing all values. None of those are represented in the NAACCR dataset.

```

[![Travis-CI Build Status](https://travis-ci.org/bokov/tidbits.svg?branch=integration)](https://travis-ci.org/bokov/tidbits)
[![Coverage Status](https://img.shields.io/codecov/c/github/bokov/tidbits/integration.svg)](https://codecov.io/github/bokov/tidbits?branch=integration)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/bokov/tidbits?branch=integration&svg=true)](https://ci.appveyor.com/project/bokov/tidbits)