-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathoverview.R
123 lines (122 loc) · 6.57 KB
/
overview.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
#' ---
#' title: "Overview of Ripcord"
#' author: "Alex F. Bokov"
#' css: "production.css"
#' output:
#' html_document:
#' toc: true
#' toc_float: true
#' ---
#'
#+ setup, echo=FALSE, message=FALSE, warning=FALSE,results='hide'
.projpackages <- c('');
.deps <- c( 'prep_deps.R' );
# do not edit the next two lines
.junk<-capture.output(source('./scripts/global.R',chdir=TRUE,echo=FALSE));
.currentscript <- current_scriptname('overview.R');
#+ echo=FALSE, results='hide'
#' ## Introduction
#'
#' Welcome to Ripcord-- a framework for rapidly setting up data science projects
#' in an automated and reproducible manner. The main data analysis example file
#' is [`example_analysis.html`](example_analysis.html) which was generated by
#' [`example_analysis.R`](example_analysis.R). This and other scripts here are
#' ordinary R scripts-- anything you can do to an R script you can do to these.
#' But our R scripts have the following additional features:
#'
#' 1. They delegate certain tasks to scripts in the [`scripts`](scripts) directory,
#' which include loading/interpreting data files and keeping track of
#' which libraries are needed. Read more about them below in
#' 'The `scripts` directory' section.
#' 2. These scripts have a standardized way of specifying what other scripts
#' they depend on, and a standardized way of saving their data so that other
#' scripts can depend on them in turn. Read more about them below.
#' 3. These scripts use two shared configuration files: [`config.R`](config.R) for
#' public project-level settings and [`local.config.R`](local.config.R) for
#' local project-level settings. These are the only places where you need to
#' specify the location of your data files-- all the other scripts here will
#' automatically find them.
#'
#' ## The project directory
#'
#' In addition to [`example_analysis.R`](example_analysis.R) which attemtps to
#' (randomly) select valid variables from your data and do example analysis of
#' them, there is also [`minimum_scriport.R`](minimum_scriport.R) which is blank
#' except a few lines of code at the beginning and end that give it access to
#' Ripcord's features and a few explanatory comments. You can modify or copy
#' either of these files to use as a starting point for whatever analysis you
#' wish to do.
#'
#' This report was generated by [`overview.R`](overview.R)
#'
#' There is also [`prep_deps.R`](prep_deps.R) whose sole purpose is to trigger
#' all the other scriports to run, generate formatted HTML reports, and save out
#' their results for use by subsequent scriports.
#'
#' [`local.config.R`](local.config.R) has the paths to the file or files your
#' project uses. It was generated when you ran the Ripcord script and were asked
#' for the locations of your data files and give them short names to use within
#' your scriports. Those files are only read, not altered by our scriports and
#' I recommend that you stick with this policy of clean separation between data
#' and code. If you need to add, remove, or alter any of the data files that
#' will be used in your project, you can edit [`local.config.R`](local.config.R)
#' manually (for new users of R, remember to follow the example of the files
#' already listed there-- put the paths in quotes, give each a distinct name,
#' and separate them from each other using commas). Do not check
#' [`local.config.R`](local.config.R) into git or otherwise include it when
#' disseminating copies of your project-- it will only work on your local
#' computer. However, there is also a [`config.R`](config.R) file which uses
#' relative paths and can work on other computer. It points to simulated
#' version/s of your file/s located in the [`data`](data) directory. **After you
#' review those files and confirm for yourself that there really is no PHI or
#' other confidential information there** you can choose to share those files
#' together with your scripts so people can run them and verify that they work
#' (though the statistical analysis will be based on random values and
#' completely different from your real results). **_Never check your actual
#' data into git_ unless you have written, unmistakeable permission to do so.**
#' Even if the data is fully de-identified, non-proprietary, and non-sensitive,
#' git is still not the ideal channel to share large datasets. Consider using
#' [Zenodo](https://about.zenodo.org/) for that instead.
#'
#' _more coming soon_
#'
#' ## The `scripts` directory
#'
#' The first of the scriports in this directory is
#' [`scripts/data.R`](scripts/data.R) which reads your data file/s into R and if there
#' isn't already a file named [`varmap.csv`](varmap.csv) at the top level of
#' your project directory, it creates one. The purpose of this file is to have
#' one place where you control the renaming of all the columns in all your
#' tables (because in the raw data the names are sometimes too long or
#' incompatible with R). In [`varmap.csv`](varmap.csv) `origname` is the
#' original name of each column, `varname` is the new name by which your
#' R scripts can refer to it (if it's all lower-case, not too long, and has no
#' invalid characters it remains the same as `origname`), and `dispname` is for
#' a nicely formatted version with spaces, capitalization, and most punctuation
#' permitted. The `dispname` version of the variable name is for use in
#' plots, tables, and other final output. You can edit
#' [`varmap.csv`](varmap.csv) whenever you need to.
#'
#' Next in line is [`scripts/simdata.R`](scripts/simdata.R) which takes the
#' data obtained by [`scripts/data.R`](scripts/data.R) and creates simulated
#' versions of each of the original files in the [`data`](data) directory using
#' the same names (unless those names already exist there).
#'
#' Finally, there is [`scripts/dictionary.R`](scripts/dictionary.R) which
#' generates a data dictionary and embeds it in each of the data.frames that
#' [`scripts/data.R`](scripts/data.R) created.
#'
#' It is not recommended that you edit anything inside [`scripts`](scripts)
#' because this may cause problems if you later choose to pull in new updates to
#' those scripts from my git repository. If you need to customize a scriport
#' in the [`scripts`](scripts) directory, you should copy it one level up, into
#' your project directory, and that copy will run instead of the one in
#' [`scripts`](scripts) and that will not interfere with updates.
#'
#' The remaining files in [`scripts`](scripts) are used internally by Ripcord.
#'
#' _more coming soon_
#'
#+ echo=FALSE, results='hide'
save(file=paste0(.currentscript,'.rdata'),list=setdiff(ls(),.origfiles));
c()