-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
110 lines (82 loc) · 5.05 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
output:
md_document:
variant: gfm
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# dccvalidator
![R-CMD-check](https://github.com/Sage-Bionetworks/dccvalidator/workflows/R-CMD-check/badge.svg?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/dccvalidator)](https://CRAN.R-project.org/package=dccvalidator)
[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://www.tidyverse.org/lifecycle/#maturing)
dccvalidator is a package and Shiny app to perform data validation and QA/QC.
It's used in the [AMP-AD](https://ampadportal.org/) and
[PsychENCODE](http://www.psychencode.org) consortia to validate data prior to
data releases.
## Installation
You can install dccvalidator from CRAN:
```{r installation-cran, eval = FALSE}
install.packages("dccvalidator")
```
To install the development version from GitHub, run:
```{r installation-dev, eval = FALSE}
devtools::install_github("Sage-Bionetworks/dccvalidator")
```
Many functions in dccvalidator use reticulate and the
[Synapse Python client](https://pypi.org/project/synapseclient/). See the
[reticulate documentation](https://rstudio.github.io/reticulate/#python-version)
for information on how to set R to use a specific version of Python if you don't
want to use the default Python installation on your machine. Whichever Python
installation you choose should have synapseclient installed.
Because dccvalidator uses reticulate, it is not compatible with the
[synapser](https://r-docs.synapse.org/) package.
## Dockerize the App
### Authentication
The dccmonitor can be authorized to log in to Synapse using Synapse Authentication (OAuth) client. Please view instructions [here](https://help.synapse.org/docs/Using-Synapse-as-an-OAuth-Server.2048327904.html#UsingSynapseasanOAuthServer-RegisteringandLinkinganOAuth2.0Client) to learn how to request a client. Our OAuth clients were created using Synapse service accounts in order to enable multiple Sage employees to maintain the applications. In the Shared-SysBio LastPass folder, credentials for each client are recorded. In the notes section of the credentials (click on the entry > Edit to see notes), the service account used to create the client is noted.
### Build a docker image using Dockerfile
```
docker build -t dccvalidator_pec -f Dockerfile .
```
### Pull docker image from GitHub Container Registry
```
docker pull ghcr.io/sage-bionetworks/dccvalidator_pec:v1.1.0
```
### Create a container from the docker image
```
docker run --rm -it -p 8100:3838 -e APP_REDIRECT_URL=<APP_REDIRECT_URL> -e R_CONFIG_ACTIVE=pec -e client_id=<Oauth client id> -e client_name=<Oauth client name> -e client_secret=<Oauth client secret> --name <container name> <dccvalidator_pec or ghcr.io/sage-bionetworks/dccvalidator_pec:v1.1.0>
```
Once the container is created, you can head to the APP_REDIRECT_URL you specified to enter the app.
## Check data
dccvalidator provides functions for checking the following common data quality
issues:
- Annotation keys and values conform to a controlled vocabulary
- Column names match those of an associated metadata template
- Certain columns are not empty
- Certain columns are complete
- Identifiers match between two metadata files (e.g. all individuals in one file
are also present in another)
- Check that identifiers are unique within a file
## Data submission validation
This package contains a Shiny app to validate manifests and metadata for AMP-AD
studies. It uses the dccvalidator package to check for common data quality
issues and gives realtime feedback to the data contributor on errors that need
to be fixed. The reporting UI is heavily inspired by the
[MetaDIG project's metadata quality reports](https://knb.ecoinformatics.org/quality/s=knb.suite.1/doi%3A10.5063%2FF12V2D1V).
The application also allows users to submit documentation of their study, a
description of the methods used, etc.
See the [customizing dccvalidator](https://sage-bionetworks.github.io/dccvalidator/articles/customizing-dccvalidator.html)
vignette for information on how to spin up a customized version of the
application
## Requesting New Features or Bug Fixes
For new feature requests or bug fixes, please create an issue to let the maintainers know. If you’ve found a bug, create an associated issue and illustrate the bug with a minimal
[reprex](https://www.tidyverse.org/help/#reprex). If there is agreement that the problem exists or that a new feature is desired, then the issue will be triaged for future development based on priority.
Please note that the dccvalidator project is released with a [Contributor Code of Conduct](https://sage-bionetworks.github.io/dccvalidator/CODE_OF_CONDUCT).
By contributing to this project via issue creation, comment, or pull request, you agree to abide by its terms.
## Contributing
See the [Contributing Guide](https://github.com/Sage-Bionetworks/dccvalidator/blob/master/.github/CONTRIBUTING.md) to see how you can get involved in the development of this application.