diff --git a/data.Rmd b/data.Rmd index 8eb6671..44dc412 100644 --- a/data.Rmd +++ b/data.Rmd @@ -1,43 +1,43 @@ # Data Management -*Can the data be shared and published, and easily re-used in other analyses*? - -- Create and maintain a [data management plan](https://dmptool.org/plans) -- Store data in simple, cross-compatible formats such as CSV files. -- Microsoft Excel can be a useful tool for data entry and organization, but - limit its use to that, and organize your data in a way that can be easily - exported. -- Metadata! Metadata! Document your data. -- For relational datasets you can create linked data on [Airtable](https://airtable.com/). For more information see \@ref(airtable) -- For data sets that cross multiple projects, create data-only project folders - for the master version. When these data sets are finalized, they can be - deposited in public or private data repositories such as - [figshare](https://figshare.com/) and [zenodo](https://zenodo.org/). In some - cases it makes sense for us to create data-only R packages for easily - distributing data internally and externally. +EcoHealth Alliance is committed to producing and promoting reliable and +reproducible research. In order to achieve this, we have to provide data +(and other research outputs) that non-team members can interpret and use; as well +as promote best practices for data management among collaborators. Ideally, the +framework for managing data laid out in this chapter will facilitate the creation +of high quality, share-able research outputs. By focusing on [Data Management +Plans](https://datamanagement.hms.harvard.edu/plan-design/data-management-plans) and the [dmptool](https://dmptool.org/plans), we can build on well +established workflows for producing high quality research outputs. -We aim to generally work in a **tidy data** framework. This approach to -structuring data makes interoperability between tools easier. ## Data Management Plan -*Data Management Plans* , also called *Outputs Management Plans* or *Data Management and Sharing Plans*, are living documents that help structure the creation and management of data throughout the lifecycle of a project. DMPs are flexible and do not force researchers to choose a particular technology set but rather ask probing questions about the mechanics and ethics of data use in research projects. Organizing data management in this way provides a common framework to think about data without requiring specific technologies be used in the research workflow. Furthermore, DMPs use reliable identifiers (URIs) to connect components of the research workflow, making long term data access more reliable. +*Data Management Plans* , also called *Outputs Management Plans* or *Data Management and Sharing Plans*, are living documents that help structure the creation and management of data throughout the lifecycle of a project. DMPs are flexible and do not force researchers to choose a particular technology set but rather ask probing questions about the mechanics and ethics of data use in research projects. Organizing data management in this way provides a common framework to think about data without requiring specific technologies be used in the research workflow. Furthermore, DMPs use stable identifiers (URIs) to connect components of the research workflow, making long term data access more reliable. + +The majority of funders require a DMP; however, each funder has specific expectations +about what, when, and how research outputs should be shared. It is important you +and your collaborators understand those expectations before submitting a DMP. Its +equally important that all collaborators understand and agree to the obligations +created when submitting a DMP. Early communication between collaborators +is key to navigating differing expectations about data sharing from researchers +in different contexts. ![](assets/data_mgmg_plan.png) *Data management plan as hub in knowledge management system* +**Important note on budgeting**: Data management activities, but not necessarily infrastructure, are an allowable cost for most funding agencies (NIH, NSF, NASA). Gray areas include paying for hosting services and other infrastructure-like components of the DMP. **Benefits of using a DMP**: -1. They are a funder requirement and you want funding - - NIH, NSF, NASA, Wellcome Trust, etc. require a DMP be submitted with a proposal. -2. They provide a scaffold for you to conceptualize data management for your project +1. They provide a scaffold for you to conceptualize data management for your project - What data do you need to answer your research question, where will it come from, what resources are needed throughout the project lifecycle, what are the mechanics of managing the data? -3. They make it easier collaborate +2. They make it easier collaborate - Defining responsibilities, Committing to using data standards, Documenting how the project works -4. They make it easier for your data to be reused +3. They make it easier for your data to be reused - You get more citations, your effort contributes to knowledge creation in unexpected ways, your results become more reproducible +4. They are a funder requirement and you want funding + - NIH, NSF, NASA, Wellcome Trust, etc. require a DMP be submitted with a proposal. **Components of a DMP**: @@ -53,28 +53,24 @@ Data management activities, but not necessarily infrastructure, are an allowable 1. Its never too late to write a DMP 2. Data Management Plans are living documents that change with a project 3. DMPs are created collaboratively and stored in DMPTool.org -4. We ensure our DMPs meet EHA best practices for FAIR data and Reproducible Science +4. We ensure our DMPs meet EHA best practices for [FAIR data](https://www.go-fair.org/fair-principles/) and Reproducible Science +5. Collaborators, especially those from outside institutions, are full participants in the DMP process -### DMP Process Overview - -0. [Create an account](https://dmptool.org/quick_start_guide) on DMPTool.org associated with EcoHealth Alliance -1. Identify Funder DMP requirements and `r params$data_librarian_appt` with the `r params$data_librarian` -2. Create a DMP using appropriate template given your funder. If no template is available or the funder has no requirements, use the EHA Minimal Data Management Plan. Add collaborators and complete as much of the plan as you can -3. Request feedback from the `r params$data_librarian` -4. Work with the `r params$data_librarian` to incorporate feedback -5. Export DMP for inclusion in grant ### Expectations by project phase **Proposal/Pre-Award Phase** -- Look for and use Funder Requirements for DMPs. If no template exists, use this one or create one based on funder requirements. +- Look for funder requirements and use funder specific templates for DMPs. If no template exists, use the EHA Minimal Data Management Plan or create one based on funder requirements. - Think about how you might make data Findable, Accessible, Interoperable and Reproducible (FAIR) +- Establish expectations for data sharing and outputs with collaborators and PIs. These discussions should begin early at the same time as discussing project responsibilities and budget. - Consider what tools you will use throughout the lifecycle of your data  - Consider how data collection, analysis and management tasks will be divided among collaborators - Outline the ethical considerations for properly managing data in your project +- Ensure collaborators and PIs understand the commitments they are making via the DMP. Request and incorporate feedback from collaborators. - `r params$data_librarian_appt` with the Data Librarian, create a timeline for proposal submission, and have a notion of tools and standards to use + **Post-Award/Early Phase** - Review and update proposal DMP @@ -112,9 +108,46 @@ Data management activities, but not necessarily infrastructure, are an allowable - Use EHA institutional tags where possible e.g. [Zenodo Community](https://zenodo.org/communities/ecohealthalliance/?page=1&size=20) - `r params$data_librarian_appt` with the `r params$data_librarian` +### Using DMPTool to create prepare your proposal data Management plan + +0. [Create an account](https://dmptool.org/quick_start_guide) on DMPTool.org associated with EcoHealth Alliance +1. Identify Funder DMP requirements and `r params$data_librarian_appt` with the `r params$data_librarian` +2. Create a DMP using appropriate template given your funder. If no template is available or the funder has no requirements, use the EHA Minimal Data Management Plan. Add collaborators and complete as much of the plan as you can +3. Principle Investigators and Project Partners explicitly agree to abide by the DMP. All collaborators should fully understand and agree with the data sharing components of the plan before approving it. +3. Request feedback from the `r params$data_librarian` +4. Work with the `r params$data_librarian` to incorporate feedback +5. Export DMP for inclusion in grant + +## Notes on data management +*Can the data be shared and published, and easily re-used in other analyses*? + +- Create and maintain a [data management plan](https://dmptool.org/plans) +- Store data in simple, interoperable formats such as CSV files. +- Microsoft Excel can be a useful tool for data entry and organization, but + limit its use to that, and organize your data in a way that can be easily + exported. +- Metadata! Metadata! Document your data. +- For relational datasets you can create linked data on [Airtable](https://airtable.com/). For more information see \@ref(airtable) +- For data sets that cross multiple projects, create data-only project folders + for the master version. When these data sets are finalized, they can be + deposited in public or private data repositories such as + [figshare](https://figshare.com/) and [zenodo](https://zenodo.org/). In some + cases it makes sense for us to create data-only R packages for easily + distributing data internally and externally. + +We aim to generally work in a **tidy data** framework. This approach to +structuring data makes interoperability between tools easier. + + + ## Learn - Watch M3 on [Data Management Plans](https://airtable.com/appwlxIzmQx5njRtQ/tbledVCO9MRKkK9MW/viwfFq11zdwCbBT83/recNVSuG2ApgfYkbl?blocks=hide) - Read California Digital Library guidance on [Data Management Plans](https://dmptool.org/general_guidance) +- [Data Management Plan Skill Building](https://dataoneorg.github.io/Education/bp_step/plan/) from DataOne +- [NIH Data Sharing Guidance](https://sharing.nih.gov/data-management-and-sharing-policy) + - [NIH Data Sharing learning Resources](https://sharing.nih.gov/about/learning) + - [Condensed NIH DMSP Guidance Resources](https://osf.io/uadxr/) +- [NSF Bio DMP Guidance](https://www.nsf.gov/bio/biodmp.jsp) - Read Hadley Wickham's [tidy data paper](http://vita.had.co.nz/papers/tidy-data.pdf) for the general concept. Note the *packages* in this paper are out of date, but the structures and