Skip to content

COMHIS/article_2020_disappearing-discourses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMHIS Project Template

This is a generic template for a COMHIS project. Clone and rewrite to start a project with the template.

The aim here is to create a model that enables somewhat painless internal reproduction of a project and to ease communication about a project's structure.

Most of the features here are recommendations, and can obviously be varied on as needed.

Repository name

A proposal for unified naming scheme for publication related repositories is as follows: document type_date_project name. An example would be: article_2019_hume_history_text_reuse. Date should follow the format YYYYMMDD, with month and day optional, and would probably refer to the projected or actual end date of the project. The whole date -element can also be optional, to be included only if relevant, eg. article_hume_history_text_reuse would be equally valid, as would other variations, such as including the name of the publication: article_2019_JEPS_Finnish_national_public_sphere.

Practices

  • Project overview documentation:
    • Should reside in the project root in a README.md (this file).
    • Should list people involved and their roles in the project.
  • Naming files and folders:
    • Use all lowercase (except for established standards such as README.md and the .R filename extension).
    • Separate words in file and directory names by underscore: _. eg. south_sea_bubble.R instead of south-sea-bubble.R or SouthSeaBubble.R.
  • Structure:
    • Follow the directory structure laid out below.
    • Include README.md in each directory documenting the contents of that directory.
      • This is especially important in data and final code directories.
    • If feasible, to avoid confusion only use single .gitattributes and single .gitignore file residing in the project root.

Directory structure

The project repository structured is variation of formats laid out in a few data science project organization articles (see the end of this README). code and output -directories include work/ and final/ -subdirectories. The work/ -subdirectory is optional, but helps to keep development material separate from the polished and clean end products that should reside in the final/ directory If the project only has "final" products, that directory level can of course be omitted too.

project_name/
├── README.md              # project overview
├── documentation/         # project documentation
├── data/
│   ├── raw/               # immutable raw input data
│   ├── work/              # intermediate data
│   └── final/             # processed data for final analysis tasks
├── code/
│   ├── templates/         # code templates shared across projects. Visualization styles etc.
│   ├── work/
│   │   ├── person1/       # a directory for each person or task
│   │   ├── person2/        
│   │   └── task1/         
│   └── final/
│       ├── task1/         # A directory for each analysis task
│       └── another_task/  # if needed.
└── output/
    ├── figures/
    │   ├── work/
    │   └── final/
    └── publications/
        ├── work/
        └── final/

Logic

  • [documentation/]: Project meta documentation. Links to all relevant planning papers, interim notes, google drive folders, etc.
  • [input/]: Input data. Either a whole dataset or if that is impractical, a link pointing to the data source (likely another repository). [data_raw/] subdirectory should have immutable original input data and/or references to the repositories where it can be retrieved from. [data_processed/] holds data that has been processed to analysis ready format and should include README.md pointing to the code that is used to produce the data. [data_work/] is a development directory for work-in-progress datasets, exchanging data between coding tasks, etc. Ideally, all datasets should be producible by scripts from the raw data.
  • [code/]: Data processing code. Finished code used for publication should be moved to [final/] subdirectory. Organization of the development directory [work/] can vary and the breakdown by person or task is just a suggestion. All directories, but especially [final/] should include a README.md clearly documenting what each script does.
    • [code/templates/] has templates for code shared across projects, such as visualization styles.
  • [output/]: Both figures and publication texts/files. Divided to work and final subdirectories.

Articles on data science project git repo organization

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published