A collection of code, methodologies, and case studies for analyzing the pathways from Open Science practices to scientific and societal impact.
The PathOS (Pathways to Open Science) project investigates how Open Science behaviors translate into measurable downstream impacts across different research domains. This repository contains the computational tools, analysis scripts, and case study implementations developed as part of the PathOS research initiative.
pathos-toolkit/
├── README.md # This file
├── case-studies/ # Individual case study implementations
│ ├── covid-19/ # Impact of Artefact Reuse in COVID-19 Publications
│ ├── french-case-study/ # Open Science Access Analysis via Connection Logs
│ ├── persistent-topics/ # Impact of Open Access Routes on Topic Persistence
│ ├── repository-effect/ # Effects of Data Repositories on Data Usage
│ └── [additional-studies]/ # Future case studies
├── case-study-template/ # Template files for new case studies
│ ├── README.md # Case study documentation template
│ ├── DATA_ACCESS.md # Data access template
│ └── TEMPLATE_GUIDE.md # Instructions for using templates
└── [shared-tools]/ # Common utilities and methodologies (future)
- Location:
case-studies/covid-19/ - Focus: Relationship between research artifact reuse and clinical impact in COVID-19 research
- Sample: 115,467 COVID-19 papers that created research artifacts
- Key Finding: Papers with evidence of artifact reuse achieve significantly greater downstream clinical impact
- Location:
case-studies/french-case-study/ - Focus: Analysis of access patterns to open and closed scientific publications using HAL and OpenEdition connection logs
- Sample: One year of connection logs from HAL (Sept 2023-Aug 2024) and OpenEdition journals (Jan-Dec 2023)
- Key Innovation: Log Explorer web application for investigating Open Access Advantage across academic disciplines, socio-economic sectors, and countries
- Methodology: Enriched connection logs with DOI-based resource matching (OpenAlex) and IP-based user classification (IPinfo + Llama 3.3 NACE classification)
- Location:
case-studies/persistent-topics/ - Focus: Effects of Open Access routes on topic persistence in AI-for-Climate research
- Sample: 132,134 papers from emerging research topics (2000-2021)
- Key Innovation: Novel "topic persistence" metric for measuring long-term scientific relevance
- Location:
case-studies/repository-effect/ - Focus: Impact of data repositories on subsequent data reuse and scientific impact in the Social Sciences and Humanities (SSH)
- Sample: Mixed-method study combining (1) quantitative data citation corpus analysis (278,922 datasets), (2) algorithmic and manual data mention extraction (162 SSH publications), and (3) qualitative interviews with SSH researchers
- Key Finding: Data reuse likelihood varies across repositories — specialised and curated repositories show higher reuse, while algorithmic detection of reuse in SSH is often unreliable due to low precision
This repository follows a clear separation of concerns:
- GitHub Repository: Contains code, analysis scripts, documentation, and results
- Zenodo Deposits: Contains processed datasets with permanent DOIs for citation and reuse
- External Sources: Large-scale databases (Semantic Scholar, OpenAIRE, etc.) accessed separately
Each case study includes a DATA_ACCESS.md file with specific information about data availability and access requirements.
- Copy the templates from
case-study-template/to create your case study folder - Implement your analysis following the established patterns from existing case studies
- Navigate to the specific case study of interest
- Check the
DATA_ACCESS.mdfile for data availability and requirements - Follow the usage instructions in the case study's
README.md
- Examine the analysis approaches used across case studies
- Identify common patterns and methodological innovations
- Consider contributing shared tools and utilities
The PathOS toolkit develops and implements several methodological innovations:
- Causal inference approaches for studying Open Science impacts
- Novel impact metrics beyond traditional bibliometric measures
- Clean treatment definitions for different Open Science practices
- Reproducible analysis pipelines for large-scale bibliometric studies
- Connection log analysis for measuring real-time access patterns to scientific publications
- Open Access Advantage metrics for quantifying differential access to open vs. closed publications
- AI-powered classification systems for automated socio-economic sector identification
Most PathOS case studies utilize these external data sources:
- Semantic Scholar Academic Graph: Paper metadata and citation networks
- OpenAIRE Graph: European research infrastructure data
- PATSTAT: Patent citation data (commercial license)
- ROR: Research organization classifications
- SciNoBo Toolkit: Specialized bibliometric indicators
- HAL & OpenEdition: Connection log data (requires authorization)
- OpenAlex: Publication metadata and DOI matching
- IPinfo Academic Research Program: IP geolocation and organization data
- Eurostat NACE Rev. 2.1: Economic activity classification system
See individual case study documentation for specific requirements.
- Create a new folder in
case-studies/ - Use the templates from
case-study-template/ - Follow the established documentation and data management patterns
- Ensure your analysis scripts are well-documented and reproducible
- Fork the repository
- Make improvements to analysis, documentation, or reproducibility
- Submit a pull request with clear description of changes
- Consider extracting common functionality into shared utilities
- Ensure tools are well-documented and tested
- Submit contributions that benefit multiple case studies
For questions about the PathOS project or this toolkit:
- Open an issue in this GitHub repository
- Contact the PathOS project team [add specific contact information]