Skip to content

NFDI4Energy/SMECS

Repository files navigation

Software Metadata Extraction and Curation Software (SMECS)

A web application to extract and curate research software metadata following the CodeMeta (version 3.0) software metadata standard.

SMECS facilitates the extraction of research software metadata from GitHub and GitLab repositories. It provides a user-friendly graphical interface for visualizing the retrieved metadata, enabling researchers and research software engineers to create high-quality metadata without reentering information already available elsewhere. The curated metadata is exported as CodeMeta-compliant JSON, ensuring integration with other tools and enhancing the discoverability, reuse, and impact of research software.

📄 For more details, see our Preprint.

Authors: Stephan Ferenz, Aida Jafarbigloo

Phases in SMECS

The workflow of SMECS consists of four sequential phases: Start, Extraction, Curation, and Export.

SMECS Workflow


1. Start Phase

In the Start phase, users provide two key inputs:
  • A repository link (GitHub or GitLab)
  • A personal access token for the corresponding platform
SMECS can operate without user-provided tokens for some repositories by using internal default tokens. However:
  • For other GitLab instances, a user-provided token is always required.
  • Providing a token can enable SMECS to extract more detailed metadata from certain repositories.

2. Extraction Phase

The Extraction phase uses HERMES harvesting steps to retrieve metadata from multiple sources. For details on the metadata fields, see: Metadata Terms in SMECS. Once the inputs from the Start phase are submitted, SMECS initiates metadata retrieval using four HERMES harvesters:

GitHub and GitLab metadata are harvested via the HERMES GitHub/GitLab plugin.

All harvested metadata are mapped to CodeMeta using existing crosswalks from CodeMeta and HERMES, plus a custom crosswalk we created for GitLab. The metadata are then processed and merged via the HERMES processing step, producing a unified set of metadata. These results are displayed in the Curation phase. The HERMES-based approach ensures an interoperable, modular architecture that makes it easy to integrate additional harvesting sources in the future.


3. Curation Phase

The Curation phase allows users to edit and refine the extracted metadata. The metadata are displayed in a form-based interface organized into four main tabs:
  1. General Information
  2. Provenance
  3. Related Persons
  4. Technical Aspects
Key visualization and curation features include:
  • Metadata Visualization & User-Friendly Interface: Metadata is displayed in a structured, easy-to-read format. The interface is intuitive, responsive, and allows smooth navigation through metadata fields.
  • Missing Metadata Identification: SMECS flags fields where metadata is absent.
  • Required Metadata Properties: Certain fields are marked as mandatory to ensure completeness of the final output.
  • Editable Fields: Users can directly edit or correct metadata within the interface.
  • Tagging Feature: Some fields allow multiple values for better metadata organization.
  • Suggestion Lists: For selected fields, SMECS provides suggestions to reduce manual input and ensure consistency.
  • Form-to-JSON Synchronization: Updates in the form are mirrored in the JSON view (one-directional) so users can track changes instantly.

4. Export Phase

In the Export phase, the curated metadata can be downloaded as a CodeMeta 3.0–compliant JSON file. Users can:
  • Include this file in their repository to make their research software more FAIR
  • Use it for other purposes, such as uploading metadata to a software registry


Installation and Usage

Install from GitHub

  • Cloning the repository
git clone https://github.com/NFDI4Energy/SMECS.git
  • Creating virtual environment
    • Ensure that Python 3.10 or higher is installed on your system.
      • Windows: Check the version with py --version.
      • Unix/MacOS: Check the version with python3 --version.
    • Create the virtual environment.
      • Windows:
      py -m venv my-env
      • Unix/MacOS:
      python3 -m venv my-env
      for more details visit Creation of virtual environments
    • Activate virtual environment.
      • Windows:
      env\Scripts\activate
      • Unix/MacOS:
      source env/bin/activate

      (Note that activating the virtual environment change the shell's prompt and show what virtual environment is being used.)

  • Managing Packages with pip
    • Ensure you can run pip from command prompt.
      • Windows:
      py -m pip --version
      • Unix/MacOS:
      python3 -m pip --version
    • Install a list of requirements specified in a Requirements.txt.
      • Windows:
      py -m pip install -r requirements.txt
      • Unix/MacOS:
      python3 -m pip install -r requirements.txt
    for more details visit Installing Packages


  • Running the project
    • Open and run the project in an editor (e.g. VS code).

    • Run the project.
      • Windows:
      py manage.py runserver
      • Unix/MacOS:
      python3 manage.py runserver
  • To see the output on the browser follow the link shown in the terminal. (e.g. http://127.0.0.1:8000/)



Install through Docker

To get started with SMECS using Docker, follow the steps below:

  • Prerequisites
    • Make sure Docker is installed on your local machine.
  • Cloning the Repository
git clone https://github.com/NFDI4Energy/SMECS.git
  • Navigate to the Project Directory
cd smecs
  • Building the Docker Images
docker-compose build
  • Starting the Services
docker-compose up
  • Accessing the Application
    • Navigate to http://localhost:8000 in your web browser.
  • Stopping the Services
docker-compose down

Setting Up GitLab/GitHub Personal Token
To enhance the functionality of this program and ensure secure interactions with the GitLab/GitHub API, users are required to provide their personal access token. Follow these steps to integrate your token:

Tip for developers
If the page does not refresh correctly, clear the browser cache. You can force Chrome to pull in new data and ignore the saved ("cached") data by using the keyboard shortcut Cmd+Shift+R on Mac, and Ctrl+F5 or Ctrl+Shift+R on Windows.

Collaboration

We believe in the power of collaboration and welcome contributions from the community to enhance the SMECS workflow. Whether you have found a bug, have a feature idea, or want to share feedback, your contribution matters. Feel free to submit a pull request, open up an issue, or reach out with any questions or concerns.

To see upcoming features in SMECS, please refer to our open issues.
To stay updated on upcoming changes to the HERMES GitHub and GitLab Plugin, visit the project’s issues page. And if you have questions, suggestions, feedback, or need to report a bug, please open a new issue there.

License and Citation

The code is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
See LICENSE.txt for further information.

Acknowledgements

We would like to thank meta_tool for providing the foundational framework upon which this project is built.

Packages

No packages published

Contributors 9

Languages