Skip to content

Commit 576ab44

Browse files
Merge remote-tracking branch 'origin/master' into 231--Description-value-is-not-updated-in-JSON
2 parents ef98ee6 + e177a5e commit 576ab44

File tree

4 files changed

+126
-55
lines changed

4 files changed

+126
-55
lines changed

README.rst

Lines changed: 126 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,152 @@
11

22
Software Metadata Extraction and Curation Software (SMECS)
33
__________________________________________________________
4-
| A web application to extract and curate research software metadata following the `codemeta <https://codemeta.github.io/>`_ software metadata standard.
4+
| A web application to extract and curate research software metadata following the `CodeMeta <https://codemeta.github.io/>`_ (`version 3.0 <https://raw.githubusercontent.com/codemeta/codemeta/3.0/codemeta.jsonld>`_) software metadata standard.
55
|
6-
| SMECS facilitates the extraction of research software metadata from repositories on GitHub/GitLab. It offers a user-friendly graphical user interface for visualizing the retrieved metadata. This empowers researchers to create good metadata for their research software without reentering data which is already available elsewhere. Ultimately, SMECS delivers the curated metadata in JSON format, enhancing usability and accessibility.
6+
| SMECS facilitates the extraction of research software metadata from GitHub and GitLab repositories. It provides a user-friendly graphical interface for visualizing the retrieved metadata, enabling researchers and research software engineers to create high-quality metadata without reentering information already available elsewhere. The curated metadata is exported as CodeMeta-compliant JSON, ensuring integration with other tools and enhancing the discoverability, reuse, and impact of research software.
77
|
8+
| 📄 For more details, see our `Preprint <http://doi.org/10.48550/arXiv.2507.18159>`_.
89
|
910
| **Authors:** Stephan Ferenz, Aida Jafarbigloo
1011
|
11-
Key Stages in SMECS
12+
Phases in SMECS
1213
__________________________________________________________
13-
| The figure below illustrates the sequential processes and data flows within SMECS. First, users input data, triggering the tool to extract metadata associated with specific URLs. This metadata is then visualized, allowing users to review and interact with it. Users can curate, modify, and finalize the metadata according to their needs. Once satisfied, they can download the curated metadata in JSON format, providing an interoperable output for further use.
14+
| The workflow of SMECS consists of four sequential phases: **Start**, **Extraction**, **Curation**, and **Export**.
1415
|
15-
|
16-
.. image:: https://github.com/NFDI4Energy/SMECS/blob/master/docs/diagram.png
17-
:alt: SMECS Workflow Visualization
16+
.. image:: https://github.com/NFDI4Energy/SMECS/blob/master/docs/Extraction_via_hermes-1.png
17+
:alt: SMECS Workflow
1818
:width: 1000px
1919
|
20-
#. **Metadata Extraction Stage**
21-
* **Metadata Extraction**
22-
* SMECS extracts metadata from GitHub and GitLab repositories. For details on the specific metadata that SMECS can extract, please refer to `Metadata Terms in SMECS <https://github.com/NFDI4Energy/SMECS/blob/master/docs/metadata-terms.md>`_
23-
* **API Interactions:** Use GitHub and GitLab APIs to fetch relevant metadata.
24-
* **Data Parsing:** Analyze the retrieved metadata and translate it into CodeMeta metadata for further processing.
25-
* **Cross-Walk and Metadata Mapping**
26-
* **Standardization:** Align metadata fields from GitHub and GitLab to a common dictionary.
27-
* **Field Matching:** Map equivalent fields between GitHub and GitLab. For example, mapping GitHub "topics" to GitLab "keywords".
28-
#. **Visualization and Curation Stage**
29-
* **Visualization:** Extracted metadata is displayed in a structured form.
30-
* **User Interface:** Interactive and simple UI for exploring the extracted and curated metadata.
31-
* **Metadata Curation:** Refine the extracted metadata based on user preferences.
32-
* **Missing Metadata Identification:** Identify and highlight fields where metadata is absent.
33-
* **User Input for Missing Metadata:** Enable users to add missing metadata directly via the user interface.
34-
* **Real-Time Metadata Curation:** Enable the possibility of representing the JSON format of the metadata based on the CodeMeta standard in real time, allowing one-direction changes from form format to JSON to show real-time metadata curation.
35-
#. **Export Stage**
36-
* **Export Formats:** Save extracted and curated metadata in JSON format.
20+
21+
1. **Start Phase**
22+
__________________________________________________________
23+
In the Start phase, users provide two key inputs:
24+
- A repository link (GitHub or GitLab)
25+
- A personal access token for the corresponding platform
26+
SMECS can operate without user-provided tokens for some repositories by using internal default tokens. However:
27+
- For other GitLab instances, a user-provided token is always required.
28+
- Providing a token can enable SMECS to extract more detailed metadata from certain repositories.
29+
|
30+
2. **Extraction Phase**
31+
__________________________________________________________
32+
The Extraction phase uses `HERMES <https://github.com/softwarepub/hermes>`_ harvesting steps to retrieve metadata from multiple sources. For details on the metadata fields, see: `Metadata Terms in SMECS <https://github.com/NFDI4Energy/SMECS/blob/master/static/schema/codemeta_schema.json>`_. Once the inputs from the Start phase are submitted, SMECS initiates metadata retrieval using four HERMES harvesters:
33+
- GitHub
34+
- GitLab
35+
- CFF (`Citation File Format <https://citation-file-format.github.io/>`_)
36+
- CodeMeta
37+
GitHub and GitLab metadata are harvested via the `HERMES GitHub/GitLab plugin <https://github.com/softwarepub/hermes-plugin-github-gitlab>`_.
38+
39+
All harvested metadata are mapped to CodeMeta using existing crosswalks from CodeMeta and HERMES, plus a custom crosswalk we created for GitLab.
40+
The metadata are then processed and merged via the HERMES processing step, producing a unified set of metadata.
41+
These results are displayed in the Curation phase. The HERMES-based approach ensures an interoperable, modular architecture that makes it easy to integrate additional harvesting sources in the future.
42+
43+
|
44+
3. **Curation Phase**
45+
__________________________________________________________
46+
The Curation phase allows users to edit and refine the extracted metadata. The metadata are displayed in a form-based interface organized into four main tabs:
47+
#. General Information
48+
#. Provenance
49+
#. Related Persons
50+
#. Technical Aspects
51+
52+
Key visualization and curation features include:
53+
- **Metadata Visualization & User-Friendly Interface:** Metadata is displayed in a structured, easy-to-read format. The interface is intuitive, responsive, and allows smooth navigation through metadata fields.
54+
- **Missing Metadata Identification:** SMECS flags fields where metadata is absent.
55+
- **Required Metadata Properties:** Certain fields are marked as mandatory to ensure completeness of the final output.
56+
- **Editable Fields:** Users can directly edit or correct metadata within the interface.
57+
- **Tagging Feature:** Some fields allow multiple values for better metadata organization.
58+
- **Suggestion Lists:** For selected fields, SMECS provides suggestions to reduce manual input and ensure consistency.
59+
- **Form-to-JSON Synchronization:** Updates in the form are mirrored in the JSON view (one-directional) so users can track changes instantly.
60+
61+
62+
4. **Export Phase**
63+
_________________________________________________________
64+
In the Export phase, the curated metadata can be downloaded as a CodeMeta 3.0–compliant JSON file. Users can:
65+
- Include this file in their repository to make their research software more FAIR
66+
- Use it for other purposes, such as uploading metadata to a software registry
67+
3768
|
3869
|
3970
Installation and Usage
4071
__________________________________________________________
4172
Install from GitHub
4273
----------
4374

44-
#. Cloning the repository
45-
* Copy URL of the project from Clone with HTTPS.
46-
* Change the current working directory to the desired location.
47-
* Run ``git clone <URL>`` in command prompt. (GitBash can be used as well)
48-
#. Creating virtual environment
49-
* Make sure `Python <https://www.python.org/>`_ is installed.
50-
* Ensure you can run Python from command prompt.
51-
* On Windows: Run ``py --version``.
52-
* On Unix/MacOS: Run ``python3 --version``.
53-
* Create the virtual environment by running this code in the command prompt.
54-
* On Windows: Run ``py -m venv <name-of-virtual-environment>``.
55-
* On Unix/MacOS: Run ``python3 -m venv <name-of-virtual-environment>``.
75+
* Cloning the repository
76+
.. code-block:: shell
77+
78+
git clone https://github.com/NFDI4Energy/SMECS.git
79+
80+
* Creating virtual environment
81+
* Ensure that `Python 3.10 or higher <https://www.python.org/>`_ is installed on your system.
82+
- **Windows:** Check the version with ``py --version``.
83+
- **Unix/MacOS:** Check the version with ``python3 --version``.
84+
* Create the virtual environment.
85+
* **Windows:**
86+
.. code-block:: shell
87+
88+
py -m venv my-env
89+
90+
* **Unix/MacOS:**
91+
.. code-block:: shell
92+
93+
python3 -m venv my-env
94+
5695
| for more details visit `Creation of virtual environments <https://docs.python.org/3/library/venv.html>`_
96+
5797
* Activate virtual environment.
58-
* On Windows: Run ``env\Scripts\activate``.
59-
* On Unix/MacOS: Run ``source env/bin/activate``.
60-
env is the selected name for the virtual environment.
61-
Note that activating the virtual environment change the shell's prompt and show what virtual
62-
environment is being used.
63-
#. Managing Packages with pip
64-
* Ensure you can run pip from command prompt.
65-
* On Windows: Run ``py -m pip --version``.
66-
* On Unix/MacOS: Run ``python3 -m pip --version``.
67-
* Install a list of requirements specified in a *Requirements.txt*.
68-
* On Windows: Run ``py -m pip install -r requirements.txt``.
69-
* On Unix/MacOS: Run ``python3 -m pip install -r requirements.txt``.
98+
* **Windows:**
99+
.. code-block:: shell
100+
101+
env\Scripts\activate
102+
103+
* **Unix/MacOS:**
104+
.. code-block:: shell
105+
106+
source env/bin/activate
107+
108+
109+
(Note that activating the virtual environment change the shell's prompt and show what virtual environment is being used.)
110+
111+
* Managing Packages with pip
112+
* Ensure you can run pip from command prompt.
113+
* **Windows:**
114+
.. code-block:: shell
115+
116+
py -m pip --version
117+
118+
* **Unix/MacOS:**
119+
.. code-block:: shell
120+
121+
python3 -m pip --version
122+
123+
* Install a list of requirements specified in a *Requirements.txt*.
124+
* **Windows:**
125+
.. code-block:: shell
126+
127+
py -m pip install -r requirements.txt
128+
129+
* **Unix/MacOS:**
130+
.. code-block:: shell
131+
132+
python3 -m pip install -r requirements.txt
133+
70134
| for more details visit `Installing Packages <https://packaging.python.org/en/latest/tutorials/installing-packages/>`_
71135
|
72136
|
73-
**Running the project**
137+
* **Running the project**
138+
* Open and run the project in an editor (e.g. VS code).
139+
* Run the project.
140+
* **Windows:**
141+
.. code-block:: shell
142+
143+
py manage.py runserver
144+
145+
* **Unix/MacOS:**
146+
.. code-block:: shell
147+
148+
python3 manage.py runserver
74149
75-
* Open and run the project in an editor (e.g. VS code).
76-
* Run the project.
77-
* On Windows: Run ``py manage.py runserver``.
78-
* On Unix/MacOS: Run ``python3 manage.py runserver``.
79150
* To see the output on the browser follow the link shown in the terminal. (e.g. http://127.0.0.1:8000/)
80151
|
81152
|
@@ -129,8 +200,8 @@ Collaboration
129200
__________________________________________________________
130201
| We believe in the power of collaboration and welcome contributions from the community to enhance the SMECS workflow. Whether you have found a bug, have a feature idea, or want to share feedback, your contribution matters. Feel free to submit a pull request, open up an issue, or reach out with any questions or concerns.
131202
|
132-
To see upcoming features, please refer to our `open issues <https://github.com/NFDI4Energy/SMECS/issues?q=is%3Aopen+is%3Aissue>`_.
133-
203+
| To see upcoming features in SMECS, please refer to our `open issues <https://github.com/NFDI4Energy/SMECS/issues?q=is%3Aopen+is%3Aissue>`_.
204+
| To stay updated on upcoming changes to the `HERMES GitHub and GitLab Plugin <https://github.com/softwarepub/hermes-plugin-github-gitlab>`_, visit the `project’s issues page <https://github.com/softwarepub/hermes-plugin-github-gitlab/issues>`_. And if you have questions, suggestions, feedback, or need to report a bug, please open a new issue `there <https://github.com/softwarepub/hermes-plugin-github-gitlab/issues>`_.
134205
|
135206
License and Citation
136207
__________________________________________________________

docs/Extraction_via_hermes-1.png

97.7 KB
Loading
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)