-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
appssubmitter finial version #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SeverusYixin ,
awesome, thanks for sending this! Next time, it would be great if you could write a sentence, what this PR is about. Furthermore, I modified the sentence "closes #94" to your message. This will later close the corresponding issue, once this is merged.
May I also ask you for adding some documentation to our jupyter book? It would be great to have short user guide for your app with some screenshots. See this page as example.
Thank you!
Best,
Robert
Ofc no problem, but which kinds of the documentation did you want me to add to our jupyter book? And I have a screeunshot of the APP, where should I upload it? |
Hey @SeverusYixin , ok, this is super super cool!! May I ask for some minor modifications:
Big thanks! |
Hi @haesleinhuepf, if you can show me the current exact license list, tags list and type list it would be so helpful. Or can I just use the things that show up on the left side of https://nfdi4bioimage.github.io/training/readme.html directly? |
For the license list, let's use the SPDX list you were using in #127 (so we don't reinvent the wheel here) : training/scripts/data_normalizer.py Line 7 in f0aa351
The current lists of tags and content types can be retrieve as shown in #130 : Let me know if you need anything else! |
hmmm, that means I need retrieve the license by myself, am I right? And for https://github.com/NFDI4BIOIMAGE/training/blob/7045d4888217d6b73c53080ba42d020475dd41fd/scripts/tag_statistics.ipynb, it looks like only have the tags list, am I right? |
Yes
Yes! |
…me Users should select the license from a list (e.g. the SPDX list) Users should select the tags from a list (they can select multiple), and there should be an option to add new tags Users should select the type from a list Do not create a github issue, the pull-request is enough."
Hi @haesleinhuepf, I have a question about URL,Description,Name,Authors. if the users add nothing to these, should they still be allowed to upload, or if they didn't enter the authors, should the submit be denied? For License, Tags and Type, the user chooses at least one, and I add an unknown choice for them. The yaml file also needs to be selected as it can be submitted. |
…on. User should select license from a list (e.g. SPDX list) User should select tags from the list (multiple selections are possible) and should be given the option to add new tags The user should select the type from the list Don't create github issues, pull requests are sufficient. Change the target repository to NFDI4BioImage/training Testing” and also add documentation to our jupyter book
Hi @haesleinhuepf, me again :) Could you review this when you have a moment? And enjoy your holiday :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SeverusYixin ,
ok cool, the app looks great! I just tried it and have some minor requests:
-
It would be great if we could add tags which are not in the list of available tags:
-
make sure the changes are made on the most recent files. I just submitted a PR using the app (it worked 🥳 ) but it suggests many changes. I could imagine that this happens if my local repository is not up-to-date. Details. But we can pull the most recent file from github, before modifying the file e.g. using this code.
-
And below some suggestions in the code.
Otherwise, it's looking great!
Best,
Robert
scripts/appsubmitter.py
Outdated
Steps: | ||
1. Select the YAML file for the type of training material. | ||
2. Provide the necessary information. | ||
3. Submit the form to create a pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Steps: | |
1. Select the YAML file for the type of training material. | |
2. Provide the necessary information. | |
3. Submit the form to create a pull request. | |
Steps: | |
1. Select the YAML file for the type of training material. | |
2. Provide the necessary information. | |
3. Submit the form to create a pull request. |
I suggest to remove this. It feels like a duplication of the sentence above and form entries below.
scripts/appsubmitter.py
Outdated
"AGPL-3.0", "All-rights-reserved", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "BSD-License", "CC0", | ||
"CC0-1.0", "CC0-Licence (with some variations noted in dataset attributes)", "CC-BY-4.0", "CC-BY-NC-4.0", | ||
"CC-BY-NC-ND-3.0-DEED", "CC-BY-ND-SA-4.0", "CC-BY-SA-4.0", "GPL-2.0", "GPL-3.0", "MIT", "ODC-By-1.0", | ||
"Public-domain" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the fetch_spdx_function from here?
training/scripts/data_normalizer.py
Line 9 in d9146a5
def fetch_spdx_licenses(): |
scripts/appsubmitter.py
Outdated
"Arc", "Artificial Intelligence", "Big Data", "Bio-Image Analysis", "Bioimage Analysis", "Bioimage Data", | ||
"Bioinformatics", "Cellpose", "Cellprofiler", "Citing", "Clesperanto", "Conda", "Dask", "Data Analysis", | ||
"Data Protection", "Data Science", "Data Stewardship", "Dataplant", "Deep Learning", "Deployment", | ||
"Diffusion Models", "Elastix", "Fair", "Fair-Principles", "Fiji", "Flim", "Git", "Github", "Gpu", | ||
"Hackathon", "I3Dbio", "Image Data Management", "Image Registration", "Image Segmentation", "Imagej", | ||
"Imagej Macro", "Itk", "Large Language Models", "Licensing", "Machine Learning", "Mamba", "Meta Data", | ||
"Metadata", "Microscopy Image Analysis", "Napari", "Neubias", "Nfdi4Bioimage", "Notebook", "Notebooks", | ||
"Omero", "Open Access", "Open Science", "Python", "Pytorch", "Quareo-Limi", "R", "Research Data Management", | ||
"Science Communication", "Segmentation", "Sharing", "Teaching", "Tu Dresden", "Workflow", "Workflow Engine", | ||
"Zarr", "Zenodo" | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we determine this list dynamically from the yml files? E.g. as shown in the notebook here:
scripts/appsubmitter.py
Outdated
"Big Data", "Bioimage Analysis", "Blog", "Book", "Code", "Collection", "Conference Abstract", "Data", | ||
"Dataset", "Document", "Documentation", "Event", "Forum Post", "Github Repository", "Jupyter Book", | ||
"Notebook", "Online Course", "Online Tutorial", "Poster", "Practicals", "Preprint", "Presentation", | ||
"Publication", "Publiction", "Python", "Slide", "Slides", "Tutorial", "Video", "Videos", "Workshop" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. This list should be determined dynamically from the yml files.
scripts/appsubmitter.py
Outdated
|
||
# List of YAML files | ||
yaml_list = sorted([ | ||
'blog_posts.yml', 'events.yml', 'materials.yml', 'nfdi4bioimage.yml', 'papers.yml', 'workflow-tools.yml', 'youtube_channels.yml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we determine this list using os.listdir?
Hi @haesleinhuepf , me again :). If you have a moment, you can run this code for me to see if it works. Especially if the yml files are merged, does it still work on your side, so far no problem on mine. Also: One question: |
It seems I don't have permission to change the data source? Would you mind changing it for me? |
I'm not sure what you mean with "data source". Please let me know what exactly should be changed. Consider using http-links to point at code pecisely... |
I mean for this: It seems I'm not allowed to update it. |
ok, done |
Thank you :) |
…MAGE/training into app_for_adding_entries
Hi @haesleinhuepf, I have already updated it with the docstring. Take your time and check it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SeverusYixin ,
thanks for working on this! It would be great if you could reduce code-duplication, before we merge this. Instead of repeating similar functionality, consider re-using functions. This will result in much shorter code, which is easier to maintain obviously.
Thanks!
Best,
Robert
docs/contributing/index_2.md
Outdated
# How to contribute | ||
|
||
This repository contains lists of training materials. It is extensible using GitHub pull requests. You can find a how-to guide at the bottom of this page. The format for entries in the repository is documented on the next page. | ||
|
||
## Quick contributing shortcut: | ||
|
||
If you're too busy to enter everything in detail yourself, please create a [GitHub issue](https://github.com/NFDI4BIOIMAGE/training/issues) with a link to the materials you want to include in our list. We can take care of all the details. | ||
|
||
## What to contribute | ||
|
||
Consider adding your favorite training materials and resources. If you know a collection of resources, add it, but do not add all individual entries of the collection. We are working on collecting them automatically ([more information](https://github.com/NFDI4BIOIMAGE/training/issues/2)). However, if there are specific entries in such a collection that you think are particularly valuable, feel free to add them now. | ||
|
||
## Inclusion criteria | ||
|
||
We will consider merging links to all educative materials related to research data management, especially in the bio-imaging context, and bio-image analysis. | ||
|
||
We would like to collect links to resources in various formats/content types, including slides, posters, publications, blog posts, example data, collections (of links to other materials), and more. | ||
|
||
## Exclusion criteria | ||
|
||
We will only merge links to materials behind a paywall in exceptional cases. We will also not merge links to materials that primarily advertise commercial products. However, if there are openly accessible training resources for commercial software, we welcome links to these resources. | ||
|
||
## Maintenance of contributions | ||
|
||
We reserve the right to remove and modify entries in this collection at any time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicated text from the page before. Please only document your AppSubmitter in this page.
scripts/appsubmitter.py
Outdated
def load_yaml_data(file_path): | ||
""" | ||
Load YAML data from a file. | ||
|
||
Args: | ||
file_path (str): Path to the YAML file. | ||
|
||
Returns: | ||
dict: Parsed YAML data. | ||
""" | ||
with open(file_path, 'r') as file: | ||
return yaml.safe_load(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you not using this function?
training/scripts/generate_link_lists.py
Line 117 in e19da86
def read_yaml_file(filename): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/appsubmitter.py
Outdated
return sorted(unique_tags), sorted(unique_types), sorted(unique_licenses) | ||
|
||
|
||
def get_yaml_files(resources_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you not using this function?
training/scripts/generate_link_lists.py
Line 97 in e19da86
def load_dataframe(directory_path): |
It gives you a DataFrame with all contents, which you can filter for unique entries in specific columns: df["license"].unique()
scripts/appsubmitter.py
Outdated
def authenticate_with_github(): | ||
""" | ||
Authenticate with GitHub using the API key from environment variables. | ||
|
||
Returns: | ||
tuple: Authenticated GitHub instance and repository object. | ||
|
||
Raises: | ||
Exception: If authentication fails. | ||
""" | ||
GITHUB_API_KEY = os.getenv('GITHUB_API_KEY') | ||
if not GITHUB_API_KEY: | ||
st.error("GitHub API Key is not set in the environment variables.") | ||
st.stop() | ||
|
||
try: | ||
g = Github(GITHUB_API_KEY) | ||
repo = g.get_repo("NFDI4BIOIMAGE/training") | ||
except Exception as e: | ||
st.error(f"Failed to authenticate with GitHub: {e}") | ||
st.stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using this function instead:
training/scripts/_github_utilities.py
Line 8 in e19da86
def get_github_repository(repository): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see your changes here until you commit and push them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll push it after I'm done: "It gives you a DataFrame with all content, which you can filter for unique entries in specific columns: df["license"].unique()"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see your changes here until you commit and push them
I marked it to told myself this is finished XD
import yaml | ||
import os | ||
|
||
def load_yaml(file_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, consider using pre-existing functions instead of duplicating code.
def extract_tags(yaml_content): | ||
tags = set() | ||
for item in yaml_content.get('resources', []): | ||
if 'tags' in item: | ||
tags.update(item['tags']) | ||
return tags | ||
|
||
def extract_licenses(yaml_content): | ||
licenses = set() | ||
for item in yaml_content.get('resources', []): | ||
if 'license' in item: | ||
license_field = item['license'] | ||
if isinstance(license_field, list): | ||
licenses.update(license_field) | ||
else: | ||
licenses.add(license_field) | ||
return licenses | ||
|
||
def extract_types(yaml_content): | ||
types = set() | ||
for item in yaml_content.get('resources', []): | ||
if 'type' in item: | ||
types.update(item['type']) | ||
return types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, consider re-using pre-existing functions, e.g. for retrieving a dataframe and then determine unique items in columns.
def collect_tags_licenses_and_types_from_files(directory): | ||
all_tags = set() | ||
all_licenses = set() | ||
all_types = set() | ||
if os.path.exists(directory): | ||
for file_name in os.listdir(directory): | ||
if file_name.endswith('.yaml') or file_name.endswith('.yml'): | ||
file_path = os.path.join(directory, file_name) | ||
yaml_content = load_yaml(file_path) | ||
all_tags.update(extract_tags(yaml_content)) | ||
all_licenses.update(extract_licenses(yaml_content)) | ||
all_types.update(extract_types(yaml_content)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, reuse code instead of implementing the same thing multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh, you can delete this script, or I delete it. We don't need it anymore, I mean the scripts tags_type_licence_retrieve.py, I have already integrated it into the appsubmitter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete it. It's your branch.
tags_output_file = os.path.abspath(os.path.join(os.path.dirname(__file__), 'tags.txt')) | ||
licenses_output_file = os.path.abspath(os.path.join(os.path.dirname(__file__), 'licenses.txt')) | ||
types_output_file = os.path.abspath(os.path.join(os.path.dirname(__file__), 'types.txt')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three text files, what do we need them for?
… "def load_dataframe(directory_path):" and "def read_yaml_file(filename): "
I have updated them all, so you can check them out now if you have time. :) |
This PR also close #193 |
Hi@haesleinhuepf, you can check it here, I have finished it |
… change everything Result example here: https://github.com/NFDI4BIOIMAGE/training/pull/198/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @SeverusYixin ,
thanks for working on this! I left some comments below. Note: I will do these final modifications before merging this.
Thanks again!
docs/_toc.yml
Outdated
@@ -7,6 +7,7 @@ parts: | |||
- caption: Contributing | |||
chapters: | |||
- file: contributing/index | |||
- file: contributing/index_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to rename this file. "index_2" is not meaningful.
@@ -89,11 +89,11 @@ def all_content(directory_path): | |||
for filename in os.listdir(directory_path): | |||
if filename.endswith('.yml'): | |||
print("Adding", filename) | |||
new_content = read_yaml_file(directory_path + filename) | |||
new_content = read_yaml_file(os.path.join(directory_path, filename)) # Corrected line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a comment "# Corrected line" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That time, I tried this code to solve the problem, and write a comment to tell you. With this update, it will work again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not spread such comments in the code, they will accumulate over time and it will be hard to read. Consider using permalinks instead, like this one:
https://github.com/NFDI4BIOIMAGE/training/blob/04cd328841d9b251e7eaf79f3b3b78ff94329687/scripts/generate_link_lists.py#L92
Views like this:
training/scripts/generate_link_lists.py
Line 92 in 04cd328
new_content = read_yaml_file(os.path.join(directory_path, filename)) # Corrected line |
And is made by clicking here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will create a branch to delete this comment
scripts/appsubmitter.py
Outdated
def all_content(directory_path): | ||
""" | ||
Load all YAML data from a directory into a list of dictionaries. | ||
|
||
Args: | ||
directory_path (str): Path to the directory containing YAML files. | ||
|
||
Returns: | ||
dict: Dictionary with a key 'resources' containing a list of all resources from YAML files. | ||
""" | ||
resources = [] | ||
for yaml_file in os.listdir(directory_path): | ||
if yaml_file.endswith('.yml'): | ||
yaml_data = read_yaml_file(os.path.join(directory_path, yaml_file)) | ||
if 'resources' in yaml_data: | ||
resources.extend(yaml_data['resources']) | ||
return {'resources': resources} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not used. I will remove it.
scripts/appsubmitter.py
Outdated
from _github_utilities import get_github_repository | ||
|
||
# Correct the directory path for resources | ||
directory_path = os.path.join('..', 'resources') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only work if you call the script from within the path "scripts/". You can determine the path, where appsubmitter.py is located using __file__
. I will implement this for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it, thank you:)
This webapp program contains a form (urls, author, description, license etc) and once the user clicks "submit", all things can be easily uploaded to github.
To protect the security, the Github API key must be updated every 30 days.
This PR closes #94