Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appssubmitter finial version #128

Merged
merged 24 commits into from
Sep 9, 2024
Merged

appssubmitter finial version #128

merged 24 commits into from
Sep 9, 2024

Conversation

SeverusYixin
Copy link
Collaborator

@SeverusYixin SeverusYixin commented Jul 19, 2024

This webapp program contains a form (urls, author, description, license etc) and once the user clicks "submit", all things can be easily uploaded to github.

To protect the security, the Github API key must be updated every 30 days.

This PR closes #94

Copy link
Member

@haesleinhuepf haesleinhuepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SeverusYixin ,

awesome, thanks for sending this! Next time, it would be great if you could write a sentence, what this PR is about. Furthermore, I modified the sentence "closes #94" to your message. This will later close the corresponding issue, once this is merged.

May I also ask you for adding some documentation to our jupyter book? It would be great to have short user guide for your app with some screenshots. See this page as example.

Thank you!

Best,
Robert

@SeverusYixin
Copy link
Collaborator Author

Hi @SeverusYixin ,

awesome, thanks for sending this! Next time, it would be great if you could write a sentence, what this PR is about. Furthermore, I modified the sentence "closes #94" to your message. This will later close the corresponding issue, once this is merged.

May I also ask you for adding some documentation to our jupyter book? It would be great to have short user guide for your app with some screenshots. See this page as example.

Thank you!

Best, Robert

Ofc no problem, but which kinds of the documentation did you want me to add to our jupyter book? And I have a screeunshot of the APP, where should I upload it?
appsubmitter

@haesleinhuepf
Copy link
Member

where should I upload it?

As a new page next to this one, certainly in the same folder.

@haesleinhuepf
Copy link
Member

haesleinhuepf commented Jul 23, 2024

Hey @SeverusYixin ,

ok, this is super super cool!!

May I ask for some minor modifications:

  • Remove the submitter name
  • Users should select the license from a list (e.g. the SPDX list)
  • Users should select the tags from a list (they can select multiple), and there should be an option to add new tags
  • Users should select the type from a list
  • Do not create a github issue, the pull-request is enough.
  • Change the target repository to NFDI4BioImage/training
  • Test

Big thanks!

@SeverusYixin
Copy link
Collaborator Author

Hi @haesleinhuepf, if you can show me the current exact license list, tags list and type list it would be so helpful. Or can I just use the things that show up on the left side of https://nfdi4bioimage.github.io/training/readme.html directly?

@haesleinhuepf
Copy link
Member

@SeverusYixin

For the license list, let's use the SPDX list you were using in #127 (so we don't reinvent the wheel here) :

SPDX_LICENSE_LIST_URL = "https://spdx.org/licenses/licenses.json"

The current lists of tags and content types can be retrieve as shown in #130 :
https://github.com/NFDI4BIOIMAGE/training/blob/7045d4888217d6b73c53080ba42d020475dd41fd/scripts/tag_statistics.ipynb

Let me know if you need anything else!

@SeverusYixin
Copy link
Collaborator Author

@SeverusYixin

For the license list, let's use the SPDX list you were using in #127 (so we don't reinvent the wheel here) :

SPDX_LICENSE_LIST_URL = "https://spdx.org/licenses/licenses.json"

The current lists of tags and content types can be retrieve as shown in #130 : https://github.com/NFDI4BIOIMAGE/training/blob/7045d4888217d6b73c53080ba42d020475dd41fd/scripts/tag_statistics.ipynb

Let me know if you need anything else!

hmmm, that means I need retrieve the license by myself, am I right?

And for https://github.com/NFDI4BIOIMAGE/training/blob/7045d4888217d6b73c53080ba42d020475dd41fd/scripts/tag_statistics.ipynb, it looks like only have the tags list, am I right?

@haesleinhuepf
Copy link
Member

hmmm, that means I need retrieve the license by myself, am I right?

Yes

it looks like only have the tags list, am I right?

Yes!

…me Users should select the license from a list (e.g. the SPDX list) Users should select the tags from a list (they can select multiple), and there should be an option to add new tags Users should select the type from a list Do not create a github issue, the pull-request is enough."
@SeverusYixin
Copy link
Collaborator Author

Hi @haesleinhuepf, I have a question about URL,Description,Name,Authors. if the users add nothing to these, should they still be allowed to upload, or if they didn't enter the authors, should the submit be denied? For License, Tags and Type, the user chooses at least one, and I add an unknown choice for them. The yaml file also needs to be selected as it can be submitted.

…on. User should select license from a list (e.g. SPDX list) User should select tags from the list (multiple selections are possible) and should be given the option to add new tags The user should select the type from the list Don't create github issues, pull requests are sufficient. Change the target repository to NFDI4BioImage/training Testing” and also add documentation to our jupyter book
@SeverusYixin
Copy link
Collaborator Author

Hi @haesleinhuepf, me again :) Could you review this when you have a moment? And enjoy your holiday :)

Copy link
Member

@haesleinhuepf haesleinhuepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SeverusYixin ,

ok cool, the app looks great! I just tried it and have some minor requests:

  • It would be great if we could add tags which are not in the list of available tags:
    image

  • make sure the changes are made on the most recent files. I just submitted a PR using the app (it worked 🥳 ) but it suggests many changes. I could imagine that this happens if my local repository is not up-to-date. Details. But we can pull the most recent file from github, before modifying the file e.g. using this code.

  • And below some suggestions in the code.

Otherwise, it's looking great!

Best,
Robert

Comment on lines 106 to 109
Steps:
1. Select the YAML file for the type of training material.
2. Provide the necessary information.
3. Submit the form to create a pull request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Steps:
1. Select the YAML file for the type of training material.
2. Provide the necessary information.
3. Submit the form to create a pull request.
Steps:
1. Select the YAML file for the type of training material.
2. Provide the necessary information.
3. Submit the form to create a pull request.

I suggest to remove this. It feels like a duplication of the sentence above and form entries below.

Comment on lines 116 to 119
"AGPL-3.0", "All-rights-reserved", "Apache-2.0", "BSD-2-Clause", "BSD-3-Clause", "BSD-License", "CC0",
"CC0-1.0", "CC0-Licence (with some variations noted in dataset attributes)", "CC-BY-4.0", "CC-BY-NC-4.0",
"CC-BY-NC-ND-3.0-DEED", "CC-BY-ND-SA-4.0", "CC-BY-SA-4.0", "GPL-2.0", "GPL-3.0", "MIT", "ODC-By-1.0",
"Public-domain"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the fetch_spdx_function from here?

def fetch_spdx_licenses():

Comment on lines 125 to 135
"Arc", "Artificial Intelligence", "Big Data", "Bio-Image Analysis", "Bioimage Analysis", "Bioimage Data",
"Bioinformatics", "Cellpose", "Cellprofiler", "Citing", "Clesperanto", "Conda", "Dask", "Data Analysis",
"Data Protection", "Data Science", "Data Stewardship", "Dataplant", "Deep Learning", "Deployment",
"Diffusion Models", "Elastix", "Fair", "Fair-Principles", "Fiji", "Flim", "Git", "Github", "Gpu",
"Hackathon", "I3Dbio", "Image Data Management", "Image Registration", "Image Segmentation", "Imagej",
"Imagej Macro", "Itk", "Large Language Models", "Licensing", "Machine Learning", "Mamba", "Meta Data",
"Metadata", "Microscopy Image Analysis", "Napari", "Neubias", "Nfdi4Bioimage", "Notebook", "Notebooks",
"Omero", "Open Access", "Open Science", "Python", "Pytorch", "Quareo-Limi", "R", "Research Data Management",
"Science Communication", "Segmentation", "Sharing", "Teaching", "Tu Dresden", "Workflow", "Workflow Engine",
"Zarr", "Zenodo"
])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we determine this list dynamically from the yml files? E.g. as shown in the notebook here:

https://github.com/NFDI4BIOIMAGE/training/blob/7045d4888217d6b73c53080ba42d020475dd41fd/scripts/tag_statistics.ipynb

Comment on lines 140 to 143
"Big Data", "Bioimage Analysis", "Blog", "Book", "Code", "Collection", "Conference Abstract", "Data",
"Dataset", "Document", "Documentation", "Event", "Forum Post", "Github Repository", "Jupyter Book",
"Notebook", "Online Course", "Online Tutorial", "Poster", "Practicals", "Preprint", "Presentation",
"Publication", "Publiction", "Python", "Slide", "Slides", "Tutorial", "Video", "Videos", "Workshop"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. This list should be determined dynamically from the yml files.


# List of YAML files
yaml_list = sorted([
'blog_posts.yml', 'events.yml', 'materials.yml', 'nfdi4bioimage.yml', 'papers.yml', 'workflow-tools.yml', 'youtube_channels.yml'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we determine this list using os.listdir?

@SeverusYixin
Copy link
Collaborator Author

Hi @haesleinhuepf , me again :). If you have a moment, you can run this code for me to see if it works. Especially if the yml files are merged, does it still work on your side, so far no problem on mine.

Also:
"normalisation_mapping = {}" function, will be removed after I update data_normalizer.py.

One question:
Since we have merged all the yml files, do we still need the 'YAML file selecter'? I think we can export it directly to the specified folder or file.

@SeverusYixin
Copy link
Collaborator Author

It seems I don't have permission to change the data source? Would you mind changing it for me?

@haesleinhuepf
Copy link
Member

I'm not sure what you mean with "data source". Please let me know what exactly should be changed. Consider using http-links to point at code pecisely...

@SeverusYixin
Copy link
Collaborator Author

I'm not sure what you mean with "data source". Please let me know what exactly should be changed. Consider using http-links to point at code pecisely...

I mean for this:
"This branch has conflicts that must be resolved
Use the web editor or the to resolve conflicts.
Conflicting files
resources/nfdi4bioimage.yml".

It seems I'm not allowed to update it.

@haesleinhuepf
Copy link
Member

ok, done

@SeverusYixin
Copy link
Collaborator Author

ok, done

Thank you :)

@SeverusYixin
Copy link
Collaborator Author

Hi @haesleinhuepf, I have already updated it with the docstring. Take your time and check it out.

Copy link
Member

@haesleinhuepf haesleinhuepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SeverusYixin ,

thanks for working on this! It would be great if you could reduce code-duplication, before we merge this. Instead of repeating similar functionality, consider re-using functions. This will result in much shorter code, which is easier to maintain obviously.

Thanks!

Best,
Robert

Comment on lines 2 to 26
# How to contribute

This repository contains lists of training materials. It is extensible using GitHub pull requests. You can find a how-to guide at the bottom of this page. The format for entries in the repository is documented on the next page.

## Quick contributing shortcut:

If you're too busy to enter everything in detail yourself, please create a [GitHub issue](https://github.com/NFDI4BIOIMAGE/training/issues) with a link to the materials you want to include in our list. We can take care of all the details.

## What to contribute

Consider adding your favorite training materials and resources. If you know a collection of resources, add it, but do not add all individual entries of the collection. We are working on collecting them automatically ([more information](https://github.com/NFDI4BIOIMAGE/training/issues/2)). However, if there are specific entries in such a collection that you think are particularly valuable, feel free to add them now.

## Inclusion criteria

We will consider merging links to all educative materials related to research data management, especially in the bio-imaging context, and bio-image analysis.

We would like to collect links to resources in various formats/content types, including slides, posters, publications, blog posts, example data, collections (of links to other materials), and more.

## Exclusion criteria

We will only merge links to materials behind a paywall in exceptional cases. We will also not merge links to materials that primarily advertise commercial products. However, if there are openly accessible training resources for commercial software, we welcome links to these resources.

## Maintenance of contributions

We reserve the right to remove and modify entries in this collection at any time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicated text from the page before. Please only document your AppSubmitter in this page.

Comment on lines 8 to 19
def load_yaml_data(file_path):
"""
Load YAML data from a file.

Args:
file_path (str): Path to the YAML file.

Returns:
dict: Parsed YAML data.
"""
with open(file_path, 'r') as file:
return yaml.safe_load(file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using this function?

def read_yaml_file(filename):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return sorted(unique_tags), sorted(unique_types), sorted(unique_licenses)


def get_yaml_files(resources_dir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using this function?

def load_dataframe(directory_path):

It gives you a DataFrame with all contents, which you can filter for unique entries in specific columns: df["license"].unique()

Comment on lines 77 to 97
def authenticate_with_github():
"""
Authenticate with GitHub using the API key from environment variables.

Returns:
tuple: Authenticated GitHub instance and repository object.

Raises:
Exception: If authentication fails.
"""
GITHUB_API_KEY = os.getenv('GITHUB_API_KEY')
if not GITHUB_API_KEY:
st.error("GitHub API Key is not set in the environment variables.")
st.stop()

try:
g = Github(GITHUB_API_KEY)
repo = g.get_repo("NFDI4BIOIMAGE/training")
except Exception as e:
st.error(f"Failed to authenticate with GitHub: {e}")
st.stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using this function instead:

def get_github_repository(repository):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see your changes here until you commit and push them

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll push it after I'm done: "It gives you a DataFrame with all content, which you can filter for unique entries in specific columns: df["license"].unique()"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see your changes here until you commit and push them

I marked it to told myself this is finished XD

import yaml
import os

def load_yaml(file_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, consider using pre-existing functions instead of duplicating code.

Comment on lines 8 to 31
def extract_tags(yaml_content):
tags = set()
for item in yaml_content.get('resources', []):
if 'tags' in item:
tags.update(item['tags'])
return tags

def extract_licenses(yaml_content):
licenses = set()
for item in yaml_content.get('resources', []):
if 'license' in item:
license_field = item['license']
if isinstance(license_field, list):
licenses.update(license_field)
else:
licenses.add(license_field)
return licenses

def extract_types(yaml_content):
types = set()
for item in yaml_content.get('resources', []):
if 'type' in item:
types.update(item['type'])
return types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, consider re-using pre-existing functions, e.g. for retrieving a dataframe and then determine unique items in columns.

Comment on lines 33 to 44
def collect_tags_licenses_and_types_from_files(directory):
all_tags = set()
all_licenses = set()
all_types = set()
if os.path.exists(directory):
for file_name in os.listdir(directory):
if file_name.endswith('.yaml') or file_name.endswith('.yml'):
file_path = os.path.join(directory, file_name)
yaml_content = load_yaml(file_path)
all_tags.update(extract_tags(yaml_content))
all_licenses.update(extract_licenses(yaml_content))
all_types.update(extract_types(yaml_content))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, reuse code instead of implementing the same thing multiple times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, you can delete this script, or I delete it. We don't need it anymore, I mean the scripts tags_type_licence_retrieve.py, I have already integrated it into the appsubmitter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete it. It's your branch.

Comment on lines 57 to 59
tags_output_file = os.path.abspath(os.path.join(os.path.dirname(__file__), 'tags.txt'))
licenses_output_file = os.path.abspath(os.path.join(os.path.dirname(__file__), 'licenses.txt'))
types_output_file = os.path.abspath(os.path.join(os.path.dirname(__file__), 'types.txt'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three text files, what do we need them for?

@SeverusYixin
Copy link
Collaborator Author

I have updated them all, so you can check them out now if you have time. :)

@SeverusYixin
Copy link
Collaborator Author

This PR also close #193

@SeverusYixin
Copy link
Collaborator Author

Hi@haesleinhuepf, you can check it here, I have finished it

Copy link
Member

@haesleinhuepf haesleinhuepf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SeverusYixin ,

thanks for working on this! I left some comments below. Note: I will do these final modifications before merging this.

Thanks again!

docs/_toc.yml Outdated
@@ -7,6 +7,7 @@ parts:
- caption: Contributing
chapters:
- file: contributing/index
- file: contributing/index_2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to rename this file. "index_2" is not meaningful.

@@ -89,11 +89,11 @@ def all_content(directory_path):
for filename in os.listdir(directory_path):
if filename.endswith('.yml'):
print("Adding", filename)
new_content = read_yaml_file(directory_path + filename)
new_content = read_yaml_file(os.path.join(directory_path, filename)) # Corrected line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a comment "# Corrected line" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That time, I tried this code to solve the problem, and write a comment to tell you. With this update, it will work again.

Copy link
Member

@haesleinhuepf haesleinhuepf Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not spread such comments in the code, they will accumulate over time and it will be hard to read. Consider using permalinks instead, like this one:

https://github.com/NFDI4BIOIMAGE/training/blob/04cd328841d9b251e7eaf79f3b3b78ff94329687/scripts/generate_link_lists.py#L92

Views like this:

new_content = read_yaml_file(os.path.join(directory_path, filename)) # Corrected line

And is made by clicking here:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will create a branch to delete this comment

Comment on lines 15 to 31
def all_content(directory_path):
"""
Load all YAML data from a directory into a list of dictionaries.

Args:
directory_path (str): Path to the directory containing YAML files.

Returns:
dict: Dictionary with a key 'resources' containing a list of all resources from YAML files.
"""
resources = []
for yaml_file in os.listdir(directory_path):
if yaml_file.endswith('.yml'):
yaml_data = read_yaml_file(os.path.join(directory_path, yaml_file))
if 'resources' in yaml_data:
resources.extend(yaml_data['resources'])
return {'resources': resources}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not used. I will remove it.

from _github_utilities import get_github_repository

# Correct the directory path for resources
directory_path = os.path.join('..', 'resources')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work if you call the script from within the path "scripts/". You can determine the path, where appsubmitter.py is located using __file__. I will implement this for you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, thank you:)

@haesleinhuepf haesleinhuepf merged commit 074effd into main Sep 9, 2024
1 check passed
@haesleinhuepf haesleinhuepf deleted the app_for_adding_entries branch September 9, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

App for adding entries
2 participants