Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish Download and Citations Data #933

Open
Ly0n opened this issue Nov 5, 2024 · 3 comments
Open

Publish Download and Citations Data #933

Ly0n opened this issue Nov 5, 2024 · 3 comments
Assignees

Comments

@Ly0n
Copy link
Member

Ly0n commented Nov 5, 2024

We've got some great data that we should definitely share: download numbers for almost all of the larger package manager (except Julia) listed on OpenSustain.tech. Here is a very prototypical playground for playing with the dataset. Just open the notebook in colab and you will see first ideas of interactive plots that we could use for the blog post.
https://github.com/protontypes/osta/blob/main/packages_insights.ipynb

Open In Colab

As discussed in the core meeting, it makes sense to start with a blog post and then work on a paper that we then publish together (ideally with peer review).

We also have citations for many projects, but the data here is often not valid. Maybe we are also able to get this numbers right. At the moment I see 2 problems with the citations numbers:

  1. We do not get citations numbers via OpenAlex from DOIs that are created via Zenodo. (A lot of Open Source projects have a DOI from Zenodo)
  2. Just because an DOI is linked in the README does not mean that the people of the open source project are affiliated with the paper that is been referenced. A possible solution for this problem is mentioned here: Comparison of repo authors and DOI authors #819
@RichardLitt
Copy link
Contributor

Thanks. Setting expectations; I won't get to this before early next week.

@RichardLitt
Copy link
Contributor

I started an Overleaf, and then spent ten minutes fiddling with the template before I remembered it really doesn't matter. What matters is the ability, at this point, to get stuff down and to move things around and to just write. We can edit and format later. So, I set up a Google Doc.

https://docs.google.com/document/d/12jn8Xfhkf6IMUa_dhkIO9Qw1VPU0F_pkUgKBQuJ2WkE/edit?usp=sharing

More coming. Feel free to jump in at any point.

@Ly0n
Copy link
Member Author

Ly0n commented Nov 12, 2024

There is another indicator that we should investigate to measure the distribution/usage of an open source project: The number of "external" issues opened per time interval by people who are not the maintainer or main contributor of the project. It should be easy to derive such an indicator from the given data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants