Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit on repos for an org? Or bug? #9

Open
data-henrik opened this issue Feb 19, 2018 · 4 comments
Open

Limit on repos for an org? Or bug? #9

data-henrik opened this issue Feb 19, 2018 · 4 comments

Comments

@data-henrik
Copy link

I tried reading stats for an org with many repos. The output stops after a chunk of repos. I haven't looked deeper into it. Is this a well-known limit, a bug in the traffic API or with the Python code?

@nchah
Copy link
Owner

nchah commented Feb 21, 2018

Hi Henrik,
It looks like either the API may be undergoing some changes as the REST API v3 is migrated to GraphQL v4: "Warning: The API may change without advance notice during the preview period." [1]

Or, also likely is that the current implementation doesn't handle pagination [2]. To be honest, I didn't think that this script would be used for organizations with hundreds of repos! In fact, getting organization's repos was added by another user in a relatively recent PR [3].

[1] https://developer.github.com/v3/repos/
[2] https://developer.github.com/v3/#pagination
[3] #8

@data-henrik
Copy link
Author

Thank you, it seems that missing pagination support is causing it.

@mcauser
Copy link

mcauser commented Mar 29, 2018

I hit the pagination issue too. I have around 180 repos and I'm only seeing stats for the first 30.
eg. gts 'mcauser' 'ALL' 'save_csv'

When requesting /user/repos there is a Link response header which shows the next page:

curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos

HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=2>; rel="next", <https://api.github.com/user/repos?page=12>; rel="last"

And after requesting the 2nd page:

curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos?page=2

HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=1>; rel="prev", <https://api.github.com/user/repos?page=3>; rel="next", <https://api.github.com/user/repos?page=12>; rel="last", <https://api.github.com/user/repos?page=1>; rel="first"

And after requesting the last page:

curl -i -H 'Authorization: token mytoken' https://api.github.com/user/repos?page=12

HTTP/1.1 200 OK
Link: <https://api.github.com/user/repos?page=11>; rel="prev", <https://api.github.com/user/repos?page=1>; rel="first"

Seems the fix where repo == 'ALL' is to extract the rel="next" url from the Link header and repeatedly call send_request() to collect all repo names, then loop over each.

Edit: You can increase the default 30 per page to 100 with &per_page=100

Also, it seems the spiderman-preview header isn't required anymore.
https://github.com/nchah/github-traffic-stats/blob/master/gts/main.py#L248
https://developer.github.com/changes/2016-08-15-traffic-api-preview/

@nchah
Copy link
Owner

nchah commented Apr 7, 2018

Thanks all for documenting this issue. I've pushed some changes that get all of the hundreds of repos owned by organizations like 'IBM', 'Google', etc. To get the actual traffic stats for those repos, the user running gts needs to have push access to those repos so I'm not personally able to get that data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants