Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraping to find new events #2

Open
bahorn opened this issue Jan 25, 2021 · 4 comments
Open

Scraping to find new events #2

bahorn opened this issue Jan 25, 2021 · 4 comments

Comments

@bahorn
Copy link
Member

bahorn commented Jan 25, 2021

As mentioned in the readme, I'm interested in maintaining more of an effort to scrape sites and such to discover new events.

One approach that can use the existing data we are collecting is the tech society Github list.

As github provides atom feeds, we can track changes across all their repositories to see when their sites change by just appending a ".atom" to the url.

Also sneaky and we can catch things before publicly announced :)

@bahorn bahorn changed the title Discovering new events. Scrapping to find new events Jan 25, 2021
@jmsv jmsv changed the title Scrapping to find new events Scraping to find new events Jan 28, 2021
@bahorn
Copy link
Member Author

bahorn commented Feb 4, 2021

So, I setup a channel called events in the gitter community, which is where notifications from the bot should be posted.

@bahorn
Copy link
Member Author

bahorn commented Feb 5, 2021

Ok, so I attempted to get the approach where I append .atom to organization URLs to get updates.

That doesn't work because apparently they do it from when it was last generated? You can see my attempt in the current revision 8f6f0a4

Best approach now would be to just use an API token to list all organization repos, check if anything new turned up (like a 2021 event repo, etc) or changes to known website repositories.

@bahorn
Copy link
Member Author

bahorn commented Feb 5, 2021

https://api.github.com/users/{org}/events

Will list all recent events that organization, which I just pushed code to use. Just needs scheduling now. Will probably aim to run on a wednesday at 4am. Should capture most new stuff then.

@bahorn
Copy link
Member Author

bahorn commented Feb 5, 2021

Now scheduled, and it seems to work!

I'll keep this issue open for progress on the other methods of scraping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant