-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contextually show where links can be found in the Wikipedia pages themselves #39
Comments
Thanks for the suggestion! I agree it would be a cool feature, but given the data source I'm using, it is not really easy to do. I don't ever actually see the full text of the Wikipedia page itself, just the Wikipedia database containing all the links. So I can't easily show you the context around where the link shows up in the actual page. Also, since the database is only updated monthly, it is possible the link is actually no longer on the page itself as it may have been edited since the latest database dump. Maybe I'll figure out a way to do this in the future, but for now, this is not feasible with my current architecture. |
You can't pick the HTML of the page, can you? |
I definitely could try something like that and I honestly think that is the way this would need to be implemented. But it wouldn't be very efficient and the system currently doesn't ever look at the raw HTML. |
Also, it would be better than needing to dump the database much times, it'd be automatic |
There is no way to do the actual search algorithm using live pages as it would take way too long. Thousands to tens of thousands of pages need to be touched. What I was referring to was just pull the context for a single page when you, for example, click on it in the graph view. |
Yep |
Maybe you could look through the HTML after the search has completed. Then do some web scraping to look for the link on the page and return the title of the section or subsection it was found in. |
This is what I was suggesting. A way SDOW could go to the live wikipedia page and search for each link, then return the parent header of that |
that would be very nice since I cannot find any links shown in the results on either of the pages requested |
To show where the links were found, just because sometimes I can't find where this link is in the page.
The text was updated successfully, but these errors were encountered: