Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print summary until end of paragraph #51

Open
mrclean789 opened this issue Sep 1, 2022 · 4 comments
Open

Print summary until end of paragraph #51

mrclean789 opened this issue Sep 1, 2022 · 4 comments

Comments

@mrclean789
Copy link

Is there a better way to get page summary so that it doesn't cut off? For e.g., start to end of first paragraph, or first two paragraphs.

@L1mak
Copy link

L1mak commented Nov 15, 2022

You will have to use nltk package (try tokenization function) or any other natural language processing library for it

@martin-majlis
Copy link
Owner

@mrclean789 : Do you have some example when this is happening?

Based on the API, there could be some limitation - https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bextracts

@martin-majlis
Copy link
Owner

When I try the underlying API call - https://en.wikipedia.org/w/api.php?action=query&explaintext=1&exsectionformat=wiki&prop=extracts&titles=Planet& - it looks to me, that there is no restriction on the length of the response.

@psmatter
Copy link

psmatter commented Dec 1, 2023

With HTML format, the summary encloses the paragraphs correctly into <p></p>.
Using the WIKI format the next paragraph is concatenated without space or newline, like:
sentence1p1. sentence2p1.sentence1p2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants