Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many generated sentences contain unbalanced punctuation/markdown #1

Open
Deimos opened this issue Aug 19, 2015 · 0 comments
Open

Many generated sentences contain unbalanced punctuation/markdown #1

Deimos opened this issue Aug 19, 2015 · 0 comments
Labels

Comments

@Deimos
Copy link
Owner

Deimos commented Aug 19, 2015

markovify actually throws out any sentences including quotes, parentheses or square brackets by default because they tend to end up unbalanced in the generated sentences. I overrode that behavior because it was removing a huge number of sentences from the training, like almost every single title in /r/relationships and most comments from /r/scenesfromahat. But by doing that I've ended up with the result it was trying to avoid - a lot of unmatched ones in the output.

Main things to try to fix with this:

  • Quotes - both double-quotes and single-quotes (need to distinguish from apostrophes)
  • Parentheses
  • Square brackets (especially as markdown link text)
  • Asterisks being used for bold and italic markdown
@Deimos Deimos added the bug label Aug 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant