- Made lxml an optional dependency
- Fixed HTML-comments handling
- Fixed a bug: quote after openning brace was considered as closing brace
- Fixed lang attribute handling
- Handling of not-typographed tags like <code>, <pre>, etc.
- A complete list of block tags
- Made a list of block and ignored tags customisable
- Fixed a bug with month names recognition for Russian.
- Added support for multiline text (separated by line breaks).
- Fixed the case when hyphen is last symbol.
- Fixed the case with quote at the beginning of a string.
- Recognize № sign as a particle in Russian.
Minor features:
- Dropped six dependency.