Expand on what this library can be used for

TeamHG-Memex · May 26, 2017 · c7ebb57 · c7ebb57
1 parent c5cc532
commit c7ebb57
Showing 1 changed file with 8 additions and 0 deletions.
diff --git a/README.rst b/README.rst
@@ -26,6 +26,14 @@ or ``.get_text()`` from Beautiful Soup?
 Text extracted with ``html_text`` does not contain inline styles,
 javascript, comments and other text that is not normally visible to the users.
 
+Apart from just getting text from the page (e.g. for display or search),
+one intended usage of this library is for machine learning (feature extraction).
+If you want to use the text of the html page as a feature (e.g. for classification),
+this library gives you plain text that you can later feed into a standard text
+classification pipeline.
+If you feel that you need html structure as well, check out
+`webstruct <http://webstruct.readthedocs.io/en/latest/>`_ library.
+
 
 Install
 -------