|
13 | 13 |
|
14 | 14 | <br/>
|
15 | 15 |
|
16 |
| -Find **original and updated publication dates** of any web page. **On |
17 |
| -the command-line or with Python**, all the steps needed from web page |
18 |
| -download to HTML parsing, scraping, and text analysis are included. The |
19 |
| -package is used in production on millions of documents and integrated by |
20 |
| -[multiple |
21 |
| -libraries](https://github.com/adbar/htmldate/network/dependents). |
| 16 | +Find **original and updated publication dates** of any web page. |
| 17 | +It is often not possible to do it using just the URL or the server response. |
| 18 | + |
| 19 | +**On the command-line or with Python**, all the steps needed from web page |
| 20 | +download to HTML parsing, scraping, and text analysis are included. |
| 21 | + |
| 22 | +The package is used in production on millions of documents and integrated into |
| 23 | +[thousands of projects](https://github.com/adbar/htmldate/network/dependents). |
22 | 24 |
|
23 | 25 |
|
24 | 26 | ## In a nutshell
|
@@ -114,17 +116,20 @@ license](https://www.apache.org/licenses/LICENSE-2.0.html).
|
114 | 116 |
|
115 | 117 | Versions prior to v1.8.0 are under GPLv3+ license.
|
116 | 118 |
|
117 |
| -## Author |
| 119 | +## Context |
118 | 120 |
|
119 |
| -This project is part of methods to derive information from web documents |
120 |
| -in order to build [text databases for |
121 |
| -research](https://www.dwds.de/d/k-web) (chiefly linguistic analysis and |
122 |
| -natural language processing). |
| 121 | +Initially launched to create text databases for research purposes |
| 122 | +at the Berlin-Brandenburg Academy of Sciences (DWDS and ZDL units), |
| 123 | +this project continues to be maintained but its future development |
| 124 | +depends on community support. |
123 | 125 |
|
124 |
| -Extracting and pre-processing web texts to meet the exacting standards |
125 |
| -is a significant challenge. It is often not possible to reliably |
126 |
| -determine the date of publication or modification using either the URL |
127 |
| -or the server response. For more information: |
| 126 | +**If you value this software or depend on it for your product, consider |
| 127 | +sponsoring it and contributing to its codebase**. Your support will |
| 128 | +help maintain and enhance this popular package, ensuring its growth, |
| 129 | +robustness, and accessibility for developers and users around the world. |
| 130 | + |
| 131 | +Reach out via the software repository or the [contact page](https://adrien.barbaresi.eu/) |
| 132 | +for inquiries, collaborations, or feedback. |
128 | 133 |
|
129 | 134 | [](https://doi.org/10.21105/joss.02439)
|
130 | 135 | [](https://doi.org/10.5281/zenodo.3459599)
|
@@ -156,9 +161,6 @@ or the server response. For more information:
|
156 | 161 | Proceedings of the [10th Web as Corpus Workshop
|
157 | 162 | (WAC-X)](https://www.sigwac.org.uk/wiki/WAC-X), 2016.
|
158 | 163 |
|
159 |
| -You can contact me via my [contact page](https://adrien.barbaresi.eu/) |
160 |
| -or [GitHub](https://github.com/adbar). |
161 |
| - |
162 | 164 | ## Contributing
|
163 | 165 |
|
164 | 166 | [Contributions](https://github.com/adbar/htmldate/blob/master/CONTRIBUTING.md)
|
|
0 commit comments