Skip to content

Commit 314272e

Browse files
authored
maintenance and docs: remove dependabot and update funding (#178)
* maintenance: remove dependabot and update funding * update readme * add context * remove duplicate text * fix typos
1 parent 9c5f619 commit 314272e

File tree

4 files changed

+44
-52
lines changed

4 files changed

+44
-52
lines changed

.github/FUNDING.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# These are supported funding model platforms
22

3-
github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
3+
github: [adbar]
44
patreon: # Replace with a single Patreon username
55
open_collective: # Replace with a single Open Collective username
66
ko_fi: adbarbaresi

.github/dependabot.yml

-27
This file was deleted.

README.md

+20-18
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,14 @@
1313

1414
<br/>
1515

16-
Find **original and updated publication dates** of any web page. **On
17-
the command-line or with Python**, all the steps needed from web page
18-
download to HTML parsing, scraping, and text analysis are included. The
19-
package is used in production on millions of documents and integrated by
20-
[multiple
21-
libraries](https://github.com/adbar/htmldate/network/dependents).
16+
Find **original and updated publication dates** of any web page.
17+
It is often not possible to do it using just the URL or the server response.
18+
19+
**On the command-line or with Python**, all the steps needed from web page
20+
download to HTML parsing, scraping, and text analysis are included.
21+
22+
The package is used in production on millions of documents and integrated into
23+
[thousands of projects](https://github.com/adbar/htmldate/network/dependents).
2224

2325

2426
## In a nutshell
@@ -114,17 +116,20 @@ license](https://www.apache.org/licenses/LICENSE-2.0.html).
114116

115117
Versions prior to v1.8.0 are under GPLv3+ license.
116118

117-
## Author
119+
## Context
118120

119-
This project is part of methods to derive information from web documents
120-
in order to build [text databases for
121-
research](https://www.dwds.de/d/k-web) (chiefly linguistic analysis and
122-
natural language processing).
121+
Initially launched to create text databases for research purposes
122+
at the Berlin-Brandenburg Academy of Sciences (DWDS and ZDL units),
123+
this project continues to be maintained but its future development
124+
depends on community support.
123125

124-
Extracting and pre-processing web texts to meet the exacting standards
125-
is a significant challenge. It is often not possible to reliably
126-
determine the date of publication or modification using either the URL
127-
or the server response. For more information:
126+
**If you value this software or depend on it for your product, consider
127+
sponsoring it and contributing to its codebase**. Your support will
128+
help maintain and enhance this popular package, ensuring its growth,
129+
robustness, and accessibility for developers and users around the world.
130+
131+
Reach out via the software repository or the [contact page](https://adrien.barbaresi.eu/)
132+
for inquiries, collaborations, or feedback.
128133

129134
[![JOSS article reference DOI: 10.21105/joss.02439](https://img.shields.io/badge/JOSS-10.21105%2Fjoss.02439-brightgreen)](https://doi.org/10.21105/joss.02439)
130135
[![Zenodo archive DOI: 10.5281/zenodo.3459599](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.3459599-blue)](https://doi.org/10.5281/zenodo.3459599)
@@ -156,9 +161,6 @@ or the server response. For more information:
156161
Proceedings of the [10th Web as Corpus Workshop
157162
(WAC-X)](https://www.sigwac.org.uk/wiki/WAC-X), 2016.
158163

159-
You can contact me via my [contact page](https://adrien.barbaresi.eu/)
160-
or [GitHub](https://github.com/adbar).
161-
162164
## Contributing
163165

164166
[Contributions](https://github.com/adbar/htmldate/blob/master/CONTRIBUTING.md)

docs/index.rst

+23-6
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,14 @@ htmldate: find the publication date of web pages
3434

3535
|
3636
37-
Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are included.
37+
Find **original and updated publication dates** of any web page.
38+
It is often not possible to do it using just the URL or the server response.
39+
40+
**On the command-line or with Python**, all the steps needed from web page
41+
download to HTML parsing, scraping, and text analysis are included.
42+
43+
The package is used in production on millions of documents and integrated into
44+
`thousands of projects <https://github.com/adbar/htmldate/network/dependents>`_.
3845

3946

4047
In a nutshell
@@ -246,10 +253,22 @@ This package is distributed under the `Apache 2.0 license <https://www.apache.or
246253
Versions prior to v1.8.0 are under GPLv3+ license.
247254

248255

249-
Author
250-
------
256+
Context
257+
-------
258+
259+
Initially launched to create text databases for research purposes
260+
at the Berlin-Brandenburg Academy of Sciences (DWDS and ZDL units),
261+
this project continues to be maintained but its future development
262+
depends on community support.
263+
264+
**If you value this software or depend on it for your product, consider
265+
sponsoring it and contributing to its codebase**. Your support will
266+
help maintain and enhance this popular package, ensuring its growth,
267+
robustness, and accessibility for developers and users around the world.
268+
269+
Reach out via the software repository or the `contact page
270+
<https://adrien.barbaresi.eu/>`_ for inquiries, collaborations, or feedback.
251271

252-
This effort is part of methods to derive information from web documents in order to build `text databases for research <https://www.dwds.de/d/k-web>`_ (chiefly linguistic analysis and natural language processing). Extracting and pre-processing web texts to the exacting standards of scientific research presents a substantial challenge for those who conduct such research. There are web pages for which neither the URL nor the server response provide a reliable way to find out when a document was published or modified. For more information:
253272

254273
.. image:: https://img.shields.io/badge/JOSS-10.21105%2Fjoss.02439-brightgreen
255274
:target: https://doi.org/10.21105/joss.02439
@@ -278,8 +297,6 @@ This effort is part of methods to derive information from web documents in order
278297
- Barbaresi, A. "`Generic Web Content Extraction with Open-Source Software <https://hal.archives-ouvertes.fr/hal-02447264/document>`_", Proceedings of KONVENS 2019, Kaleidoscope Abstracts, 2019.
279298
- Barbaresi, A. "`Efficient construction of metadata-enhanced web corpora <https://hal.archives-ouvertes.fr/hal-01371704v2/document>`_", Proceedings of the `10th Web as Corpus Workshop (WAC-X) <https://www.sigwac.org.uk/wiki/WAC-X>`_, 2016.
280299

281-
You can contact me via my `contact page <https://adrien.barbaresi.eu/>`_ or `GitHub <https://github.com/adbar>`_.
282-
283300

284301
Contributing
285302
------------

0 commit comments

Comments
 (0)