Skip to content

Commit 69d7723

Browse files
committed
Version 3.10.2rc3
1 parent 95cef06 commit 69d7723

14 files changed

+100
-24
lines changed

.bumpversion.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 3.10.2rc2
2+
current_version = 3.10.2rc3
33
message = Release {new_version}
44
parse = ^
55
(?P<major>\d+)

.deploy.bat

+3
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ if %r% EQU n exit /b
3636
set /p r=Did you run local tests (tox) and ensure CI passed? [N/y] || set r=n
3737
if %r% EQU N r=n
3838
if %r% EQU n exit /b
39+
set /p r=Did you check https://readthedocs.org/projects/webchanges/builds/ and ensure docs built there? [N/y] || set r=n
40+
if %r% EQU N r=n
41+
if %r% EQU n exit /b
3942
set /p r=Are you in the unreleased branch? [N/y] || set r=n
4043
if %r% EQU N r=n
4144
if %r% EQU n exit /b

.github/ISSUE_TEMPLATE/bug_report.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@ bug.
1818
A clear and concise description of what you expected to happen.
1919

2020
**Screen scrape/screenshots**
21-
If applicable, add screen scrape or screenshots to help explain your problem.
21+
If applicable, add screen scrape or screenshots to help explain the bug. Use ``-v`` or ``-vv`` to capture
22+
logging.
2223

2324
**Version info**
2425
Please run ``webchanges -v`` and paste the version information as follows (first 3 lines):

.github/ISSUE_TEMPLATE/feature_request.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ assignees: ''
88
---
99

1010
**Is your feature request related to a problem? Please describe.**
11-
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
11+
A clear and concise description of what is the problem to solve. Ex. I'm always frustrated when [...]
1212

1313
**Describe the solution you'd like**
1414
A clear and concise description of what you want to happen.

CHANGELOG.rst

+6-4
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ can check out the `wish list <https://github.com/mborsetti/webchanges/blob/main/
3131
Internals, for changes that don't affect users. [triggers a minor patch]
3232
3333
34-
Version 3.10.2rc2
34+
Version 3.10.2rc3
3535
===================
3636
Unreleased
3737

@@ -43,12 +43,14 @@ Unreleased
4343

4444
Added
4545
-----
46-
* You can now run the command line argument ``--test`` without specifying a JOB; this will run a check of the config and
47-
job files for syntax errors.
46+
* You can now run the command line argument ``--test`` without specifying a JOB; this will check the config
47+
(default: ``config.yaml``) and job (default: ``job.yaml``) files for syntax errors.
48+
* New job directive ``compared_versions`` allows change detection to be made against multiple saved snapshots;
49+
useful for monitoring websites that change between a set of states (e.g. they are running A/B testing).
50+
* New command line argument ``--check-new`` to check if a new version of **webchanges** is available.
4851
* Error messages for url jobs failing with HTTP reason codes of 400 and higher now include any text returned by the
4952
website (e.g. "Rate exceeded.", "upstream request timeout", etc.). Not implemented in jobs with ``use_browser: true``
5053
due to limitations in Playwright.
51-
* New command line argument ``--check-new`` to check if a new version of **webchanges** is available.
5254

5355
Changed
5456
-------

RELEASE.rst

+5-3
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,14 @@
66

77
Added
88
-----
9-
* You can now run the command line argument ``--test`` without specifying a JOB; this will run a check of the config and
10-
job files for syntax errors.
9+
* You can now run the command line argument ``--test`` without specifying a JOB; this will check the config
10+
(default: ``config.yaml``) and job (default: ``job.yaml``) files for syntax errors.
11+
* New job directive ``compared_versions`` allows change detection to be made against multiple saved snapshots;
12+
useful for monitoring websites that change between a set of states (e.g. they are running A/B testing).
13+
* New command line argument ``--check-new`` to check if a new version of **webchanges** is available.
1114
* Error messages for url jobs failing with HTTP reason codes of 400 and higher now include any text returned by the
1215
website (e.g. "Rate exceeded.", "upstream request timeout", etc.). Not implemented in jobs with ``use_browser: true``
1316
due to limitations in Playwright.
14-
* New command line argument ``--check-new`` to check if a new version of **webchanges** is available.
1517

1618
Changed
1719
-------

docs/examples.rst

+14
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,20 @@ crontab)::
4444

4545
.. _always_report:
4646

47+
48+
Comparing with several latest snapshots
49+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
50+
If a webpage frequently changes between several known stable states (e.g. A/B layout testing), it may be desirable to
51+
have changes reported only if the webpage changes into a new unknown state. You can use compared_versions to do this.
52+
53+
.. code-block:: yaml
54+
55+
url: https://example.com/
56+
compared_versions: 3
57+
58+
In this example, changes are only reported if the webpage becomes different from the latest three distinct states.
59+
The differences are shown relative to the closest match.
60+
4761
Receiving a report for every run
4862
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4963
If you are watching pages that change seldomly, but you still want to be notified every time :program:`webchanges`

docs/jobs.rst

+42-3
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,20 @@ Whether to use a Chrome web browser (true/false). Defaults to false.
168168
If true, it renders the URL via a JavaScript-enabled web browser and extracts the HTML after rendering (see
169169
:ref:`above <use_browser>` for important information).
170170

171+
compared_versions
172+
^^^^^^^^^^^^^^^^^
173+
Number of saved snapshots to compare against (int). Defaults to 1.
174+
175+
If set to a number greater than 1, instead of comparing the current data to only the very last snapshot captured, it
176+
is matched against any of *n* snapshots. This is very useful when a webpage frequently changes between several known
177+
stable states (e.g. they're doing A/B testing), as changes will be reported only when the content changes to a new
178+
unknown state, in which case the differences are shown relative to the closest match.
179+
180+
Refer to the command line argument ``--max-snapshots`` to ensure that you are saving the number of snapshots you need
181+
for this directive to run successfully (default is 4) (see :ref:`here<max-snapshots>`).
182+
183+
.. versionadded:: 3.10.2
184+
171185
cookies
172186
^^^^^^^
173187
Cookies to send with the request (a dict).
@@ -194,7 +208,7 @@ http_proxy
194208
Proxy server to use for HTTP requests (a string). If unspecified or null/false, the system environment variable
195209
``HTTP_PROXY``, if defined, will be used.
196210

197-
E.g. ``\http://username:[email protected]:8080``.
211+
E.g. ``http://username:[email protected]:8080``.
198212

199213
.. versionchanged:: 3.0
200214
Works for all ``url`` jobs, including those with ``use_browser: true``.
@@ -204,7 +218,7 @@ https_proxy
204218
Proxy server to use for HTTPS (i.e. secure) requests (a string). If unspecified or null/false, the system environment
205219
variable ``HTTPS_PROXY``, if defined, will be used.
206220

207-
E.g. ``\https://username:[email protected]:8080``.
221+
E.g. ``https://username:[email protected]:8080``.
208222

209223
.. versionchanged:: 3.0
210224
Works for all ``url`` jobs, including those with ``use_browser: true``.
@@ -332,7 +346,32 @@ The following directives are available only for ``url`` jobs without ``use_brows
332346

333347
no_redirects
334348
^^^^^^^^^^^^
335-
Disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection (true/false). Defaults to false.
349+
Disables GET, OPTIONS, POST, PUT, PATCH, DELETE, HEAD redirection (true/false). Defaults to false (i.e. redirection
350+
is enabled) for all methods except HEAD. See more `here
351+
<https://requests.readthedocs.io/en/latest/user/quickstart/#redirection-and-history>`__. Redirection takes place
352+
whenever an HTTP status code of 301, 302, 303, 307 or 308 is returned.
353+
354+
Example:
355+
356+
.. code-block:: yaml
357+
358+
url: "https://donneespubliques.meteofrance.fr/donnees_libres/bulletins/BCM/203001.pdf"
359+
no_redirects: true
360+
filter:
361+
- html2text:
362+
363+
Returns:
364+
365+
.. code-block::
366+
367+
302 Found
368+
---------
369+
370+
# Found
371+
The document has moved [here](https://donneespubliques.meteofrance.fr/?fond=donnee_indisponible).
372+
* * *
373+
Apache/2.2.15 (CentOS) Server at donneespubliques.meteofrance.fr Port 80
374+
336375
337376
.. versionadded:: 3.2.7
338377

webchanges/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
# * MINOR version when you add functionality in a backwards compatible manner, and
1919
# * MICRO or PATCH version when you make backwards compatible bug fixes. We no longer use '0'
2020
# If unsure on increments, use pkg_resources.parse_version to parse
21-
__version__ = '3.10.2rc2'
21+
__version__ = '3.10.2rc3'
2222
__description__ = (
2323
'Check web (or commands) for changes since last run and notify.\n\nAnonymously alerts you of webpage changes.'
2424
)

webchanges/handler.py

+2
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ class JobState(ContextManager):
5656
_generated_diff_html: Optional[str] = None
5757
error_ignored: Union[bool, str]
5858
exception: Optional[Exception] = None
59+
history_data: Dict[str, float] = {}
5960
new_data: str
6061
new_etag: str
6162
new_timestamp: float
@@ -121,6 +122,7 @@ def load(self) -> None:
121122
"""Loads form the database the last snapshot for the job."""
122123
guid = self.job.get_guid()
123124
self.old_data, self.old_timestamp, self.tries, self.old_etag = self.cache_storage.load(guid)
125+
self.history_data = self.cache_storage.get_history_data(guid, self.job.compared_versions)
124126

125127
def save(self, use_old_data: bool = False) -> None:
126128
"""Saves new data retrieved by the job into the snapshot database.

webchanges/jobs.py

+9-5
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ class JobBase(object, metaclass=TrackSubClasses):
100100
additions_only: Optional[bool] = None
101101
block_elements: List[str] = []
102102
chromium_revision: Optional[Union[Dict[str, int], Dict[str, str], str, int]] = None # deprecated
103+
compared_versions: int = 1
103104
contextlines: Optional[int] = None
104105
cookies: Optional[Dict[str, str]] = None
105106
data: Union[str, Dict[str, str]] = None # type: ignore[assignment]
@@ -426,6 +427,7 @@ class Job(JobBase):
426427
'name',
427428
'note',
428429
'additions_only',
430+
'compared_versions',
429431
'contextlines',
430432
'deletions_only',
431433
'diff_filter',
@@ -840,6 +842,7 @@ def _playwright_retrieve(self, job_state: JobState, headless: bool = True) -> Tu
840842
:raises BrowserResponseError: If a browser error or an HTTP response code between 400 and 599 is received.
841843
"""
842844
try:
845+
from playwright._repo_version import version as playwright_version
843846
from playwright.sync_api import Error as PlaywrightError
844847
from playwright.sync_api import ProxySettings, Route, sync_playwright
845848
except ImportError:
@@ -983,8 +986,9 @@ def _playwright_retrieve(self, job_state: JobState, headless: bool = True) -> Tu
983986
ignore_https_errors=self.ignore_https_errors, # type: ignore[arg-type]
984987
extra_http_headers=dict(headers),
985988
)
986-
logger.debug(
987-
f'Job {self.index_number}: Pyppeteer launched {browser_name} browser ' f'{browser.version}'
989+
logger.info(
990+
f'Job {self.index_number}: Pyppeteer {playwright_version} launched {browser_name} browser'
991+
f' {browser.version}'
988992
)
989993
else:
990994
context = p.chromium.launch_persistent_context(
@@ -999,9 +1003,9 @@ def _playwright_retrieve(self, job_state: JobState, headless: bool = True) -> Tu
9991003
ignore_https_errors=self.ignore_https_errors, # type: ignore[arg-type]
10001004
extra_http_headers=dict(headers),
10011005
)
1002-
logger.debug(
1003-
f'Job {self.index_number}: Pyppeteer launched {browser_name} browser version'
1004-
f' {context.browser.version} with user data directory ' # type: ignore[union-attr]
1006+
logger.info(
1007+
f'Job {self.index_number}: Pyppeteer {playwright_version} launched {browser_name} browser '
1008+
f'{context.browser.version} from user data directory ' # type: ignore[union-attr]
10051009
f'{self.user_data_dir}'
10061010
)
10071011

webchanges/main.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def load_hooks(self) -> None:
9393
)
9494
else:
9595
import_module_from_source('hooks', self.urlwatch_config.hooks)
96-
logger.info(f'Loaded hooks from {self.urlwatch_config.hooks}')
96+
logger.info(f'Imported hooks module from {self.urlwatch_config.hooks}')
9797

9898
def load_jobs(self) -> None:
9999
"""Load jobs from the file into self.jobs.
@@ -102,7 +102,6 @@ def load_jobs(self) -> None:
102102
"""
103103
if self.urlwatch_config.jobs.is_file():
104104
jobs = self.jobs_storage.load_secure()
105-
logger.info(f'Loaded {len(jobs)} jobs from {self.urlwatch_config.jobs}')
106105
else:
107106
print(f'Jobs file not found: {self.urlwatch_config.jobs}')
108107
raise SystemExit(1)

webchanges/storage.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -564,6 +564,7 @@ def is_shell_job(job: JobBase) -> bool:
564564
)
565565
jobs = [job for job in jobs if job not in removed_jobs]
566566

567+
logger.info(f'Loaded {len(jobs)} jobs from {self.filename}')
567568
return jobs
568569

569570

@@ -1060,7 +1061,7 @@ def __init__(self, filename: Union[str, Path], max_snapshots: int = 4) -> None:
10601061
self.lock = threading.RLock()
10611062

10621063
self.db = sqlite3.connect(filename, check_same_thread=False)
1063-
logger.info(f'Using sqlite3 database at {filename}')
1064+
logger.info(f'Using sqlite3 {sqlite3.sqlite_version} database at {filename}')
10641065
self.cur = self.db.cursor()
10651066
self.cur.execute('PRAGMA temp_store = MEMORY;')
10661067
tables = self._execute("SELECT name FROM sqlite_master WHERE type='table';").fetchone()

webchanges/worker.py

+11-2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
from __future__ import annotations
66

7+
import difflib
78
import logging
89
import os
910
import random
@@ -113,14 +114,22 @@ def job_runner(
113114
urlwatcher.report.error(job_state)
114115
else:
115116
logger.info(f'Job {job_state.job.index_number}: Job finished with no exceptions')
116-
elif job_state.old_data != '' or job_state.old_timestamp != 0:
117+
elif len(job_state.old_data) or job_state.old_timestamp != 0:
117118
# This is not the first time running this job (we have snapshots)
118-
if job_state.new_data == job_state.old_data:
119+
if job_state.history_data.get(job_state.new_data):
120+
# exactly matches one of the previous snapshots
119121
if job_state.tries > 0:
120122
job_state.tries = 0
121123
job_state.save()
122124
urlwatcher.report.unchanged(job_state)
123125
else:
126+
# no match
127+
if len(job_state.history_data) > 1:
128+
# replace old with best "good enough" previous snapshot
129+
close_matches = difflib.get_close_matches(job_state.new_data, job_state.history_data, n=1)
130+
if close_matches:
131+
job_state.old_data = close_matches[0]
132+
job_state.old_timestamp = job_state.history_data[close_matches[0]]
124133
job_state.tries = 0
125134
job_state.save()
126135
urlwatcher.report.changed(job_state)

0 commit comments

Comments
 (0)