Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2062 +/- ##
==========================================
+ Coverage 74.38% 74.44% +0.06%
==========================================
Files 41 41
Lines 3146 3146
Branches 689 689
==========================================
+ Hits 2340 2342 +2
+ Misses 686 684 -2
Partials 120 120 ☔ View full report in Codecov by Sentry. |
|
@audiodude Thank you very much for this simple but very important fix. One thing, why bot just taking the Mediwiki URL given as argument to MWoffliner in place of this fake "http://localhost/"? To me this would more correct and probably more robust. |
Because the WMF software looks for specific values of the |
All WMF domains are part of the Regex look like, so why a hack like "localhost" could be better? |
It just seems better to put a "dummy" value that is obviously a hack, than a WMF domain which is potentially misleading. |
Not really convinced, but I guess you can argue that way. I believe we should anyway have a small test validating this(proper download of map) to secure next time we detect and can fix such kind of issue early. |
|
@audiodude Maybe we could just extend |
Added test in |
| const mwResp = await axios(url, this.arrayBufferRequestOptions) | ||
| // The 'Referer' header is set to get around WMF domain origin restrictions. | ||
| // See: https://github.com/openzim/mwoffliner/issues/2061 | ||
| const mwResp = await axios(url, { ...this.arrayBufferRequestOptions, headers: { Referer: 'https://localhost/' } }) |
There was a problem hiding this comment.
Why not use the actual site/page you are scraping as the referrer instead of this fiction? That would make your scraper function in the same way as any normal user-agent consuming the wiki content and allow you to avoid being seen as deliberately violating hot linking protections used on the Wikimedia content farm.
There was a problem hiding this comment.
@bd808 Was pretty much my proposal, see comments above and responses.
Fixes #2061
Tested with Italian wikipedia download with article listed in bug.