Skip to content

Commit

Permalink
Merge pull request #4 from TeamHG-Memex/domain-limits-fix
Browse files Browse the repository at this point in the history
don't go out of domain for pagination URLs
  • Loading branch information
lopuhin committed Mar 16, 2016
2 parents c45f407 + eda5d14 commit 63272d1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion undercrawler/spiders/base_spider.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def _pagination_urls(self, response):
return [
url for url in
unique(canonicalize_url(url) for url in autopager.urls(response))
if url.startswith('http')
if self.link_extractor.matches(url)
]


Expand Down

0 comments on commit 63272d1

Please sign in to comment.