Skip to content

Commit d8d98c5

Browse files
Merge pull request #41 from thewebscraping/feat/rotator
Feature: Smart Rotators & Robust Library Management
2 parents 7964828 + 7857e57 commit d8d98c5

30 files changed

+1901
-470
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,4 +160,4 @@ cython_debug/
160160
# and can be added to the global gitignore or merged into this file. For a more nuclear
161161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162162
.idea/
163-
tls_requests/bin/*xgo*
163+
tls_requests/bin/*

.pre-commit-config.yaml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
exclude: '^docs.sh/|scripts/'
22
default_stages: [pre-commit]
33

4-
default_language_version:
5-
python: python3.10
6-
74
repos:
85
- repo: https://github.com/pre-commit/pre-commit-hooks
9-
rev: v4.5.0
6+
rev: v6.0.0
107
hooks:
118
- id: trailing-whitespace
129
- id: end-of-file-fixer
@@ -20,14 +17,21 @@ repos:
2017
- id: check-docstring-first
2118
- id: detect-private-key
2219

20+
# run the autoflake.
21+
- repo: https://github.com/PyCQA/autoflake
22+
rev: v2.3.1
23+
hooks:
24+
- id: autoflake
25+
args: [--remove-all-unused-imports, --in-place, --ignore-init-module-imports]
26+
2327
# run the isort.
2428
- repo: https://github.com/PyCQA/isort
25-
rev: 5.13.2
29+
rev: 6.1.0
2630
hooks:
2731
- id: isort
2832

2933
# run the flake8.
3034
- repo: https://github.com/PyCQA/flake8
31-
rev: 7.0.0
35+
rev: 7.3.0
3236
hooks:
3337
- id: flake8

Makefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,12 @@
22
init-actions:
33
python -m pip install --upgrade pip
44
python -m pip install -r requirements-dev.txt
5+
python -m autoflake --in-place --remove-all-unused-imports --ignore-init-module-imports .
56
python -m black tls_requests
67
python -m isort tls_requests
78
python -m flake8 tls_requests
89

10+
911
test:
1012
tox -p
1113
rm -rf *.egg-info

README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,31 @@ Start using TLS Requests with just a few lines of code:
4444
200
4545
```
4646

47+
Basic automatically rotates:
48+
49+
```pycon
50+
>>> import tls_requests
51+
>>> proxy_list = [
52+
"http://user1:[email protected]:8080",
53+
"http://user2:[email protected]:8081",
54+
"socks5://proxy.example.com:8082",
55+
"proxy.example.com:8083", # (defaults to http)
56+
"http://user:[email protected]:8084|1.0|US", # http://user:pass@host:port|weight|region
57+
]
58+
>>> r = tls_requests.get(
59+
"https://httpbin.org/get",
60+
proxy=proxy,
61+
headers=tls_requests.HeaderRotator(),
62+
tls_identifier=tls_requests.TLSIdentifierRotator()
63+
)
64+
>>> r
65+
<Response [200 OK]>
66+
>>> r.status_code
67+
200
68+
>>> tls_requests.HeaderRotator(strategy = "round_robin") # strategy: Literal["round_robin", "random", "weighted"]
69+
>>> tls_requests.Proxy("http://user1:[email protected]:8080", weight=0.1) # default weight: 1.0
70+
```
71+
4772
**Introduction**
4873
----------------
4974

docs/advanced/rotators.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# Using Rotators
2+
3+
The `tls_requests` library is designed to be smart out of the box. By default, it automatically rotates through realistic headers and client identifiers to make your requests appear authentic and avoid detection.
4+
5+
This guide explains how these default rotators work and how you can customize or disable them.
6+
7+
* * *
8+
9+
### Header Rotator
10+
11+
**Default Behavior: Automatic Rotation**
12+
13+
When you initialize a `Client` without specifying the `headers` parameter, it will **automatically rotate** through a built-in collection of header templates that mimic popular browsers like Chrome, Firefox, and Safari across different operating systems.
14+
15+
```python
16+
import tls_requests
17+
18+
# No extra configuration needed!
19+
# This client will automatically use a different, realistic header set for each request.
20+
with tls_requests.Client(headers=tls_requests.HeaderRotator()) as client:
21+
# Request 1 might have Chrome headers
22+
res1 = client.get("https://httpbin.org/headers")
23+
print(f"Request 1 UA: {res1.json()['headers']['User-Agent']}")
24+
25+
# Request 2 might have Firefox headers
26+
res2 = client.get("https://httpbin.org/headers")
27+
print(f"Request 2 UA: {res2.json()['headers']['User-Agent']}")
28+
```
29+
30+
**How to Override the Default Behavior:**
31+
32+
- **To rotate through your own list of headers**, pass a `list` of `dict`s:
33+
```python
34+
my_headers = [{"User-Agent": "MyBot/1.0"}, {"User-Agent": "MyBot/2.0"}]
35+
client = tls_requests.Client(headers=my_headers)
36+
```
37+
38+
- **To use a single, static set of headers (no rotation)**, pass a single `dict`:
39+
```python
40+
static_headers = {"User-Agent": "Always-The-Same-Bot/1.0"}
41+
client = tls_requests.Client(headers=static_headers)
42+
```
43+
44+
- **To completely disable default headers**, pass `None`:
45+
```python
46+
# This client will not add any default headers (like User-Agent).
47+
client = tls_requests.Client(headers=None)
48+
```
49+
50+
* * *
51+
52+
### TLS Client Identifier Rotator
53+
54+
**Default Behavior: Automatic Rotation**
55+
56+
Similar to headers, the `Client` **defaults to rotating** through all supported client identifier profiles (e.g., `chrome_120`, `firefox_120`, `safari_16_0`, etc.). This changes your TLS fingerprint with every request, an advanced technique to evade sophisticated anti-bot systems.
57+
58+
```python
59+
import tls_requests
60+
61+
# This client automatically changes its TLS fingerprint for each request.
62+
with tls_requests.Client(client_identifier=tls_requests.TLSIdentifierRotator()) as client:
63+
# These two requests will have different TLS profiles.
64+
res1 = client.get("https://tls.browserleaks.com/json")
65+
res2 = client.get("https://tls.browserleaks.com/json")
66+
```
67+
68+
**How to Override the Default Behavior:**
69+
70+
- **To rotate through a specific list of identifiers**, pass a `list` of strings:
71+
```python
72+
my_identifiers = ["chrome_120", "safari_16_0"]
73+
client = tls_requests.Client(client_identifier=my_identifiers)
74+
```
75+
76+
- **To use a single, static identifier**, pass a string:
77+
```python
78+
client = tls_requests.Client(client_identifier="chrome_120")
79+
```
80+
- **To disable rotation and use the library's single default identifier**, pass `None`:
81+
```python
82+
client = tls_requests.Client(client_identifier=None)
83+
```
84+
85+
* * *
86+
87+
### Proxy Rotator
88+
89+
Unlike headers and client identifiers, proxy rotation is **not enabled by default**, as the library cannot provide a list of free proxies. You must provide your own list to enable this feature.
90+
91+
To enable proxy rotation, pass a list of proxy strings to the `proxy` parameter. The library will automatically use a `weighted` strategy, prioritizing proxies that perform well.
92+
93+
```python
94+
import tls_requests
95+
96+
proxy_list = [
97+
"http://user1:[email protected]:8080",
98+
"http://user2:[email protected]:8081",
99+
"socks5://proxy.example.com:8082",
100+
"proxy.example.com:8083", # (defaults to http)
101+
"http://user:[email protected]:8084|1.0|US", # http://user:pass@host:port|weight|region
102+
]
103+
104+
# Provide a list to enable proxy rotation.
105+
with tls_requests.Client(proxy=proxy_list) as client:
106+
response = client.get("https://httpbin.org/get")
107+
```
108+
109+
For more control, you can create a `ProxyRotator` instance with a specific strategy:
110+
111+
```python
112+
from tls_requests.models.rotators import ProxyRotator
113+
114+
rotator = ProxyRotator.from_file(proxy_list, strategy="round_robin")
115+
116+
with tls_requests.Client(proxy=rotator) as client:
117+
response = client.get("https://httpbin.org/get")
118+
```
119+
120+
> **Note:** The `Client` automatically provides performance feedback (success/failure, latency) to the `ProxyRotator`, making the `weighted` strategy highly effective.
121+
122+
* * *
123+
124+
### Asynchronous Support
125+
126+
All rotator features, including the smart defaults, work identically with `AsyncClient`.
127+
128+
```python
129+
import tls_requests
130+
import asyncio
131+
132+
async def main():
133+
# This async client automatically uses default header and identifier rotation.
134+
async with tls_requests.AsyncClient(
135+
headers=tls_requests.HeaderRotator(),
136+
client_identifier=tls_requests.TLSIdentifierRotator()
137+
) as client:
138+
tasks = [client.get("https://httpbin.org/get") for _ in range(2)]
139+
responses = await asyncio.gather(*tasks)
140+
141+
for i, r in enumerate(responses):
142+
print(f"Async Request {i+1} status: {r.status_code}")
143+
144+
asyncio.run(main())
145+
```

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ nav:
3030
- Authentication: 'advanced/authentication.md'
3131
- Hooks: 'advanced/hooks.md'
3232
- Proxies: 'advanced/proxies.md'
33+
- Rotators: 'advanced/rotators.md'
3334
- TLS Client:
3435
- Install: 'tls/install.md'
3536
- Wrapper TLS Client: 'tls/index.md'

pyproject.toml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,47 @@ build-backend = 'setuptools.build_meta'
44

55
[tool.pytest.ini_options]
66
asyncio_mode = "auto"
7+
8+
[tool.black]
9+
line-length = 120
10+
target-version = ['py38', 'py39', 'py310', 'py311', 'py312']
11+
unstable = true
12+
exclude = '''
13+
/(
14+
\.git
15+
| \.hg
16+
| \.mypy_cache
17+
| \.tox
18+
| \.venv
19+
| _build
20+
| buck-out
21+
| build
22+
| dist
23+
)/
24+
'''
25+
26+
27+
[tool.flake8]
28+
max-line-length = 120
29+
max-complexity = 10
30+
extend-ignore = [
31+
"E203", # Whitespace before ':', which black handles differently than flake8.
32+
"W503", # Line break before binary operator, black's preferred style.
33+
]
34+
35+
# Comma-separated list of directories to exclude from linting.
36+
exclude = [
37+
".git",
38+
"__pycache__",
39+
"docs/source/conf.py",
40+
"old",
41+
"build",
42+
"dist",
43+
".venv",
44+
]
45+
46+
# Per-file ignores are very useful for specific cases.
47+
# For example, __init__.py files often have unused imports on purpose.
48+
per-file-ignores = [
49+
"__init__.py:F401", # Ignore "unused import" in __init__.py files
50+
]

requirements-dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ black==24.3.0
1717
coverage[toml]==7.6.1
1818
isort==5.13.2
1919
flake8==7.1.1
20+
autoflake==2.3.1
2021
mypy==1.11.2
2122
pytest==8.3.3
2223
pytest-asyncio==0.24.0

requirements.txt

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
11
# Base
22
chardet~=5.2.0
3-
requests~=2.32.3
4-
tqdm~=4.67.1
53
idna~=3.10

setup.cfg

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@ license_file = LICENSE
66
python_requires = >=3.8
77
install_requires =
88
chardet ~= 5.2.0
9-
requests ~= 2.32.3
10-
tqdm ~= 4.67.1
119
idna ~= 3.10
1210
classifiers =
1311
Development Status :: 5 - Production/Stable

0 commit comments

Comments
 (0)