Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,18 @@ html_result = scrape_tool.invoke({
})
print(html_result)

# --- Test Case 1b: Auto-Mode (let ScrapingBee pick the cheapest config that works) ---
# mode='auto' tries configs cheapest-first and charges only for the winning one
# (0 credits if all fail). max_cost caps the spend. The credits charged come back
# in the "Spb-auto-cost" response header.
# Note: mode='auto' must NOT be combined with render_js/premium_proxy/stealth_proxy.
print("\n--- 1b. Testing ScrapeUrlTool (Auto-Mode) ---")
auto_result = scrape_tool.invoke({
'url': 'http://httpbin.org/html',
'params': {'mode': 'auto', 'max_cost': 25}
})
print(auto_result)

# --- Test Case 2: Scrape a PDF file ---
print("\n--- 2. Testing ScrapeUrlTool (PDF) ---")
pdf_result = scrape_tool.invoke({
Expand Down
6 changes: 5 additions & 1 deletion langchain_scrapingbee/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
"For text content, returns the scraped HTML/text directly. "
"Use params for screenshots: {'screenshot_full_page': 'true'} or data extraction: {'extract_rules': '{...}'}. "
"Supports JavaScript rendering, mobile simulation, proxy geolocation, and AI-powered extraction."
"When unsure which scraping config a site needs, prefer params {'mode': 'auto'} to let ScrapingBee pick the cheapest config that succeeds (charged only for the winning config); optionally cap spend with 'max_cost'. Do not combine mode='auto' with render_js/premium_proxy/stealth_proxy."
"params should be a valid dictionary"
"For non-text files, use 'render_js=false'"
"Before running ai_query and ai_extract_rules, scrapingbee converts the HTML content to markdown. So the ai model only have access to markdown not the html"
Expand Down Expand Up @@ -79,6 +80,8 @@
- Stealth proxy limitations: No infinite_scroll, evaluate, custom headers/cookies
]
- "json_response": true - Wrap response in JSON format with metadata. This can also be used to find internal xhr requests
- "max_cost": int >= 1 - Only valid with mode='auto'. Caps the credits a request may spend: escalation climbs only to the costliest tier whose cost is <= max_cost (e.g. max_cost=25 stops at premium+JS, max_cost=10 stops at premium). Amount charged never exceeds it. Omit for uncapped (stealth reachable).
- "mode": "auto" - Cheapest-first auto-escalation: ScrapingBee tries scraping configs from cheapest to most expensive (1cr basic -> 5cr JS -> 10cr premium -> 25cr premium+JS -> 75cr stealth), stops at the first that succeeds, and charges ONLY for the winning config (0 credits if all fail). Use this when you don't know which config a site needs and want the cheapest one that works. GET-only. The credits charged are returned in the "Spb-auto-cost" response header. Must NOT be combined with render_js, premium_proxy, stealth_proxy, or transparent_status_code (the API returns 400). Optionally pair with max_cost to cap spend.
- "own_proxy": "protocol://user:pass@host:port" - Use your own proxy
- "premium_proxy": true - Use premium proxy pool (10-25 credits)
- "render_js": true/false - Enable JS rendering (default: true)
Expand Down Expand Up @@ -230,7 +233,8 @@ class ScrapeUrlInput(BaseModel):
Examples:
{"screenshot_full_page": true, "wait": 2000}
{"extract_rules": '{"title": "h1", "price": ".price"}'}
{"country_code": "gb", "device": "mobile"}"""
{"country_code": "gb", "device": "mobile"}
{"mode": "auto", "max_cost": 25}"""
)
headers: Optional[Dict[str, str]] = Field( # ADD THIS
default_factory=dict,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "langchain-scrapingbee"
version = "0.2.0"
version = "0.2.1"
description = "An integration package connecting Scrapingbee and LangChain"
authors = []
readme = "README.md"
Expand Down
1 change: 1 addition & 0 deletions release_notes.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
* 0.2.1 - Documented Auto-Mode (`mode=auto` / `max_cost`) in ScrapeUrlTool so agents can pick the cheapest scraping config that succeeds
* 0.2.0 - Added YouTube APIs — Search, Metadata, Trainability, Transcript
* 0.1.0 — Initial version