feat(scraping): Add max_scrolls parameter to get_person_profile#361
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optional max_scrolls argument to the person-profile scraping toolchain to allow callers to increase the scroll budget for heavy sections (e.g., many certifications), while preserving existing defaults when unset.
Changes:
- Exposes
max_scrollsonget_person_profilewith Pydantic validation (ge=1,le=50) and forwards it into scraping. - Threads
max_scrollsthroughscrape_person→extract_page→_extract_page_onceto overridescroll_to_bottom’s default scroll budgets (5 detail/main, 10 activity). - Adds tests covering forwarding, default behavior, override behavior, and invalid-value rejection.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
linkedin_mcp_server/tools/person.py |
Adds validated max_scrolls parameter to the MCP tool and forwards it to scrape_person. |
linkedin_mcp_server/scraping/extractor.py |
Adds max_scrolls plumbing and uses it to override scroll budgets in _extract_page_once. |
tests/test_tools.py |
Tests tool-level forwarding and validation rejection for max_scrolls. |
tests/test_scraping.py |
Tests extractor-level forwarding and default/override behavior for scroll budgets. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/test_tools.py
Outdated
| with pytest.raises(ValidationError, match="greater_than_equal"): | ||
| await mcp.call_tool( | ||
| "get_person_profile", | ||
| {"linkedin_username": "test-user", "max_scrolls": 0}, | ||
| ) |
There was a problem hiding this comment.
The validation assertion is a bit brittle: matching the internal Pydantic error code string ("greater_than_equal") can change across Pydantic/FastMCP versions and may cause unnecessary test failures. Consider asserting only that a ValidationError is raised, or matching a more stable part of the message (e.g., that max_scrolls must be >= 1 / the field name).
Greptile SummaryAdds an optional Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant LLM as LLM / MCP Client
participant Tool as get_person_profile
participant Scraper as scrape_person
participant ExtPage as extract_page
participant Once as _extract_page_once
LLM->>Tool: "max_scrolls=20, sections="certifications""
Tool->>Scraper: "max_scrolls=20"
loop for each section (main_profile, certifications)
Scraper->>ExtPage: "url, section_name, max_scrolls=20"
ExtPage->>Once: "url, section_name, max_scrolls=20"
alt is_details (/details/ URL)
Note over Once: Show more loop (max 20 clicks)<br/>exits early when button gone
end
Note over Once: scroll_to_bottom (max 20 iters)<br/>exits early when height stable
Once-->>ExtPage: ExtractedSection
ExtPage-->>Scraper: ExtractedSection
end
Scraper-->>Tool: "{url, sections, references}"
Tool-->>LLM: result dict
Prompt To Fix All With AIThis is a comment left during a code review.
Path: linkedin_mcp_server/tools/person.py
Line: 52-58
Comment:
**Docstring underspecifies `max_scrolls` for detail pages**
The docstring says `max_scrolls` is "the max number of 'Show more' button clicks" on detail sections, but the implementation also passes it as `max_scrolls` to `scroll_to_bottom` in `_extract_page_once` (the `else` branch covers all non-activity pages, including `/details/`). In practice `scroll_to_bottom` exits early once the page height stabilizes, so the added iterations are cheap, but the docstring creates an incomplete mental model.
```suggestion
max_scrolls: Maximum pagination attempts per section to load more content.
On detail sections (experience, certifications, skills, etc.) this
controls both the max "Show more" button clicks and the max
scroll-to-bottom iterations (scroll exits early once the page is
fully loaded). On activity/posts it is only the max
scroll-to-bottom iterations. Applies to all sections
in this call. Default (None) uses 5 for detail sections and 10 for
posts. Increase when a profile has many items in a section
(e.g., 30+ certifications, max_scrolls=20). To avoid slowing down
other sections, request heavy sections in a separate call.
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (4): Last reviewed commit: "fix: Address PR review feedback" | Re-trigger Greptile |
- Match on field name instead of Pydantic error code for stability
Detail pages (experience, certifications, skills, etc.) paginate with a "Show more" button inside <main>, not scroll-to-bottom. Click it in a loop (bounded by max_scrolls) until the button disappears. Resolves: #360
- Configure mock_locator.filter in fixture so Show more loop exits cleanly without hitting the exception handler in detail-page tests

Summary
max_scrollsparameter toget_person_profileso the LLM can increase scrolling for profiles with many items in a section (e.g., 30+ certifications)Field(ge=1, le=50), defaults toNone(5 for detail sections, 10 for posts)Resolves #360
Synthetic prompt
Generated with Claude Opus 4.6