Skip to content

feat(scraping): Add max_scrolls parameter to get_person_profile#361

Merged
stickerdaniel merged 4 commits intomainfrom
04-13-feat_scraping_add_max_scrolls_parameter_to_get_person_profile
Apr 14, 2026
Merged

feat(scraping): Add max_scrolls parameter to get_person_profile#361
stickerdaniel merged 4 commits intomainfrom
04-13-feat_scraping_add_max_scrolls_parameter_to_get_person_profile

Conversation

@stickerdaniel
Copy link
Copy Markdown
Owner

@stickerdaniel stickerdaniel commented Apr 13, 2026

Summary

  • Add max_scrolls parameter to get_person_profile so the LLM can increase scrolling for profiles with many items in a section (e.g., 30+ certifications)
  • Parameter is validated with Field(ge=1, le=50), defaults to None (5 for detail sections, 10 for posts)
  • Global override — applies to all sections in the call; heavy sections should be requested separately

Resolves #360

Synthetic prompt

Add an optional max_scrolls parameter to get_person_profile that flows through scrape_personextract_page_extract_page_oncescroll_to_bottom. Use Annotated[int, Field(ge=1, le=50)] for validation (matching search_jobs pattern). Default None preserves existing behavior (5 for detail pages, 10 for activity). Use if max_scrolls is not None not truthiness. Add tests for forwarding, defaults, override, and validation rejection.

Generated with Claude Opus 4.6

Copy link
Copy Markdown
Owner Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@stickerdaniel stickerdaniel marked this pull request as ready for review April 13, 2026 21:51
Copilot AI review requested due to automatic review settings April 13, 2026 21:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional max_scrolls argument to the person-profile scraping toolchain to allow callers to increase the scroll budget for heavy sections (e.g., many certifications), while preserving existing defaults when unset.

Changes:

  • Exposes max_scrolls on get_person_profile with Pydantic validation (ge=1, le=50) and forwards it into scraping.
  • Threads max_scrolls through scrape_personextract_page_extract_page_once to override scroll_to_bottom’s default scroll budgets (5 detail/main, 10 activity).
  • Adds tests covering forwarding, default behavior, override behavior, and invalid-value rejection.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
linkedin_mcp_server/tools/person.py Adds validated max_scrolls parameter to the MCP tool and forwards it to scrape_person.
linkedin_mcp_server/scraping/extractor.py Adds max_scrolls plumbing and uses it to override scroll budgets in _extract_page_once.
tests/test_tools.py Tests tool-level forwarding and validation rejection for max_scrolls.
tests/test_scraping.py Tests extractor-level forwarding and default/override behavior for scroll budgets.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +149 to +153
with pytest.raises(ValidationError, match="greater_than_equal"):
await mcp.call_tool(
"get_person_profile",
{"linkedin_username": "test-user", "max_scrolls": 0},
)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation assertion is a bit brittle: matching the internal Pydantic error code string ("greater_than_equal") can change across Pydantic/FastMCP versions and may cause unnecessary test failures. Consider asserting only that a ValidationError is raised, or matching a more stable part of the message (e.g., that max_scrolls must be >= 1 / the field name).

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 13, 2026

Greptile Summary

Adds an optional max_scrolls parameter to get_person_profile that flows through scrape_personextract_page_extract_page_once, controlling both the new "Show more" button-click loop on /details/ pages and the existing scroll_to_bottom iterations. The fixture-level fix (mock_locator.filter = MagicMock(return_value=mock_locator)) also resolves a pre-existing async-safety gap in several detail-page tests.

Confidence Score: 5/5

  • Safe to merge — backward-compatible, well-validated, and thoroughly tested.
  • All findings are P2 documentation suggestions. The implementation is correct: defaults are preserved with None checks (not truthiness), validation follows the established search_jobs pattern, the Show more loop has proper early-exit guards, and scroll_to_bottom already exits early when page height stabilises so the dual use of max_scrolls on detail pages is cheap in practice.
  • No files require special attention.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds max_scrolls to extract_page, _extract_page_once, and scrape_person; introduces a "Show more" button-click loop for /details/ pages with early-exit guards on timeout, invisibility, and button disappearance. Logic is sound; scroll_to_bottom also receives max_scrolls on detail pages (exits early when height stabilizes).
linkedin_mcp_server/tools/person.py Adds `max_scrolls: Annotated[int, Field(ge=1, le=50)]
tests/test_scraping.py Adds mock_locator.filter = MagicMock(return_value=mock_locator) to the shared fixture (fixing the pre-existing async-safety gap) and four new Show-more tests; asyncio.sleep is patched where the button is present to avoid real delays.
tests/test_tools.py Adds forwarding test (max_scrolls=15 reaches scrape_person) and Pydantic validation rejection test (max_scrolls=0 raises ValidationError). Clean coverage of the happy and invalid paths.

Sequence Diagram

sequenceDiagram
    participant LLM as LLM / MCP Client
    participant Tool as get_person_profile
    participant Scraper as scrape_person
    participant ExtPage as extract_page
    participant Once as _extract_page_once

    LLM->>Tool: "max_scrolls=20, sections="certifications""
    Tool->>Scraper: "max_scrolls=20"
    loop for each section (main_profile, certifications)
        Scraper->>ExtPage: "url, section_name, max_scrolls=20"
        ExtPage->>Once: "url, section_name, max_scrolls=20"
        alt is_details (/details/ URL)
            Note over Once: Show more loop (max 20 clicks)<br/>exits early when button gone
        end
        Note over Once: scroll_to_bottom (max 20 iters)<br/>exits early when height stable
        Once-->>ExtPage: ExtractedSection
        ExtPage-->>Scraper: ExtractedSection
    end
    Scraper-->>Tool: "{url, sections, references}"
    Tool-->>LLM: result dict
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/person.py
Line: 52-58

Comment:
**Docstring underspecifies `max_scrolls` for detail pages**

The docstring says `max_scrolls` is "the max number of 'Show more' button clicks" on detail sections, but the implementation also passes it as `max_scrolls` to `scroll_to_bottom` in `_extract_page_once` (the `else` branch covers all non-activity pages, including `/details/`). In practice `scroll_to_bottom` exits early once the page height stabilizes, so the added iterations are cheap, but the docstring creates an incomplete mental model.

```suggestion
            max_scrolls: Maximum pagination attempts per section to load more content.
                On detail sections (experience, certifications, skills, etc.) this
                controls both the max "Show more" button clicks and the max
                scroll-to-bottom iterations (scroll exits early once the page is
                fully loaded). On activity/posts it is only the max
                scroll-to-bottom iterations. Applies to all sections
                in this call. Default (None) uses 5 for detail sections and 10 for
                posts. Increase when a profile has many items in a section
                (e.g., 30+ certifications, max_scrolls=20). To avoid slowing down
                other sections, request heavy sections in a separate call.
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (4): Last reviewed commit: "fix: Address PR review feedback" | Re-trigger Greptile

- Match on field name instead of Pydantic error code for stability
Detail pages (experience, certifications, skills, etc.) paginate with
a "Show more" button inside <main>, not scroll-to-bottom. Click it in
a loop (bounded by max_scrolls) until the button disappears.

Resolves: #360
- Configure mock_locator.filter in fixture so Show more loop exits cleanly
  without hitting the exception handler in detail-page tests
@stickerdaniel stickerdaniel merged commit 23d8e1c into main Apr 14, 2026
5 checks passed
@stickerdaniel stickerdaniel deleted the 04-13-feat_scraping_add_max_scrolls_parameter_to_get_person_profile branch April 14, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Detail sections truncate results on profiles with 30+ items

2 participants