Skip to content

Feat/saved jobs fix and progress v2#338

Open
IfThingsThenStuff wants to merge 511 commits intostickerdaniel:mainfrom
IfThingsThenStuff:feat/saved-jobs-fix-and-progress-v2
Open

Feat/saved jobs fix and progress v2#338
IfThingsThenStuff wants to merge 511 commits intostickerdaniel:mainfrom
IfThingsThenStuff:feat/saved-jobs-fix-and-progress-v2

Conversation

@IfThingsThenStuff
Copy link
Copy Markdown

replaces #167

I wanted the ability to read out my saved jobs - so, I added it. It will handle multiple pages.

Let me know if this is aligned to what you would like to include. Let me know of any changes you think are needed.

Summary

  • Add scrape_saved_jobs to LinkedInExtractor — scrapes the LinkedIn jobs tracker page, extracts job IDs from link hrefs, and paginates through results using numbered page buttons
  • Add get_saved_jobs MCP tool with progress reporting via on_progress callback
  • Cap total_pages with max_pages for accurate progress percentages
  • Use Set for O(1) job ID deduplication in the DOM polling function
  • Add navigation delay between page clicks consistent with other scraping methods

stickerdaniel and others added 30 commits December 25, 2025 00:27
…sh-setup-bun-2.x

chore(deps): update oven-sh/setup-bun action to v2
…ns-checkout-6.x

chore(deps): update actions/checkout action to v6
…n-3.x

chore(deps): update python docker tag to v3.14
Python 3.14 is too new and key dependencies lack support:
- pydantic-core: PyO3 doesn't support Python 3.14 yet
- lxml: No pre-built wheels for Python 3.14

Python 3.13 is still modern and has full ecosystem support.
Python 3.14 is too new and key dependencies lack support:
- pydantic-core: PyO3 doesn't support Python 3.14 yet
- lxml: No pre-built wheels for Python 3.14

Python 3.13 is still modern and has full ecosystem support.
Add ToolAnnotations to all 6 tools with appropriate hints:
- get_person_profile: readOnly, openWorld (LinkedIn API)
- get_company_profile: readOnly, openWorld (LinkedIn API)
- get_job_details: readOnly, openWorld (LinkedIn API)
- search_jobs: readOnly, openWorld (LinkedIn API)
- get_recommended_jobs: readOnly, openWorld (LinkedIn API)
- close_session: not readOnly, not openWorld (local session mgmt)

Tool annotations help LLM clients understand tool behavior and make
better decisions about tool selection and user confirmations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)
…aniel#65)

## Summary

Add `ToolAnnotations` to all 6 tools to help LLM clients understand tool behavior and make better decisions about tool selection and user confirmations.

### Changes

- Added annotations to all 6 tools across 4 files:
  - `linkedin_mcp_server/tools/person.py`
  - `linkedin_mcp_server/tools/company.py`
  - `linkedin_mcp_server/tools/job.py`
  - `linkedin_mcp_server/server.py`

### Tool Annotations Added

| Tool | title | readOnlyHint | destructiveHint | openWorldHint |
|------|-------|--------------|-----------------|---------------|
| get_person_profile | Get Person Profile | ✅ | ❌ | ✅ |
| get_company_profile | Get Company Profile | ✅ | ❌ | ✅ |
| get_job_details | Get Job Details | ✅ | ❌ | ✅ |
| search_jobs | Search Jobs | ✅ | ❌ | ✅ |
| get_recommended_jobs | Get Recommended Jobs | ✅ | ❌ | ✅ |
| close_session | Close Session | ❌ | ❌ | ❌ |

### Annotation Rationale

- **readOnlyHint=true**: 5 tools are read-only data retrieval from LinkedIn
- **openWorldHint=true**: 5 tools access external LinkedIn API
- **close_session**: Local session management (not read-only, not external)
- **destructiveHint=false**: No tools delete or destroy any resources

### Why This Matters

Tool annotations are part of the MCP specification that help AI clients:
- Display appropriate confirmation dialogs for destructive operations
- Make better decisions about autonomous tool execution
- Show users accurate information about what tools do

### Testing

- ✅ Python import test passes
- ✅ All 6 tools verified

🤖 Generated with [Claude Code](https://claude.com/claude-code)
Replace non-existent main.py with module execution
(-m linkedin_mcp_server) in VS Code task configurations
Replace non-existent main.py with module execution
(-m linkedin_mcp_server) in VS Code task configurations

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Align VS Code tasks with module-based entry point.
> 
> - Replace `uv run main.py` with `uv run -m linkedin_mcp_server` across debug, standard run, and HTTP MCP server tasks
> - Update task `label` and `detail` to reflect server execution; preserve flags like `--debug`, `--no-headless`, `--no-lazy-init`, and `--transport streamable-http`
> - Config-only change in `.vscode/tasks.json`
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e0460c8. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
The CLI uses --log-level {DEBUG,INFO,WARNING,ERROR} not --debug
The CLI uses --log-level {DEBUG,INFO,WARNING,ERROR} not --debug
Upgrade fastmcp from >=2.10.1 to >=2.14.0 to fix the 307 Temporary
Redirect issue when using streamable-http transport.

The fix was merged in FastMCP PR #896 and #998, which changed default
paths to include trailing slashes and removed automatic path
manipulation that caused redirect loops with Starlette's Mount routing.

This also upgrades mcp from 1.10.1 to 1.25.0 which includes related
fixes confirmed by users in modelcontextprotocol/python-sdk#1168.

Resolves: stickerdaniel#54
Upgrade fastmcp from >=2.10.1 to >=2.14.0 to fix the 307 Temporary
Redirect issue when using streamable-http transport.

The fix was merged in FastMCP PR #896 and #998, which changed default
paths to include trailing slashes and removed automatic path
manipulation that caused redirect loops with Starlette's Mount routing.

This also upgrades mcp from 1.10.1 to 1.25.0 which includes related
fixes confirmed by users in modelcontextprotocol/python-sdk#1168.

Resolves: stickerdaniel#54

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> <sup>[Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) is generating a summary for commit f2b67c2. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Add fakeredis and docket loggers to noise reduction to prevent
DEBUG log pollution from FastMCP's internal task queue.
Add fakeredis and docket loggers to noise reduction to prevent
DEBUG log pollution from FastMCP's internal task queue.
Updated formatting for notes and tips in README.md.
…s-formatting

fix(docs): Refactor notes and tips formatting in README
- Upgrade to config:best-practices preset
- Add group:allNonMajor to reduce PR noise
- Enable vulnerability alerts with security label
- Group MCP ecosystem packages (fastmcp, mcp) together
- Automerge dev tool updates (pytest, ruff, pre-commit, ty)
…aniel#75)

- Upgrade to config:best-practices preset
- Add group:allNonMajor to reduce PR noise
- Enable vulnerability alerts with security label
- Group MCP ecosystem packages (fastmcp, mcp) together
- Automerge dev tool updates (pytest, ruff, pre-commit, ty)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> Upgrades Renovate configuration and streamlines dependency management.
> 
> - Switches to `extends: ["config:best-practices", "group:allNonMajor"]` to reduce PR noise and apply recommended settings
> - Enables `vulnerabilityAlerts` with `labels: ["security"]` and schedule `at any time`
> - Adds `packageRules` to group `fastmcp` and `mcp` minor/patch updates under "MCP ecosystem"
> - Automerges minor/patch updates for dev tools (`pytest**`, `ruff`, `pre-commit`, `ty`)
> 
> <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 67b1c16. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
… playwright as a dependency

- Upgraded linkedin-scraper from version 2.11.5 to 3.0.1 with updated source.
- Added playwright as a new dependency with a minimum version of 1.40.0.
- Removed keyring from dependencies.
- Updated pyproject.toml and uv.lock files accordingly.
…nhance session management

- Updated Dockerfile to use Playwright for browser automation instead of Chrome WebDriver.
- Simplified authentication flow by implementing session file management.
- Removed keyring dependency and adjusted configuration for environment variable usage.
- Enhanced error handling and logging for better debugging.
- Updated README to reflect changes in authentication and usage instructions.
- Refactored tools for company, job, and person scraping to utilize new Playwright-based architecture.
…rting

- Remove unused retry utility (linkedin-scraper v3 has built-in retry)
- Add MCPContextProgressCallback to report progress to MCP clients
- Wire FastMCP Context into all scraping tools
- Remove claude-code-review workflow
…ce browser reset functionality

- Updated Dockerfile to use cache for faster dependency synchronization.
- Added reset_browser_for_testing function to improve test isolation in browser management.
- Adjusted job tool error handling to streamline exception management.
…anagement and documentation

- Removed version specification from docker-compose.yml for flexibility.
- Updated README to clarify session timeout details and added a warning about sensitive data in session files.
- Improved comments in setup.py regarding manual login timeout.
- Refactored job.py to return a dictionary with job URLs and count instead of a list for better data structure.
stickerdaniel and others added 15 commits February 20, 2026 19:06
Bump version to 4.1.2 to trigger release workflow test.
)

Bump version to 4.1.2 to trigger release workflow test.
…orting

- Fix wait_for_function positional arg bug (arg= keyword required)
- Switch pagination from broken "Next" button to numbered page buttons
  (button[aria-label="Page N"]) which reliably triggers content updates
- Replace arbitrary asyncio.sleep() calls with DOM-based waiting via
  wait_for_function to detect new job links
- Embed job IDs summary in section text so LLMs always surface them
- Add on_progress callback for per-page progress reporting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detect total pages from pagination buttons on the page instead of using
max_pages (10), so progress reports reflect reality (1/2, 2/2 instead
of 1/10, 2/10).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kups, and add tests

Address review findings: cap total_pages with max_pages to fix misleading
progress percentages, add _NAV_DELAY between page clicks for rate-limit
safety, convert JS prevIds.includes() to Set.has() for O(1) lookups, guard
division by zero in _report, fix docstring inaccuracies, and add 5 targeted
tests covering progress callbacks, timeout graceful stop, max_pages cap,
and session expired error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address Greptile review: use Set for O(1) dedup in _EXTRACT_JOB_IDS_JS,
expose max_pages parameter on get_saved_jobs MCP tool, and document the
new tool in AGENTS.md, README.md, and docs/docker-hub.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	.dockerignore
#	.github/workflows/ci.yml
#	.github/workflows/release.yml
#	.gitignore
#	.opencode/agents/code-reviewer.md
#	.opencode/agents/code-simplifier.md
#	.opencode/agents/comment-analyzer.md
#	.opencode/agents/pr-test-analyzer.md
#	.opencode/agents/silent-failure-hunter.md
#	.opencode/agents/type-design-analyzer.md
#	AGENTS.md
#	Dockerfile
#	README.md
#	RELEASE_NOTES_TEMPLATE.md
#	btca.config.jsonc
#	docker-compose.yml
#	docs/docker-hub.md
#	linkedin_mcp_server/__init__.py
#	linkedin_mcp_server/authentication.py
#	linkedin_mcp_server/cli_main.py
#	linkedin_mcp_server/config/loaders.py
#	linkedin_mcp_server/config/schema.py
#	linkedin_mcp_server/core/__init__.py
#	linkedin_mcp_server/core/auth.py
#	linkedin_mcp_server/core/browser.py
#	linkedin_mcp_server/core/utils.py
#	linkedin_mcp_server/drivers/browser.py
#	linkedin_mcp_server/error_handler.py
#	linkedin_mcp_server/exceptions.py
#	linkedin_mcp_server/logging_config.py
#	linkedin_mcp_server/scraping/__init__.py
#	linkedin_mcp_server/scraping/extractor.py
#	linkedin_mcp_server/scraping/fields.py
#	linkedin_mcp_server/server.py
#	linkedin_mcp_server/setup.py
#	linkedin_mcp_server/tools/__init__.py
#	linkedin_mcp_server/tools/company.py
#	linkedin_mcp_server/tools/job.py
#	linkedin_mcp_server/tools/person.py
#	manifest.json
#	pyproject.toml
#	renovate.json
#	scripts/dump_snapshots.py
#	tests/conftest.py
#	tests/test_authentication.py
#	tests/test_browser_driver.py
#	tests/test_cli_main.py
#	tests/test_config.py
#	tests/test_core_utils.py
#	tests/test_error_handler.py
#	tests/test_fields.py
#	tests/test_scraping.py
#	tests/test_tools.py
#	uv.lock
…orting

- Fix wait_for_function positional arg bug (arg= keyword required)
- Switch pagination from broken "Next" button to numbered page buttons
  (button[aria-label="Page N"]) which reliably triggers content updates
- Replace arbitrary asyncio.sleep() calls with DOM-based waiting via
  wait_for_function to detect new job links
- Embed job IDs summary in section text so LLMs always surface them
- Add on_progress callback for per-page progress reporting
Detect total pages from pagination buttons on the page instead of using
max_pages (10), so progress reports reflect reality (1/2, 2/2 instead
of 1/10, 2/10).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kups, and add tests

Address review findings: cap total_pages with max_pages to fix misleading
progress percentages, add _NAV_DELAY between page clicks for rate-limit
safety, convert JS prevIds.includes() to Set.has() for O(1) lookups, guard
division by zero in _report, fix docstring inaccuracies, and add 5 targeted
tests covering progress callbacks, timeout graceful stop, max_pages cap,
and session expired error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address Greptile review: use Set for O(1) dedup in _EXTRACT_JOB_IDS_JS,
expose max_pages parameter on get_saved_jobs MCP tool, and document the
new tool in AGENTS.md, README.md, and docs/docker-hub.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
  `get_saved_jobs` was failing in the real stdio MCP flow even though
  `get_inbox` worked in the same session. The root cause was a stale
  duplicate `scrape_saved_jobs` implementation in
  `linkedin_mcp_server/scraping/extractor.py`.

  The second duplicate definition was overriding the current
  implementation and still called `extract_page(url)` without the required
  `section_name` argument. In the live path this raised:

  `TypeError: LinkedInExtractor.extract_page() missing 1 required positional argument: 'section_name'`

  This change removes the stale duplicate and keeps the current
  implementation that correctly calls
  `extract_page(url, "saved_jobs")`.

  The saved-jobs tests also missed this because they mocked `extract_page`
  too loosely and returned raw strings instead of the current
  `ExtractedSection` shape. This commit updates those tests and adds a
  regression check that `scrape_saved_jobs` passes the `"saved_jobs"`
  section name explicitly.

  Verified with:
  - `pytest tests/test_scraping.py -k saved_jobs`
  - `pytest tests/test_tools.py -k get_saved_jobs`
  - real stdio MCP session using the `linkedin-playground` launch command,
    where both `get_inbox` and `get_saved_jobs` now succeed
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR introduces scrape_saved_jobs on LinkedInExtractor and the corresponding get_saved_jobs MCP tool, scraping LinkedIn's /jobs-tracker/ page with full multi-page support, sliding-window pagination refresh, O(1) ID deduplication, and progress reporting. All three issues from the previous review round (pages_visited/sections_requested keys, stale total_pages, missing bug-report checkbox) are resolved in the 1ccb9f9 fixup commit.

Confidence Score: 5/5

  • Safe to merge; the one remaining finding is a manifest entry omission that does not affect runtime behavior.
  • All prior P0/P1 concerns are fixed, the implementation is well-tested across single-page, multi-page, timeout, and empty-result paths, and the only outstanding item is the missing get_saved_jobs entry in manifest.json — a P2 documentation gap that doesn't break the tool.
  • manifest.json — add the get_saved_jobs entry before shipping as a DXT bundle.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds scrape_saved_jobs with correct sliding-window pagination, O(1) ID dedup via prev_ids set, timeout-safe wait_for_function loop, and refreshing total_pages each iteration; reuses shared _EXTRACT_JOB_IDS_JS / _EXTRACT_MAX_PAGE_JS helpers.
linkedin_mcp_server/tools/job.py Adds get_saved_jobs MCP tool with max_pages (1-10) parameter, progress via _report callback that caps at 99%, and full auth-error handling consistent with other job tools.
tests/test_scraping.py Adds 5 tests covering single-page, multi-page, timeout, max_pages cap, and empty-result scenarios; verifies progress callbacks, correct job_ids dedup, and documented return keys only.
manifest.json All existing tools are listed but get_saved_jobs is absent, making the DXT/MCPB bundle manifest incomplete for tool discovery.
.github/ISSUE_TEMPLATE/bug_report.md Adds get_saved_jobs checkbox to the tool list, addressing the previous review comment.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[get_saved_jobs tool called] --> B[report_progress 0%]
    B --> C[scrape_saved_jobs\nextract_page /jobs-tracker/]
    C --> D[evaluate _EXTRACT_JOB_IDS_JS\npage 1 IDs]
    D --> E[evaluate _EXTRACT_MAX_PAGE_JS\ninitial total_pages = min_max_pages]
    E --> F[on_progress page=1 total=total_pages]
    F --> G{page_num in range 2..max_pages}
    G -->|next page| H{button\nPage N exists?}
    H -->|no| Z[build id_summary\nreturn result]
    H -->|yes| I[prev_ids = set all_job_ids\nclick button\nsleep _NAV_DELAY]
    I --> J{wait_for_function\nnew ID in DOM?\ntimeout=15s}
    J -->|timeout| Z
    J -->|new ID found| K[scroll_to_bottom\nevaluate EXTRACT_MAIN_TEXT_JS\nevaluate _EXTRACT_JOB_IDS_JS]
    K --> L{new_ids empty?}
    L -->|yes| Z
    L -->|no| M[extend all_job_ids\nrefresh total_pages]
    M --> N[on_progress page=N total=total_pages]
    N --> G
    Z --> O[report_progress 100%\nreturn url + sections + job_ids]
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: manifest.json
Line: 47-99

Comment:
**`get_saved_jobs` missing from `manifest.json` tools list**

Every other tool (`get_job_details`, `search_jobs`, `get_sidebar_profiles`, etc.) is listed in the `tools` array. Omitting `get_saved_jobs` means DXT/MCPB bundle clients that enumerate tools from the manifest won't surface it. The tool still works at runtime via the MCP protocol, but the manifest entry is the authoritative list for bundle discovery.

```suggestion
    {
      "name": "get_job_details",
      "description": "Retrieve specific job posting details using LinkedIn job IDs"
    },
    {
      "name": "get_saved_jobs",
      "description": "Get saved/bookmarked jobs from the LinkedIn job tracker"
    },
    {
      "name": "search_jobs",
      "description": "Search for jobs with filters like keywords and location"
    },
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (8): Last reviewed commit: "Merge branch 'main' into feat/saved-jobs..." | Re-trigger Greptile

@IfThingsThenStuff
Copy link
Copy Markdown
Author

@stickerdaniel - I've re-applied my old PR to the latest code. It would be great to get this merged in before it drifts again. Thanks

@IfThingsThenStuff
Copy link
Copy Markdown
Author

@stickerdaniel - anything stopping this getting merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants