Skip to content

feat: bulk connections export tools#170

Open
Desperado wants to merge 13 commits intostickerdaniel:mainfrom
Desperado:feature/bulk-connections-export
Open

feat: bulk connections export tools#170
Desperado wants to merge 13 commits intostickerdaniel:mainfrom
Desperado:feature/bulk-connections-export

Conversation

@Desperado
Copy link

@Desperado Desperado commented Feb 28, 2026

Summary

Adds two new MCP tools for bulk LinkedIn connections export:

get_my_connections

Collects connection usernames via infinite scroll on the connections page. Configurable limit and max_scrolls. Returns {username, name, headline} for each connection.

extract_contact_details

Enriches profiles with structured contact data by scraping the main profile page and contact info overlay. Returns parsed fields instead of raw text:

Field Source
first_name, last_name Profile page (first line)
headline Profile page (after connection degree marker)
location Profile page (after headline)
company Profile page (after "Contact info" label)
email, phone, website, birthday Contact info overlay (labeled sections)
profile_raw, contact_info_raw Original innerText kept as fallback

Rate-limit handling: Processes profiles in chunked batches with configurable chunk_size (default 5) and chunk_delay (default 30s). Stops early on hard rate limit, returns partial results with rate_limited flag. Individual page loads retry once after 5s backoff on soft rate limits (empty-content responses).

Files changed

  • linkedin_mcp_server/tools/connections.py — New tool module (follows tools/person.py pattern)
  • linkedin_mcp_server/scraping/extractor.py — Added scrape_connections_list(), scrape_contact_batch(), and _parse_contact_record() parser
  • linkedin_mcp_server/server.py — Registered register_connections_tools(mcp)

Test plan

  • All 105 existing tests pass
  • ruff format, ruff check, ty check all clean
  • get_my_connections with limit=10 returns 10 valid usernames
  • extract_contact_details with 3 usernames returns structured fields (emails found for 2/3)
  • No rate limiting triggered during testing
  • Test with larger batch (50+) to verify chunking and inter-chunk delays

🤖 Generated with Claude Code

Greptile Summary

This PR adds two new MCP tools for bulk LinkedIn connections export: get_my_connections (infinite scroll collection of connection usernames) and extract_contact_details (batch enrichment with structured contact data). The implementation follows existing code patterns and includes thoughtful rate-limit handling with configurable chunking and delays.

Key changes:

  • New connections.py module with two tools following the existing tool registration pattern
  • Bulk export methods in extractor.py with chunked batch processing and soft/hard rate-limit handling
  • Structured contact parser _parse_contact_record that extracts fields like email, phone, location from raw profile text
  • ERR_ABORTED navigation error handling for timing edge cases
  • Network degree filter (1st/2nd/3rd+) added to search_people

Code quality: The implementation is well-structured with proper error handling, progress reporting, deduplication, and defensive programming. Most issues from previous review rounds have been addressed.

Minor issue: Inconsistent ERR_ABORTED handling between initial navigation and re-navigation (line 559) could cause crashes in rare edge cases where the page navigates away during scroll and re-navigation encounters the same timing issue.

Confidence Score: 4/5

  • This PR is safe to merge with low risk - well-tested implementation following established patterns
  • Score reflects thorough testing (105 tests passing), clean code following existing patterns, and comprehensive rate-limit handling. Deducting 1 point for the ERR_ABORTED inconsistency that could cause failures in rare edge cases where page navigation timing issues occur during re-navigation.
  • Pay attention to linkedin_mcp_server/scraping/extractor.py - verify the ERR_ABORTED handling inconsistency at line 559 won't impact production usage

Important Files Changed

Filename Overview
linkedin_mcp_server/tools/connections.py New file adding two MCP tools for bulk connection export - get_my_connections for scraping connection list and extract_contact_details for enriching profiles with contact data. Follows existing tool patterns, includes proper error handling and progress reporting.
linkedin_mcp_server/scraping/extractor.py Adds bulk export methods scrape_connections_list and scrape_contact_batch with chunked rate-limit handling, plus _parse_contact_record parser for structured field extraction. Includes ERR_ABORTED navigation handling and soft rate-limit sentinels. Small network filter added to search_people.
linkedin_mcp_server/server.py Simple registration of new connections tools module - follows existing pattern for tool registration with no changes to other logic.
linkedin_mcp_server/tools/person.py Adds optional network parameter to search_people for filtering by connection degree (1st, 2nd, 3rd+) - minimal, focused change with proper parameter passthrough.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Tool as connections.py<br/>(extract_contact_details)
    participant Extractor as extractor.py<br/>(scrape_contact_batch)
    participant Browser as LinkedIn Pages
    
    Client->>Tool: extract_contact_details(usernames, chunk_size, chunk_delay)
    Tool->>Tool: Parse & deduplicate usernames
    Tool->>Extractor: scrape_contact_batch(usernames, chunk_size, chunk_delay)
    
    loop For each chunk
        loop For each username in chunk
            Extractor->>Browser: Navigate to profile page
            Browser-->>Extractor: profile_text
            
            alt Soft rate limit (empty content)
                Extractor->>Extractor: Check for _RATE_LIMITED_MSG sentinel
                Extractor->>Extractor: Skip username, add to failed[]
            else Success
                Extractor->>Browser: Navigate to contact info overlay
                Browser-->>Extractor: contact_text
                Extractor->>Extractor: _parse_contact_record(profile, contact)
                Extractor->>Extractor: Add to contacts[]
            end
            
            alt Hard rate limit (RateLimitError)
                Extractor->>Extractor: Add to failed[], set rate_limited=true
                Extractor->>Extractor: Break loop
            end
        end
        
        Extractor->>Tool: Report progress
        
        alt Not last chunk
            Extractor->>Extractor: Sleep(chunk_delay)
        end
    end
    
    Extractor-->>Tool: {contacts[], total, failed[], rate_limited, pages_visited[]}
    Tool-->>Client: Return enriched data
Loading

Last reviewed commit: b9add13

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 20 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +560 to +570
// Headline: try known selectors, then parse card text
let headline = '';
if (card) {
const headlineEl = card.querySelector(
'.mn-connection-card__occupation, .entity-result__primary-subtitle, span.t-normal'
);
if (headlineEl) headline = headlineEl.innerText.trim();
}
if (!headline && card) {
// Fallback: split card text by newlines, second non-empty line is usually headline
const lines = card.innerText.split('\\n').map(l => l.trim()).filter(Boolean);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soft rate-limit sentinel silently corrupts contact records

extract_page returns the module-level _RATE_LIMITED_MSG string sentinel ("[Rate limited] LinkedIn blocked this section…") when a soft rate limit persists after one retry, instead of raising RateLimitError. scrape_contact_batch never checks for this sentinel before calling _parse_contact_record, so the sentinel is treated as valid profile text.

The result is a silently corrupted record:

  • first_name"[Rate"
  • last_name"limited] LinkedIn blocked this section. Try again later or request fewer sections."
  • headline, location, companyNone
  • The contact overlay is still scraped unnecessarily
  • The record is added to contacts with no error indication (only profile_raw would reveal the problem)

The same risk applies if _extract_overlay returns the sentinel for contact_text.

A guard should be added before calling _parse_contact_record:

# Scrape main profile page
profile_text = await self.extract_page(profile_url)
pages_visited.append(profile_url)

if profile_text == _RATE_LIMITED_MSG:
    logger.warning("Soft rate limit on profile %s, skipping", username)
    failed.append(username)
    await asyncio.sleep(_NAV_DELAY)
    continue

# Scrape contact info overlay
contact_text = await self._extract_overlay(contact_url)
pages_visited.append(contact_url)

if contact_text == _RATE_LIMITED_MSG:
    contact_text = ""  # fall back to empty; parsed fields will be None
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 560-570

Comment:
**Soft rate-limit sentinel silently corrupts contact records**

`extract_page` returns the module-level `_RATE_LIMITED_MSG` string sentinel (`"[Rate limited] LinkedIn blocked this section…"`) when a soft rate limit persists after one retry, instead of raising `RateLimitError`. `scrape_contact_batch` never checks for this sentinel before calling `_parse_contact_record`, so the sentinel is treated as valid profile text.

The result is a silently corrupted record:
- `first_name``"[Rate"`
- `last_name``"limited] LinkedIn blocked this section. Try again later or request fewer sections."`
- `headline`, `location`, `company``None`
- The contact overlay is still scraped unnecessarily
- The record is added to `contacts` with no error indication (only `profile_raw` would reveal the problem)

The same risk applies if `_extract_overlay` returns the sentinel for `contact_text`.

A guard should be added before calling `_parse_contact_record`:

```python
# Scrape main profile page
profile_text = await self.extract_page(profile_url)
pages_visited.append(profile_url)

if profile_text == _RATE_LIMITED_MSG:
    logger.warning("Soft rate limit on profile %s, skipping", username)
    failed.append(username)
    await asyncio.sleep(_NAV_DELAY)
    continue

# Scrape contact info overlay
contact_text = await self._extract_overlay(contact_url)
pages_visited.append(contact_url)

if contact_text == _RATE_LIMITED_MSG:
    contact_text = ""  # fall back to empty; parsed fields will be None
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in beebf5e — added guards for both _RATE_LIMITED_MSG sentinels. Profile sentinel skips the username (added to failed), contact overlay sentinel falls back to empty string.

Comment on lines +514 to +519
url = "https://www.linkedin.com/mynetwork/invite-connect/connections/"

await self._page.goto(url, wait_until="domcontentloaded", timeout=30000)
await detect_rate_limit(self._page)

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_size=0 causes an unhandled ValueError

range(0, total, chunk_size) raises ValueError: range() arg 3 must not be zero when chunk_size is 0. This exception propagates to the tool handler in connections.py and is returned via handle_tool_error, but the error message ("range() arg 3 must not be zero") is opaque to the caller.

A simple guard at the top of the method would produce a much clearer error:

async def scrape_contact_batch(
    self,
    usernames: list[str],
    chunk_size: int = 5,
    chunk_delay: float = 30.0,
    progress_cb: Callable[[int, int], Awaitable[None]] | None = None,
) -> dict[str, Any]:
    if chunk_size <= 0:
        raise ValueError(f"chunk_size must be a positive integer, got {chunk_size}")
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 514-519

Comment:
**`chunk_size=0` causes an unhandled `ValueError`**

`range(0, total, chunk_size)` raises `ValueError: range() arg 3 must not be zero` when `chunk_size` is `0`. This exception propagates to the tool handler in `connections.py` and is returned via `handle_tool_error`, but the error message ("range() arg 3 must not be zero") is opaque to the caller.

A simple guard at the top of the method would produce a much clearer error:

```python
async def scrape_contact_batch(
    self,
    usernames: list[str],
    chunk_size: int = 5,
    chunk_delay: float = 30.0,
    progress_cb: Callable[[int, int], Awaitable[None]] | None = None,
) -> dict[str, Any]:
    if chunk_size <= 0:
        raise ValueError(f"chunk_size must be a positive integer, got {chunk_size}")
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in beebf5e — added if chunk_size <= 0: raise ValueError(...) guard at the top of scrape_contact_batch.

Comment on lines +596 to +601
progress_cb: Callable[[int, int], Awaitable[None]] | None = None,
) -> dict[str, Any]:
"""Enrich a list of profiles with contact details in chunked batches.

For each username: scrapes main profile + contact_info overlay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate-limited username is not added to failed

When RateLimitError is caught, the current username is not appended to failed before breaking out of the loop. The return value only signals rate_limited: True but doesn't record which username triggered the stop, making it difficult for callers to resume from where processing halted.

except RateLimitError:
    logger.warning("Rate limited during contact batch at %s", username)
    failed.append(username)  # record the username that triggered the stop
    rate_limited = True
    break
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 596-601

Comment:
**Rate-limited username is not added to `failed`**

When `RateLimitError` is caught, the current username is not appended to `failed` before breaking out of the loop. The return value only signals `rate_limited: True` but doesn't record which username triggered the stop, making it difficult for callers to resume from where processing halted.

```python
except RateLimitError:
    logger.warning("Rate limited during contact batch at %s", username)
    failed.append(username)  # record the username that triggered the stop
    rate_limited = True
    break
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in beebf5e — added failed.append(username) before the break.

Comment on lines +527 to +582
await scroll_to_bottom(self._page, pause_time=1.0, max_scrolls=max_scrolls)

# Extract connection data from profile link elements
raw_connections: list[dict[str, str]] = await self._page.evaluate(
"""() => {
const results = [];
const seen = new Set();
const links = document.querySelectorAll('main a[href*="/in/"]');
for (const a of links) {
const href = a.getAttribute('href') || '';
const match = href.match(/\\/in\\/([^/?#]+)/);
if (!match) continue;
const username = match[1];
if (seen.has(username)) continue;
seen.add(username);

// Walk up to the connection card container
const card = a.closest('li') || a.parentElement;

// Name: try known selectors, then the link's own visible text
let name = '';
if (card) {
const nameEl = card.querySelector(
'.mn-connection-card__name, .entity-result__title-text, span[dir="ltr"], span.t-bold'
);
if (nameEl) name = nameEl.innerText.trim();
}
if (!name) {
// The profile link itself often contains the person's name
const linkText = a.innerText.trim();
if (linkText && linkText.length < 80) name = linkText;
}

// Headline: try known selectors, then parse card text
let headline = '';
if (card) {
const headlineEl = card.querySelector(
'.mn-connection-card__occupation, .entity-result__primary-subtitle, span.t-normal'
);
if (headlineEl) headline = headlineEl.innerText.trim();
}
if (!headline && card) {
// Fallback: split card text by newlines, second non-empty line is usually headline
const lines = card.innerText.split('\\n').map(l => l.trim()).filter(Boolean);
if (lines.length >= 2) headline = lines[1];
}

results.push({ username, name, headline });
}
return results;
}"""
)

# Apply limit
if limit > 0:
raw_connections = raw_connections[:limit]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient when limit is small - scrolls through all connections before truncating.

If limit=10 but user has 500 connections, this scrolls through all 500 (~8 minutes with 1s pauses), then discards 490. Consider checking len(results) >= limit inside the JavaScript loop and breaking early.

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 527-582

Comment:
Inefficient when `limit` is small - scrolls through all connections before truncating.

If `limit=10` but user has 500 connections, this scrolls through all 500 (~8 minutes with 1s pauses), then discards 490. Consider checking `len(results) >= limit` inside the JavaScript loop and breaking early.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — the suggestion to break early in the JS loop wouldn't help because the expensive part is scroll_to_bottom(), which runs before the JS extraction. By the time the DOM query executes, all scrolling is already done. Users already control scroll depth via the max_scrolls parameter (e.g. max_scrolls=3 for quick results). A proper fix would require refactoring the generic scroll_to_bottom utility to accept an early-exit predicate, which is out of scope for this PR.

Comment on lines +37 to +38
limit: int = 0,
max_scrolls: int = 50,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No validation for negative values - max_scrolls=-10 would bypass scrolling entirely (range produces empty sequence).

Consider adding validation: if max_scrolls < 0: raise ValueError("max_scrolls must be non-negative")

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 37-38

Comment:
No validation for negative values - `max_scrolls=-10` would bypass scrolling entirely (range produces empty sequence).

Consider adding validation: `if max_scrolls < 0: raise ValueError("max_scrolls must be non-negative")`

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — max_scrolls=-10 simply makes range(max_scrolls) produce an empty sequence, meaning no scrolling occurs. This doesn't crash or corrupt data; it just returns whatever connections are visible without scrolling. Since this is an MCP tool parameter with a sensible default of 50, a negative value is an obvious caller mistake that produces a self-explanatory empty result. Adding validation here would be pure noise.

)
async def get_my_connections(
ctx: Context,
limit: int = 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for negative limit values. If limit=-10, it bypasses the if limit > 0 check on extractor.py:581 and behaves as unlimited, which is counterintuitive.

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 37

Comment:
Missing validation for negative `limit` values. If `limit=-10`, it bypasses the `if limit > 0` check on extractor.py:581 and behaves as unlimited, which is counterintuitive.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — limit=-10 behaving as unlimited is fine. The parameter semantics are "0 = unlimited" and any non-positive value logically means "no limit". This is consistent and not a bug.

usernames: str,
ctx: Context,
chunk_size: int = 5,
chunk_delay: float = 30.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for negative chunk_delay. A negative value would cause asyncio.sleep() to return immediately, bypassing the delay entirely.

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 92

Comment:
Missing validation for negative `chunk_delay`. A negative value would cause `asyncio.sleep()` to return immediately, bypassing the delay entirely.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — asyncio.sleep() with a negative value returns immediately (same as 0), which just means "no delay". This is an MCP tool parameter with a sensible default of 30s; passing a negative value is a caller error with a harmless outcome.

async def extract_contact_details(
usernames: str,
ctx: Context,
chunk_size: int = 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation for chunk_size < 1. While extractor.py validates <= 0, the error won't be clear to callers.

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 91

Comment:
Missing validation for `chunk_size < 1`. While extractor.py validates `<= 0`, the error won't be clear to callers.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't fix — the extractor already validates chunk_size <= 0 with a clear ValueError. Duplicating validation at the tool layer adds no value; the error message from the extractor ("chunk_size must be a positive integer, got 0") is already user-friendly.

Desperado and others added 13 commits March 22, 2026 15:46
…contact_details)

Two new MCP tools for collecting LinkedIn connections and enriching
them with contact details (email, phone, etc.) in rate-limit-aware
chunked batches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- chunk_delay: int → float to match scrape_contact_batch signature
- Report actual completed count instead of total on early rate-limit stop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of returning raw innerText blobs, parse profile and contact
overlay text into structured fields (first_name, last_name, email,
phone, headline, location, company, website, birthday). Raw text
kept as _raw suffix fields for fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use %.0fs for chunk_delay in log message (float, not int)
- Update scrape_contact_batch docstring to list actual structured fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When rate limiting stops processing early, the progress message now
shows "Stopped early due to rate limit (N/M processed)" instead of
the misleading "Complete".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The GitHub suggestion merge created duplicate lines (completed/msg
assigned twice, report_progress called twice). Cleaned up to single
correct version.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… regex, failed tracking

- Guard against _RATE_LIMITED_MSG sentinel corrupting parsed records
  (skip profile on soft rate limit, fall back to empty contact text)
- Validate chunk_size > 0 with clear error message
- Extend degree regex to match ordinals like "3rd+" and "4th"
- Add rate-limited username to failed list for caller resumability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents scraping the same profile twice when duplicate usernames
are passed (e.g. "user1,user1,user2").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Support filtering LinkedIn people search by connection degree (1st/2nd/3rd+)
via the `network` parameter passed through to LinkedIn's search URL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add stabilization delay after scroll_to_bottom and re-navigate if
LinkedIn redirected away from the connections page during infinite
scroll. Prevents "Execution context was destroyed" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Catch ERR_ABORTED on initial goto (happens when page is already
  loaded or LinkedIn redirects during navigation), retry after delay
- Add stabilization delay after scroll_to_bottom
- Re-navigate if LinkedIn redirected away during infinite scroll

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Desperado Desperado force-pushed the feature/bulk-connections-export branch from b9add13 to 5c7d9f4 Compare March 22, 2026 14:54
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR adds two new MCP tools — get_my_connections (bulk connection list via infinite scroll) and extract_contact_details (batch profile enrichment with contact data) — plus a network filter on the existing search_people tool. The tool-layer wiring, progress reporting, deduplication, chunked rate-limit handling, and the _parse_contact_record parser are all well-structured.

However, extract_contact_details is entirely broken in its current state due to a single root-cause error in scrape_contact_batch:

  • extract_page and _extract_overlay both require a positional section_name: str argument (see extractor.py lines 440–443, 553–556). Both calls in scrape_contact_batch omit this argument, causing TypeError on every iteration.
  • The TypeError is caught by the bare except Exception block and the username is silently added to failed — so the method always returns an empty contacts list with all usernames in failed.
  • Even after providing section_name, both methods return ExtractedSection objects, not strings. The sentinel comparisons (profile_text == _RATE_LIMITED_MSG) will always be False, and passing ExtractedSection objects to _parse_contact_record (which expects str) would raise AttributeError. The fix is to extract .text from the returned objects, consistent with every other caller in the codebase.

The fix is a targeted, mechanical change in scrape_contact_batch (see inline comment). Everything else in the PR is on the happy path.

Confidence Score: 3/5

  • Not safe to merge — extract_contact_details silently returns empty results for every input due to a TypeError from missing section_name arguments.
  • The primary new feature (extract_contact_details) is completely non-functional: every profile enrichment attempt throws TypeError (missing required section_name arg), is caught silently, and lands in failed. This breaks the main user path. The fix is mechanical and confined to ~10 lines of scrape_contact_batch, so it's one targeted change away from being mergeable. All prior review concerns are resolved, and get_my_connections and the network filter are clean.
  • linkedin_mcp_server/scraping/extractor.py — specifically scrape_contact_batch (lines 1411–1431)

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds _parse_contact_record, scrape_connections_list, and scrape_contact_batch. Critical bug: scrape_contact_batch calls extract_page and _extract_overlay without the required section_name argument and treats the ExtractedSection return value as a raw string, causing TypeError (silently swallowed) for every profile — extract_contact_details will never enrich any profile.
linkedin_mcp_server/tools/connections.py New tool module for get_my_connections and extract_contact_details. Tool layer looks correct — progress reporting, deduplication, rate-limit messaging, and error handling are all properly implemented. Quality depends on the underlying extractor being fixed.
linkedin_mcp_server/server.py One-line registration of the new connections tool module — straightforward and correct.
linkedin_mcp_server/tools/person.py Adds optional network filter parameter to search_people — clean, minimal change, correctly threaded through to the extractor.

Sequence Diagram

sequenceDiagram
    participant Client as MCP Client
    participant CT as connections.py
    participant EX as LinkedInExtractor
    participant LI as LinkedIn

    Client->>CT: extract_contact_details(usernames, chunk_size, chunk_delay)
    CT->>EX: scrape_contact_batch(usernames, chunk_size, chunk_delay, progress_cb)

    loop Each chunk
        loop Each username in chunk
            EX->>LI: extract_page(profile_url, section_name) → ExtractedSection
            LI-->>EX: profile innerText
            EX->>LI: _extract_overlay(contact_url, section_name) → ExtractedSection
            LI-->>EX: contact overlay innerText
            EX->>EX: _parse_contact_record(profile_text, contact_text)
            EX-->>CT: progress_cb(completed, total)
        end
        EX->>EX: asyncio.sleep(chunk_delay)
    end

    EX-->>CT: {contacts, total, failed, rate_limited, pages_visited}
    CT-->>Client: result dict

    Client->>CT: get_my_connections(limit, max_scrolls)
    CT->>EX: scrape_connections_list(limit, max_scrolls)
    EX->>LI: goto /mynetwork/invite-connect/connections/
    LI-->>EX: connections page
    EX->>EX: scroll_to_bottom(max_scrolls)
    EX->>LI: page.evaluate() — extract username/name/headline
    LI-->>EX: raw_connections[]
    EX-->>CT: {connections, total, url, pages_visited}
    CT-->>Client: result dict
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 1413-1433

Comment:
**`extract_page` / `_extract_overlay` called with wrong signature — every profile fails**

Both `extract_page` and `_extract_overlay` have a required `section_name: str` second parameter (see lines 440–443 and 553–556 respectively). Calling them without it raises `TypeError: extract_page() missing 1 required positional argument: 'section_name'` / `_extract_overlay() missing 1 required positional argument: 'section_name'` for every iteration.

That `TypeError` is silently swallowed by the `except Exception` block, so every username ends up in `failed` and `contacts` is always empty — making `extract_contact_details` functionally broken.

Even after adding the missing argument, `extract_page` and `_extract_overlay` return `ExtractedSection` objects, not raw strings. The current comparisons to `_RATE_LIMITED_MSG` (e.g. `if profile_text == _RATE_LIMITED_MSG`) will always be `False` (comparing dataclass to `str`), and passing the objects directly to `_parse_contact_record(profile_text, contact_text)` would raise `AttributeError: 'ExtractedSection' object has no attribute 'split'`. The rest of the codebase consistently accesses `.text` (e.g. line 1126: `if extracted.text and extracted.text != _RATE_LIMITED_MSG`).

The fix requires both changes together:

```python
                    # Scrape main profile page
                    extracted_profile = await self.extract_page(profile_url, section_name="profile")
                    pages_visited.append(profile_url)
                    profile_text = extracted_profile.text

                    if profile_text == _RATE_LIMITED_MSG:
                        logger.warning(
                            "Soft rate limit on profile %s, skipping", username
                        )
                        failed.append(username)
                        await asyncio.sleep(_NAV_DELAY)
                        continue

                    # Scrape contact info overlay
                    extracted_contact = await self._extract_overlay(contact_url, section_name="contact_info")
                    pages_visited.append(contact_url)
                    contact_text = extracted_contact.text

                    if contact_text == _RATE_LIMITED_MSG:
                        contact_text = ""  # fall back to empty; parsed fields will be None
```

How can I resolve this? If you propose a fix, please make it concise.

Reviews (9): Last reviewed commit: "fix: Handle ERR_ABORTED and context dest..." | Re-trigger Greptile

Comment on lines +1413 to +1433
profile_text = await self.extract_page(profile_url)
pages_visited.append(profile_url)

if profile_text == _RATE_LIMITED_MSG:
logger.warning(
"Soft rate limit on profile %s, skipping", username
)
failed.append(username)
await asyncio.sleep(_NAV_DELAY)
continue

# Scrape contact info overlay
contact_text = await self._extract_overlay(contact_url)
pages_visited.append(contact_url)

if contact_text == _RATE_LIMITED_MSG:
contact_text = (
"" # fall back to empty; parsed fields will be None
)

parsed = _parse_contact_record(profile_text, contact_text)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 extract_page / _extract_overlay called with wrong signature — every profile fails

Both extract_page and _extract_overlay have a required section_name: str second parameter (see lines 440–443 and 553–556 respectively). Calling them without it raises TypeError: extract_page() missing 1 required positional argument: 'section_name' / _extract_overlay() missing 1 required positional argument: 'section_name' for every iteration.

That TypeError is silently swallowed by the except Exception block, so every username ends up in failed and contacts is always empty — making extract_contact_details functionally broken.

Even after adding the missing argument, extract_page and _extract_overlay return ExtractedSection objects, not raw strings. The current comparisons to _RATE_LIMITED_MSG (e.g. if profile_text == _RATE_LIMITED_MSG) will always be False (comparing dataclass to str), and passing the objects directly to _parse_contact_record(profile_text, contact_text) would raise AttributeError: 'ExtractedSection' object has no attribute 'split'. The rest of the codebase consistently accesses .text (e.g. line 1126: if extracted.text and extracted.text != _RATE_LIMITED_MSG).

The fix requires both changes together:

                    # Scrape main profile page
                    extracted_profile = await self.extract_page(profile_url, section_name="profile")
                    pages_visited.append(profile_url)
                    profile_text = extracted_profile.text

                    if profile_text == _RATE_LIMITED_MSG:
                        logger.warning(
                            "Soft rate limit on profile %s, skipping", username
                        )
                        failed.append(username)
                        await asyncio.sleep(_NAV_DELAY)
                        continue

                    # Scrape contact info overlay
                    extracted_contact = await self._extract_overlay(contact_url, section_name="contact_info")
                    pages_visited.append(contact_url)
                    contact_text = extracted_contact.text

                    if contact_text == _RATE_LIMITED_MSG:
                        contact_text = ""  # fall back to empty; parsed fields will be None
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 1413-1433

Comment:
**`extract_page` / `_extract_overlay` called with wrong signature — every profile fails**

Both `extract_page` and `_extract_overlay` have a required `section_name: str` second parameter (see lines 440–443 and 553–556 respectively). Calling them without it raises `TypeError: extract_page() missing 1 required positional argument: 'section_name'` / `_extract_overlay() missing 1 required positional argument: 'section_name'` for every iteration.

That `TypeError` is silently swallowed by the `except Exception` block, so every username ends up in `failed` and `contacts` is always empty — making `extract_contact_details` functionally broken.

Even after adding the missing argument, `extract_page` and `_extract_overlay` return `ExtractedSection` objects, not raw strings. The current comparisons to `_RATE_LIMITED_MSG` (e.g. `if profile_text == _RATE_LIMITED_MSG`) will always be `False` (comparing dataclass to `str`), and passing the objects directly to `_parse_contact_record(profile_text, contact_text)` would raise `AttributeError: 'ExtractedSection' object has no attribute 'split'`. The rest of the codebase consistently accesses `.text` (e.g. line 1126: `if extracted.text and extracted.text != _RATE_LIMITED_MSG`).

The fix requires both changes together:

```python
                    # Scrape main profile page
                    extracted_profile = await self.extract_page(profile_url, section_name="profile")
                    pages_visited.append(profile_url)
                    profile_text = extracted_profile.text

                    if profile_text == _RATE_LIMITED_MSG:
                        logger.warning(
                            "Soft rate limit on profile %s, skipping", username
                        )
                        failed.append(username)
                        await asyncio.sleep(_NAV_DELAY)
                        continue

                    # Scrape contact info overlay
                    extracted_contact = await self._extract_overlay(contact_url, section_name="contact_info")
                    pages_visited.append(contact_url)
                    contact_text = extracted_contact.text

                    if contact_text == _RATE_LIMITED_MSG:
                        contact_text = ""  # fall back to empty; parsed fields will be None
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant