feat: bulk connections export tools#170
feat: bulk connections export tools#170Desperado wants to merge 13 commits intostickerdaniel:mainfrom
Conversation
| // Headline: try known selectors, then parse card text | ||
| let headline = ''; | ||
| if (card) { | ||
| const headlineEl = card.querySelector( | ||
| '.mn-connection-card__occupation, .entity-result__primary-subtitle, span.t-normal' | ||
| ); | ||
| if (headlineEl) headline = headlineEl.innerText.trim(); | ||
| } | ||
| if (!headline && card) { | ||
| // Fallback: split card text by newlines, second non-empty line is usually headline | ||
| const lines = card.innerText.split('\\n').map(l => l.trim()).filter(Boolean); |
There was a problem hiding this comment.
Soft rate-limit sentinel silently corrupts contact records
extract_page returns the module-level _RATE_LIMITED_MSG string sentinel ("[Rate limited] LinkedIn blocked this section…") when a soft rate limit persists after one retry, instead of raising RateLimitError. scrape_contact_batch never checks for this sentinel before calling _parse_contact_record, so the sentinel is treated as valid profile text.
The result is a silently corrupted record:
first_name→"[Rate"last_name→"limited] LinkedIn blocked this section. Try again later or request fewer sections."headline,location,company→None- The contact overlay is still scraped unnecessarily
- The record is added to
contactswith no error indication (onlyprofile_rawwould reveal the problem)
The same risk applies if _extract_overlay returns the sentinel for contact_text.
A guard should be added before calling _parse_contact_record:
# Scrape main profile page
profile_text = await self.extract_page(profile_url)
pages_visited.append(profile_url)
if profile_text == _RATE_LIMITED_MSG:
logger.warning("Soft rate limit on profile %s, skipping", username)
failed.append(username)
await asyncio.sleep(_NAV_DELAY)
continue
# Scrape contact info overlay
contact_text = await self._extract_overlay(contact_url)
pages_visited.append(contact_url)
if contact_text == _RATE_LIMITED_MSG:
contact_text = "" # fall back to empty; parsed fields will be NonePrompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 560-570
Comment:
**Soft rate-limit sentinel silently corrupts contact records**
`extract_page` returns the module-level `_RATE_LIMITED_MSG` string sentinel (`"[Rate limited] LinkedIn blocked this section…"`) when a soft rate limit persists after one retry, instead of raising `RateLimitError`. `scrape_contact_batch` never checks for this sentinel before calling `_parse_contact_record`, so the sentinel is treated as valid profile text.
The result is a silently corrupted record:
- `first_name` → `"[Rate"`
- `last_name` → `"limited] LinkedIn blocked this section. Try again later or request fewer sections."`
- `headline`, `location`, `company` → `None`
- The contact overlay is still scraped unnecessarily
- The record is added to `contacts` with no error indication (only `profile_raw` would reveal the problem)
The same risk applies if `_extract_overlay` returns the sentinel for `contact_text`.
A guard should be added before calling `_parse_contact_record`:
```python
# Scrape main profile page
profile_text = await self.extract_page(profile_url)
pages_visited.append(profile_url)
if profile_text == _RATE_LIMITED_MSG:
logger.warning("Soft rate limit on profile %s, skipping", username)
failed.append(username)
await asyncio.sleep(_NAV_DELAY)
continue
# Scrape contact info overlay
contact_text = await self._extract_overlay(contact_url)
pages_visited.append(contact_url)
if contact_text == _RATE_LIMITED_MSG:
contact_text = "" # fall back to empty; parsed fields will be None
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in beebf5e — added guards for both _RATE_LIMITED_MSG sentinels. Profile sentinel skips the username (added to failed), contact overlay sentinel falls back to empty string.
| url = "https://www.linkedin.com/mynetwork/invite-connect/connections/" | ||
|
|
||
| await self._page.goto(url, wait_until="domcontentloaded", timeout=30000) | ||
| await detect_rate_limit(self._page) | ||
|
|
||
| try: |
There was a problem hiding this comment.
chunk_size=0 causes an unhandled ValueError
range(0, total, chunk_size) raises ValueError: range() arg 3 must not be zero when chunk_size is 0. This exception propagates to the tool handler in connections.py and is returned via handle_tool_error, but the error message ("range() arg 3 must not be zero") is opaque to the caller.
A simple guard at the top of the method would produce a much clearer error:
async def scrape_contact_batch(
self,
usernames: list[str],
chunk_size: int = 5,
chunk_delay: float = 30.0,
progress_cb: Callable[[int, int], Awaitable[None]] | None = None,
) -> dict[str, Any]:
if chunk_size <= 0:
raise ValueError(f"chunk_size must be a positive integer, got {chunk_size}")Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 514-519
Comment:
**`chunk_size=0` causes an unhandled `ValueError`**
`range(0, total, chunk_size)` raises `ValueError: range() arg 3 must not be zero` when `chunk_size` is `0`. This exception propagates to the tool handler in `connections.py` and is returned via `handle_tool_error`, but the error message ("range() arg 3 must not be zero") is opaque to the caller.
A simple guard at the top of the method would produce a much clearer error:
```python
async def scrape_contact_batch(
self,
usernames: list[str],
chunk_size: int = 5,
chunk_delay: float = 30.0,
progress_cb: Callable[[int, int], Awaitable[None]] | None = None,
) -> dict[str, Any]:
if chunk_size <= 0:
raise ValueError(f"chunk_size must be a positive integer, got {chunk_size}")
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in beebf5e — added if chunk_size <= 0: raise ValueError(...) guard at the top of scrape_contact_batch.
| progress_cb: Callable[[int, int], Awaitable[None]] | None = None, | ||
| ) -> dict[str, Any]: | ||
| """Enrich a list of profiles with contact details in chunked batches. | ||
|
|
||
| For each username: scrapes main profile + contact_info overlay. | ||
|
|
There was a problem hiding this comment.
Rate-limited username is not added to failed
When RateLimitError is caught, the current username is not appended to failed before breaking out of the loop. The return value only signals rate_limited: True but doesn't record which username triggered the stop, making it difficult for callers to resume from where processing halted.
except RateLimitError:
logger.warning("Rate limited during contact batch at %s", username)
failed.append(username) # record the username that triggered the stop
rate_limited = True
breakPrompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 596-601
Comment:
**Rate-limited username is not added to `failed`**
When `RateLimitError` is caught, the current username is not appended to `failed` before breaking out of the loop. The return value only signals `rate_limited: True` but doesn't record which username triggered the stop, making it difficult for callers to resume from where processing halted.
```python
except RateLimitError:
logger.warning("Rate limited during contact batch at %s", username)
failed.append(username) # record the username that triggered the stop
rate_limited = True
break
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in beebf5e — added failed.append(username) before the break.
| await scroll_to_bottom(self._page, pause_time=1.0, max_scrolls=max_scrolls) | ||
|
|
||
| # Extract connection data from profile link elements | ||
| raw_connections: list[dict[str, str]] = await self._page.evaluate( | ||
| """() => { | ||
| const results = []; | ||
| const seen = new Set(); | ||
| const links = document.querySelectorAll('main a[href*="/in/"]'); | ||
| for (const a of links) { | ||
| const href = a.getAttribute('href') || ''; | ||
| const match = href.match(/\\/in\\/([^/?#]+)/); | ||
| if (!match) continue; | ||
| const username = match[1]; | ||
| if (seen.has(username)) continue; | ||
| seen.add(username); | ||
|
|
||
| // Walk up to the connection card container | ||
| const card = a.closest('li') || a.parentElement; | ||
|
|
||
| // Name: try known selectors, then the link's own visible text | ||
| let name = ''; | ||
| if (card) { | ||
| const nameEl = card.querySelector( | ||
| '.mn-connection-card__name, .entity-result__title-text, span[dir="ltr"], span.t-bold' | ||
| ); | ||
| if (nameEl) name = nameEl.innerText.trim(); | ||
| } | ||
| if (!name) { | ||
| // The profile link itself often contains the person's name | ||
| const linkText = a.innerText.trim(); | ||
| if (linkText && linkText.length < 80) name = linkText; | ||
| } | ||
|
|
||
| // Headline: try known selectors, then parse card text | ||
| let headline = ''; | ||
| if (card) { | ||
| const headlineEl = card.querySelector( | ||
| '.mn-connection-card__occupation, .entity-result__primary-subtitle, span.t-normal' | ||
| ); | ||
| if (headlineEl) headline = headlineEl.innerText.trim(); | ||
| } | ||
| if (!headline && card) { | ||
| // Fallback: split card text by newlines, second non-empty line is usually headline | ||
| const lines = card.innerText.split('\\n').map(l => l.trim()).filter(Boolean); | ||
| if (lines.length >= 2) headline = lines[1]; | ||
| } | ||
|
|
||
| results.push({ username, name, headline }); | ||
| } | ||
| return results; | ||
| }""" | ||
| ) | ||
|
|
||
| # Apply limit | ||
| if limit > 0: | ||
| raw_connections = raw_connections[:limit] |
There was a problem hiding this comment.
Inefficient when limit is small - scrolls through all connections before truncating.
If limit=10 but user has 500 connections, this scrolls through all 500 (~8 minutes with 1s pauses), then discards 490. Consider checking len(results) >= limit inside the JavaScript loop and breaking early.
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 527-582
Comment:
Inefficient when `limit` is small - scrolls through all connections before truncating.
If `limit=10` but user has 500 connections, this scrolls through all 500 (~8 minutes with 1s pauses), then discards 490. Consider checking `len(results) >= limit` inside the JavaScript loop and breaking early.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Won't fix — the suggestion to break early in the JS loop wouldn't help because the expensive part is scroll_to_bottom(), which runs before the JS extraction. By the time the DOM query executes, all scrolling is already done. Users already control scroll depth via the max_scrolls parameter (e.g. max_scrolls=3 for quick results). A proper fix would require refactoring the generic scroll_to_bottom utility to accept an early-exit predicate, which is out of scope for this PR.
| limit: int = 0, | ||
| max_scrolls: int = 50, |
There was a problem hiding this comment.
No validation for negative values - max_scrolls=-10 would bypass scrolling entirely (range produces empty sequence).
Consider adding validation: if max_scrolls < 0: raise ValueError("max_scrolls must be non-negative")
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 37-38
Comment:
No validation for negative values - `max_scrolls=-10` would bypass scrolling entirely (range produces empty sequence).
Consider adding validation: `if max_scrolls < 0: raise ValueError("max_scrolls must be non-negative")`
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Won't fix — max_scrolls=-10 simply makes range(max_scrolls) produce an empty sequence, meaning no scrolling occurs. This doesn't crash or corrupt data; it just returns whatever connections are visible without scrolling. Since this is an MCP tool parameter with a sensible default of 50, a negative value is an obvious caller mistake that produces a self-explanatory empty result. Adding validation here would be pure noise.
| ) | ||
| async def get_my_connections( | ||
| ctx: Context, | ||
| limit: int = 0, |
There was a problem hiding this comment.
Missing validation for negative limit values. If limit=-10, it bypasses the if limit > 0 check on extractor.py:581 and behaves as unlimited, which is counterintuitive.
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 37
Comment:
Missing validation for negative `limit` values. If `limit=-10`, it bypasses the `if limit > 0` check on extractor.py:581 and behaves as unlimited, which is counterintuitive.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Won't fix — limit=-10 behaving as unlimited is fine. The parameter semantics are "0 = unlimited" and any non-positive value logically means "no limit". This is consistent and not a bug.
| usernames: str, | ||
| ctx: Context, | ||
| chunk_size: int = 5, | ||
| chunk_delay: float = 30.0, |
There was a problem hiding this comment.
Missing validation for negative chunk_delay. A negative value would cause asyncio.sleep() to return immediately, bypassing the delay entirely.
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 92
Comment:
Missing validation for negative `chunk_delay`. A negative value would cause `asyncio.sleep()` to return immediately, bypassing the delay entirely.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Won't fix — asyncio.sleep() with a negative value returns immediately (same as 0), which just means "no delay". This is an MCP tool parameter with a sensible default of 30s; passing a negative value is a caller error with a harmless outcome.
| async def extract_contact_details( | ||
| usernames: str, | ||
| ctx: Context, | ||
| chunk_size: int = 5, |
There was a problem hiding this comment.
Missing validation for chunk_size < 1. While extractor.py validates <= 0, the error won't be clear to callers.
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/connections.py
Line: 91
Comment:
Missing validation for `chunk_size < 1`. While extractor.py validates `<= 0`, the error won't be clear to callers.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Won't fix — the extractor already validates chunk_size <= 0 with a clear ValueError. Duplicating validation at the tool layer adds no value; the error message from the extractor ("chunk_size must be a positive integer, got 0") is already user-friendly.
…contact_details) Two new MCP tools for collecting LinkedIn connections and enriching them with contact details (email, phone, etc.) in rate-limit-aware chunked batches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- chunk_delay: int → float to match scrape_contact_batch signature - Report actual completed count instead of total on early rate-limit stop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of returning raw innerText blobs, parse profile and contact overlay text into structured fields (first_name, last_name, email, phone, headline, location, company, website, birthday). Raw text kept as _raw suffix fields for fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use %.0fs for chunk_delay in log message (float, not int) - Update scrape_contact_batch docstring to list actual structured fields Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When rate limiting stops processing early, the progress message now shows "Stopped early due to rate limit (N/M processed)" instead of the misleading "Complete". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The GitHub suggestion merge created duplicate lines (completed/msg assigned twice, report_progress called twice). Cleaned up to single correct version. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… regex, failed tracking - Guard against _RATE_LIMITED_MSG sentinel corrupting parsed records (skip profile on soft rate limit, fall back to empty contact text) - Validate chunk_size > 0 with clear error message - Extend degree regex to match ordinals like "3rd+" and "4th" - Add rate-limited username to failed list for caller resumability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents scraping the same profile twice when duplicate usernames are passed (e.g. "user1,user1,user2"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Support filtering LinkedIn people search by connection degree (1st/2nd/3rd+) via the `network` parameter passed through to LinkedIn's search URL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add stabilization delay after scroll_to_bottom and re-navigate if LinkedIn redirected away from the connections page during infinite scroll. Prevents "Execution context was destroyed" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Catch ERR_ABORTED on initial goto (happens when page is already loaded or LinkedIn redirects during navigation), retry after delay - Add stabilization delay after scroll_to_bottom - Re-navigate if LinkedIn redirected away during infinite scroll Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b9add13 to
5c7d9f4
Compare
Greptile SummaryThis PR adds two new MCP tools — However,
The fix is a targeted, mechanical change in Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as MCP Client
participant CT as connections.py
participant EX as LinkedInExtractor
participant LI as LinkedIn
Client->>CT: extract_contact_details(usernames, chunk_size, chunk_delay)
CT->>EX: scrape_contact_batch(usernames, chunk_size, chunk_delay, progress_cb)
loop Each chunk
loop Each username in chunk
EX->>LI: extract_page(profile_url, section_name) → ExtractedSection
LI-->>EX: profile innerText
EX->>LI: _extract_overlay(contact_url, section_name) → ExtractedSection
LI-->>EX: contact overlay innerText
EX->>EX: _parse_contact_record(profile_text, contact_text)
EX-->>CT: progress_cb(completed, total)
end
EX->>EX: asyncio.sleep(chunk_delay)
end
EX-->>CT: {contacts, total, failed, rate_limited, pages_visited}
CT-->>Client: result dict
Client->>CT: get_my_connections(limit, max_scrolls)
CT->>EX: scrape_connections_list(limit, max_scrolls)
EX->>LI: goto /mynetwork/invite-connect/connections/
LI-->>EX: connections page
EX->>EX: scroll_to_bottom(max_scrolls)
EX->>LI: page.evaluate() — extract username/name/headline
LI-->>EX: raw_connections[]
EX-->>CT: {connections, total, url, pages_visited}
CT-->>Client: result dict
Prompt To Fix All With AIThis is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 1413-1433
Comment:
**`extract_page` / `_extract_overlay` called with wrong signature — every profile fails**
Both `extract_page` and `_extract_overlay` have a required `section_name: str` second parameter (see lines 440–443 and 553–556 respectively). Calling them without it raises `TypeError: extract_page() missing 1 required positional argument: 'section_name'` / `_extract_overlay() missing 1 required positional argument: 'section_name'` for every iteration.
That `TypeError` is silently swallowed by the `except Exception` block, so every username ends up in `failed` and `contacts` is always empty — making `extract_contact_details` functionally broken.
Even after adding the missing argument, `extract_page` and `_extract_overlay` return `ExtractedSection` objects, not raw strings. The current comparisons to `_RATE_LIMITED_MSG` (e.g. `if profile_text == _RATE_LIMITED_MSG`) will always be `False` (comparing dataclass to `str`), and passing the objects directly to `_parse_contact_record(profile_text, contact_text)` would raise `AttributeError: 'ExtractedSection' object has no attribute 'split'`. The rest of the codebase consistently accesses `.text` (e.g. line 1126: `if extracted.text and extracted.text != _RATE_LIMITED_MSG`).
The fix requires both changes together:
```python
# Scrape main profile page
extracted_profile = await self.extract_page(profile_url, section_name="profile")
pages_visited.append(profile_url)
profile_text = extracted_profile.text
if profile_text == _RATE_LIMITED_MSG:
logger.warning(
"Soft rate limit on profile %s, skipping", username
)
failed.append(username)
await asyncio.sleep(_NAV_DELAY)
continue
# Scrape contact info overlay
extracted_contact = await self._extract_overlay(contact_url, section_name="contact_info")
pages_visited.append(contact_url)
contact_text = extracted_contact.text
if contact_text == _RATE_LIMITED_MSG:
contact_text = "" # fall back to empty; parsed fields will be None
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (9): Last reviewed commit: "fix: Handle ERR_ABORTED and context dest..." | Re-trigger Greptile |
| profile_text = await self.extract_page(profile_url) | ||
| pages_visited.append(profile_url) | ||
|
|
||
| if profile_text == _RATE_LIMITED_MSG: | ||
| logger.warning( | ||
| "Soft rate limit on profile %s, skipping", username | ||
| ) | ||
| failed.append(username) | ||
| await asyncio.sleep(_NAV_DELAY) | ||
| continue | ||
|
|
||
| # Scrape contact info overlay | ||
| contact_text = await self._extract_overlay(contact_url) | ||
| pages_visited.append(contact_url) | ||
|
|
||
| if contact_text == _RATE_LIMITED_MSG: | ||
| contact_text = ( | ||
| "" # fall back to empty; parsed fields will be None | ||
| ) | ||
|
|
||
| parsed = _parse_contact_record(profile_text, contact_text) |
There was a problem hiding this comment.
extract_page / _extract_overlay called with wrong signature — every profile fails
Both extract_page and _extract_overlay have a required section_name: str second parameter (see lines 440–443 and 553–556 respectively). Calling them without it raises TypeError: extract_page() missing 1 required positional argument: 'section_name' / _extract_overlay() missing 1 required positional argument: 'section_name' for every iteration.
That TypeError is silently swallowed by the except Exception block, so every username ends up in failed and contacts is always empty — making extract_contact_details functionally broken.
Even after adding the missing argument, extract_page and _extract_overlay return ExtractedSection objects, not raw strings. The current comparisons to _RATE_LIMITED_MSG (e.g. if profile_text == _RATE_LIMITED_MSG) will always be False (comparing dataclass to str), and passing the objects directly to _parse_contact_record(profile_text, contact_text) would raise AttributeError: 'ExtractedSection' object has no attribute 'split'. The rest of the codebase consistently accesses .text (e.g. line 1126: if extracted.text and extracted.text != _RATE_LIMITED_MSG).
The fix requires both changes together:
# Scrape main profile page
extracted_profile = await self.extract_page(profile_url, section_name="profile")
pages_visited.append(profile_url)
profile_text = extracted_profile.text
if profile_text == _RATE_LIMITED_MSG:
logger.warning(
"Soft rate limit on profile %s, skipping", username
)
failed.append(username)
await asyncio.sleep(_NAV_DELAY)
continue
# Scrape contact info overlay
extracted_contact = await self._extract_overlay(contact_url, section_name="contact_info")
pages_visited.append(contact_url)
contact_text = extracted_contact.text
if contact_text == _RATE_LIMITED_MSG:
contact_text = "" # fall back to empty; parsed fields will be NonePrompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 1413-1433
Comment:
**`extract_page` / `_extract_overlay` called with wrong signature — every profile fails**
Both `extract_page` and `_extract_overlay` have a required `section_name: str` second parameter (see lines 440–443 and 553–556 respectively). Calling them without it raises `TypeError: extract_page() missing 1 required positional argument: 'section_name'` / `_extract_overlay() missing 1 required positional argument: 'section_name'` for every iteration.
That `TypeError` is silently swallowed by the `except Exception` block, so every username ends up in `failed` and `contacts` is always empty — making `extract_contact_details` functionally broken.
Even after adding the missing argument, `extract_page` and `_extract_overlay` return `ExtractedSection` objects, not raw strings. The current comparisons to `_RATE_LIMITED_MSG` (e.g. `if profile_text == _RATE_LIMITED_MSG`) will always be `False` (comparing dataclass to `str`), and passing the objects directly to `_parse_contact_record(profile_text, contact_text)` would raise `AttributeError: 'ExtractedSection' object has no attribute 'split'`. The rest of the codebase consistently accesses `.text` (e.g. line 1126: `if extracted.text and extracted.text != _RATE_LIMITED_MSG`).
The fix requires both changes together:
```python
# Scrape main profile page
extracted_profile = await self.extract_page(profile_url, section_name="profile")
pages_visited.append(profile_url)
profile_text = extracted_profile.text
if profile_text == _RATE_LIMITED_MSG:
logger.warning(
"Soft rate limit on profile %s, skipping", username
)
failed.append(username)
await asyncio.sleep(_NAV_DELAY)
continue
# Scrape contact info overlay
extracted_contact = await self._extract_overlay(contact_url, section_name="contact_info")
pages_visited.append(contact_url)
contact_text = extracted_contact.text
if contact_text == _RATE_LIMITED_MSG:
contact_text = "" # fall back to empty; parsed fields will be None
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds two new MCP tools for bulk LinkedIn connections export:
get_my_connectionsCollects connection usernames via infinite scroll on the connections page. Configurable
limitandmax_scrolls. Returns{username, name, headline}for each connection.extract_contact_detailsEnriches profiles with structured contact data by scraping the main profile page and contact info overlay. Returns parsed fields instead of raw text:
first_name,last_nameheadlinelocationcompanyemail,phone,website,birthdayprofile_raw,contact_info_rawRate-limit handling: Processes profiles in chunked batches with configurable
chunk_size(default 5) andchunk_delay(default 30s). Stops early on hard rate limit, returns partial results withrate_limitedflag. Individual page loads retry once after 5s backoff on soft rate limits (empty-content responses).Files changed
linkedin_mcp_server/tools/connections.py— New tool module (followstools/person.pypattern)linkedin_mcp_server/scraping/extractor.py— Addedscrape_connections_list(),scrape_contact_batch(), and_parse_contact_record()parserlinkedin_mcp_server/server.py— Registeredregister_connections_tools(mcp)Test plan
ruff format,ruff check,ty checkall cleanget_my_connectionswithlimit=10returns 10 valid usernamesextract_contact_detailswith 3 usernames returns structured fields (emails found for 2/3)🤖 Generated with Claude Code
Greptile Summary
This PR adds two new MCP tools for bulk LinkedIn connections export:
get_my_connections(infinite scroll collection of connection usernames) andextract_contact_details(batch enrichment with structured contact data). The implementation follows existing code patterns and includes thoughtful rate-limit handling with configurable chunking and delays.Key changes:
_parse_contact_recordthat extracts fields like email, phone, location from raw profile textsearch_peopleCode quality: The implementation is well-structured with proper error handling, progress reporting, deduplication, and defensive programming. Most issues from previous review rounds have been addressed.
Minor issue: Inconsistent ERR_ABORTED handling between initial navigation and re-navigation (line 559) could cause crashes in rare edge cases where the page navigates away during scroll and re-navigation encounters the same timing issue.
Confidence Score: 4/5
linkedin_mcp_server/scraping/extractor.py- verify the ERR_ABORTED handling inconsistency at line 559 won't impact production usageImportant Files Changed
get_my_connectionsfor scraping connection list andextract_contact_detailsfor enriching profiles with contact data. Follows existing tool patterns, includes proper error handling and progress reporting.scrape_connections_listandscrape_contact_batchwith chunked rate-limit handling, plus_parse_contact_recordparser for structured field extraction. Includes ERR_ABORTED navigation handling and soft rate-limit sentinels. Smallnetworkfilter added tosearch_people.networkparameter tosearch_peoplefor filtering by connection degree (1st, 2nd, 3rd+) - minimal, focused change with proper parameter passthrough.Sequence Diagram
sequenceDiagram participant Client participant Tool as connections.py<br/>(extract_contact_details) participant Extractor as extractor.py<br/>(scrape_contact_batch) participant Browser as LinkedIn Pages Client->>Tool: extract_contact_details(usernames, chunk_size, chunk_delay) Tool->>Tool: Parse & deduplicate usernames Tool->>Extractor: scrape_contact_batch(usernames, chunk_size, chunk_delay) loop For each chunk loop For each username in chunk Extractor->>Browser: Navigate to profile page Browser-->>Extractor: profile_text alt Soft rate limit (empty content) Extractor->>Extractor: Check for _RATE_LIMITED_MSG sentinel Extractor->>Extractor: Skip username, add to failed[] else Success Extractor->>Browser: Navigate to contact info overlay Browser-->>Extractor: contact_text Extractor->>Extractor: _parse_contact_record(profile, contact) Extractor->>Extractor: Add to contacts[] end alt Hard rate limit (RateLimitError) Extractor->>Extractor: Add to failed[], set rate_limited=true Extractor->>Extractor: Break loop end end Extractor->>Tool: Report progress alt Not last chunk Extractor->>Extractor: Sleep(chunk_delay) end end Extractor-->>Tool: {contacts[], total, failed[], rate_limited, pages_visited[]} Tool-->>Client: Return enriched dataLast reviewed commit: b9add13