Feature/acm agent feat: Add ACM Scholar Agent for paper searchbasic #354

hongping-zh · 2025-12-25T11:00:11Z

Summary

Implements the ACM Scholar Agent feature requested in #319.

Features

🔍 Search ACM papers via OpenAlex API (free, no API key required)
📥 One-click ingestion - download and add papers to notebooks
🎨 Frontend UI - Research Papers dialog in Sources dropdown
✅ Open Access filtering - only shows freely accessible papers

Demo

[Video coming soon / Screenshots]

How It Works

User clicks "+ Add" → "Research Papers" in a notebook
Search for any topic (e.g., "Large Language Models")
Click "Add" on any paper
Paper is downloaded and processed automatically

Technical Details

Uses OpenAlex API with ACM Publisher + CS Concept filters
Integrates with existing source processing pipeline
No additional dependencies required

Closes #319

- Add ACM Agent service module with OpenAlex API integration - Search ACM Digital Library papers with Open Access filtering - Auto-download PDFs from trusted sources (arXiv, PubMed, etc.) - Add Research Papers dialog in frontend UI - Integrate with Open Notebook's source processing pipeline Phase 1 MVP - Full open source implementation

- Search ACM Digital Library papers via OpenAlex API - Filter by ACM publisher, Computer Science, Open Access - Download and ingest papers into notebooks - Frontend UI for searching and adding papers

cubic-dev-ai

7 issues found across 11 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="api/routers/agent.py">

<violation number="1" location="api/routers/agent.py:102">
P2: Exposing raw exception messages in API responses can leak internal implementation details. Consider using a generic error message for 500 errors while logging the full exception internally.</violation>

<violation number="2" location="api/routers/agent.py:126">
P0: **SSRF Vulnerability**: The endpoint accepts arbitrary URLs without validation, allowing attackers to make requests to internal services, cloud metadata endpoints, or scan internal networks. Add URL validation to ensure only allowed external domains/schemes are accessed.</violation>

<violation number="3" location="api/routers/agent.py:127">
P2: Missing content-type validation. The response is saved as a PDF without verifying the `Content-Type` header or checking for PDF magic bytes (`%PDF-`). This could allow arbitrary content to be stored and processed.</violation>
</file>

<file name="docs/ACM_AGENT_TESTING_GUIDE.md">

<violation number="1" location="docs/ACM_AGENT_TESTING_GUIDE.md:26">
P2: Python version requirement is incorrect. The project requires Python 3.11+ (per pyproject.toml), but this documentation states 3.10+. Users following these instructions with Python 3.10 will encounter compatibility issues.</violation>
</file>

<file name="open_notebook/acm_agent_service/core.py">

<violation number="1" location="open_notebook/acm_agent_service/core.py:14">
P2: URL parsing using `split(&#39;/&#39;)[-1]` is fragile and doesn&#39;t handle query strings, trailing slashes, or URL fragments. Use `urllib.parse.urlparse()` to properly extract the path component.</violation>
</file>

<file name="open_notebook/acm_agent_service/tools.py">

<violation number="1" location="open_notebook/acm_agent_service/tools.py:1">
P2: Using `requests` library which is not declared in project dependencies. The project uses `httpx` as the HTTP client (declared in pyproject.toml). Consider using `httpx` instead for consistency:

```python
import httpx

# In search method:
response = httpx.get(cls.BASE_URL, params=params, timeout=10)

Alternatively, add requests to the project's runtime dependencies.

P1: This file uses raw `fetch()` instead of the established `apiClient` from `./client`. This bypasses authentication (Bearer token), the 10-minute timeout for slow operations, and the 401 redirect handling. Use `apiClient.get()` and `apiClient.post()` for consistency with the rest of the codebase. ```

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-25T11:03:31Z

api/routers/agent.py

+        )
+    except Exception as e:
+        logger.error(f"Error searching ACM papers: {e}")
+        raise HTTPException(status_code=500, detail=str(e))


P2: Exposing raw exception messages in API responses can leak internal implementation details. Consider using a generic error message for 500 errors while logging the full exception internally.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At api/routers/agent.py, line 102: <comment>Exposing raw exception messages in API responses can leak internal implementation details. Consider using a generic error message for 500 errors while logging the full exception internally.</comment> <file context> @@ -0,0 +1,195 @@ + ) + except Exception as e: + logger.error(f"Error searching ACM papers: {e}") + raise HTTPException(status_code=500, detail=str(e)) + [email protected]("/agent/acm/ingest", response_model=IngestPaperResponse) </file context>

cubic-dev-ai · 2025-12-25T11:03:31Z

api/routers/agent.py

+
+        # Use httpx for async download
+        async with httpx.AsyncClient(follow_redirects=True, timeout=30.0) as client:
+            response = await client.get(request.pdf_url)


P2: Missing content-type validation. The response is saved as a PDF without verifying the Content-Type header or checking for PDF magic bytes (%PDF-). This could allow arbitrary content to be stored and processed.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At api/routers/agent.py, line 127: <comment>Missing content-type validation. The response is saved as a PDF without verifying the `Content-Type` header or checking for PDF magic bytes (`%PDF-`). This could allow arbitrary content to be stored and processed.</comment> <file context> @@ -0,0 +1,195 @@ + + # Use httpx for async download + async with httpx.AsyncClient(follow_redirects=True, timeout=30.0) as client: + response = await client.get(request.pdf_url) + response.raise_for_status() + </file context>

cubic-dev-ai · 2025-12-25T11:03:31Z

api/routers/agent.py

+            filename += ".pdf"
+
+        # Use httpx for async download
+        async with httpx.AsyncClient(follow_redirects=True, timeout=30.0) as client:


P0: SSRF Vulnerability: The endpoint accepts arbitrary URLs without validation, allowing attackers to make requests to internal services, cloud metadata endpoints, or scan internal networks. Add URL validation to ensure only allowed external domains/schemes are accessed.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At api/routers/agent.py, line 126: <comment>**SSRF Vulnerability**: The endpoint accepts arbitrary URLs without validation, allowing attackers to make requests to internal services, cloud metadata endpoints, or scan internal networks. Add URL validation to ensure only allowed external domains/schemes are accessed.</comment> <file context> @@ -0,0 +1,195 @@ + filename += ".pdf" + + # Use httpx for async download + async with httpx.AsyncClient(follow_redirects=True, timeout=30.0) as client: + response = await client.get(request.pdf_url) + response.raise_for_status() </file context>

cubic-dev-ai · 2025-12-25T11:03:31Z

docs/ACM_AGENT_TESTING_GUIDE.md

+
+### Prerequisites
+
+- Python 3.10+


P2: Python version requirement is incorrect. The project requires Python 3.11+ (per pyproject.toml), but this documentation states 3.10+. Users following these instructions with Python 3.10 will encounter compatibility issues.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At docs/ACM_AGENT_TESTING_GUIDE.md, line 26: <comment>Python version requirement is incorrect. The project requires Python 3.11+ (per pyproject.toml), but this documentation states 3.10+. Users following these instructions with Python 3.10 will encounter compatibility issues.</comment> <file context> @@ -0,0 +1,205 @@ + +### Prerequisites + +- Python 3.10+ +- Node.js 18+ +- SurrealDB </file context>

cubic-dev-ai · 2025-12-25T11:03:31Z

open_notebook/acm_agent_service/core.py

+        return OpenAlexACMTool.search(query, limit)
+
+    def ingest_paper(self, paper_url: str) -> Dict[str, Any]:
+        filename = paper_url.split('/')[-1]


P2: URL parsing using split('/')[-1] is fragile and doesn't handle query strings, trailing slashes, or URL fragments. Use urllib.parse.urlparse() to properly extract the path component.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At open_notebook/acm_agent_service/core.py, line 14: <comment>URL parsing using `split('/')[-1]` is fragile and doesn't handle query strings, trailing slashes, or URL fragments. Use `urllib.parse.urlparse()` to properly extract the path component.</comment> <file context> @@ -0,0 +1,24 @@ + return OpenAlexACMTool.search(query, limit) + + def ingest_paper(self, paper_url: str) -> Dict[str, Any]: + filename = paper_url.split('/')[-1] + if not filename.endswith('.pdf'): + filename += ".pdf" </file context>

cubic-dev-ai · 2025-12-25T11:03:31Z

open_notebook/acm_agent_service/tools.py

@@ -0,0 +1,69 @@
+import requests


P2: Using requests library which is not declared in project dependencies. The project uses httpx as the HTTP client (declared in pyproject.toml). Consider using httpx instead for consistency:

import httpx # In search method: response = httpx.get(cls.BASE_URL, params=params, timeout=10)

Alternatively, add requests to the project's runtime dependencies.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At open_notebook/acm_agent_service/tools.py, line 1: <comment>Using `requests` library which is not declared in project dependencies. The project uses `httpx` as the HTTP client (declared in pyproject.toml). Consider using `httpx` instead for consistency: ```python import httpx # In search method: response = httpx.get(cls.BASE_URL, params=params, timeout=10)

Alternatively, add requests to the project's runtime dependencies.
@@ -0,0 +1,69 @@ +import requests +from typing import List, Dict, Any +from loguru import logger ```

cubic-dev-ai · 2025-12-25T11:03:31Z

frontend/src/lib/api/agent.ts

+
+export async function searchAcmPapers(query: string, limit: number = 5): Promise<SearchPapersResponse> {
+  const apiUrl = await getApiUrl()
+  const response = await fetch(`${apiUrl}/api/agent/acm/search?query=${encodeURIComponent(query)}&limit=${limit}`, {


P1: This file uses raw fetch() instead of the established apiClient from ./client. This bypasses authentication (Bearer token), the 10-minute timeout for slow operations, and the 401 redirect handling. Use apiClient.get() and apiClient.post() for consistency with the rest of the codebase.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At frontend/src/lib/api/agent.ts, line 33: <comment>This file uses raw `fetch()` instead of the established `apiClient` from `./client`. This bypasses authentication (Bearer token), the 10-minute timeout for slow operations, and the 401 redirect handling. Use `apiClient.get()` and `apiClient.post()` for consistency with the rest of the codebase.</comment> <file context> @@ -0,0 +1,62 @@ + +export async function searchAcmPapers(query: string, limit: number = 5): Promise<SearchPapersResponse> { + const apiUrl = await getApiUrl() + const response = await fetch(`${apiUrl}/api/agent/acm/search?query=${encodeURIComponent(query)}&limit=${limit}`, { + method: 'GET', + headers: { </file context>

hongping-zh · 2025-12-25T11:47:18Z

Here's a demo video showing the ACM Scholar Agent in action:

🎬 Demo Video: https://drive.google.com/file/d/1xPVI2EUEtvbNSVgpg4eQ4xNrG4IyGrD2/view?usp=drive_link

The video demonstrates:

Searching for ACM papers via OpenAlex API
One-click paper ingestion into notebook
Viewing extracted PDF content
Chat with the ingested paper

Let me know if you have any questions!

hongping added 3 commits December 25, 2025 18:07

docs: Add ACM Agent user testing guide

9bb01ea

feat: Add ACM Scholar Agent for paper search

e912b57

- Search ACM Digital Library papers via OpenAlex API - Filter by ACM publisher, Computer Science, Open Access - Download and ingest papers into notebooks - Frontend UI for searching and adding papers

cubic-dev-ai bot reviewed Dec 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/acm agent feat: Add ACM Scholar Agent for paper searchbasic #354

Feature/acm agent feat: Add ACM Scholar Agent for paper searchbasic #354

Uh oh!

hongping-zh commented Dec 25, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

Uh oh!

hongping-zh commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature/acm agent feat: Add ACM Scholar Agent for paper searchbasic #354

Are you sure you want to change the base?

Feature/acm agent feat: Add ACM Scholar Agent for paper searchbasic #354

Uh oh!

Conversation

hongping-zh commented Dec 25, 2025

Summary

Features

Demo

How It Works

Technical Details

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hongping-zh commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading

cubic-dev-ai bot Dec 25, 2025 •

edited

Loading