-
Notifications
You must be signed in to change notification settings - Fork 5
Feature/literature mcp #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's Guide引入一个多数据源的文献检索子系统(OpenAlex 客户端、DOI 工具、工作分发器、MCP 服务器),提供健壮的布尔值/年份解析、日志记录和完整测试,用于支持统一的规范化模型、去重逻辑,并在不同工具之间实现一致的检索行为。 新文献检索 MCP 流程的时序图sequenceDiagram
actor User
participant MCP as MCP_literature_server
participant Tool as search_literature_tool
participant Distributor as WorkDistributor
participant OAClient as OpenAlexClient
participant OpenAlexAPI as OpenAlex_API
User->>MCP: request search_literature(query, filters)
MCP->>Tool: call search_literature(...)
Tool->>Tool: parse years, booleans, max_results
Tool->>Tool: create SearchRequest
Tool->>Distributor: new WorkDistributor(openalex_email)
Tool->>Distributor: search(request)
activate Distributor
Distributor->>Distributor: clamp request.max_results
Distributor->>Distributor: determine data_sources
loop for each source
Distributor->>OAClient: search(request)
activate OAClient
OAClient->>OAClient: resolve author/institution/source ids
OAClient->>OpenAlexAPI: GET /authors, /institutions, /sources
OpenAlexAPI-->>OAClient: JSON ids
OAClient->>OpenAlexAPI: GET /works?page=1...N with filters
OpenAlexAPI-->>OAClient: paginated results
OAClient->>OAClient: transform to LiteratureWork list
OAClient-->>Distributor: works, warnings
deactivate OAClient
end
Distributor->>Distributor: deduplicate_by_doi(all_works)
Distributor->>Distributor: sort and slice by max_results
Distributor-->>Tool: aggregated_result
deactivate Distributor
Tool->>Tool: merge year and boolean warnings
Tool->>Tool: format markdown + JSON via _format_search_result
Tool-->>MCP: markdown_report
MCP-->>User: markdown_report
新文献检索子系统的类图classDiagram
direction LR
class SearchRequest {
+str query
+str author
+str institution
+str source
+int year_from
+int year_to
+bool is_oa
+str work_type
+str language
+bool is_retracted
+bool has_abstract
+bool has_fulltext
+str sort_by
+int max_results
+list~str~ data_sources
}
class LiteratureWork {
+str id
+str doi
+list~dict~ authors
+int publication_year
+int cited_by_count
+str abstract
+str journal
+bool is_oa
+str oa_url
+str source
+dict raw_data
}
class BaseLiteratureClient {
<<abstract>>
+search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
}
class OpenAlexClient {
+str email
+int rate_limit
+_RateLimiter rate_limiter
+httpx_AsyncClient client
+search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
+close() async None
+pool_type str
-_build_query_params(request SearchRequest, author_id str, institution_id str, source_id str) dict~str,str~
-_resolve_author_id(author_name str) async tuple~str,bool,str~
-_resolve_institution_id(institution_name str) async tuple~str,bool,str~
-_resolve_source_id(source_name str) async tuple~str,bool,str~
-_fetch_all_pages(params dict~str,str~, max_results int) async list~dict~
-_request_with_retry(url str, params dict~str,str~) async dict~str,Any~
-_transform_work(work dict~str,Any~) LiteratureWork
-_reconstruct_abstract(inverted_index dict~str,list~int~~) str
}
class _RateLimiter {
-float _min_interval
-asyncio_Lock _lock
-float _last_request
-asyncio_Semaphore _semaphore
+__aenter__() async None
+__aexit__(exc_type type, exc BaseException, tb Any) async None
-_throttle() async None
}
class WorkDistributor {
+dict~str,Any~ clients
+str openalex_email
+search(request SearchRequest) async dict~str,Any~
+close() async None
-_register_clients() None
-_sort_works(works list~LiteratureWork~, sort_by str) list~LiteratureWork~
}
class WorkWithDOI {
<<protocol>>
+str doi
+int cited_by_count
+int publication_year
}
class doi_cleaner_module {
+normalize_doi(doi str) str
+deduplicate_by_doi(works list~WorkWithDOI~) list~WorkWithDOI~
}
class mcp_literature_module {
+search_literature(query str, mailto str, author str, institution str, source str, year_from str, year_to str, is_oa str, work_type str, language str, is_retracted str, has_abstract str, has_fulltext str, sort_by str, max_results int, data_sources list~str~, include_abstract bool) async str
-_format_search_result(request SearchRequest, result dict~str,Any~, include_abstract bool) str
}
BaseLiteratureClient <|-- OpenAlexClient
OpenAlexClient o-- _RateLimiter
WorkDistributor o-- BaseLiteratureClient
WorkDistributor --> SearchRequest
WorkDistributor --> LiteratureWork
doi_cleaner_module ..> WorkWithDOI
WorkDistributor ..> doi_cleaner_module
mcp_literature_module ..> WorkDistributor
mcp_literature_module ..> SearchRequest
OpenAlexClient ..> SearchRequest
OpenAlexClient ..> LiteratureWork
文件级变更
Tips and commandsInteracting with Sourcery
Customizing Your Experience访问你的 dashboard 以:
Getting HelpOriginal review guide in EnglishReviewer's GuideIntroduce a multi-source literature search subsystem (OpenAlex client, DOI utilities, work distributor, MCP server) with robust boolean/year parsing, logging, and comprehensive tests, enabling normalized models, deduplication, and unified search behavior across tools. Sequence diagram for the new literature search MCP flowsequenceDiagram
actor User
participant MCP as MCP_literature_server
participant Tool as search_literature_tool
participant Distributor as WorkDistributor
participant OAClient as OpenAlexClient
participant OpenAlexAPI as OpenAlex_API
User->>MCP: request search_literature(query, filters)
MCP->>Tool: call search_literature(...)
Tool->>Tool: parse years, booleans, max_results
Tool->>Tool: create SearchRequest
Tool->>Distributor: new WorkDistributor(openalex_email)
Tool->>Distributor: search(request)
activate Distributor
Distributor->>Distributor: clamp request.max_results
Distributor->>Distributor: determine data_sources
loop for each source
Distributor->>OAClient: search(request)
activate OAClient
OAClient->>OAClient: resolve author/institution/source ids
OAClient->>OpenAlexAPI: GET /authors, /institutions, /sources
OpenAlexAPI-->>OAClient: JSON ids
OAClient->>OpenAlexAPI: GET /works?page=1...N with filters
OpenAlexAPI-->>OAClient: paginated results
OAClient->>OAClient: transform to LiteratureWork list
OAClient-->>Distributor: works, warnings
deactivate OAClient
end
Distributor->>Distributor: deduplicate_by_doi(all_works)
Distributor->>Distributor: sort and slice by max_results
Distributor-->>Tool: aggregated_result
deactivate Distributor
Tool->>Tool: merge year and boolean warnings
Tool->>Tool: format markdown + JSON via _format_search_result
Tool-->>MCP: markdown_report
MCP-->>User: markdown_report
Class diagram for the new literature search subsystemclassDiagram
direction LR
class SearchRequest {
+str query
+str author
+str institution
+str source
+int year_from
+int year_to
+bool is_oa
+str work_type
+str language
+bool is_retracted
+bool has_abstract
+bool has_fulltext
+str sort_by
+int max_results
+list~str~ data_sources
}
class LiteratureWork {
+str id
+str doi
+list~dict~ authors
+int publication_year
+int cited_by_count
+str abstract
+str journal
+bool is_oa
+str oa_url
+str source
+dict raw_data
}
class BaseLiteratureClient {
<<abstract>>
+search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
}
class OpenAlexClient {
+str email
+int rate_limit
+_RateLimiter rate_limiter
+httpx_AsyncClient client
+search(request SearchRequest) async tuple~list~LiteratureWork~~ list~str~~
+close() async None
+pool_type str
-_build_query_params(request SearchRequest, author_id str, institution_id str, source_id str) dict~str,str~
-_resolve_author_id(author_name str) async tuple~str,bool,str~
-_resolve_institution_id(institution_name str) async tuple~str,bool,str~
-_resolve_source_id(source_name str) async tuple~str,bool,str~
-_fetch_all_pages(params dict~str,str~, max_results int) async list~dict~
-_request_with_retry(url str, params dict~str,str~) async dict~str,Any~
-_transform_work(work dict~str,Any~) LiteratureWork
-_reconstruct_abstract(inverted_index dict~str,list~int~~) str
}
class _RateLimiter {
-float _min_interval
-asyncio_Lock _lock
-float _last_request
-asyncio_Semaphore _semaphore
+__aenter__() async None
+__aexit__(exc_type type, exc BaseException, tb Any) async None
-_throttle() async None
}
class WorkDistributor {
+dict~str,Any~ clients
+str openalex_email
+search(request SearchRequest) async dict~str,Any~
+close() async None
-_register_clients() None
-_sort_works(works list~LiteratureWork~, sort_by str) list~LiteratureWork~
}
class WorkWithDOI {
<<protocol>>
+str doi
+int cited_by_count
+int publication_year
}
class doi_cleaner_module {
+normalize_doi(doi str) str
+deduplicate_by_doi(works list~WorkWithDOI~) list~WorkWithDOI~
}
class mcp_literature_module {
+search_literature(query str, mailto str, author str, institution str, source str, year_from str, year_to str, is_oa str, work_type str, language str, is_retracted str, has_abstract str, has_fulltext str, sort_by str, max_results int, data_sources list~str~, include_abstract bool) async str
-_format_search_result(request SearchRequest, result dict~str,Any~, include_abstract bool) str
}
BaseLiteratureClient <|-- OpenAlexClient
OpenAlexClient o-- _RateLimiter
WorkDistributor o-- BaseLiteratureClient
WorkDistributor --> SearchRequest
WorkDistributor --> LiteratureWork
doi_cleaner_module ..> WorkWithDOI
WorkDistributor ..> doi_cleaner_module
mcp_literature_module ..> WorkDistributor
mcp_literature_module ..> SearchRequest
OpenAlexClient ..> SearchRequest
OpenAlexClient ..> LiteratureWork
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @SoberPizza, you have reached your weekly rate limit of 500000 diff characters.
Please try again later or upgrade to continue using Sourcery
…idation and detailed results formatting
* Feature/literature mcp (#192) * feat: literature-MCP 完整功能 * refactor: improve boolean parsing and logging in literature search functions * feat: enhance literature search functionality with improved query validation and detailed results formatting * refactor: rename oa_url to access_url in LiteratureWork model and related tests * feat: remove test-build workflow and update README for development setup * feat: tool cost system and PPTX image handling fixes (#193) * fix: prompt, factory * feat: enhanced ppt generation with image slides mode - Add image_slides mode for PPTX with full-bleed AI-generated images - Add ImageBlock.image_id field for referencing generated images - Add ImageSlideSpec for image-only slides - Add ImageFetcher service for fetching images from various sources - Reorganize knowledge module from single file to module structure - Move document utilities from app/mcp/ to app/tools/utils/documents/ - Resolve image_ids to storage URLs in async layer (operations.py) - Fix type errors and move tests to proper location Co-Authored-By: Claude <[email protected]> * feat: implement the tool cost --------- Co-authored-by: Claude <[email protected]> * fix: fix the first time calling knowledge tool error (#194) * fix: fix the wrong cache for second call of agent tools (#195) * feat: several improvements (#196) * fix: jump to latest topic when click agent * feat: allow more than one image for generate image * feat: allow user directly edit mcp in the chat-toolbar * feat: improve the frontend perf * feat: multiple UI improvements and fixes (#198) * fix: jump to latest topic when click agent * feat: allow more than one image for generate image * feat: allow user directly edit mcp in the chat-toolbar * feat: improve the frontend perf * fix: restore previous active topic when clicking agent Instead of always jumping to the latest topic, now tracks and restores the previously active topic for each agent when switching between them. Co-Authored-By: Claude <[email protected]> * feat: add context menu to FocusedView agents and download button to lightbox - Add right-click context menu (edit/delete) to compact AgentListItem variant - Render context menu via portal to escape overflow:hidden containers - Add edit/delete handlers to FocusedView with AgentSettingsModal and ConfirmationModal - Add download button to image lightbox with smart filename detection Co-Authored-By: Claude <[email protected]> * feat: add web_fetch tool bundled with web_search - Add web_fetch tool using Trafilatura for content extraction - Bundle web_fetch with web_search in frontend toolConfig - Group WEB_SEARCH_TOOLS for unified toggle behavior - Only load web_fetch when web_search is available (SearXNG enabled) - Update tool capabilities mapping for web_fetch Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> --------- Co-authored-by: Meng Junxing <[email protected]> Co-authored-by: Harvey <[email protected]> Co-authored-by: Claude <[email protected]>
* Feature/literature mcp (#192) * feat: literature-MCP 完整功能 * refactor: improve boolean parsing and logging in literature search functions * feat: enhance literature search functionality with improved query validation and detailed results formatting * refactor: rename oa_url to access_url in LiteratureWork model and related tests * feat: remove test-build workflow and update README for development setup * feat: tool cost system and PPTX image handling fixes (#193) * fix: prompt, factory * feat: enhanced ppt generation with image slides mode - Add image_slides mode for PPTX with full-bleed AI-generated images - Add ImageBlock.image_id field for referencing generated images - Add ImageSlideSpec for image-only slides - Add ImageFetcher service for fetching images from various sources - Reorganize knowledge module from single file to module structure - Move document utilities from app/mcp/ to app/tools/utils/documents/ - Resolve image_ids to storage URLs in async layer (operations.py) - Fix type errors and move tests to proper location Co-Authored-By: Claude <[email protected]> * feat: implement the tool cost --------- Co-authored-by: Claude <[email protected]> * fix: fix the first time calling knowledge tool error (#194) * fix: fix the wrong cache for second call of agent tools (#195) * feat: several improvements (#196) * fix: jump to latest topic when click agent * feat: allow more than one image for generate image * feat: allow user directly edit mcp in the chat-toolbar * feat: improve the frontend perf * feat: multiple UI improvements and fixes (#198) * fix: jump to latest topic when click agent * feat: allow more than one image for generate image * feat: allow user directly edit mcp in the chat-toolbar * feat: improve the frontend perf * fix: restore previous active topic when clicking agent Instead of always jumping to the latest topic, now tracks and restores the previously active topic for each agent when switching between them. Co-Authored-By: Claude <[email protected]> * feat: add context menu to FocusedView agents and download button to lightbox - Add right-click context menu (edit/delete) to compact AgentListItem variant - Render context menu via portal to escape overflow:hidden containers - Add edit/delete handlers to FocusedView with AgentSettingsModal and ConfirmationModal - Add download button to image lightbox with smart filename detection Co-Authored-By: Claude <[email protected]> * feat: add web_fetch tool bundled with web_search - Add web_fetch tool using Trafilatura for content extraction - Bundle web_fetch with web_search in frontend toolConfig - Group WEB_SEARCH_TOOLS for unified toggle behavior - Only load web_fetch when web_search is available (SearXNG enabled) - Update tool capabilities mapping for web_fetch Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * feat: fix the fork issue and implement the locked fork --------- Co-authored-by: Meng Junxing <[email protected]> Co-authored-by: Harvey <[email protected]> Co-authored-by: Claude <[email protected]>
* Feature/better agent community (#200) * Feature/literature mcp (#192) * feat: literature-MCP 完整功能 * refactor: improve boolean parsing and logging in literature search functions * feat: enhance literature search functionality with improved query validation and detailed results formatting * refactor: rename oa_url to access_url in LiteratureWork model and related tests * feat: remove test-build workflow and update README for development setup * feat: tool cost system and PPTX image handling fixes (#193) * fix: prompt, factory * feat: enhanced ppt generation with image slides mode - Add image_slides mode for PPTX with full-bleed AI-generated images - Add ImageBlock.image_id field for referencing generated images - Add ImageSlideSpec for image-only slides - Add ImageFetcher service for fetching images from various sources - Reorganize knowledge module from single file to module structure - Move document utilities from app/mcp/ to app/tools/utils/documents/ - Resolve image_ids to storage URLs in async layer (operations.py) - Fix type errors and move tests to proper location Co-Authored-By: Claude <[email protected]> * feat: implement the tool cost --------- Co-authored-by: Claude <[email protected]> * fix: fix the first time calling knowledge tool error (#194) * fix: fix the wrong cache for second call of agent tools (#195) * feat: several improvements (#196) * fix: jump to latest topic when click agent * feat: allow more than one image for generate image * feat: allow user directly edit mcp in the chat-toolbar * feat: improve the frontend perf * feat: multiple UI improvements and fixes (#198) * fix: jump to latest topic when click agent * feat: allow more than one image for generate image * feat: allow user directly edit mcp in the chat-toolbar * feat: improve the frontend perf * fix: restore previous active topic when clicking agent Instead of always jumping to the latest topic, now tracks and restores the previously active topic for each agent when switching between them. Co-Authored-By: Claude <[email protected]> * feat: add context menu to FocusedView agents and download button to lightbox - Add right-click context menu (edit/delete) to compact AgentListItem variant - Render context menu via portal to escape overflow:hidden containers - Add edit/delete handlers to FocusedView with AgentSettingsModal and ConfirmationModal - Add download button to image lightbox with smart filename detection Co-Authored-By: Claude <[email protected]> * feat: add web_fetch tool bundled with web_search - Add web_fetch tool using Trafilatura for content extraction - Bundle web_fetch with web_search in frontend toolConfig - Group WEB_SEARCH_TOOLS for unified toggle behavior - Only load web_fetch when web_search is available (SearXNG enabled) - Update tool capabilities mapping for web_fetch Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * feat: fix the fork issue and implement the locked fork --------- Co-authored-by: Meng Junxing <[email protected]> Co-authored-by: Harvey <[email protected]> Co-authored-by: Claude <[email protected]> * fix: prevent forked agents from being republished to marketplace (#201) * fix: prevent forked agents from being republished to marketplace Forked agents were able to be republished, which could expose the original agent's configuration. This fix adds validation at both API and UI levels: - Backend: Add validation in publish endpoint to reject agents with original_source_id set (HTTP 400) - Frontend: Hide publish button for forked agents in AgentSettingsModal and WorkflowEditor - Types: Add original_source_id and source_version fields to Agent interface Co-Authored-By: Claude <[email protected]> * refactor: address code review feedback for fork detection - Extract `isForked` helper variable to avoid duplication - Use explicit nullish check (`!= null`) to match backend `is not None` semantic - Replace implicit empty div spacer with dynamic justify-* class in WorkflowEditor Co-Authored-By: Claude <[email protected]> * feat: add justfile for better command * feat: improve AGENTS.md and fix backend fix --------- Co-authored-by: Claude <[email protected]> --------- Co-authored-by: xinquiry(SII) <[email protected]> Co-authored-by: Meng Junxing <[email protected]> Co-authored-by: Claude <[email protected]>
improve boolean parsing and logging in literature search functions
Summary by Sourcery
添加一个多源文献检索 MCP 服务器以及配套的 OpenAlex 客户端和工具。
新功能:
search_literatureMCP 工具,对分发器进行封装,并返回带有 JSON 结果的结构化 Markdown 报告。测试:
Original summary in English
Summary by Sourcery
Add a multi-source literature search MCP server and supporting OpenAlex client and utilities.
New Features:
search_literatureMCP tool that wraps the distributor and returns a structured markdown report with JSON results.Tests: