-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.2.0 #21
Release v0.2.0 #21
Conversation
* feat(chat): chat on top of all documents * feat(chat): remove extra print statements * feat[Chat]: chat with references for the answers * feat[chat]: display multiple references of chat * feat(chat): push chat reference data * fix(chat): clean reference ui
Fix/chat with ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
- backend/app/api/v1/chat.py (5 hunks)
- backend/app/api/v1/extract.py (4 hunks)
- backend/app/api/v1/projects.py (9 hunks)
- backend/app/repositories/process_repository.py (2 hunks)
- backend/app/repositories/project_repository.py (1 hunks)
- backend/app/repositories/user_repository.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- backend/app/api/v1/chat.py
- backend/app/api/v1/extract.py
- backend/app/repositories/process_repository.py
- backend/app/repositories/project_repository.py
- backend/app/repositories/user_repository.py
🧰 Additional context used
🔇 Additional comments (6)
backend/app/api/v1/projects.py (6)
68-69
: Improved error logging inget_projects
The change from
print(traceback.print_exc())
tologger.error(traceback.format_exc())
is a good improvement. It ensures that the full traceback is properly logged using the logger, which is more suitable for production environments and easier to manage.
93-94
: Consistent error logging improvement inget_project
The error logging improvement is consistently applied here, replacing
print(traceback.print_exc())
withlogger.error(traceback.format_exc())
. This change maintains consistency across the codebase and improves error tracking.
128-129
: Consistent error logging improvement inget_assets
The error logging improvement is consistently applied in the
get_assets
function as well. This change maintains the pattern of usinglogger.error(traceback.format_exc())
for better error tracking and management.
191-192
: Consistent error logging improvement inupload_files
The error logging improvement is consistently applied in the
upload_files
function. This change maintains the pattern of usinglogger.error(traceback.format_exc())
for better error tracking and management across the codebase.
245-246
: Consistent error logging improvement inadd_url_asset
The error logging improvement is consistently applied in the
add_url_asset
function. This change maintains the pattern of usinglogger.error(traceback.format_exc())
for better error tracking and management across the codebase.
310-311
: Consistent error logging improvement inget_processes
The error logging improvement is consistently applied in the
get_processes
function. This change maintains the pattern of usinglogger.error(traceback.format_exc())
for better error tracking and management across the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
backend/.gitignore (1)
25-29
: LGTM! Good addition of coverage files to .gitignoreThe addition of
.coverage
andcoverage.xml
to the.gitignore
file is a good practice. These files are typically generated during code coverage analysis and should not be version controlled.For consistency with the rest of the file, consider adding a blank line after the new "coverage" section. This would improve readability and maintain the structure of the file. Here's a suggested change:
# coverage .coverage coverage.xml +
backend/tests/projects/test_get_projects.py (1)
149-149
: Approve the user-friendly error message and suggest improvementsThe updated error message is more user-friendly and follows best practices for error handling in production environments. This change is approved.
However, to maintain thorough testing and aid in debugging, consider the following suggestions:
Add a development-mode specific test that checks for more detailed error messages. This can help with debugging while maintaining user-friendly messages in production.
The AI summary mentions that lines asserting parameters passed to the mocked
get_projects
function have been removed, but these are not visible in the provided code. Could you clarify if these assertions were removed in a previous commit? If so, consider re-introducing them to maintain test coverage.Would you like assistance in implementing these suggestions?
backend/tests/projects/test_delete_asset.py (3)
70-70
: Improved error message clarity.The updated error message for the project not found scenario is more descriptive and user-friendly. This change enhances the API's error reporting clarity.
For consistency with other error messages in the file, consider using sentence case:
assert response.json() == {"detail": "The specified project could not be found."}
103-103
: Improved test accuracy and error handling.The changes in the
test_delete_asset_db_error
function enhance both the test's accuracy and the API's error reporting:
- Adding
project_id
to the mock asset (line 103) ensures consistency with the actual data model.- The updated error message (line 111) provides more clarity on the nature of the error, which will aid in troubleshooting.
The additional blank lines (116-117) improve code readability.
For consistency with other error messages, consider rephrasing the error message slightly:
assert response.json() == {"detail": "An error occurred while deleting the asset. Please try again later."}Also applies to: 111-111, 116-117
118-135
: Excellent addition of edge case testing!The new
test_delete_asset_wrong_project
function significantly enhances the test suite by covering an important edge case: attempting to delete an asset that doesn't belong to the specified project. This test ensures that the API correctly handles such scenarios, preventing unauthorized asset deletions across projects.The test structure, mocking, and assertions are well-implemented and consistent with other test cases in the file. This addition will help maintain the integrity and security of the asset deletion functionality.
For consistency with other test functions, consider adding a blank line after the function definition (line 120) and before the closing line (line 135).
backend/tests/projects/test_get_assets.py (1)
162-162
: Approve the improved error message with suggestions for test robustness.The updated error message is more informative and user-friendly, which is a good improvement. It provides better guidance to the end-user without exposing internal details.
To make the test more robust and less brittle, consider the following suggestions:
Instead of asserting the exact error message, check for key phrases or the structure of the response. This allows for minor wording changes without breaking the test.
Verify the structure of the error response, not just the content of the "detail" field.
Here's an example of how you could refactor the assertion:
response_json = response.json() assert response.status_code == 500 assert "detail" in response_json assert "internal server error" in response_json["detail"].lower() assert "retrieving assets" in response_json["detail"].lower() assert "try again later" in response_json["detail"].lower()This approach maintains the intent of the test while allowing for more flexibility in the exact wording of the error message.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (15)
- .gitignore (1 hunks)
- .pre-commit-config.yaml (1 hunks)
- backend/.gitignore (1 hunks)
- backend/app/api/v1/projects.py (15 hunks)
- backend/tests/extract/test_extract_fields.py (3 hunks)
- backend/tests/extract/test_extract_pdf.py (3 hunks)
- backend/tests/projects/test_add_assets.py (4 hunks)
- backend/tests/projects/test_create_project.py (3 hunks)
- backend/tests/projects/test_delete_asset.py (5 hunks)
- backend/tests/projects/test_delete_project.py (3 hunks)
- backend/tests/projects/test_get_assets.py (1 hunks)
- backend/tests/projects/test_get_file.py (3 hunks)
- backend/tests/projects/test_get_project.py (2 hunks)
- backend/tests/projects/test_get_projects.py (1 hunks)
- backend/tests/projects/test_update_project.py (4 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (8)
- .pre-commit-config.yaml
- backend/tests/extract/test_extract_fields.py
- backend/tests/extract/test_extract_pdf.py
- backend/tests/projects/test_add_assets.py
- backend/tests/projects/test_create_project.py
- backend/tests/projects/test_delete_project.py
- backend/tests/projects/test_get_file.py
- backend/tests/projects/test_get_project.py
🧰 Additional context used
🔇 Additional comments (20)
backend/tests/projects/test_update_project.py (5)
60-60
: Improved error message for project not found scenario.The updated error message "The specified project could not be found." is more descriptive and user-friendly compared to the previous message. This change enhances the clarity of the error response, which is beneficial for both developers and potential API consumers.
77-77
: Enhanced error message for database error scenario.The updated error message "An error occurred while updating the project. Please try again later." is a significant improvement. It provides a clear explanation to the user without exposing internal system details, which is a good security practice. The message also suggests a course of action (trying again later), which is helpful for users encountering this error.
119-119
: Consistent error message update in API test for project not found scenario.This change aligns the API test with the updated error message in the unit test. The new expected JSON response {"detail": "The specified project could not be found."} ensures consistency across different layers of the application. This update maintains the integrity of the error handling system and improves the overall quality of the tests.
133-133
: Consistent error message update in API test for database error scenario.This change aligns the API test with the updated error message in the unit test for database errors. The new expected JSON response {"detail": "An error occurred while updating the project. Please try again later."} ensures consistency between the application logic and the API response. This update reinforces the robustness of the error handling system across different layers of the application.
Line range hint
60-133
: Overall improvement in error message handling and testing.The changes in this file consistently update error messages across both unit tests and API tests for project update functionality. These updates enhance the clarity and user-friendliness of error responses without altering the core test logic. The new messages provide more descriptive information to users while maintaining good security practices by not exposing internal details.
These changes likely reflect corresponding updates in the application code, indicating an overall improvement in error handling throughout the system. This attention to detail in error messaging contributes to a better user experience and easier debugging for developers.
backend/tests/projects/test_delete_asset.py (2)
41-41
: Excellent improvements to test coverage and accuracy!The changes in the
test_delete_asset_success
function significantly enhance the test's effectiveness:
- Adding
project_id
to the mock asset (line 41) ensures that the test accurately represents the relationship between assets and projects.- The addition of ChromaDB mocking (lines 44-46) and corresponding assertions (lines 58-59) verifies that the vector database is correctly updated during asset deletion.
These modifications align with best practices for comprehensive unit testing and will help maintain the integrity of the asset deletion functionality.
Also applies to: 44-46, 58-59
88-88
: Enhanced error message for asset not found scenario.The updated error message provides more clarity when an asset is not found in the database. This improvement in error reporting will help developers and users better understand and troubleshoot issues related to asset deletion.
backend/app/api/v1/projects.py (13)
29-29
: Improved error message for empty project nameThe updated error message is more descriptive and clearly states that the project name cannot be empty. This change enhances the user experience by providing more precise feedback.
68-70
: Improved error handling and loggingThe exception handling has been updated to use
traceback.format_exc()
for logging, which is the correct approach. The error message provided to the user is now more informative while keeping internal details hidden. These changes improve both debugging capabilities and user experience.
78-78
: Enhanced error handling for get_project endpointThe changes improve error handling in the
get_project
function:
- A more specific error message is provided when a project is not found.
- Exception handling has been updated to use
traceback.format_exc()
for logging.- The error message for internal server errors is more informative for users.
These improvements enhance both the debugging process and user experience.
Also applies to: 93-95
128-130
: Consistent error handling improvement in get_assets functionThe exception handling in the
get_assets
function has been updated to usetraceback.format_exc()
for logging and provides a more informative error message to the user. This change maintains consistency with the improvements made throughout the file and enhances both debugging capabilities and user experience.
140-140
: Improved error messages in upload_files functionThe error messages in the
upload_files
function have been enhanced:
- A more specific message is provided when the project is not found.
- Clear feedback is given for invalid file types (non-PDF files).
- A detailed message is shown when the file size exceeds the limit.
These improvements provide more precise information to the user, which will help in troubleshooting and improve the overall user experience.
Also applies to: 150-150, 157-157
191-193
: Consistent error handling improvement in upload_files functionThe exception handling in the
upload_files
function has been updated to usetraceback.format_exc()
for logging and provides a more informative error message to the user. This change maintains consistency with the improvements made throughout the file and enhances both debugging capabilities and user experience.
202-202
: Enhanced error messages in add_url_asset functionThe error messages in the
add_url_asset
function have been improved:
- A more specific message is provided when the project is not found.
- Clear feedback is given when no URLs are provided.
- A detailed message is shown for invalid URL formats.
These improvements provide more precise information to the user, which will help in troubleshooting and improve the overall user experience.
Also applies to: 205-205, 209-209
245-247
: Consistent error handling improvement in add_url_asset functionThe exception handling in the
add_url_asset
function has been updated to usetraceback.format_exc()
for logging and provides a more informative error message to the user. This change maintains consistency with the improvements made throughout the file and enhances both debugging capabilities and user experience.
257-257
: Improved error messages in get_file functionThe error messages in the
get_file
function have been enhanced to provide more specific information:
- A clear message is given when the file is not found in the database.
- A distinct message is provided when the file is not found on the server.
These improvements help users better understand the nature of any issues that occur.
Also applies to: 264-264
310-311
: Consistent error handling improvement in get_processes functionThe exception handling in the
get_processes
function has been updated to usetraceback.format_exc()
for logging. This change maintains consistency with the improvements made throughout the file and enhances debugging capabilities.
320-320
: Enhanced error handling in update_project functionThe
update_project
function has been improved:
- A more specific error message is provided when the project is not found.
- Exception handling has been updated to use
traceback.format_exc()
for logging.- A more informative error message is provided to the user in case of internal server errors.
These changes enhance both the debugging process and user experience, maintaining consistency with improvements throughout the file.
Also applies to: 331-333
341-341
: Improved error handling in delete_project functionThe
delete_project
function has been enhanced:
- A more specific error message is provided when the project is not found.
- Exception handling has been updated to use
traceback.format_exc()
for logging.- A more informative error message is provided to the user in case of internal server errors.
These improvements provide better feedback to the user and maintain consistency with the error handling enhancements throughout the file.
Also applies to: 360-362
Line range hint
1-407
: Overall improvements to error handling and loggingThis pull request significantly enhances the
projects.py
file, particularly in terms of error handling and logging:
- Error messages across all functions have been made more specific and informative, improving the user experience.
- Exception handling has been standardized to use
traceback.format_exc()
for logging, which will aid in debugging.- The
delete_asset
function has been substantially improved with better error checking and handling of vectorstore deletion.These changes contribute to a more robust and maintainable codebase. A few minor issues were identified and addressed in the review comments, particularly regarding consistent use of logging methods.
Great job on these improvements! Once the minor logging issues are addressed, this PR will significantly enhance the quality and reliability of the project management API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
backend/tests/api/v1/test_chat.py (1)
34-80
: LGTM: Comprehensive test for successful chat execution.The test effectively mocks dependencies and verifies the structure of the response. However, consider enhancing the test by also asserting the actual content of the response, not just its structure.
You could add assertions to verify the actual content of the response, for example:
assert response.json()["data"]["response"] == "Test response" assert len(response.json()["data"]["response_references"]) > 0
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- .pre-commit-config.yaml (1 hunks)
- backend/.coveragerc (1 hunks)
- backend/app/api/v1/chat.py (4 hunks)
- backend/tests/api/v1/test_chat.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- backend/.coveragerc
🚧 Files skipped from review as they are similar to previous changes (2)
- .pre-commit-config.yaml
- backend/app/api/v1/chat.py
🧰 Additional context used
🔇 Additional comments (7)
backend/tests/api/v1/test_chat.py (7)
1-31
: LGTM: Imports and test setup are well-structured.The imports, TestClient setup, and pytest fixtures are appropriately defined. The use of mock objects for database, vectorstore, and chat_query is a good practice for isolating the tests from external dependencies.
83-112
: LGTM: Chat status tests cover both success and pending scenarios.The tests for chat status API endpoint are well-structured and cover both scenarios (no pending assets and assets pending content). The use of mocks and assertions is appropriate and thorough.
115-180
: LGTM: Comprehensive test for group_by_start_end function.This test effectively verifies the complex logic of the group_by_start_end function. It covers multiple scenarios including overlapping ranges and combining sources for the same asset_id. The assertions are thorough and check both the structure and content of the result.
182-216
: LGTM: Edge cases for group_by_start_end function are well-covered.The tests for empty input and single reference scenarios are well-structured and cover important edge cases for the group_by_start_end function. The assertions are appropriate and thorough for each case.
218-283
: LGTM: Specific scenarios for group_by_start_end function are well-tested.The tests for same asset on different pages and overlapping ranges scenarios are well-structured and cover important real-world use cases for the group_by_start_end function. The assertions effectively verify the expected behavior in each case.
285-356
: LGTM: Additional scenarios for group_by_start_end function are well-covered.The tests for non-contiguous ranges and large values scenarios are well-structured and cover important edge cases for the group_by_start_end function. The assertions effectively verify the expected behavior in each case, ensuring the function works correctly with various input types.
1-356
: Overall: Excellent test coverage and structure.This test file provides comprehensive coverage of the chat functionality and the group_by_start_end function. The tests are well-structured, cover various scenarios and edge cases, and make effective use of mocks and fixtures. These tests should provide high confidence in the correctness of the implemented functionality.
Minor suggestion for improvement:
- In the test_chat_success function, consider adding assertions to verify the actual content of the response, not just its structure.
Great job on writing thorough and effective tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (6)
frontend/src/components/ui/ChatBubble.tsx (3)
40-88
: LGTM: New state variables and useEffect hookThe new state variables and useEffect hook are well-implemented for managing references and UI state. The processing of references is efficient and avoids unnecessary re-renders.
Consider memoizing the
references
dependency of the useEffect hook to further optimize performance:const memoizedReferences = useMemo(() => references, [references]); useEffect(() => { // ... existing code ... }, [memoizedReferences]);
125-202
: LGTM: Updated ChatBubble render logicThe updated render logic effectively handles the new references feature. The UI for expanding/collapsing references is well-implemented, providing a smooth user experience with Framer Motion animations.
Consider adding an
aria-expanded
attribute to the expand/collapse button to improve accessibility:<motion.button aria-expanded={isReferencesExpanded} // ... existing props ... > {/* ... existing content ... */} </motion.button>This change will help screen readers understand the current state of the expandable section.
Line range hint
1-203
: Overall implementation looks greatThe changes to the ChatBubble component to support the new references feature are well-implemented. The code is readable, maintainable, and makes good use of React hooks and Framer Motion for animations.
To further optimize performance, consider memoizing the
renderReferenceItem
function:const renderReferenceItem = useCallback( (item: ChatReference, index: number) => ( // ... existing implementation ... ), [indexMap, handleReferenceClick] );This change will prevent unnecessary re-renders of reference items when the component re-renders for reasons unrelated to the references.
backend/app/vectorstore/chroma.py (1)
140-191
: Optimize retrieval of surrounding sentences and improve robustnessThe implementation of
get_relevant_segments
looks good overall, but there are a few areas for improvement:
The loops for fetching previous and next sentences (lines 165-173 and 177-183) make individual calls to
get_relevant_docs_by_id
for each sentence. This can be inefficient, especially for larger values ofnum_surrounding_sentences
.The use of
-1
as a sentinel value for IDs (lines 166 and 178) may cause confusion, especially if IDs are expected to be strings.There's no error handling when fetching documents by ID, which could lead to issues if a document is not found.
Consider the following improvements:
- Collect all required sentence IDs first, then fetch them in a single query:
sentence_ids = [] current_id = metadata.get("previous_sentence_id") for _ in range(num_surrounding_sentences): if current_id is not None: sentence_ids.insert(0, current_id) current_id = self.get_relevant_docs_by_id(ids=[current_id])["metadatas"][0].get("previous_sentence_id") else: break current_id = metadata.get("next_sentence_id") for _ in range(num_surrounding_sentences): if current_id is not None: sentence_ids.append(current_id) current_id = self.get_relevant_docs_by_id(ids=[current_id])["metadatas"][0].get("next_sentence_id") else: break sentences_data = self.get_relevant_docs_by_id(ids=sentence_ids)
- Use
None
instead of-1
for sentinel values:if current_id is not None: # ... existing code ...
- Add error handling when fetching documents:
try: sentences_data = self.get_relevant_docs_by_id(ids=sentence_ids) except Exception as e: logger.error(f"Error fetching documents: {e}") # Handle the error appropriatelyThe return type annotation is correct and matches the actual return value, which addresses a previous review comment.
frontend/src/ee/components/HighlightPdfViewer.tsx (2)
Line range hint
173-174
: Removeconsole.log
statements from production codeThe
console.log("No view width or height found");
statement is useful during development but should be removed or replaced with a proper logging mechanism to avoid cluttering the console in production.
301-312
: Ensure proper cleanup ofIntersectionObserver
While you are disconnecting the observer on cleanup, make sure that all observed elements are properly unobserved to prevent potential memory leaks.
Optionally, you can modify the cleanup function to unobserve each element:
return () => { + document.querySelectorAll('[id^="page_"]').forEach((el) => { + observer.unobserve(el); + }); observer.disconnect(); };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
frontend/package-lock.json
is excluded by!**/package-lock.json
frontend/yarn.lock
is excluded by!**/yarn.lock
,!**/*.lock
📒 Files selected for processing (9)
- .pre-commit-config.yaml (1 hunks)
- backend/app/vectorstore/chroma.py (2 hunks)
- frontend/package.json (2 hunks)
- frontend/src/components/ChatBox.tsx (4 hunks)
- frontend/src/components/ChatReferenceDrawer.tsx (1 hunks)
- frontend/src/components/ExtractReferenceDrawer.tsx (2 hunks)
- frontend/src/components/ui/ChatBubble.tsx (2 hunks)
- frontend/src/components/ui/MessageWithReferences.tsx (1 hunks)
- frontend/src/ee/components/HighlightPdfViewer.tsx (14 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- .pre-commit-config.yaml
- frontend/package.json
- frontend/src/components/ChatBox.tsx
- frontend/src/components/ChatReferenceDrawer.tsx
- frontend/src/components/ExtractReferenceDrawer.tsx
- frontend/src/components/ui/MessageWithReferences.tsx
🧰 Additional context used
🔇 Additional comments (3)
frontend/src/components/ui/ChatBubble.tsx (1)
1-16
: LGTM: New imports and interface updatesThe new imports and the update to the
ChatBubbleProps
interface are appropriate for the added functionality of handling references in theChatBubble
component.frontend/src/ee/components/HighlightPdfViewer.tsx (2)
66-67
: Previous comments aboutconsole.log
statements are still applicableThe
console.log
statements at lines 67 and 161 are still present. As previously noted, consider removing them or replacing them with an appropriate logging system before deployment.Also applies to: 161-162
278-286
:⚠️ Potential issueAdd missing dependencies to the
useEffect
dependency arrayThe
useEffect
hook depends onhighlightSources
, but it's not included in the dependency array. Omitting dependencies can lead to stale closures and unexpected behavior. IncludehighlightSources
in the dependency array to ensure the effect runs correctly whenhighlightSources
changes.Apply this diff to fix the issue:
useEffect(() => { const highlightAllSources = async () => { for (const highlightSource of highlightSources) { await highlightTextInPdf( highlightSource.page_number, highlightSource.source ); } }; highlightAllSources(); -}, [onScrolled]); +}, [onScrolled, highlightSources]);Likely invalid or redundant comment.
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
* fix[chat]: metadata and filtering error margins * dev: add vscode file * fix: error in condition --------- Co-authored-by: Gabriele Venturi <[email protected]>
Summary by CodeRabbit
New Features
process_steps
table to streamline process management.Bug Fixes
Documentation
Chores