Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend: Add better support for file content parsing with Python Interpreter #805

Merged
merged 8 commits into from
Oct 22, 2024

Conversation

tianjing-li
Copy link
Collaborator

@tianjing-li tianjing-li commented Oct 8, 2024

I tried various ways to get the Python Interpreter to work with files, including sharing docker volumes between the backend and terrarium services, only to find in https://github.com/cohere-ai/cohere-terrarium?tab=readme-ov-file#sandbox-design that filesystem access is not supported by the Python sandbox.

This workaround instead tries to force instructions to use read file tools and pass content directly

AI Description

This PR introduces several changes to the codebase, primarily focused on file handling and tool configuration.

Summary

The PR makes changes to the file handling system, adding new functions to read different file formats and updating existing ones. It also renames a tool and modifies its description, ensuring consistent naming across the codebase. Additionally, it removes references to Langchain, a tool for building applications with language models, and updates the default model for chat requests.

Changes

  • Makefile: Adds a new target, exec-terrarium, which executes a command in the cohere-toolkit-terrarium-1 container as the root user.
  • docker-compose.yml: Removes the mounting of the src/backend/data directory, which was used to sync uploaded files.
  • src/backend/chat/custom/tool_calls.py: Changes the TIMEOUT variable to TIMEOUT_SECONDS and updates its value to 60. This change affects the timeout value used in the asyncio.wait_for function.
  • src/backend/config/configuration.template.yaml: Renames the tool read_document to read_file.
  • src/backend/config/tools.py: Modifies the description of the ToolName class to clarify the usage of the Python interpreter without internet access and provide guidelines for file handling.
  • src/backend/schemas/chat.py: Removes the user_id field from the BaseChatRequest class, which was previously used to store the conversation under a specific user.
  • src/backend/schemas/file.py: Adds a user_id field to the ConversationFilePublic class, allowing for user-specific file handling.
  • src/backend/services/file.py: Removes the read_excel, read_docx, and read_parquet functions and adds new functions with the same names. These new functions have updated argument names and return types.
  • src/backend/tools/files.py: Renames the NAME attribute of the ReadFileTool class from read_document to read_file.
  • src/backend/tools/python_interpreter.py: Removes the LangchainPythonInterpreterToolInput class and the langchain_call and to_langchain_tool methods.
  • src/interfaces/assistants_web/src/cohere-client/generated/schemas.gen.ts: Updates the default model for chat requests to 'command-r-plus' and adds a user_id field to the $ConversationFilePublic type.
  • src/interfaces/assistants_web/src/cohere-client/generated/types.gen.ts: Adds a user_id field to the ConversationFilePublic type.
  • src/interfaces/assistants_web/src/constants/tools.ts: Renames the TOOL_READ_DOCUMENT_ID constant to TOOL_READ_FILE_ID.
  • src/interfaces/coral_web/src/constants.ts: Renames the TOOL_READ_DOCUMENT_ID constant to TOOL_READ_FILE_ID.

@tianjing-li tianjing-li changed the title backend: Add more support the Python interpreter reading file contentsFix interpreter backend: Add better support for file content parsing with Python Interpreter Oct 8, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 42.10526% with 11 lines in your changes missing coverage. Please review.

Project coverage is 79.82%. Comparing base (5e94f95) to head (b00c48b).
Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/backend/services/file.py 25.00% 9 Missing ⚠️
src/backend/routers/agent.py 0.00% 1 Missing ⚠️
src/backend/tools/python_interpreter.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #805      +/-   ##
==========================================
- Coverage   79.92%   79.82%   -0.10%     
==========================================
  Files         242      243       +1     
  Lines       10305    10346      +41     
==========================================
+ Hits         8236     8259      +23     
- Misses       2069     2087      +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@EugeneLightsOn EugeneLightsOn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just a few comments not related to the changes in this PR:
validate_file method has unused params index and ctx but these params are not passed here and here

Copy link
Collaborator

@EugeneLightsOn EugeneLightsOn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@tianjing-li tianjing-li added this pull request to the merge queue Oct 22, 2024
Merged via the queue into main with commit 4145ca0 Oct 22, 2024
6 checks passed
@tianjing-li tianjing-li deleted the fix-interpreter branch October 22, 2024 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants