Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: Vision Support for Khoj #889

Merged
merged 37 commits into from
Sep 9, 2024

Conversation

MythicalCow
Copy link
Contributor

✨ Summary of Changes

  • New UI to show preview of image uploads
  • ChatML message changes to support gpt-4o vision based responses on images
  • AWS S3 image uploads for persistent image context in conversations
  • Database changes to have vision_enabled option in server admin panel while configuring models

👁️ Demo Images

Screenshot 2024-08-14 164719
image

🛠️ Feedback

  • User experience changes
  • Any bug fixes

@MythicalCow MythicalCow added upgrade New feature or request coverage Add content type to search and index plugin Improvements or additions to content or UI integrations try Experiment with feature or capability labels Aug 14, 2024
@MythicalCow MythicalCow self-assigned this Aug 14, 2024
Copy link
Member

@sabaimran sabaimran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Still need to test it out locally but this looks great.

src/interface/web/app/chat/page.tsx Outdated Show resolved Hide resolved
src/interface/web/app/page.tsx Outdated Show resolved Hide resolved
src/interface/web/app/share/chat/page.tsx Outdated Show resolved Hide resolved
src/khoj/processor/conversation/utils.py Outdated Show resolved Hide resolved
src/khoj/processor/conversation/utils.py Outdated Show resolved Hide resolved
src/khoj/routers/api_chat.py Outdated Show resolved Hide resolved
src/khoj/routers/api_chat.py Outdated Show resolved Hide resolved
src/khoj/routers/storage.py Outdated Show resolved Hide resolved
src/khoj/routers/storage.py Outdated Show resolved Hide resolved
Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. But excited to get vision support in Khoj soon!

src/interface/web/package.json Outdated Show resolved Hide resolved
src/interface/web/yarn.lock Outdated Show resolved Hide resolved
src/interface/web/app/share/chat/layout.tsx Outdated Show resolved Hide resolved
src/interface/web/app/share/chat/layout.tsx Show resolved Hide resolved
src/khoj/routers/api_chat.py Outdated Show resolved Hide resolved
@thinker007
Copy link

please upgrade and merge to master asap

Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested out the new vision support. Feels neat to be able to talk about visuals with Khoj!

Some general feedback on UX and response generation:

  1. When requesting a vision enabled response from web app's home screen, the response generation UX doesn't make it obvious that Khoj understood and is responding to that image
  2. The image attached to the chat message doesn't seem to be passed to Khoj's train of thought? This would reduce response quality and make the train of thought UX confusing (as Khoj doesn't acknowledge its using the image to generate it's response)

src/interface/web/package.json Outdated Show resolved Hide resolved
Copy link
Member

@debanjum debanjum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look great, seems ready to merge?

This should speed up image loading and reduce storage costs
The image itself isn't strictly required to infer output mode and data
sources to reference to generate chat response

So only share placeholder text for attached image rather than the
actual image itself with those chat actors. This should reduce
response latency and consume less tokens.

- Minor
  - Reorder passing uploaded_image_url arg closer to query for better
    code readability
The /api/chat API endpoint has been updated to a POST endpoint from a
GET endpoint to support passing attached images from client
- Align rendering uploaded images to previous HTML DOM structure of
  chat messages
- Support adding image attached to chat message by user or khoj to clipboard
Copy link
Member

@sabaimran sabaimran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, fixing the clients from GET -> POST.

@sabaimran sabaimran merged commit 549686a into khoj-ai:master Sep 9, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coverage Add content type to search and index plugin Improvements or additions to content or UI integrations try Experiment with feature or capability upgrade New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants