-
-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MacOS] add ability to send images to the model and not just text - The model can now summarize and describe images, useful to work with screen shots #104
Comments
Hi, thanks for using the app. This is a very interesting idea! I'll definitely look into it and add it in a future update. Thanks! |
I have already implemented on the version running on my machine. If you want I can send it to you for you to check. Also modified the copy function so that it copies the entire conversation not just the latest message, for me personal use case it is better. |
That's great news! If you want, you can create a fork of the main project and push your changes there. I can then check them out. |
Done Have testes throughly with Gemini flash 2.0 and it works very well. The only problem is with the outlook app , when forwarding or replying to an email, for some reason, the text is not detected. On the emails themselves work just fine. Also tried all the apps that I use, and also works fine so there is something about the emails on the outlook app that is interfering. Still investigating. |
Fixed the problem on the outlook app, so now images are detected in all apps and no longer text is being recognized as image which resulted in error on the outlook app. |
@Joaov41 how do we run it? |
Thank you! I'll review it today and include your contribution in the next version. |
I pushed another update. On the previous version, images, would only work on the "Describe you changes", which meant users would have to type that they wanted the llm to do with the image. I have incorporated the image capability on the regular options menu like summarize, key points, etc. That way the code will treat the image as if it were regular text. Of course this works better with images that have text on it. But I have tested with images wit non text and, at least with Gemini, it works very will, if no text the llm just describes the image according the option request. It even creates a nice table with the description of the image. Very impressive. |
Thanks! I've integrated your excellent code into the upcoming release, likely by the end of the week. I've also credited you as a macOS version contributor in the next update. The next version so far includes:
I've also been working on adding German, Spanish, French, and Russian Languages to the app but that might be available in version 3 and not 2. |
Have been working on another update. a URL now can be copied to clipboard through the share extension, when invoking the shortcut, the code will check if on the clipboard there is an url, if so it will extract the content of it and then it I can be used as normally with the summary, key points, or any other option or custom option. |
Great work! This could be very helpful and significantly increase the app's usefulness. I haven't had time to review your code yet, but I'll add the feature to the TODO list for version 4. Since version 3 is nearly complete, it would be more efficient to include this new capability in the next version. |
May I ask what is to be expected in v4? |
you mean it will extract the content of the url acting as a scraper? |
Sure! V3 (Arriving This Week):
V4 (TBA):
Concepts for the future (Some might be added to V4 or later versions):
|
Exactly. |
You mentioned pdf support , I had already been working on the pdf support when I started working on the image support and URL scrapping but it was not working yet. I think I have it now although I followed a different approach. It is not through Finder, user can simply use the context menu on the pdf file ( right click - copy) the app will extract the text from a pdf and sent it to the llm for use along with the usual options. So this version includes image support, Url scrapping and pdf all through the clipboard. I have updated my fork in case you want to check it out. |
Another update, I was thinking, I use a lot of video screenshots in my daily workflow, and since the gemini models support video, why not add video functionality to the app? it will be very useful for my use case. So I have updated the code. Gemini only of course. |
@Joaov41 , that's some really cool work :) Here's a small suggestion, let me know what you think! This would solve 2 things:
Again, awesome work! :] |
Thank you for your kind words and thank you for your awesome suggestions, they are great. The code already clears the clipboard after every request, that very valid concern would be only when the app starts and the user has something on the clipboard. |
Hello everyone, I regret to announce that v3 will not be released this week. I'm currently occupied with my exams. I've decided to postpone the update until after my exams are over and then merge v3 and v4 into a single, comprehensive update. Sorry, and thank you for your understanding. |
All the best with your exams, Arya! No worries at all. |
Image detection and storage logic is inserted into the showPopup() method. This block is responsible for checking the clipboard for image data and storing it.
let supportedTypes: [NSPasteboard.PasteboardType] = [.tiff, .png, .pdf, .fileURL]
This array lists the clipboard types the code considers as image data.
for type in supportedTypes {
if let data = pasteboard.data(forType: type) {
print("[DEBUG] Found image data of size (data.count) bytes for type: (type.rawValue)")
foundImages.append(data)
}
}
Iterates over each supported image type.
If data for that type is found on the clipboard, it prints a debug message and adds the data to the foundImages array.
self.appState.selectedImages = foundImages
After detecting any images, this line saves the collected image data into the selectedImages property of AppState.
The code block appears after the simulation of Cmd+C and after retrieving any text from the clipboard.
It comes right before showing the popup window, ensuring that any detected images are stored in AppState for later use.
Full:
// MARK: - SHOW POPUP (Image handling added)
The text was updated successfully, but these errors were encountered: