A Python tool that takes a screenshot of your screen, sends it to ChatGPT's vision API, and returns the coordinates of objects you're looking for.
- Install dependencies:
pip install -r requirements.txt- Set your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"Or create a .env file (not included in git) with:
OPENAI_API_KEY=your-api-key-here
from screen_finder import find_object_coordinates
# Find coordinates of a button
coords = find_object_coordinates("submit button")
if coords:
x, y = coords
print(f"Found at: ({x}, {y})")python screen_finder.py "submit button"
python screen_finder.py "login button"
python screen_finder.py "close icon"- Takes a screenshot of your entire screen using
pyautogui - Encodes the screenshot as base64
- Sends it to ChatGPT's vision API (gpt-4o) with a prompt asking for coordinates
- Parses the response to extract the (x, y) coordinates
- Returns the coordinates as a tuple
- The function returns the center coordinates of the object
- If the object is not found, it returns
None - Make sure you have a valid OpenAI API key with access to vision models
- The default model is
gpt-4owhich supports vision