Skip to content

Commit

Permalink
support for streaming; multiple languages; api key provided via input…
Browse files Browse the repository at this point in the history
…; deploying site to public
  • Loading branch information
gregsadetsky committed Dec 19, 2023
1 parent abef5f4 commit 3ee8a6c
Show file tree
Hide file tree
Showing 15 changed files with 749 additions and 189 deletions.
32 changes: 32 additions & 0 deletions .github/workflows/vite-github-pages-deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Vite Github Pages Deploy

on:
# Runs on pushes targeting the default branch
push:
branches: ["master", "main"]
# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
permissions:
contents: read
pages: write
id-token: write

concurrency:
group: "pages"
cancel-in-progress: false

jobs:
# Build job
build:
runs-on: ubuntu-latest
environment:
name: demo
url: ${{ steps.deploy_to_pages.outputs.github_pages_url }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Vite Github Pages Deployer
uses: skywarth/vite-github-pages-deployer@master
id: deploy_to_pages
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- clone this repo, cd into it
- duplicate `.env.example` and name the copy `.env`
- fill out the `VITE_OPENAI_KEY=` value with your OpenAI api key. you must have access to the `gpt-4-vision-preview` model
- you can also try out the Gemini API if you have a key -- fill out `VITE_GEMINI_KEY` in the same `.env`
- you can also try out the Gemini API if you have a key -- fill out `VITE_GEMINI_KEY` in the same `.env`
- then, run:
- `npm install`
- `npm run dev`
Expand All @@ -19,10 +19,10 @@ note: the in-browser speech recognition works best in Google Chrome

## TODO

- [ ] allow input of API keys as `<input>` on the page
- [ ] deploy frontend to site i.e. sagittarius.greg.technology via vite+github actions
- [ ] enable streaming output..!
- [x] allow input of API keys as `<input>` on the page
- [x] deploy frontend to site i.e. sagittarius.greg.technology via vite+github actions
- [x] enable streaming output..!
- [ ] make new video with 1) uses of repo in the wild / forks 2) UI improvements 3) streaming output / comparison
- [ ] enable selection of dictation language
- [x] enable selection of dictation language
- [ ] add allcontributors bot
- [ ] add dependabot
1 change: 0 additions & 1 deletion docs/CNAME

This file was deleted.

10 changes: 0 additions & 10 deletions docs/README.md

This file was deleted.

1 change: 0 additions & 1 deletion docs/_config.yml

This file was deleted.

15 changes: 10 additions & 5 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,30 @@
<div class="switch-container">
<span class="label label-left"><img src="assets/OpenAI-GPT-4.png" id="gptLogo" /></span>
<div class="toggle-switch"><div class="toggle-slider" data-position="left"></div></div>
<span class="label label-right"><img src="assets/Google-Gemini-AI-Logo.png" id="geminiLogo" /><img src="assets/USA-flag.png" id="usaFlag">U.S. only</span>
<span class="label label-right"><img src="assets/Google-Gemini-AI-Logo.png" id="geminiLogo" /></span>
</div>

<div class="api-key-container">
API Key: <input type="password" id='apiKey' value=''>
</div>

<video autoplay playsinline webkit-playsinline muted hidden></video>

<canvas id="canvas" width="640" height="480"></canvas>

<div id="instruction"><button id="startButton">Start</button>Start speaking and ask the AI what it recognizes, including hand gestures.</div>
<div id="instruction"><button id="startButton">Start</button>Start speaking and ask the AI what it recognizes, including hand gestures.<br/>
Dictation & speech language: <select id='languageSelect'></select>
</div>

<div id="promptOutput"></div>

<div id="debugImages" style="display:none;"></div>

<div id="footer">
Only works on <img src="assets/Google_Chrome_icon.png" id="chromeLogo" /> Chrome Browser on Desktop<br />
Forked from: <a href="https://github.com/gregsadetsky/sagittarius" target="blank">github.com/gregsadetsky/sagittarius</a><br />
Best experienced using <img src="assets/Google_Chrome_icon.png" id="chromeLogo" /> Google Chrome on desktop<br />
Repo: <a href="https://github.com/gregsadetsky/sagittarius" target="blank">github.com/gregsadetsky/sagittarius</a><br />
OpenAI Model: <a href="https://platform.openai.com/docs/guides/vision" target="_blank">gpt-4-vision-preview</a><br />
Google Gemini Model: <a href="https://ai.google.dev/models/gemini" target="_blank">gemini-pro-vision</a><br />
<a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API" target="_blank">Web Speech API</a> for Speech Recognition and Speech Synthesis<br />
</div>
</div>

Expand Down
Loading

0 comments on commit 3ee8a6c

Please sign in to comment.