support for streaming; multiple languages; api key provided via input…

…; deploying site to public
gregsadetsky · Dec 19, 2023 · 3ee8a6c · 3ee8a6c
1 parent abef5f4
commit 3ee8a6c
Show file tree

Hide file tree

Showing 15 changed files with 749 additions and 189 deletions.
diff --git a/.github/workflows/vite-github-pages-deploy.yml b/.github/workflows/vite-github-pages-deploy.yml
@@ -0,0 +1,32 @@
+name: Vite Github Pages Deploy
+
+on:
+  # Runs on pushes targeting the default branch
+  push:
+    branches: ["master", "main"]
+  # Allows you to run this workflow manually from the Actions tab
+  workflow_dispatch:
+
+# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
+permissions:
+  contents: read
+  pages: write
+  id-token: write
+
+concurrency:
+  group: "pages"
+  cancel-in-progress: false
+
+jobs:
+  # Build job
+  build:
+    runs-on: ubuntu-latest
+    environment:
+      name: demo
+      url: ${{ steps.deploy_to_pages.outputs.github_pages_url }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+      - name: Vite Github Pages Deployer
+        uses: skywarth/vite-github-pages-deployer@master
+        id: deploy_to_pages
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@
 - clone this repo, cd into it
 - duplicate `.env.example` and name the copy `.env`
 - fill out the `VITE_OPENAI_KEY=` value with your OpenAI api key. you must have access to the `gpt-4-vision-preview` model
-    - you can also try out the Gemini API if you have a key -- fill out `VITE_GEMINI_KEY` in the same `.env`
+  - you can also try out the Gemini API if you have a key -- fill out `VITE_GEMINI_KEY` in the same `.env`
 - then, run:
 - `npm install`
 - `npm run dev`
@@ -19,10 +19,10 @@ note: the in-browser speech recognition works best in Google Chrome
 
 ## TODO
 
-- [ ] allow input of API keys as `<input>` on the page
-- [ ] deploy frontend to site i.e. sagittarius.greg.technology via vite+github actions
-- [ ] enable streaming output..!
+- [x] allow input of API keys as `<input>` on the page
+- [x] deploy frontend to site i.e. sagittarius.greg.technology via vite+github actions
+- [x] enable streaming output..!
 - [ ] make new video with 1) uses of repo in the wild / forks 2) UI improvements 3) streaming output / comparison
-- [ ] enable selection of dictation language
+- [x] enable selection of dictation language
 - [ ] add allcontributors bot
 - [ ] add dependabot
diff --git a/docs/CNAME b/docs/CNAME
diff --git a/docs/README.md b/docs/README.md
diff --git a/docs/_config.yml b/docs/_config.yml
diff --git a/index.html b/index.html
@@ -16,25 +16,30 @@
     <div class="switch-container">
       <span class="label label-left"><img src="assets/OpenAI-GPT-4.png" id="gptLogo" /></span>
       <div class="toggle-switch"><div class="toggle-slider" data-position="left"></div></div>
-      <span class="label label-right"><img src="assets/Google-Gemini-AI-Logo.png" id="geminiLogo" /><img src="assets/USA-flag.png" id="usaFlag">U.S. only</span>
+      <span class="label label-right"><img src="assets/Google-Gemini-AI-Logo.png" id="geminiLogo" /></span>
+    </div>
+
+    <div class="api-key-container">
+      API Key: <input type="password" id='apiKey' value=''>
     </div>
 
     <video autoplay playsinline webkit-playsinline muted hidden></video>
 
     <canvas id="canvas" width="640" height="480"></canvas>
 
-    <div id="instruction"><button id="startButton">Start</button>Start speaking and ask the AI what it recognizes, including hand gestures.</div>
+    <div id="instruction"><button id="startButton">Start</button>Start speaking and ask the AI what it recognizes, including hand gestures.<br/>
+    Dictation & speech language: <select id='languageSelect'></select>
+    </div>
 
     <div id="promptOutput"></div>
 
     <div id="debugImages" style="display:none;"></div>
 
     <div id="footer">
-      Only works on <img src="assets/Google_Chrome_icon.png" id="chromeLogo" /> Chrome Browser on Desktop<br />
-      Forked from: <a href="https://github.com/gregsadetsky/sagittarius" target="blank">github.com/gregsadetsky/sagittarius</a><br />
+      Best experienced using <img src="assets/Google_Chrome_icon.png" id="chromeLogo" /> Google Chrome on desktop<br />
+      Repo: <a href="https://github.com/gregsadetsky/sagittarius" target="blank">github.com/gregsadetsky/sagittarius</a><br />
       OpenAI Model: <a href="https://platform.openai.com/docs/guides/vision" target="_blank">gpt-4-vision-preview</a><br />
       Google Gemini Model: <a href="https://ai.google.dev/models/gemini" target="_blank">gemini-pro-vision</a><br />
-      <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API" target="_blank">Web Speech API</a> for Speech Recognition and Speech Synthesis<br />
     </div>
   </div>