Skip to content

Interprets speech-to-text commands to execute keyboard input.

Notifications You must be signed in to change notification settings

DeeboyEdx/voice-to-type

Repository files navigation

VOICE-TO-TYPE

Use your smart home device to command or dictate to your Windows computer.

Demo Video

Voice-to-type demo video

Project Description

This project originated from a simple desire to dictate to my computer, evolving into a quest for voice command functionality akin to my experience with my Android phone. Through the amalgamation of Python and various free tools, I crafted a functional solution. While it may not rival Microsoft's Windows Speech Recognition, my approach enables control from smart devices, offering a personally more valuable experience. Additionally, this project could prove beneficial for individuals facing tactile input impairments, extending its potential applications beyond conventional use.

Prerequisites

All requirements are FREE1. The only exceptions are if you've already used your free allotments, would be the Pushbullet and/or IFTTT accounts.

  • Windows PC
  • Pushbullet account
  • A smart device from which to send Pushbullet messages. (These are two options I've used)
    • an Amazon Echo smart home assistant [recommended FREE option]
    • a device with Google Assistant [Potentially Free option]
      • and an IFTTT account
        • with an applet that links your Google and Pushbullet services

Setup

  1. Install the Push2Run application.

  2. Setup one or both of these smart home device connections to your Pushbullet service.

    With the completion of these steps, you will already be able to do a lot of things such as shutdown, reboot, google search, youtube search, open a program of your choice, etc. See more with these example cards.

  3. Install python. I recommend checking the "Add python.exe to PATH" check box.

  4. Download this project's files to a directory of your choosing. Take note of the path as you will need it later.

    • change_audio_volume.py
    • dee_logging.py
    • keypress_functions.py
    • type.py
    • Push2Run_type_cards.p2r (optional)

    The other files are unnecessary.

  5. (optional) "Install"2 NirCmd to enable synthesized "voice" responses from your computer.3 This is a small command-line utility that allows you to do some useful tasks such as voice synthesis.

  6. Setup Push2Run (p2r) cards. By this step, you should be ready to import (or create) cards that will facilitate the connection between Push2Run and these python project files. To import, simply drag the included Push2Run_type_cards.p2r file (a JSON file) into your Push2Run client. Feel free to discard the file once imported.

    What cards will be imported

    • Pause/Play
      Presses space bar
    • Full Screen
      Presses f key
    • Full Screen and Play
      Presses f and space bar
    • Type *
      Bypass command interpretation to simply type out the supplied text
    • Computer! Do Things
      A catch-all card. Attempts to interpret any messages which didn't trigger a Push2Run card as a command.
    • No matching phrases
      Same catch-all functionality as above card
  7. Change the path in the cards' Parameter field to the directory you chose in step 4, where you've placed this project's files. This can be done either in the p2r file before importing, or after importing within Push2Run's GUI. In the provided cards, the path is set to C:\Scripts\python\type\.

    Click here to see how to build your own cards.

    Note that all these cards are set to the "Hidden" window state which is important to prevent a terminal window from being shown.

    Type card

    We'll start with the dictation card. With this, you'll be able to tell your computer to type out long sentences.

    Type card example

    Command card

    Next is the command card. With this, you'll be able to tell your computer to perform a multitude of physical inputs, either colloquially (ex. "minimize") or literally (ex. "press alt space n"). See more.

    Command card example

    Volume card

    With these cards, you'll be able to tell your computer to change the volume. You can also tell it to mute, un-mute, toggle mute, or even to "shut up". I'm still working out the kinks for this one so I did not include a card for volume adjustments in the included p2r file.

    Note that there are two cards as I found it more successful to separate them like so.

    Volume card example

    Read an additional brief Push2Run primer...

    By this point, you will have an invocation keyword set up to indicate to your digital assistant to forward commands through your Pushbullet service which will be captured by Push2Run. In this readme's example scenarios, we will use the "tell my computer to ~" keywords (the default for both proposed routes) which colloquially just makes sense.

    • $ represents your variable. For example, let's say you've setup your Type card as below with "type $" as one of the entries in the 'Listen for' field...4

      You say: "tell my computer to type it is a lovely day period mark"

      type.py will receive: "-v it is a lovely day period mark" (-v being the verbatim flag) which it will then format the string nicely and simulate the key presses to type it out on your computer. "It is a lovely day."

    • within the "Listen for" field, the * is a throw-away catch-all. It's only purpose is for matching miscellaneous phrases, not for capturing text. For example...

      1. You say: "tell my computer to lower the gosh darn volume to 20 percent"
      2. Push2Run will match and throw away "lower the gosh darn".
      3. Match the "volume" keyword to the 'Change Volume' card.
      4. And pass long "to 20 percent" to the script.


    Caveats, acknowledgements, and known bugs to fix

    • An internet connection is required for your computer to recieve commands.
    • You must be logged into your computer for most, if not all, actions to succeed.
    • A Digital assistant's attention span is short. So, commands must be swift and to the point.
      • As such, performing multiple or complex actions utilizing this project may prove difficult. Thankfully the Alexa method has a follow-up mode which alleviates this pressure.
    • Giving literal key-press commands can be tricky to near impossible as it is wholly dependent on what the digital assistant thinks it heard with their tendency to listen for natural spoken language. For example, it may hear "end" when you say "n". I try to work with this by providing an equivalency dictionary but it isn't perfect.
    • Log file location may differ depending on whether the script is executed from the console5 or by Push2Run.

How to use directly

Here's how to utilize these project files directly, without relying on Push2Run triggers.

  • To type out a string to your computer with basic formatting use...

    python type.py -v <string>

  • To give your computer a command (for example these) use...

    python type.py <command>

    Note that these commands will execute immediately so if you wish to type on or control a particular application, you will need to either execute the command in a hidden window or use a delay timer.

List of viable commands

Please note the following

  • You can chain commands together with delimiters "and", and "then".6
  • Although Google Assistant will handily detect in your speech when you meant to use punctuation, and I acknowledge it's a mouth-full but to explicitely indicate to the script to produce a punctuation mark, you must say "mark" or "sign" afterwards. For example: "open curly bracket mark x closed curly bracket sign" -> "{ x }"

Typing

  • type a phrase of your choice comma with punctuation exclamation mark
  • type i'll be there at 6 pm period mark send

Colloquial

  • maximize
  • minimize
  • restore
  • minimize all
  • minimize everything else
  • move
  • resize
  • resize left
  • resize right
  • resize bottom
  • resize top
  • dock left
  • dock right
  • close program
  • change program

Media

  • pause
  • play
  • full screen (comptible for toggling full screen on most players)

Literal

  • alt tab [five times]
  • alt space n
  • shift r
  • etc.
  • control alt delete <- is a protected key combination thus will NOT work

in Browser

  • go to website dot com
  • refresh
  • go back
  • go forward
  • new tab
  • close tab
  • reopen tab
  • change tab

Text

  • select all
  • cut
  • copy
  • paste
  • undo
  • redo
  • home
  • end
  • page up
  • page down
  • save
  • save as
  • emojis (don't get excited, just pulls up the menu)
  • change input language

System

  • show notifications
  • show time
  • show calendar
  • start dictation (uses Windows Speech Recognition)
  • show settings
  • take screenshot
  • save screenshot
  • open system menu
  • open control panel

Misc

  • wait 10 seconds and ...
  • type I see you exclamation mark after 3 minutes

Footnotes

  1. Aside from the Windows PC and a device with smart home assistant, of course. These devices are ubiquitous but I recognize accessibility to these devices is not universal.

  2. Download and extract to a location in your PATH environmental variable OR this project's root folder.

  3. Currently, audial responses are only used to confirm volume adjustments and to inform the user when a command was not understood.

  4. You can list multiple "Listen for" phrases. Be sparing here as the more variability you add, the greater your chances of stepping on another card's toes causing unexpected results. As you may experience with the Volume cards later.

  5. To execute from console do python type.py DESIRED COMMAND HERE. Use the -v argument to avoid interpretation and simply dictate. python type.py -v DESIRED SENTENCE HERE You may choose to use quotations around your command ("DESIRED COMMAND") if you wish.

  6. Actually by default, Push2Run also uses "and" as a delimiter to separate commands. Given that setting, I acknowledge that the "Full Screen and Play" card is redundant when you have separate "Full Screen" and "Pause/Play (press Space bar)" cards.

About

Interprets speech-to-text commands to execute keyboard input.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages