Caster: Program by Voice
Caster is an open source collection of tools aimed at enabling programming entirely by voice. It runs on top of Dragonfly, which runs on either Dragon NaturallySpeaking or WSR-Windows Speech Recognition. Caster is programmed in Python and is extensible allowing for additional programming languages and programs. A presentation done demonstrating coding by voice using Dragonfly. Caster more feature-rich and user-friendly compared to straight Dragonfly.
Some of you might be asking yourself the question. How is this useful to average developer that doesnât want or need to program entirely by voice? Put simply it can boost your productivity by augmenting the power of your voice within development. The skies the limit when it comes to your voice whether through advanced macros or taking control of any program on your computer. Caster can be leveraged to your advantage with the added power of a keyboard and mouse.
Check it out the Caster Repository! Ultimately Iâd love to see Sponge and Forge API integrated! Keep in mind it is an active development but very functional. Most framework is complete but the features need to be fleshed out.
Iâm not the author of Caster. I brought Caster to the Sponge forums because I love the community. I need feedback on Caster and to see if anyoneâs interested collaborating on development. My fork of Caster* Questions are welcome! Feel free to fork and make pull requests!
Features
- Easy setup
- Fuzzy string matching for symbols
- âAliasâ Commands - on-the-fly commands created by highlighting stuff
- âRecord From Historyâ - turn previously spoken commands into macros
- CCR for multiple languages, easily configure new languages
- Sikuli integration
- Four additional mouse navigation modes
- Text navigation by templates and punctuation-seek
- The Context Stack - asynchronous and context seeking commands
- Spec reduction via NodeRule
- Configurable settings
- Ergonomic break alarm (optional)
- Automatic fix macro for Dragon double first letter, other Dragon
enhancements
Demos of a few of Casters features.
Supported languages
- HTML
- Python
- C++
- JavaScript
- Java
- SQL
Supported Editors and IDEs
- Sublime
- Atom*
- Jetbrain IDEs
- Android Studio
- Eclipse
- SQL Developer
- Microsoft Visual Studio
- Emacs
- Notepad++
Miscellaneous Programs
- Firefox
- Chrome
- Command Prompt
- Gitbash
- WinWord
Documentation
Caster Wiki
Castor Issue tracker
Dragonfly Documentation
Iâm not the author of Caster. I posted this on the Sponge forums because I love the community. I need feedback on caster and to see if anyoneâs interested collaborating on development. My fork of Caster Questions are welcome!
[spoiler=-The differences between WSR and DNS-]
In regards to Caster thereâs no difference between Dragon Naturally Speaking and Windows Speech Recognition besides initial setup and starting Caster. In terms of Caster features. The only is exception described here temporary as it maybe.
There is a significant difference between WSR and DNS in terms of speech recognition accuracy, speed and app contextual wariness. DNS 13 pretty much works out of the box with a very little training. WSR requires lots of training and even then donât expect the accuracy that you might see from DNS. Caster improves WSR app contextual wariness and accuracy but cannot make up for the differences between the technology in speech recognition engines.
[/spoiler]
Caster Install Videos for WSR Videos
- Video tutorial to install Dragonfly
-
Video tutorial to install Caster
*Note NatLink is not required for WSR. To launch click â_caster.pyâ.
-The quick and dirty install guide for Caster for WRS-
Install python 2.7.9 32bit. Make sure click the âadd to pathâ during the python install. Then copy and pasted the dragonfly repository to the desktop. Open the folder and as Administrator execute âpython setup.py installâ from CMD. Then pip.bat from caster repository. The Caster repository can be placed anywhere so long as itâs not moved after running pip.bat.
-How to use Caster-
Speech recognition prioritizes speech commands over text dictation. With Caster you will dictate with commands. The current documentation that is available is geared towards furthering Caster development. The first place to look is the Wiki and CasterQuickReference.pfd in the root directory Caster.
Outside of the references above the best place to discover commands and how theyâre used is in the project itself by opening up files. Basic command phrases are pretty easy to spot and generally are self explanatory.
[spoiler=-Example of Basic command phrases-]
Example from Atom.py
Spoken Command/Action-----------------> Shortcut keys------------->
#File Menu
"[open] new window": R(Key("cs-n"), rdescript="Atom: New Window"),
Example from a CCR module configjava.txt
Spoken Command/Action-----------------> Shortcut keys------------->
"convert to integer": Text("Integer.getInteger()")+ Key("left"),
[/spoiler]
Below Iâve laid out the key file paths relative to the caster root directory with a short description.
[spoiler=Key File Paths]
caster-master\caster\bin\data\ccr
- Continuous command recognition modules or CCR modules like for HTML
or Java can be toggled. For instance confightml.txt can be enabled
by saying âenable htmlâ or disabled by saying âdisable htmlâ. In
addition you can refresh CCR modules files on the fly by saying
ârefresh (module name)â like ârefresh htmlâ.
caster-master\caster\apps
- Application specific commands which are active automatically when an
application is launched. The name of the files would correspond
with the applications.
caster-master\caster\dev.py
- A development module for general dev commands. Must be enabled
through settings.json
caster-master\caster\bin\data\settings.json
- Settings file where you can turn on and off various features.
[/spoiler]
Troubleshooting
[spoiler=General Tips and Troubleshooting]
Background noise can degrade dictation accuracy. This includes breathing and wind. Ideally the microphone should not be picking up sound when not dictating. Adjusting the OS mic volume and gain along with the mics position.
Try not to max out the mic volume or gain as that can cause artifacts that degrade speech recognition accuracy. Once youâre the appropriate mic volume keep the microphone boom or microphone placement in the same position.
The quality of sound card and I/O shielding can influence dictation accuracy. Negative effects can be seen if the microphone is receiving audio without dictation or background noise. Try moving the microphone plug to the back of the PC. If that fails to help or you donât have an alternate MIC input you could buy a USB sound card.
No microphone, terrible sound card, poor I/O shielding, cant afford a USB Sound card, or have a crappy pc mic? The cheap or free solution option if you own an android smart phone. Use an app to turn your phone into a wireless microphone. An app wirelessly transmit your phones microphone audio to the PC. This bypasses your computerâs sound system completely. I use WO Mic app with a pair of ear bud with a built in mic. In addition I use SoundWire to pipe my PC audio through my phone.
[/spoiler]
[spoiler=-WSR Specific Optimizations-]
When using WSR it is necessary to go through the training at least two times.
- Start WSR.
- Right click on the WSRmicrophone Icon or WSRStatus bar.
- Click
It may be necessary to add words to WSR dictionary.
- Start WSR.
- Right click on the WSR microphone Icon or WSR Status bar.
- Open Speech Dictionary
Disabling WSR Dictation Pad
- Start WSR.
- Right click on the WSR microphone Icon or WSR Status bar.
- Options
- Uncheck âEnabled Dictation Padâ
WSR Correction/Alternative Menu
WSR correction menu is only available in WSR supported programs like WordPad or Dictation Pad. If youâre having trouble dictating try speaking it in WordPad. WordPad can be launched by saying âOpen WordPadâ This allows WSR to obtain user feedback on your dictation patterns increasing accuracy. WSR automatically brings up a alternative panel or you can force a correction by saying âcorrectâ or âcorrect (dictate a specific word)â
WSR is sensitive to the clarity of the spoken word compared to DNS. See what provides higher accuracy speaking naturally or talking like a robot halting your words but not mimicking the tone.
[/spoiler]