
Voice commands in Windows video games
(Using speech recognition in games)
Context
This is our second article about playing video games with diminished, or simply mediocre, manual dexterity.
Grim Dawn, Path of Exile and Warframe are used here as examples so as to provide concrete details. But the general principles are applicable to all sorts of games, and they bypass the mapping issues encountered with some gaming controllers.
Other articles
The previous article was about using the Razer Tartarus, with Grim Dawn being our example game.
This is an evolution over the old Nostromo – a mini-keyboard peripheral with handy programmable buttons.
My Nostromo is more or less sufficient for Grim Dawn. Some of my characters require more buttons, but mapping the thumb joystick (to resummon undead monsters, as it happens) was sufficient.
I’ve since moved on to an Azeron peripheral, which turned out to be the best solution for me.
Advertisement
Path of Exile
Path of Exile requires even more buttons than what I could use on the Nostromo.
Even if you do a build with very few skills (such as the Icestorm build I use as of this writing) :
- The potions are essentially an additional action bar, as they can do a lot of critical things.
- Some skills with a “charged mode” (Vaal skills) have been reworked in 2018. These often provide “oh crap” emergency buttons. Which is good for me. But… more buttons.
- With the Delve League evolution of the game, you need two more buttons. And one — the flares — is an emergency one.
Here’s an example of the bottom of the UI for a relatively simple character, without Delve buttons :
Flasks on the left, active skills on the right, trouble everywhere.
So what to do, heh ? The Nostromo has plenty of buttons, but my damaged hands can’t use them at speed.
Voice commands
I therefore added a layer of commands using my voice. The solution I went for involves :
- Using the Windows 10 voice recognition technology. It’s built-in.
- Using the Voice Attack software , which has a shareware version. It is an additional layer over Windows 10 speech recognition.
- Using a microphone. I just got a relatively cheap Sennheiser PC3, which was typical VoIP gear back then. There are much cheaper solutions (about $10). But I can’t articulate super-clearly so I’m wary about low-end mikes.
Once the mike’s plugged in, the first step is to open Control Panel > Ease of Access > Speech recognition. Then you start to train the speech recognition engine. Depending upon your accent, ability to articulate, etc. establishing the basics can take a while.
Here we want to teach specific words to Windows 10. So once the basic training was done, I opened a WordPad document. I repeated the words I intended to use, to see how it transcribed them within WordPad.
Whenever a word was transcribed wrong I would highlight it, say “correct”, and pick what I was saying in the corrections list. So Windows could learn how I pronounced these specific words.
I am still routing the sound to my normal speakers, not to my small headset. But Windows only sees one microphone — the headset’s — plugged in, so it offers to use that one. Thus the quality of the headset’s sound is unimportant.
Which also means that folks with hearing conditions such as tinnitus can use this approach.
Voice Attack
Then I launch Voice Attack. The shareware version has only one profile. I deleted the default commands, then entered new ones.
These all correspond to a single word, and all trigger a single keypress. So for instance, when Voice Attack hears the word “two”, I’ve programmed it to press the 2 key for .1 seconds (the default duration).
More intricate programming is possible, but not necessary here.
Advertisement
The commands array – flasks
First, the flasks and the like.
I’ll simply be using numbers “one” to “seven”, to include Delve League buttons. I won’t be really using “one” and “five” since they are mapped to my mouse and Nostromo, but let’s be comprehensive.
So when I say “two” in the mike, Voice Attack inputs “2” in the currently active window – Path of Exile. And the character takes a gulp of their second potion, which is bound to the 2 key in the Path of Exile UI.
Numbers aren’t perfect. The English language is full of homophones and bad at vowels. So “two” is real close to “too”, “four” is real close to “for”, etc..
The speech recognition interface could thus think you are saying another word than “two” or “four”.
But teaching Windows 10 to learn how you count aloud is fairly easy, barring an accent (say, rural Irish) or articulatin’ difficulties it doesn’t like . From context, it can recognise you’re counting. So that accelerates its learning process.
The command array – active skills
I am going to need three commands here. Ideally I’d need but two, but most characters make constant use of the “attack without moving” key, so that’s one finger taken.
The three active skills button on the right are, in my UI, happen to be bound to the keys D, R and T.
Examples of PoE Player Characters – arc elementalist, SRS baron necromancer, cleave/fortify juggernaut, icestorm/ES scion.
Here the main issue is to select words that start with that letter and can’t easily be confused with other words.
My first instinct is to go for the radio alphabet, but even that doesn’t always work. For instance, “tango” for “T” is easily misunderstood as “tangle” in English. And “delta” for “delete”. Because “delete” is far more common than “delta” for the software, so it errs toward “delete”.
Therefore I opted for the words “die”, “Romeo” and “take”. And spent a stretch teaching Windows 10 how I say those words. These are phonetically less ambiguous and start with the right letter.
I spent a while trying different words, and looking at what Windows 10 speech recognition would input into Wordpad. If a word kept coming up wrong (such as “dog” being constantly mistaken for “Dr.”, presumably because the software interprets “doc”), I’d move on to test another one.
I’ve also added a command that presses “x” when I say “switch”, another unambiguous word. One of my characters (Oni-Goroshi raider) benefits from switching weapons against certain enemies.
Experience, part 1
Speech recognition isn’t 100% reliable – especially with a not-an-anchorperson elocution.
But that just means you have be aware of the possibility of a command not being executed, and say the word a second time. In my case, it’s still better than manually fumbling for the additional keys and/or badly repositioning when I return my fingers to the default keys.
Sometimes, correct execution is critical. A typical example is drinking from a potion that dispels paralysis immediately after opening a strongbox that paralyses.
In this case I’ll still use the keys on the Nostromo. The sequence of actions is entirely predictable so I can look at the keys rather than the screen for a second or two.
Being able to use a character’s full skill array significantly lowers difficulty. For instance, my flying-burning-angry-skulls summoning witch can now make *much* more systematic use of :
- Her fanning lightning balls attack, which weakens monsters in several important ways.
- Her offering skill, which is a big boost to her speed and her minion’s speed. Speed is life.
- Her desecration skill, to use the offering skill in unfavourable circumstances.
- Her charge-up burst of speed skill.
This in turn allows me to focus much more clearly on monitoring the battle and dodging attacks.
Experience, part 2
This specific witch doesn’t have a lot of fancy potions.
OTOH, not having to find the key for a potion means that I can instead look at indicators for damage-over-time, curses, etc. and react correctly. Rather than reacting to what I think is happening because I can’t both look at indicators and glance at my Nostromo to press the correct potion key to counter the effect.
And it feels surprisingly fun giving verbal orders whilst playing, even if these are just nonsense keywords. Very Picard.
On the other hand, there’s often a delay between the order and the execution. It feels like a second, though it’s prolly less. Mind, I’m not going to have split-second timing anyway, so it’s not much a problem for me.
But one does get into the habit of saying the keyword twice, since one can’t know whether the word wasn’t understood – or it’s the normal lag.
Findings
In summary :
- I can recommend this for persons with low manual dexterity and similar issues, barring speech issues.
It’s not perfect, but it’s very cheap (or free if you already have a microphone) and there’s no recurring cost. Removing the stress of occasionally having to fumble with keys is also good. - For others, the small lag and the occasional speech recognition failure create a slight problem, and solves little since you can use keys and buttons. Unless you like the *idea* of using voice commands, it is more efficient to stick with a keypad like a Razer Tartarus or use a MMO mouse in most applications.
- if you’re going to make heavy use of it and can wear a heavier headset, there are plenty of “gamer” cans with good built-in mikes. Right as of this writing (October of 2018) a Corsair Void Pro would be my pick.
But this field has been evolving quickly as of this writing, with lots of much-better-than-the-year-before headsets emerging as online gaming voice chat has become such a big market.
I eventually had to give up on the Voice Attack solution, as the damage from my condition reached the thrroat as well as the hands. Hence the Azeron.
Warframe control scheme
Warframe is a shooter where the player characters have special abilities. In term of control schemes, that creates a slightly different problem.
So as an addendum to the article, here’s a low-fingers-mobility setup using voice recog. Invert if left-handed. 🙂
Left hand
- Movement – Arrow keys block to the right of the keyboard.
- Jumping, sprint lock – Left shift and left control keys. Sliding is slightly awkward, but isn’t used that much.
- The inventory and the map overlay are bound to T and M. These aren’t urgent functions, so taking a second to press them is okay.
WASD would likely be a better choice. But by very little, and the arrow keys are much easier to navigate by touch since they stand alone.
Right hand
- Mouselook.
- Shooting and aiming – Mouse buttons. Melee channelling is bound to the central mouse button as if it were an alternate fire.
- Attacking is bound both to left mouse button, and mouse scroll up. The second really helps with hand/wrist strain on melee weapons and semi-auto guns.
- Sliding – A lateral mouse button. Toggle is recommended over press-to-keep-it-up.
The sprinting is a bit awkward, but I don’t use it much since I play solo. And I current inhabit a Volt warframe, which has its own speed boost.
Voice/headset
- “Take” – presses T which is the Use button. Quite intuitive since it’s usually done to take loot from lockers, pick up stars, or take over terminals.
- “Romeo” – presses R to reload. Intuitive if you’ve used the radio alphabet.
- “Die” – presses E for a quick melee attack. Intuitive too !
- “Switch” – presses the weapon switch key, I forget where I bound it.
- “One”, “Two”, “Three”, “Four” – presses the number keys to activate warframe powers. I’d never be able to use them otherwise.
- “Smash” – Alternate fire, also bound to a mouse button. Sometimes it’s easier via vocal.
- “Zero” – Inventory slot where my K-drive is.
- “Six”, “Seven” – More inventory slots for Simaris captures.
- “Map” – toggles the mini-map. Used to check enemy or loot radar. Or when the exfiltration point is unclear, usually because it overlaps with an objective. Being able to do this on the fly is good.
Oddly enough, I find the voice commands more reliable and markedly faster in Warframe. The Windows 10 speech-recognition is adaptable, so it’s likely learning from hearing me say the same small cluster of words over and again.
On the other hand, having a kubrow or a kavat may interfere with speech recognition. If you tend to talk to your pet.