5832 in-depth character profiles from comics, games, movies

B&W stylised headset icon

Voice commands in Windows video games

(Using speech recognition in games)

Game system: DC Heroes Role-Playing Game


This is our second article about playing video games with diminished, or simply mediocre, manual dexterity.

I’m writing this as a person with nerve damage. But this can be useful both for persons with more severe issues (such as missing fingers), or people who simply are butterfingered.

Grim Dawn and Path of Exile are used here as examples so as to provide concrete details. But the general principles are applicable to all sorts of games, and they bypass the mapping issues encountered with some gaming controllers.


Previous article

The previous article was about using the Razer Tartarus , with Grim Dawn being our example game. This is an evolution over the old Nostromo – a mini-keyboard peripheral with handy programmable buttons.

My Nostromo is more or less sufficient for Grim Dawn. Some of my characters require more buttons, but mapping the thumb joystick (to resummon undead monsters, as it happens) was sufficient.

Path of Exile

But Path of Exile has even more buttons. Even if you do a build with very few skills (such as the Icestorm build I use) :

  1. The potions are essentially an additional action bar, as they can do a lot of critical things.
  2. Some skills with a “charged mode” (Vaal skills) have been reworked in 2018. These often provide “oh crap” emergency buttons. Which is good for me. But… more buttons.
  3. With the Delve League evolution of the game, you need two more buttons. And one — the flares — is an emergency thing.

Here’s an example of the bottom of the UI for a relatively simple character, without Delve buttons :

Details of Path of Exile video game UI controls

Flasks on the left, active skills on the right, trouble everywhere.

So what to do, heh ? The Nostromo has plenty of buttons, but my hands can’t use them at speed.

Voice commands

So I add another layer of commands using my voice. The solution I went for involves :

  1. Using the Windows 10 voice recognition technology. It’s built-in.
  2. Using the Voice Attack software , which has a shareware version. It is an additional layer over Windows 10 speech recognition.
  3. Using a microphone. I just got a relatively cheap Sennheiser PC3 , which is typical VoIP gear. There are much cheaper solutions (about $10). But I can’t articulate super-clearly so I’m wary about low-end mikes.

Once the mike’s plugged in, the first step is to open Control Panel > Ease of Access > Speech recognition. Then you start to train the speech recognition engine. Depending upon your accent, ability to articulate, etc. establishing the basics can take a while.

Here we want to teach specific words to Windows 10. So once the basic training was done, I opened a WordPad document. I repeated the words I intended to use, to see how it transcribed them within WordPad.

Whenever a word was transcribed wrong I would highlight it, say “correct”, and pick what I was saying in the corrections list. So Windows could learn how I pronounced these specific words.

I am still routing the sound to my normal speakers, not to my small headset. But Windows only sees one microphone — the headset’s — plugged in, so it offers to use that one. Thus the quality of the headset’s sound is unimportant. And folks with hearing conditions such as tinnitus can thus use this approach.

Voice Attack

Then I launch Voice Attack. The shareware version has only one profile. I delete the default commands, then enter new ones.

These all correspond to a single word, and all trigger a single keypress. So for instance, when Voice Attack hears the word “two”, I’ve programmed it to press the 2 key for .1 seconds (the default duration). More intricate programming is possible, but not necessary here.


The commands array – flasks

First, the flasks and the like. I’ll simply be using numbers “one” to “seven”, to include Delve League buttons. I won’t be really using “one” and “five” since they are mapped to my mouse and Nostromo, but let’s be comprehensive.

So when I say “two” in the mike, Voice Attack inputs “2” in the currently active window – Path of Exile. And the character takes a gulp of their second potion, which is bound to the 2 key in the Path of Exile UI.

Numbers aren’t perfect. The English language is full of homophones and bad at vowels. So “two” is real close to “too”, “four” is real close to “for”, etc.. The speech recognition interface could thus think you are saying another word than “two” or “four”.

But teaching Windows 10 to learn how you count aloud is fairly easy. From context, it can recognise you’re counting. So that accelerates its learning process.

The command array – active skills

I am going to need three commands here. Ideally I’d need but two, but most characters make constant use of the “attack without moving” key, so that’s one finger taken.

The three active skills button on the right are, in my UI, happen to be bound to the keys D, R and T.

Examples of Player Characters in the Path of Exile video game

Examples of PoE Player Characters – arc elementalist, SRS baron necromancer, cleave/fortify juggernaut, icestorm/ES scion. Click for a larger version.

Here the main issue is to select words that start with that letter and can’t easily be confused with other words. My first instinct is to go for the radio alphabet, but even that doesn’t always work. For instance, “tango” for “T” is easily misunderstood as “tangle” in English. And “delta” for “delete” (because “delete” is far more common than “delta” for the software, so it errs toward “delete”).

Therefore I opted for the words “die”, “Romeo” and “take”. And spent a stretch teaching Windows 10 how I say those words. These are phonetically less ambiguous and start with the right letter.

I spent a while trying different words, and looking at what Windows 10 speech recognition would input into Wordpad. If a word kept coming up wrong (such as “dog” being constantly mistaken for “Dr.”, presumably because the software interprets “doc”), I’d move on to test another one.

I’ve also added a command that presses “x” when I say “switch”, another unambiguous word. One of my characters (Oni-Goroshi raider) benefits from switching weapons against certain enemies.

Experience, part 1

Speech recognition isn’t 100% reliable – especially with a not-an-anchorperson elocution. But that just means you have be aware of the possibility of a command not being executed, and say the word a second time. In my case, it’s still better than manually fumbling for the additional keys and/or badly repositioning when I return my fingers to the default keys.

Sometimes, correct execution is critical. A typical example is drinking from a potion that dispels paralysis immediately after opening a strongbox that paralyses. In this case I’ll still use the keys on the Nostromo. The sequence of actions is entirely predictable so I can look at the keys rather than the screen for a second or two.

Being able to use a character’s full skill array significantly lowers difficulty. For instance, my flying-burning-angry-skulls summoning witch can now make *much* more systematic use of :

  • Her fanning lightning balls attack, which weakens monsters in several important ways.
  • Her offering skill, which is a big boost to her speed and her minion’s speed. Speed is life.
  • Her desecration skill, to use the offering skill in unfavourable circumstances.
  • Her charge up burst of speed skill.

This in turn allows me to focus much more clearly on monitoring the battle and dodging attacks.

Experience, part 2

This specific witch doesn’t have a lot of fancy potions. OTOH, not having to find the key for a potion means that I can instead look at indicators for damage-over-time, curses, etc. and react correctly. Rather than reacting to what I think is happening because I can’t both look at indicators and glance at my Nostromo to press the correct potion key to counter the effect.

And it feels surprisingly fun giving verbal orders whilst playing, even if these are just nonsense keywords. Very Picard.

On the other hand, there’s often a delay between the order and the execution. It feels like a second, though it’s prolly less. Mind, I’m not going to have split-second timing anyway, so it’s not much a problem for me.

But one does get into the habit of saying the keyword twice, since one can’t know whether the word wasn’t understood – or it’s the normal lag.


In summary :

  1. I can recommend this for persons with low manual dexterity and similar issues, barring speech problems. It’s not perfect, but it’s very cheap (or free if you already have a microphone) and there’s no recurring cost. Removing the stress of occasionally having to fumble with keys is also good.
  2. For others, the small lag and the occasional speech recognition failure create a slight problem, and solves little since you can use keys and buttons. Unless you like the *idea* of using voice commands, it is more efficient to stick with a keypad like a Razer Tartarus  or use a MMO mouse  in most applications.
  3. if you’re going to make heavy use of it and can wear a heavier headset, there are plenty of “gamer” cans with good built-in mikes. Right as of this writing (October of 2018) a Corsair Void Pro  would be my pick. But this field has been evolving quickly as of late, with lots of much-better-than-the-year-before headsets emerging as online gaming voice chat has become such a big market.

Writeups.org is a non-commercial, community site

We chat and work at the DC Heroes Yahoo! group .