Skip to content

Recognition Failures

September 4, 2010


Is a low and will lure in the and it on him as he or or all on
in him or her a
I will will you lead and in the air and you will be relieved him is to be on an and he and I and a are a new to I did go will tear him in you will
are on it and he is


Pope pope bob pool room and may allow the rain and where are you laughing out loud if that I’ll be out block wall
at gallaudet at me like a button back the mall and at a Hanukkah lamp
do not count back up again and I am at a lightly a lot of that belongs

For a blond-haired attended Brigham at what cost login
that the old whose head of the tobacco: is that the cat Ballou


Pulled up the
Pull up the
It brought up the

And blot out the
And to live out the
And you move out by
In out by
A doubt the ha had eight
And our that coffee

That as an odd that,
Out of the
If for a living out of the old who couldn’t and
Guard and 1/2


Crews are Hussein had the
Whole outfit that the house but not all advantage banking
Main thing that kind of buying a bad bill that allows aunt who
Can talk all of the zoo
Is your
Media, and not let
How many of whom have been
But the wind is oceanfront scorer or a lot more if


aug 26?, sept. 2nd. spontaneous text transformed by Automated Speech Recognition failures

I’ve used Automated Speech Recognition (ASR) programs before and I don’t have a lot of love for them. Speaker-independent ASR is not very reliable, even with a good headset mic, though if you train a model to the speaker it can get pretty good. But a lot of times you don’t have the option of a good mic or speaker-specific training. Anyways, I started playing around with ASR at home, mostly to show my kid. Sometimes I got good results, and sometimes I didn’t, so I pulled out all the sentences that the ASR got completely wrong.

  1. The first stanza up there is the results of us testing out Dragon NaturallySpeaking 9 on an XP box – this was without training a speaker model and using a microphone that (if memory serves) came from a 80s-era walkman, i.e. not particularly suited for the task! I think I was just saying things like “hello computer” though towards the middle my kid started yelling random words while I was talking.
  2. The second stanza is the result of us testing Windows Speech Recognition on a Windows 7 laptop. I got a good microphone by this point, but we didn’t train an acoustic model. (After I trained an acoustic model, the recognizer got a lot more words, but I gotta say that Microsoft designed the interface so poorly that the Windows Speech Recognition is not really usable in general.)
  3. For the third stanza I designed a game for my kid. She would say some words, then see what the ASR returned, which would be the first line. Then she would read the first line and see what the ASR returned for that, which would be the second line. Then she would read the second line and see what the ASR returned for that, which would be the third line. etc. I guess I would call this method 907d701b-6499-4145-8cb2-3a4f95363109 (telephone game solitaire with ASR). We did this for Windows Speech Recognition on a Windows 7 laptop using the good mic and an acoustic model trained on her voice, although there seemed to be problems during the training with it picking up the signal.
  4. The fourth stanza used the same ASR, model, and mic as the third stanza, but by this point my kid was just yelling and screaming and saying random stuff into the mic lol
No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: