Speech-to-Text Directory Assistance Comes to Asterisk

Since the invention of the telephone, the most critical component has been the ability to match people's names to their phone numbers. Ma Bell did this with live operators (including my aunt) for many years. Then came automated lookups where you called a number for directory assistance and actually spoke the name of the person you wished to call. A computer then converted your speech to text and looked up the number in a database. Typically, the number was spoken back to the caller who then could place the call. Or "for a few cents more" the lookup service would actually place the call for you. For the learning impaired, this became a godsend when many metropolitan areas switched to 10-digit dialing.

Today, we'll show you how trivial it is to implement this yourself on any Asterisk® server using Google's new (free) speech-to-text web service which we introduced a few weeks ago. It's a 2-minute drill using PBX in a Flash™ with Incredible PBX™ and Google's Speech-to-Text web service. We'll be using a MySQL database to demonstrate the concept today, but it easily could be tweaked for use with any ODBC-compatible database. ODBC demos are included in Incredible PBX by dialing 222 or 223.

Many years ago we demonstrated how to quickly place calls to your friends by dialing the first three letters in their names with any phone connected to your Asterisk server using our freely available AsteriDex™ database. This has been incorporated into Incredible PBX by dialing 412 from phones connected to your PBX in a Flash server. Thanks to Google's new (free) speech-to-text web service, today we'll show you how trivial it is to tweak that application to replace 3-letter calling with spoken names of people to call with Asterisk. When you're finished, you'll be able to pick up any phone on your Asterisk server, dial 4-1-2, speak the name of an individual or company in your AsteriDex database, and have Asterisk automatically place the call for you.

Legal Disclaimer. What we're demonstrating today is how to use a publicly accessible web resource to respond to queries using a phone connected to your Asterisk server. We're assuming that Google has its legal bases covered and has a right to provide the public service they are offering. We are not vouching for Google or the services being offered in any way. By using our tutorial, YOU AGREE TO ASSUME ALL RISKS, LEGAL AND OTHERWISE, ASSOCIATED WITH USE OF THIS FREELY ACCESSIBLE WEB TOOL. NO WARRANTY EXPRESS OR IMPLIED IS BEING PROVIDED BY US INCLUDING ANY IMPLIED WARRANTY OF FITNESS FOR USE OR MERCHANTABILITY. You, of course, have an absolute right not to read our articles or implement our code if you have reservations of any kind or are unwilling to assume all risks associated with such use. Sorry for legalese, but it's the time in which we live I'm afraid. Plain English: "Don't Shoot the Messenger!"

Prerequisites. The easiest setup for this is a new PIAF2™ server. Once you have it running, install Incredible PBX 3 by logging into your server as root and issuing the command: install-incredpbx3. For complete instructions on Incredible PBX 3, here's the link to the Nerd Vittles tutorial. If you'd prefer not to go the Incredible route, then simply install AsteriDex 4 and then add the CallWho extension. Finally, you'll need to run the Wolfram Alpha for Asterisk one-click installer. This gets Google's speech-to-text components installed on your server. Now you're ready to tweak the CallWho app to use speech-to-text lookups through Google instead of 3-letter dialing.

Editing nv-callwho.php. Log in as root and edit nv-callwho.php in /var/lib/asterisk/agi-bin:

cd /var/lib/asterisk/agi-bin
nano -w nv-callwho.php

Press Ctrl-W. Search for where dialcode =. Replace it with where name =.

Now save the file with the change: Ctrl-X, Y, then press Enter.

If you'd prefer to use the latest, greatest (preconfigured) version, ignore the above and issue the following commands instead:

cd /var/lib/asterisk/agi-bin
wget http://nerd.bz/xnyJR3
tar zxvf callwho21.tgz
rm callwho21.tgz

Tweaking Your Custom Dialplan. While still logged in as root, you'll also need to edit extensions_custom.conf in /etc/asterisk:

cd /etc/asterisk
nano -w extensions_custom.conf

Press Ctrl-W. Search for 412. Now scroll down to the following lines:

exten => 412,9,Read(DIALCODE,beep,3)
exten => 412,10,NoOp(Name lookup: ${DIALCODE})
exten => 412,11,AGI(nv-callwho.php,${DIALCODE})

You'll want to replace those lines with the following 3 lines with no word wrap:

exten => 412,9(record),agi(speech-recog.agi,en-US)
exten => 412,10,Noop(= Script returned: ${status} , ${id} , ${confidence} , ${utterance} =)
exten => 412,11,AGI(nv-callwho.php,${utterance})

Finally, you'll want to adjust the spoken prompts in lines 412,6 and 412,8 to say something like this: "At the beep say the name of the person or company you wish to call. Then press the pound key."

Now save the file with your changes: Ctrl-X, Y, then press Enter

Finally, reload your Asterisk dialplan: asterisk -rx "dialplan reload"

Test Drive. To test things out, pick up a phone connected to your Asterisk server and dial 412. When prompted for the person or company to call, say "American Airlines" and then press the pound key.

Tweaking AsteriDex. You may need to make some minor adjustments to entries in your AsteriDex database to accommodate speech-to-text queries. For example, the sample entries include American Airlines and Delta Air Lines. Google translates the spoken words "air lines" as "airlines" so you'll need to modify the Delta entry, or it won't find a match. Similarly, there's a sample entry for "Emery Worldwide" but Google translates the spoken words as "emory worldwide." While capitalization doesn't matter, emory will not match emery. But, with a little tweaking, you'll have a very impressive, homegrown directory assistance service to impress all of your Friends and Family™. Enjoy!

Fuzzy Search Update. After we went to press, one of our favorite pundits on the PIAF Forum suggested that perhaps implementing fuzzy logic searches with MySQL would improve results, particularly with proper names. Great idea! It solved both the Delta Air Lines and Emery Worldwide lookup issues. And it turned out it was incredibly simple to implement. All that was required was replacing the existing $query command in nv-callwho.php (as explained above) with the following. This now has been incorporated into the preconfigured AGI script which is available for download above.

$query = "SELECT * FROM user1 where strcmp(soundex(name), soundex('$dialcode')) = 0";

For additional enhancements, see this thread on the PIAF Forum.

Asterisk TTS Bug. Be advised that certain newer releases of Asterisk have a text-to-speech bug which abnormally terminates TTS messages that have an embedded comma. If you have stored names in AsteriDex using Lastname, Firstname format, this may pose a problem. The simple solution is to either remove the commas or change them to periods. In the alternative, you can add the following line of code immediately below all existing lines of code beginning with $msg in nv-callwho.php. This, too, has been incorporated into the preconfigured AGI script above.

$msg = str_replace( ",", ".", $msg );

Originally published: Monday, January 30, 2012


Support Issues. With any application as sophisticated as Asterisk, you're bound to have questions. Blog comments are a terrible place to handle support issues although we welcome general comments about our articles and software. If you have particular support issues, we encourage you to get actively involved in the PBX in a Flash Forums. It's the best Asterisk tech support site in the business, and it's all free! We maintain a thread with Information, Patches and Bug Fixes for Incredible PBX. Please have a look. Unlike some forums, ours is extremely friendly and is supported by literally hundreds of Asterisk gurus and thousands of ordinary users just like you. You won't have to wait long for an answer to your question.




Need help with Asterisk? Visit the PBX in a Flash Forum.
Or Try the New, Free PBX in a Flash Conference Bridge.


whos.amung.us If you're wondering what your fellow man is reading on Nerd Vittles these days, wonder no more. Visit our new whos.amung.us statistical web site and check out what's happening. It's a terrific resource both for us and for you.


 
New Vitelity Special. Vitelity has generously offered a new discount for PBX in a Flash users. You now can get an almost half-price DID and 60 free minutes from our special Vitelity sign-up link. If you're seeking the best flexibility in choosing an area code and phone number plus the lowest entry level pricing plus high quality calls, then Vitelity is the hands-down winner. Vitelity provides Tier A DID inbound service in over 3,000 rate centers throughout the US and Canada. And, when you use our special link to sign up, the Nerd Vittles and PBX in a Flash projects get a few shekels down the road while you get an incredible signup deal as well. The going rate for Vitelity's DID service is $7.95 a month which includes up to 4,000 incoming minutes on two simultaneous channels with terminations priced at 1.45¢ per minute. Not any more! For PBX in a Flash users, here's a deal you can't (and shouldn't) refuse! Sign up now, and you can purchase a Tier A DID with unlimited incoming calls for just $3.99 a month and you get a free hour of outbound calling to test out their call quality. To check availability of local numbers and tiers of service from Vitelity, click here. Do not use this link to order your DIDs, or you won't get the special pricing! After the free hour of outbound calling, Vitelity's rate is just 1.44¢ per minute for outbound calls in the U.S. There is a $35 prepay when you sign up. This covers future usage and any balance is fully refundable if you decide to discontinue service with Vitelity.
 


Some Recent Nerd Vittles Articles of Interest...

Be Sociable, Share!

One Response to “Speech-to-Text Directory Assistance Comes to Asterisk”

  1. speech rec guy says:

    Wondering if anyone has tried integrating any SaaS models for ASR or TTS… I tried iSpeech (https://www.ispeech.org/developers) for mobile development and it sounds pretty good. Do you think they have an 8khz version?

Ringbinder theme by Themocracy