Let’s face it. Voice recognition with Google has been hit and miss, and that’s on a good day. So we’re delighted to shift gears and introduce a new platform powered by IBM Watson’s Speech-to-Text (STT) engine. While it’s not free, that’s really theoretical for most of our readers. Your first month on the platform is entirely free. And, after that, you get 1,000 minutes a month of free voice recognition services. If you still want more, it’s 2¢ a minute.

We first introduced IBM’s STT platform back in March when we documented how to use the service to transcribe voicemails and deliver them via email. Today, we’re introducing the Incredible Voice Dialer for Asterisk. It runs on all of the major Incredible PBX platforms: CentOS, Wazo, and Issabel. It’s married to our AsteriDex phonebook application that is deployed with Incredible PBX using MySQL, MariaDB, or SQLite3 depending upon platform.

The way it works is a user picks up an extension on your PBX and dials 411. The caller will be prompted for the name of the person or company to call. Once the caller says the name, the Incredible Voice Dialer will send the recording to IBM’s Watson STT engine for transcription. The result is then passed to AsteriDex where the text will be matched against the phone number saved for that person or company. The number is then passed to your default outbound trunk to place the call. All of the magic happens in less than two seconds, and the call begins ringing at your destination. You can try it out for yourself on our demo server this week. Just dial: , choose option 1 when the IVR answers, and then say "Delta Airlines" or "American Airlines" when prompted for a name. The queries support wildcard matching. If you say "Delta", you’ll still be connected to Delta Airlines.

What About the Quality? Here’s the bottom line. Speech recognition isn’t all that useful if it fails miserably in recognizing everyday speech. The good news is that IBM Watson’s speech recognition engine is now the best in the business. If you want more details, read the article below which will walk you through IBM’s latest speech recognition breakthrough:

Creating an IBM Bluemix Speech to Text Account

1. Create Bluemix account here.

2. Confirm your registration by replying to email from IBM.

3. Login to Bluemix using your new credentials.

4. Agree to terms and conditions, name your organization, and name your space (STT).

5. Choose Watson Speech to Text service and click Create.

6. When Speech to Text-kb opens, click Service Credentials tab (on the left).

7. In Actions column, click View Credentials. Write down your username and password.

8. Logout by clicking on image icon in upper right corner of dialog window.


Install Voice Dialer with Incredible PBX for Wazo

1. Login to your server as root using SSH/Putty and issue the following commands:

cd /
wget http://incrediblepbx.com/ibmstt-411-wazo.tar.gz
tar zxvf ibmstt-411-wazo.tar.gz
rm -f ibmstt-411-wazo.tar.gz
sed -i '\:// BEGIN Call by Name:,\:// END Call by Name:d' /etc/asterisk/extensions_extra.d/xivo-extrafeatures.conf
sed -i '/\[xivo-extrafeatures\]/r /tmp/411.txt' /etc/asterisk/extensions_extra.d/xivo-extrafeatures.conf
asterisk -rx "dialplan reload"

2. Edit /var/lib/asterisk/agi-bin/getnumber.sh and insert your IBM credentials from step #7 above into these variables:


3. Save the file.


Install Voice Dialer on Other Incredible PBX Platforms

1. Login to your server as root using SSH/Putty and issue the following commands:

cd /
wget http://incrediblepbx.com/ibmstt-411.tar.gz
tar zxvf ibmstt-411.tar.gz
rm -f ibmstt-411.tar.gz
sed -i '\:// BEGIN Call by Name:,\:// END Call by Name:d' /etc/asterisk/extensions_custom.conf
sed -i '/\[from-internal-custom\]/r /tmp/411.txt' /etc/asterisk/extensions_custom.conf
asterisk -rx "dialplan reload"

2. Edit /var/lib/asterisk/agi-bin/getnumber.sh and insert your IBM credentials from step #7 above into these variables:


3. Save the file.


Take Incredible Voice Dialer for a Test Drive

1. From an extension connected to your PBX, dial 411. When prompted for the name to call, say "Delta Airlines" or "American Airlines."

2. Quicker than you could actually dial the number, you’ll be connected.


Building Voice-Enabled Applications with Asterisk

All of our code is open source, GPL2 code so you’re more than welcome to use it, learn from it, and then build your own voice-enabled applications. Just abide by the terms of the license and share. When you review /var/lib/asterisk/agi-bin/getnumber.sh, you’ll see that it’s incredibly easy to change the backend database. Here’s the Wazo flavor of the script:



# sending the recording to IBM Watson for transcription
curl -k -u $API_USERNAME:$API_PASSWORD -X POST --limit-rate 40000 --header "Content-Type: audio/wav" --data-binary @/tmp/$thisfile.wav "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=en-US_NarrowbandModel" 1>/tmp/$thisfile.txt

# grabbing the text out of the IBM Watson response
msg=`cat /tmp/$thisfile.txt | grep transcript | cut -f 2 -d ":" | cut -f 2 -d '"' | sed 's| *$||' | sed -e "s/b(.)/u1/g"`%

# passing text to MySQL (1st line) or SQLite3 (2nd line) for name lookup. answer is num2call.
#num2call=$(mysql -uroot -ppassw0rd asteridex -ss -N -e "SELECT user1.out FROM user1 where name LIKE '$msg'");
num2call=`/usr/bin/sqlite3 /var/lib/asterisk/agi-bin/asteridex.sqlite "select out from user1 where name LIKE '$msg'"`

# clearing out our temporary files
rm -f /tmp/$thisfile.*

# passing the results to the Asterisk dialplan
echo "SET VARIABLE PTY2CALL "\""$msg"\"""
echo "SET VARIABLE NUM2CALL "\""$num2call"\"""

# we're done with the AGI bash script so let's exit gracefully
exit 0

The Asterisk dialplan code could be modified for any number of applications. Here’s what it looks like on the Incredible PBX 13 platform. It’s slightly different with Wazo to accomodate their dialplan syntax.

;# // BEGIN Call by Name
exten => 411,1,Answer
exten => 411,n,Playback(custom/411)
exten => 411,n,Set(RANDFILE=${RAND(8000,8599)})
exten => 411,n,Record(/tmp/${RANDFILE}.wav,3,10)
exten => 411,n,Playback(/tmp/${RANDFILE})
exten => 411,n,AGI(getnumber.sh,${RANDFILE})
exten => 411,n,NoOp(Party to call : ${PTY2CALL})
exten => 411,n,NoOp(Number to call: ${NUM2CALL})
exten => 411,n,Goto(outbound-allroutes,${NUM2CALL},1)
exten => 411,n,Hangup()
;# // END Call by Name

There’s nothing magical about it. (1) It answers the call to 411. (2) It plays back a recording that prompts the user to say the name of the person or company to call. (3) It generates a random number to use for the filenames associated with the STT process. (4) It records the caller’s speech and saves it to the random filename as a .wav file which IBM STT can understand. (5) It passes the call to the AGI bash script to send the recording to IBM Watson and obtain the transcription and to pass the text to MySQL or SQLite3 to lookup the text in the AsteriDex database. (6) We display the called party’s name on the Asterisk CLI. (7) We display the called party’s phone number on the Asterisk CLI. (8) We place the call using the PBX’s default outbound route. (9) We hangup the call when it’s completed.

Published: Monday, October 9, 2017  

Need help with Asterisk? Visit the PBX in a Flash Forum.


Special Thanks to Our Generous Sponsors

FULL DISCLOSURE: RentPBX, Amazon, Vitelity, DigitalOcean, Vultr, Digium, Sangoma, 3CX, TelecomsXchange and others have provided financial support to Nerd Vittles and our open source projects through advertising, referral revenue, and/or merchandise. We’ve chosen these providers not the other way around. Our decisions are based upon their corporate reputation and the quality of their offerings and their pricing. Our recommendations regarding technology are reached without regard to financial compensation except in situations in which comparable products at comparable pricing are available from multiple sources. In this limited case, we support our sponsors because our sponsors support us.

Awesome Vitelity Special. Vitelity has generously offered a terrific discount for Nerd Vittles readers. You now can get an almost half-price DID from our special Vitelity sign-up link. If you’re seeking the best flexibility in choosing an area code and phone number plus the lowest entry level pricing plus high quality calls, then Vitelity is the hands-down winner. Vitelity provides Tier A DID inbound service in over 3,000 rate centers throughout the US and Canada. When you use our special link to sign up, Nerd Vittles gets a few shekels down the road to support our open source development efforts while you get an incredible signup deal as well. The going rate for Vitelity’s DID service is $7.95 a month which includes up to 4,000 incoming minutes on two simultaneous channels with terminations priced at 1.45¢ per minute. Not any more! For our users, here’s a deal you can’t (and shouldn’t) refuse! Sign up now, and you can purchase a Tier A DID with unlimited incoming calls and four simultaneous channels for just $3.99 a month. To check availability of local numbers and tiers of service from Vitelity, click here. NOTE: You can only use the Nerd Vittles sign-up link to order your DIDs, or you won’t get the special pricing! Vitelity’s rate is just 1.44¢ per minute for outbound calls in the U.S. There is a $35 prepay when you sign up. This covers future usage. Any balance is refundable if you decide to discontinue service with Vitelity.

RentPBX, a long-time partner and supporter of PIAF project, is offering generous discounts for Nerd Vittles readers. For all of your Incredible PBX hosting needs, sign up at www.RentPBX.com and use code NOGOTCHAS to get the special pricing. The code will lower the price to $14.99/month, originally $24.99/month. It’s less than 50¢/day.

Some Recent Nerd Vittles Articles of Interest…

Print Friendly, PDF & Email

Be Sociable, Share!


This article has 1 comment

  1. I think that 5% rate is totally misleading in this situation. Maybe it can do that on connected speech but a person or company name doesn’t give it enough context. And it cannot distinguish between eg "Geoff" and "Jeff". That is useless when trying to match the textual names in Asteridex. (Even a human with no input other than hearing the word cannot know if "Geoff" or "Jeff" was intended.)

    And even if it could achieve that 5% error rate, that would still be annoyingly high. My old mobile phone does a good job of voice dialing because it doesn’t do STT followed by a text match. It does a fuzzy match on the spoken audio signal against the set of previously recorded spoken audio signals constructed as each name was added to the contact database.

    [WM: Your approach would certainly work. I’m not aware of that many smartphones that actually store audio clips linked to phonebook entries. However, we’ve had excellent results with AsteriDex queries just using a plain-text database. The simple answer to your "Geoff" or "Jeff" issue is to actually try a call, watch the Asterisk CLI, and see which spelling the STT engine returns. Then simply adjust the AsteriDex entry accordingly. For most users, there are probably a handful of these entries to work through so it’s really not a showstopper. And our success rate far exceeds the 95% threshold mentioned in our article.]