Let’s face it. Voice recognition with Google has been hit and miss, and that’s on a good day. So we’re delighted to shift gears and introduce a new platform powered by IBM Watson’s Speech-to-Text (STT) engine. While it’s not free, that’s really theoretical for most of our readers. Your first month on the platform is entirely free. And, after that, you get 1,000 minutes a month of free voice recognition services. If you still want more, it’s 2¢ a minute.

We first introduced IBM’s STT platform back in March when we documented how to use the service to transcribe voicemails and deliver them via email. Today, we’re introducing the Incredible Voice Dialer for Asterisk. It runs on all of the major Incredible PBX platforms: CentOS, Wazo, and Issabel. It’s married to our AsteriDex phonebook application that is deployed with Incredible PBX using MySQL, MariaDB, or SQLite3 depending upon platform.

The way it works is a user picks up an extension on your PBX and dials 411. The caller will be prompted for the name of the person or company to call. Once the caller says the name, the Incredible Voice Dialer will send the recording to IBM’s Watson STT engine for transcription. The result is then passed to AsteriDex where the text will be matched against the phone number saved for that person or company. The number is then passed to your default outbound trunk to place the call. All of the magic happens in less than two seconds, and the call begins ringing at your destination. You can try it out for yourself on our demo server this week. Just dial: , choose option 1 when the IVR answers, and then say "Delta Airlines" or "American Airlines" when prompted for a name. The queries support wildcard matching. If you say "Delta", you’ll still be connected to Delta Airlines.

What About the Quality? Here’s the bottom line. Speech recognition isn’t all that useful if it fails miserably in recognizing everyday speech. The good news is that IBM Watson’s speech recognition engine is now the best in the business. If you want more details, read the article below which will walk you through IBM’s latest speech recognition breakthrough:


Creating an IBM Bluemix Speech to Text Account

NOV. 1 UPDATE: IBM has moved the goal posts effective December 1, 2018:

1. Create Bluemix account here.

2. Confirm your registration by replying to email from IBM.

3. Login to Bluemix using your new credentials.

4. Agree to terms and conditions, name your organization, and name your space (STT).

5. Choose Watson Speech to Text service and click Create.

6. When Speech to Text-kb opens, click Service Credentials tab (on the left).

7. In Actions column, click View Credentials. Write down your username and password.

8. Logout by clicking on image icon in upper right corner of dialog window.

 

Install Voice Dialer with Incredible PBX for Wazo

1. Login to your server as root using SSH/Putty and issue the following commands:

cd /
wget http://incrediblepbx.com/ibmstt-411-wazo.tar.gz
tar zxvf ibmstt-411-wazo.tar.gz
rm -f ibmstt-411-wazo.tar.gz
sed -i '\:// BEGIN Call by Name:,\:// END Call by Name:d' /etc/asterisk/extensions_extra.d/xivo-extrafeatures.conf
sed -i '/\[xivo-extrafeatures\]/r /tmp/411.txt' /etc/asterisk/extensions_extra.d/xivo-extrafeatures.conf
asterisk -rx "dialplan reload"

2. Edit /var/lib/asterisk/agi-bin/getnumber.sh and insert your IBM credentials from step #7 above into these variables:

API_USERNAME="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
API_PASSWORD="XXXXXXXXXXXX"

3. Save the file.

 

Install Voice Dialer on Other Incredible PBX Platforms

1. Login to your server as root using SSH/Putty and issue the following commands:

cd /
wget http://incrediblepbx.com/ibmstt-411.tar.gz
tar zxvf ibmstt-411.tar.gz
rm -f ibmstt-411.tar.gz
sed -i '\:// BEGIN Call by Name:,\:// END Call by Name:d' /etc/asterisk/extensions_custom.conf
sed -i '/\[from-internal-custom\]/r /tmp/411.txt' /etc/asterisk/extensions_custom.conf
asterisk -rx "dialplan reload"

2. Edit /var/lib/asterisk/agi-bin/getnumber.sh and insert your IBM credentials from step #7 above into these variables:

API_USERNAME="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
API_PASSWORD="XXXXXXXXXXXX"

3. Save the file.

 

Take Incredible Voice Dialer for a Test Drive

1. From an extension connected to your PBX, dial 411. When prompted for the name to call, say "Delta Airlines" or "American Airlines."

2. Quicker than you could actually dial the number, you’ll be connected.

 

Building Voice-Enabled Applications with Asterisk

All of our code is open source, GPL2 code so you’re more than welcome to use it, learn from it, and then build your own voice-enabled applications. Just abide by the terms of the license and share. When you review /var/lib/asterisk/agi-bin/getnumber.sh, you’ll see that it’s incredibly easy to change the backend database. Here’s the Wazo flavor of the script:

API_USERNAME="XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
API_PASSWORD="XXXXXXXXXXXX"

thisfile="$1"

# sending the recording to IBM Watson for transcription
curl -k -u $API_USERNAME:$API_PASSWORD -X POST --limit-rate 40000 --header "Content-Type: audio/wav" --data-binary @/tmp/$thisfile.wav "https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true&model=en-US_NarrowbandModel" 1>/tmp/$thisfile.txt

# grabbing the text out of the IBM Watson response
msg=`cat /tmp/$thisfile.txt | grep transcript | cut -f 2 -d ":" | cut -f 2 -d '"' | sed 's| *$||' | sed -e "s/b(.)/u1/g"`%

# passing text to MySQL (1st line) or SQLite3 (2nd line) for name lookup. answer is num2call.
#num2call=$(mysql -uroot -ppassw0rd asteridex -ss -N -e "SELECT user1.out FROM user1 where name LIKE '$msg'");
num2call=`/usr/bin/sqlite3 /var/lib/asterisk/agi-bin/asteridex.sqlite "select out from user1 where name LIKE '$msg'"`

# clearing out our temporary files
rm -f /tmp/$thisfile.*

# passing the results to the Asterisk dialplan
echo "SET VARIABLE PTY2CALL "\""$msg"\"""
echo "SET VARIABLE NUM2CALL "\""$num2call"\"""

# we're done with the AGI bash script so let's exit gracefully
exit 0

The Asterisk dialplan code could be modified for any number of applications. Here’s what it looks like on the Incredible PBX 13 platform. It’s slightly different with Wazo to accomodate their dialplan syntax.

;# // BEGIN Call by Name
exten => 411,1,Answer
exten => 411,n,Playback(custom/411)
exten => 411,n,Set(RANDFILE=${RAND(8000,8599)})
exten => 411,n,Record(/tmp/${RANDFILE}.wav,3,10)
exten => 411,n,Playback(/tmp/${RANDFILE})
exten => 411,n,AGI(getnumber.sh,${RANDFILE})
exten => 411,n,NoOp(Party to call : ${PTY2CALL})
exten => 411,n,NoOp(Number to call: ${NUM2CALL})
exten => 411,n,Goto(outbound-allroutes,${NUM2CALL},1)
exten => 411,n,Hangup()
;# // END Call by Name

There’s nothing magical about it. (1) It answers the call to 411. (2) It plays back a recording that prompts the user to say the name of the person or company to call. (3) It generates a random number to use for the filenames associated with the STT process. (4) It records the caller’s speech and saves it to the random filename as a .wav file which IBM STT can understand. (5) It passes the call to the AGI bash script to send the recording to IBM Watson and obtain the transcription and to pass the text to MySQL or SQLite3 to lookup the text in the AsteriDex database. (6) We display the called party’s name on the Asterisk CLI. (7) We display the called party’s phone number on the Asterisk CLI. (8) We place the call using the PBX’s default outbound route. (9) We hangup the call when it’s completed.

Published: Monday, October 9, 2017  




Need help with Asterisk? Visit the PBX in a Flash Forum.


 

Special Thanks to Our Generous Sponsors

FULL DISCLOSURE: RentPBX, Amazon, Skyetel, Vitelity, DigitalOcean, Vultr, Digium, Sangoma, 3CX, TelecomsXchange and others have provided financial support to Nerd Vittles and our open source projects through advertising, referral revenue, and/or merchandise. We’ve chosen these providers not the other way around. Our decisions are based upon their corporate reputation and the quality of their offerings and pricing. Our recommendations regarding technology are reached without regard to financial compensation except in situations in which comparable products at comparable pricing are available from multiple sources. In this limited case, we support our sponsors because our sponsors support us.

Skyetel $50 Free Trial: Enjoy state-of-the-art VoIP service with a $50 free trial and free number porting when you sign up for a Skyetel account. No restrictions on number of simultaneous calls, and triple data center redundancy assures that you never experience a failed call. Tutorial and sign up details available here.

Special Thanks to Vitelity. Vitelity is now Voyant Communications and has halted new registrations for the time being. Our special thanks to Vitelity for their unwavering financial support over many years and to the many Nerd Vittles readers who continue to enjoy the benefits of their service offerings. We will keep everyone posted on further developments.
 


RentPBX, a long-time partner and supporter of PIAF project, is offering generous discounts for Nerd Vittles readers. For all of your Incredible PBX hosting needs, sign up at www.RentPBX.com and use code NOGOTCHAS to get the special pricing. The code will lower the price to $14.99/month, originally $24.99/month. It’s less than 50¢/day.


Some Recent Nerd Vittles Articles of Interest…

Print Friendly, PDF & Email

Be Sociable, Share!

Tags:

This article has 1 comment

  1. I think that 5% rate is totally misleading in this situation. Maybe it can do that on connected speech but a person or company name doesn’t give it enough context. And it cannot distinguish between eg "Geoff" and "Jeff". That is useless when trying to match the textual names in Asteridex. (Even a human with no input other than hearing the word cannot know if "Geoff" or "Jeff" was intended.)

    And even if it could achieve that 5% error rate, that would still be annoyingly high. My old mobile phone does a good job of voice dialing because it doesn’t do STT followed by a text match. It does a fuzzy match on the spoken audio signal against the set of previously recorded spoken audio signals constructed as each name was added to the contact database.

    [WM: Your approach would certainly work. I’m not aware of that many smartphones that actually store audio clips linked to phonebook entries. However, we’ve had excellent results with AsteriDex queries just using a plain-text database. The simple answer to your "Geoff" or "Jeff" issue is to actually try a call, watch the Asterisk CLI, and see which spelling the STT engine returns. Then simply adjust the AsteriDex entry accordingly. For most users, there are probably a handful of these entries to work through so it’s really not a showstopper. And our success rate far exceeds the 95% threshold mentioned in our article.]