Using Google Cloud Speech Services with the 3CX Call Flow Designer
Introduction
Google Cloud offers Text To Speech (TTS) and Speech To Text (STT), as cloud services. 3CX supports both, the first as an alternative engine for text to speech, and the second to provide speech recognition through the Voice Input component. To use this feature you need 3CX Phone System v16 Update 6 or later.
This guide describes how to create the Google Cloud account, enable the Text to Speech / Speech to Text services, and use these within a CFD application.
💡 Tip: The project for this example application is available via the CFD Demos GitHub page, and is installed along with the 3CX Call Flow Designer in your Windows user documents folder, i.e. “C:\Users\YourUsername\Documents\3CX Call Flow Designer Demos”.
Text to Speech with Google Cloud
Many times there is the need to reproduce audio that cannot be pre-recorded, e.g. a name, a place, or some task description obtained from a database. In these cases Text to Speech (TTS) can be employed, letting us create WAV files on the fly for the CFD app to play them back to the caller.
The 3CX Call Flow Designer includes the Text to Speech Audio Prompt, used when configuring prompts with the Prompt Playback component, the Menu component, the User Input component, and so on.
The CFD app converts text to speech in real time, just before playing the message to the caller. It invokes a web service to get the audio stream, and saves it to a local WAV file. Finally, when the call ends, the WAV files are automatically removed to keep the installation clean.
To use TTS, you can use the engine provided by Google Cloud or Amazon Web Services. This guide explains how to set up a CFD app using Google Cloud. To use TTS with Amazon Web Services, please refer to this guide.
Before selecting the Google Cloud engine, check the language coverage and available voices.
Speech to Text with Google Cloud
You can use the Voice Input component to enable speech recognition from the caller, and convert the result to text. For example, you can ask your customers to verbally specify:
- an alphanumeric ID, so you can do a database lookup with it.
- the name of a person or department within your company to automatically connect the caller to the appropriate destination.
The component sends the caller audio in real time to Google Cloud, and receives back the recognition in real time, validating the input in the process.
Before you start working on a CFD project with speech recognition, please check the supported languages.
Step 1: Create a Google Cloud Account
Before you start working on your CFD project, you need a Google Cloud account. To create it, go to the Google Cloud Console, and follow the instructions to activate your account.
Step 2: Create a Google Cloud Project
Once your Google Cloud account is active, go the Google Cloud Console and:
- Create a new project with an appropriate name, e.g. “3CX TTS and STT”, and then click on “CREATE”.
- Enable the required APIs by going to the “Menu” > “APIs & Services” > “Dashboard” and then clicking “+ ENABLE APIS & SERVICES”. Select and enable these services:
- Cloud Speech-to-Text API.
- Cloud Text-to-Speech API.
- To create a service account, so the CFD app can authenticate with Google Cloud and use this project, go to the “Menu” > “IAM & Admin” > “Service Accounts” and click on “+ CREATE SERVICE ACCOUNT”.
- Fill in the service account details with appropriate values and click on “CREATE”.
- Set the “Role” to “Project Owner” and click on “CONTINUE”.
- In the “Grant users access to this service account” section, leave the fields empty and click on “DONE”.
- In the new row containing the service account details, click the 3 dots on the right side and select “Manage keys”.
- Select “ADD KEY” > “Create new key”. Select JSON and click “CREATE” to download a JSON file to your computer. Store this file in a secure location to access your cloud resources. You need this file to configure “Online Services” in your CFD application.
Step 3: Create the CFD Project
With your Google Cloud account ready, you can proceed to create our Call Flow Designer project:
- Open the CFD and go to “File” > “New” > “Project”, select the folder where you want to save it, and enter a name for the project, e.g. “SpeechToTextDemo”.
- Go to the “Tools” > “Online Services” menu and:
- Under “Text To Speech” select:
- “Online Service”: Google Cloud
- “Service Account Key JSON File”: select the JSON file downloaded in the previous step.
- Under “Speech To Text” select:
- “Online Service”: Google Cloud
- “Service Account Key JSON File”: the JSON file is already selected, as it is the same for TTS and STT.
These settings are used for every Text To Speech Audio Prompt and Voice Input component in your project.
Step 4: Add a “Voice Input” Component
The “Voice Input” component lets you configure prompts to ask for input to the caller, so in this demo both Text to Speech and Speech to Text are used in the same component. A “Prompt Playback” component is added first to provide a welcome message, before moving to the “Voice Input”.
To add the “Prompt Playback” component:
- Drag a “Prompt Playback” component from the toolbox, and drop it into the design view of the “Main” callflow. Then select the added component and go to the “Properties” to rename it to “Welcome”.
- From the “Properties”, open the “Prompt Collection Editor”, clicking the button on the right of the “Prompts” property.
- Click “Add” to add a new prompt to the collection, and change the type to “Text to Speech Audio Prompt”.
- Select the Voice to use. The drop down list of voices is ordered by language, so you can easily find the options available for the language you need to use.
🛈 Note: The voices available for Google Cloud are listed here. In case of Google Cloud releasing a new voice not included yet in this drop down list, you can just enter the value from the “Voice name” column to use it. If you want a specific voice to be pre-filled, you can set it from “Tools” > “Options” > “Component Templates” > “Text To Speech”. For this demo the “en-US-Standard-B (English - US, Male)” voice profile is used.
- Select the Type of text:
- “Text” - The value of the “Text” property is considered as plain text by the TTS engine, converting it to speech just as it is. “Text” is set for this example to represent typical usage.
- “SSML” (Speech Synthesis Markup Language) - The value of the “Text” property is considered XML according to the SSML specification. With SSML you can control various aspects of speech such as pronunciation, volume, pitch, and speech rate. For more information, see the Google guide on using SSML.
- Enter an expression for the Text. Depending on the type selected in the previous step, the expression must return plain text to convert to speech, or XML according to the SSML specification. For this demo this static text can be used:
To add the “Voice Input” component:
- Drag a “Voice Input” component from the toolbox, and drop it into the design view of the “Main” callflow. Then select the component added, go to the “Properties” and rename it to “AskForDepartment”.
- Double click the added component to launch the configuration dialog and set:
- “Input Timeout”: 3 seconds. This means that the component tries to recognize audio until the caller remains in silence for 3 seconds, or it recognizes something.
- “Max Retries”: 3. This means that the component repeats the prompts asking for input up to 3 times, when the user remains in silence or the input is not valid.
- “Language Code”: select “en-US”.
- For the prompts, use “Text to Speech Audio Prompts” and configure the following texts for each prompt:
- “Initial Prompts”: "Please say the name of the department you want to connect to, for example Sales, Support or Marketing."
- “Subsequent Prompts”: "Let's give it another try. Say the name of the department you want to connect to, for example Sales, Support or Marketing."
- “Timeout Prompts”: "Sorry, I couldn't hear you."
- “Invalid Input Prompts”: "Sorry, you need to say one of the valid options, for example Sales, Support or Marketing."
- For the “Dictionary”, you can define three (3) valid options: “Sales”, “Support” and “Marketing”. This means that the “Voice Input” component is to try and identify one of these terms in the recognized text. When it does, the speech recognition ends and the component moves on.
- When the “Voice Input” component recognizes an entry from the dictionary, the execution continues in the “Valid Input” branch. In this case you need to check what was recognized using a Create a Condition component, and then transfer the call to the appropriate destination. Add this component from the toolbox into the “Valid Input” branch, name it “CheckRecognition”, and configure the component with three (3) branches: “Sales”, “Support” and “Marketing”.
- For the Sales branch, use the following expression in the Condition property:
- For the Support branch, use the following expression in the Condition property:
- For the Marketing branch, leave the Condition empty, so the branch is executed when the first 2 branches are skipped.
- In each of these branches, add a “Transfer” component configured to transfer the call to the appropriate destination.
- Finally, add another “Transfer” component to the “Invalid Input” branch, so that the call can be transferred to the receptionist.
Step 5: Build and Deploy to 3CX Phone System
The project is ready to build and upload to our 3CX Phone System server, with these steps:
- Select “Build” > “Build All” and the CFD generates the file “SpeechToTextDemo.zip”.
- Go to the “3CX Management Console” > “Advanced” > “Call Flow Apps” > “Add/Update”, and upload the file created by the CFD in the previous step.
- The Call Flow app is ready to use. Make a call to it to test this app. Please note that the very first time you call this application, the first text to speech conversion and the first speech recognition might have a short delay. This is related to the authentication procedure, and only happens the first time you call the app.
See Also
- Learn more about CFD components.
- Automated Telephone Ordering Voice app with CRM integration via the 3CX API.
- Sending emails from a CFD voice app.
- Routing Calls Based on the Time of Day.
- Using the Authentication Component to Validate Customers.
- Using the Credit Card Component.
- Text to Speech and Speech to Text with the 3CX Call Flow Designer.
- Using the Loop component to navigate upwards
- Registering and making callbacks
- Using the survey component
- Using the CRM Lookup component
- See how to integrate your PBX with a CRM via the 3CX API.
Last Updated
This document was last updated on 10th August 2021
https://www.3cx.com/docs/cfd-google-cloud-speech/