Time to Get Started with Chat and Voice

I have spent the last two and a half years of my career focusing on a technology that, back then, was easily dismissible. So much that others in at my work doubted we could build a successful business around it. At the time, chat bots had gained a somewhat notorious reputation for underwhelming users because of the bots’ limitations. From a technology implementation perspective, what was a clear attempt at providing narrow, but useful, conversational experiences became a target of Turing completeness ridicule. No way this is AI, they said. It was fair, but very misplaced criticism. In the past two years, chat bots have been gaining steam across the consumer and enterprise space. Bots are filling a real need.

Users who have a smartphone love their messaging apps. Look at the average user’s phone and you will find apps the likes of WhatsApp, WeChat, Snapchat, Facebook Messenger and so on. You know what you will not find? A mobile app for a local mechanic or a local flower shop. Users, millennials especially, heavily prefer messaging to calling. Messaging is convenient and, of importance, asynchronous. If we interact with friends using messaging apps, why should we interact with businesses any differently? The writing is on the wall and companies from Facebook to Twitter and Apple are on board.

Of equal relevance are digital assistants like Alexa, Cortana and Google Assistant. As these become more and more integrated with our daily activities, our expectations around communicating with computer agents using natural language become more ingrained. I just attended the VOICE 2018 conference in Newark, NJ. The stories shared around our interactions with voice assistants resonated, especially as they reflect real usage in our homes. For instance, children love Alexa. They love asking her all kinds of questions, watching fun videos and, most recently, playing games by using gadgets like Echo Buttons. Nursing homes and the elderly stand to benefit as well; there is something human about being able to speak to Alexa at any time, especially for those living alone. For everyone in between, it acts as an appointment assistant, a task tracker or a glorified kitchen timer. As we become accustomed to these voice interactions, expecting the same level of natural language comprehension with all kinds of computer agents will become second nature.

As one would expect, there is significant overlap between the technologies powering both chat bots and voice experiences. At the end of the day, a conversational experience is composed of a per-user state machine. An incoming user message gets distilled into an intent and an optional set of entities. Given a user’s state, incoming intent and entities, the state machine takes the three pieces as input and transitions the user to the next state. For example, if I begin a conversation with a bot I may be in a Begin state. If I say, What is the current weather?, the state machine would transition me to the CurrentWeather state, in which the right business logic to fetch the weather and generate a response would be executed. The collection of all these state transitions is the conversation. Natural Language Understanding (NLU) technologies such as Microsoft’s LUIS, Rasa NLU and Google’s Dialogflow, among many others are the Narrow AI behind conversational experiences. There are also many options for developing the conversation engine that powers the state machine, such as Microsoft’s Bot Framework, Google’s Dialogflow, Amazon’s Lex, Watson Assistant and many others. Once we have an NLU system and a conversation engine, our last task is to build the business logic to provide responses to the combination of users’ context, and their input intents and entities.

The process of building voice and chat bots is very similar across the different tools. Many approaches leave the NLU and conversation engine piece in the cloud and only call into your business logic as necessary. In my book, Practical Bot Development: Designing and Building Bots with Node.js and Microsoft Bot Framework, I make the explicit choice of using Microsoft’s Bot Framework, one of more flexible options in the marker that my team has used across more than a dozen production bots. Microsoft’s approach allows developers the flexibility to implement their own conversation engine logic, and thus, is a great teaching tool. In the book, we make the journey from developing simple bots connected to Facebook Messenger to powering a Twilio phone conversation or Alexa skill using the same technology. We integrate with Google’s OAuth and connect a chat bot to Google’s Calendar API. We discuss the ins and outs of NLU using LUIS, Adaptive Cards, dynamic graphics generation, human handover, bot analytics and many other topics. The goal is to excite and equip developers with the skills to build fun and impactful conversational experiences!

This is where it gets interesting; once we have the skills to build conversational experiences, what then? The truth is that this is still a new space and we are learning about what it takes to build a truly engaging chat bot or voice skill. So much that the technology to build these experiences is evolving at a breakneck pace. Although frustrating when writing a book, this should excite you! We know so little about this new way of interacting that the platforms are constantly improving ways in which we communicate with users. The space needs innovators and forward-looking developers willing to showcase new and experimental applications that make users’ lives easier. There is no better time to jump into this space that now. Join us!