Multi Language Chat Bot Suggested Architecture

Natural conversations, by their very nature, allow for the flexibility of switching language mid-conversation. In fact, for multi-lingual individuals such as my brothers and me, switching between various languages allows us to emphasize certain concepts without explicitly stating so. We generally speak in Polish (English if our wives are present), English to fill in words we don’t know in Polish and Spanish to provide emphasis or a callback to something that happened in our childhood growing up in Puerto Rico. Chat bots, in their current state without Artificial General Intelligence, does not allow for the nuance of language choice. However, given the state of language recognition and machine translation, we can implement a somewhat intelligent multilingual chat bot. In fact, I design and develop the code for an automated approach in my book. In this post, I outline the general automatic approach below. Afterwards, I highlight the downsides of this approach and list the different problems that need to be solved when creating a production quality multi language chat bot experience.

A Naive Approach

I call the fully automated approach naive. This is the type of approach most projects start off with. It’s somewhat easy to put in place and moves the project into the multi lingual realm quite quickly. It comes with its set of challenges. Before I dive into those, let’s review the approach. Assume we have a working English natural language model and English content, the bot can implement multi lingual conversations as follows.

  1. Receive user input…
  2. … in their native language.
  3. Detect the user input language and store in user’s preferences.
  4. If incoming message is not English, translate into English.
  5. Send English user utterance to NLU platform.
  6. Execute logic and render English output.
  7. If user’s language was not English, translate output into user’s native language.
  8. Send response back to user.

This approach works but the conversation quality is off. Although machine translation has improved by leaps and bounds, there are still cases in which the conversation feels stiff and culturally disconnected. There are three areas where this approach suffers.

  • Input utterance cultural nuances: utterance translation can sometimes feel awkward, especially for heavy slang or for highly proprietary language. NLU model performance suffer as a result.
  • Ambiguous language utterance affect conversation flow: a word like no or mama can easily turn conversation into another language. For example, in some language detection engines, the word no gets consistently classified as Spanish. If the bot were to ask a yes/no question, answering no will trigger a response in Spanish.
  • Output translation branding quality: although automatic machine translation is a good start, companies and brands that want fine tuned control over their bot’s output will cringe at the output generated by the machine translation service.

Moving to a Hybrid Managed Approach

I address each issue separately. The answer to these problems vary based on risk aversion, content quality and available resources. I highlight options for each item as we progress through the items.

Multi Language NLU

Ideally, I like my chat bot solutions to have an NLU model for each supported language. Obviously, the cost of creating and maintaining these models can be significant. For multi language solutions, I always ask for the highest priority languages that a client would like to support. If an enterprise can support 90% of employees by getting two languages working well, then we can limit the NLU scope to those two languages, while using the automatic approach for any other languages. In many of my projects, I use Microsoft’s LUIS. I might create one model for English and another one for Simplified Chinese. That way, Chinese users don’t suffer the nuanced translation tax. Project stakeholders also need to decide whether the chat bot should support an arbitrary amount of languages or limit the valid inputs to languages with an NLU model. If it does, the automatic approach above will be applied to non-natively supported languages.

Ambiguous Language Detection

The issue with ambiguous language detection is that short utterances may be valid utterances in multiple languages. Further complicating the matter is that the translation APIs such as Microsoft and Google’s do not return options and confidence levels. There are numerous approaches in terms of resolving the ambiguous language problem. Two possible approaches are (1) run a concatenation of the last N user utterances through the language recognition engine, or, (2) maintain a list of ambiguous words that we ignore for language detection and use the user’s last utterance language instead. Both are different flavors of simply considering the user’s language preference as a conversation level rather than message level property. If we are interested in supporting switching between languages mid conversation, a mix of both approaches works well.

Output Content Translation

Similarly to the Multi Language NLU piece, I encourage clients to maintain the precise localized content sent by the chat bot, especially for public consumer or regulated industry use cases where any mistranslated content might result in either pain for a brand or fines. This, again, is a risk versus effort calculation that needs to be performed by the right stakeholders. The necessity of controlling localized content and the effort involved in it typically weighs on whether the bot supports arbitrary languages or not.

Final Architecture

Based on all the above, here is what a true approach to a multi lingual chat bot experience would look like.

The bot in this case:

  1. Receives user input…
  2. … in their native language.
  3. Detects the user input language and store in user’s preferences. Language detection is based both on an API but also on utterance ambiguity rules.
  4. Depending on the detected language…
    1. If we have an NLU model for the detected language, the bot queries that NLU model.
    2. If not, assuming we want to support all languages, the bot translates the user’s messages into English and uses the English NLU model to resolve intent. Assuming we want to support a closed set of languages, the bot may response with a not recognized kind of message.
  5. Executes the chat bot logic and render localized output.
  6. If user’s language was not English and our bot support arbitrary languages, the bot automatically translates the output into user’s native language.
  7. Sends response back to user.

The managed models and paths to automatic translation add nuance to the automatic approach. If we imagine a spectrum in which on one end we find the fully automatic approach and on the other end the fully managed approach, all implementations fall somewhere within this spectrum. Clients in regulated industries and heavily branded scenarios will lean towards the fully managed end and clients with internal or less precise use cases will typically find the automatic approach more effective and economical.

The hybrid managed/automatic implementation does take some effort but results in the best conversational experience. Let me know your experience!