What Are Intelligent Apps?

The Enterprise world is chock full with all kinds of marketing terminology attempting to capture the latest trends and hype. Not to minimize the terms or imply that it is all hype; the terms, in fact, capture some important trends and patterns in the industry. The term Modern Apps is an example. At the time, it was easy to ask, well if we aren’t developing modern apps, what are we developing? Good question. Old school apps were monolithic logic hogs. Much of the business logic executed on the app itself and the app was most likely designed uniquely for a desktop experience. Modern Apps treat web, desktop and mobile as first class citizens, a nod to the proliferation of non-desktop devices in our lives, and take heavy advantage of server-side (read: cloud-based) logic.

Recently, a new term has crossed my path: Intelligent Apps. Where did this come from and what does it even mean?

An Abbreviated History

In recent years we have seen a huge uptick in the interest in applied Deep Learning algorithms. With tools like Tensor Flow, any developer can write custom Machine Learning algorithms to perform tasks such as image classification or even create components of a self driving car. Part of why Deep Learning has become so prevalent is that the big tech companies have amassed tremendous experience managing and manipulating large sets of data. Given this scale, research and development efforts have been able to explore new Artificial Neural Network architectures that perform very well on certain tasks (see ConvNet for image classification).

Not only are developers equipped with great tools to create their own neural network architectures and algorithms, but the big tech companies have created REST APIs that implement some of the most common algorithms such as NLU (Natural Language Understanding), speech recognition, machine translation and sentiment analysis to name a few. These APIs have been honed by years of research and, some, have been integrated into the companies’ products. Facebook, Google, Amazon and Microsoft products are full of Machine Learning-based features. Facebook recognizes faces in images and recommend tagging friends in photos. Google’s Gmail makes response recommendations and recently added an autocomplete functionality. Google’s Translate app uses machine learning for speech recognition, image recognition and machine translation. Amazon’s Alexa is machine learning in the home using speech recognition and NLU most obviously. Apple’s iOS includes many machine learning features, even an ability to integrate trained models into an iOS app via CoreML. This also applies to Enterprise Software. Microsoft’s Dynamics 365 CRM for example, provides numerous insights based on historical data.

Tech is investing in this space at high rates. They are performing heavy research & development in the quality of neural network architectures, model efficiencies and development tooling. It’s an arms race! Because the techniques and algorithms are easily available for us to use, integrating them into an application is not an option, but a must-have, to enhance customer experiences or optimize business decisions.

Enter Intelligent Apps

An Intelligent App, is then, any application that utilizes machine learning techniques, especially cognitive ones that mimic human intelligence tasks, to help users accomplish their goals more efficiently. The laziest example that usually comes to mind is chat bots, but that is a very limiting view. Let’s start with another example: computer vision. Computer vision is also an incredibly fast growing field. Think about inspecting water treatment facilities using drones and machine learning. Think analyzing call center logs using speech recognition and natural language understanding. Think of the value of having up to date information about each customer’s phone call sentiment. All of these are possibilities and the future of our applications.

Part of the reason I wrote my book was not only to educate others on how to build chat bots, but to also teach developers on how to begin using LUIS to build any NLU-enabled applications. In addition, I dedicate an entire chapter to integrating chat bots with Microsoft’s Cognitive Services. I believe that the end result is a robust introduction to creating intelligent applications, through the lens of chat bots and digital assistants. There is really no better time to learn about this new category of apps than now. You can find Practical Bot Development on the Apress site here or wherever Apress books are sold.

In parallel, developers should become familiar with how machine learning works in general. The brave new AI world may not mean that each of us becomes a professional data scientist, but a deeper understanding of machine learning goes a long way in building the right context and frame of mind for building these apps. The Chairman of Nokia certainly thinks so. As for myself, I took a number of data science in Python courses on DataCamp to become more familiar with the many topics in this space. This kind of knowledge will pay off dividends when developers need to integrate an application with an existing cognitive service or a in-house developed deep learning model.

Posted by Szymon in General

Introducing Luis Version Tools

Microsoft’s LUIS is a super cool NLU (Natural Language Understanding) platform that our teams at Insight have been using on projects for over two years since it was in preview and supported a maximum of 10 intents per application. Since then, LUIS has come a really long way in terms of features, performance, automation and governance. Last May, at Build, Microsoft announced the Bot Builder Tools, a set of mostly Node based scripts that allow for easy scripting of everything from the Azure Bot Service, QnA Maker and LUIS as well as provide facilities to help author LUIS models (ludown) or provide a dispatch LUIS model fronting a set of child LUIS models (dispatch). Ludown quickly became one of my favorite tools and I was fortunate enough to help the team by reporting bugs and requesting feature enhancements. This work inspired much of what I present below.

The Problems with Authoring LUIS

LUIS allows users to create Applications. An Application can have one or more Versions. An Application has two deployment slots: staging and production. A Version is a collection of intents and entities. It can be trained and published into any deployment slot. Version A can be published to the production slot for production apps to utilize, and Version B can be published to the staging slot for development/test application to use. Any Version can be exported into JSON.

There are a number of issues that accumulate:

  • The LUIS JSON is very verbose.
  • There is no audit log; one cannot tell who changed what and when.
  • There is no easy way to tell the difference between two versions.
  • There is no clear direction on when to create a new version or any version naming conventions.
  • An Application having two slots is limiting. For many apps, there are more than two environments. In these cases, the two-slot model fails.

Microsoft provides the Authoring API (also accessible via the luis-apis package). The functionality provided by the API is the beginning of how we solve these problems.

Introducing luis-version

I created two tools to help fill the gap and provide easy automation for model and version management right within source control. The tools are based on ludown and luis-apis. We assume that the entire contents of a LUIS app are managed within a ludown file, which we call model.lu by default. The first tool is luis-version.

The goal of luis-version is to generate a new LUIS app version name if the contents of model.lu have changed since the last version. This is tracked via a file called .luis-app-version that sits within source control. The version name is the current UTC date formatted as YYMMDDHHmm. A new version generated right now, would be named 1810271732. This might seem cryptic, but LUIS limits version names to 10 characters so that’s what we ended up with for now. If a second developer runs luis-version with the same model.lu file and the same .luis-app-version in their working directory, luis-version will understand that the hash is the same, therefore the version name is reused.

After determining the version name, luis-version ensures that the version exists. If it does not, it creates it. It runs model.lu through ludown to generate the LUIS JSON and calls luis import version. You can pass an optional --publish flag to ensure that the new version is immediately published.

luis-version obeys the rules of .luisrc as documented here. In addition, you can pass a different luisrc file by using the --luisrc parameter. For example: luis-version --model model.lu --luisrc .luisrc.prod --publish.

And one can add the --verbose flag to see exactly what the utility is doing underneath.

Sample Walkthrough

Let’s say we have a LUIS model we want to manage in source control. We are familiar with the ludown format and use this sample to get started. We will call this file model.lu. Second, we create three new .luisrc files. The contents of each one are similar except for the appId. These are the three LUIS apps representing the dev, test and production environments. In my case, the authoringKey is the same for all three environments. I left the real appIds in here, though I’ve gotten rid of the applications since.


.luisrc
{
  "authoringKey": "",
  "region": "westus",
  "appId": "c83b0094-8c19-4d73-bd91-689d91ccfd8c",
  "endpointBasePath": "https://westus.api.cognitive.microsoft.com/luis/api/v2.0"
}

.luisrc.test
{
  "authoringKey": "",
  "region": "westus",
  "appId": "8650ff9f-6be5-4b28-87ad-a61053d8dbdc",
  "endpointBasePath": "https://westus.api.cognitive.microsoft.com/luis/api/v2.0"
}

.luisrc.prod
{
  "authoringKey": "",
  "region": "westus",
  "appId": "d920f225-6a80-4857-bb11-baf73416ec96",
  "endpointBasePath": "https://westus.api.cognitive.microsoft.com/luis/api/v2.0"
}

I can now run the following command to deploy the latest model data into the development LUIS app.

luis-version --model model.lu --luisrc .luisrc --publish

Getting app id c83b0094-8c19-4d73-bd91-689d91ccfd8c...
Calculating hash...
Hash 59c0db24 and tag 1810271806 generated
Checking if version 1810271806 exists...
Version 1810271806 doesn't exist. Continuing...
Running ludown.
Importing version 1810271806...
Version 1810271806 imported.
Training 1810271806...
Done training ...
Publishing...
All done.

Note that the script created a file called .luis-app-version that contains the latest hash/version name based on the model.lu content. In this case, the file contents match the output hash/version.


{
  "tag": "1810271806",
  "hash": "59c0db24"
}

If we look at our dev LUIS application, we will note that we have created a new version that is published into the production slot. We can easily deploy the same to test and prod using.

luis-version --model model.lu --luisrc.test .luisrc.test --publish
luis-version --model model.lu --luisrc.prod .luisrc.test --publish

If the version already exists, the version is simply retrained and published if the flag is passed. If the .luis-app-version file exists with the same hash, the old version name is used as shown by this output.

Getting app id c83b0094-8c19-4d73-bd91-689d91ccfd8c...
Calculating hash...
Hash 59c0db24 and tag 1810271809 generated
Found old version with hash 59c0db24. Using version 1810271806
Checking if version 1810271806 exists...
Version 1810271806 exists...
Version exists. Not updating...
Training 1810271806...
Done training ...
Publishing...
All done.

If we were now to modify the model.lu file and run the same command, the script will create a new version and publish.

Getting app id c83b0094-8c19-4d73-bd91-689d91ccfd8c...
Calculating hash...
Hash 5bc3e294 and tag 1810271809 generated
Checking if version 1810271809 exists...
Version 1810271809 doesn't exist. Continuing...
Running ludown.
Importing version 1810271809...
Version 1810271809 imported.
Training 1810271809...
Done training ...
Publishing...
All done.

Running luis list versions should result in three versions.


[
  {
    "version": "1.0",
    "createdDateTime": "2018-10-27T14:22:51.000Z",
    "lastModifiedDateTime": "2018-10-27T14:22:51.000Z",
    "lastTrainedDateTime": null,
    "lastPublishedDateTime": null,
    "endpointUrl": null,
    "assignedEndpointKey": null,
    "externalApiKeys": null,
    "intentsCount": 1,
    "entitiesCount": 0,
    "endpointHitsCount": 0,
    "trainingStatus": "NeedsTraining"
  },
  {
    "version": "1810271806",
    "createdDateTime": "2018-10-27T18:06:18.000Z",
    "lastModifiedDateTime": "2018-10-27T18:06:30.000Z",
    "lastTrainedDateTime": "2018-10-27T18:08:59.000Z",
    "lastPublishedDateTime": "2018-10-27T18:09:09.000Z",
    "endpointUrl": null,
    "assignedEndpointKey": null,
    "externalApiKeys": null,
    "intentsCount": 7,
    "entitiesCount": 3,
    "endpointHitsCount": 0,
    "trainingStatus": "Trained"
  },
  {
    "version": "1810271809",
    "createdDateTime": "2018-10-27T18:09:51.000Z",
    "lastModifiedDateTime": "2018-10-27T18:10:13.000Z",
    "lastTrainedDateTime": "2018-10-27T18:10:08.000Z",
    "lastPublishedDateTime": "2018-10-27T18:10:19.000Z",
    "endpointUrl": null,
    "assignedEndpointKey": null,
    "externalApiKeys": null,
    "intentsCount": 7,
    "entitiesCount": 3,
    "endpointHitsCount": 0,
    "trainingStatus": "Trained"
  }
]

Supporting Manual or Web App Editing

Not all users will be happy supporting editing the model using the ludown file. Some team members might still want to use the web app UI to iterate, test and make sure the model is working correctly. That is fine. The second tool in the luis-version-tools NPM package is luis-lu-export. This script downloads the latest model and writes is to the destination file of choice. For example, we can run the following command to get the latest online version.

luis-lu-export --luisrc .luisrc --model model.lu --version 1810271809

Any edits made online will be applied to model.lu. At this point, before checking into source control, we can run luis-version --luisrc .luisrc --model model.lu --publish to ensure the .luis-app-version file is re-generated based on the latest model content and a new version is created. At this point, we can check all changes into source control.

In my experience, this manual editing of the model using the web app should only be allowed in the development version of the model. Test, QA, Prod, Integration, and all other environments should be generated directly from a ludown file.

What’s Next?

The scripts are in a state where they can integrated into dev ops pipelines. Go ahead and submit feature requests/bug reports on GitHub. I’m very interested in how developers may end up using the tools and feedback on the approach. NPM package details here.

Posted by Szymon in LUIS Version Tools

Multi Language Chat Bot Suggested Architecture

Natural conversations, by their very nature, allow for the flexibility of switching language mid-conversation. In fact, for multi-lingual individuals such as my brothers and me, switching between various languages allows us to emphasize certain concepts without explicitly stating so. We generally speak in Polish (English if our wives are present), English to fill in words we don’t know in Polish and Spanish to provide emphasis or a callback to something that happened in our childhood growing up in Puerto Rico. Chat bots, in their current state without Artificial General Intelligence, does not allow for the nuance of language choice. However, given the state of language recognition and machine translation, we can implement a somewhat intelligent multilingual chat bot. In fact, I design and develop the code for an automated approach in my book. In this post, I outline the general automatic approach below. Afterwards, I highlight the downsides of this approach and list the different problems that need to be solved when creating a production quality multi language chat bot experience.

A Naive Approach

I call the fully automated approach naive. This is the type of approach most projects start off with. It’s somewhat easy to put in place and moves the project into the multi lingual realm quite quickly. It comes with its set of challenges. Before I dive into those, let’s review the approach. Assume we have a working English natural language model and English content, the bot can implement multi lingual conversations as follows.

  1. Receive user input…
  2. … in their native language.
  3. Detect the user input language and store in user’s preferences.
  4. If incoming message is not English, translate into English.
  5. Send English user utterance to NLU platform.
  6. Execute logic and render English output.
  7. If user’s language was not English, translate output into user’s native language.
  8. Send response back to user.

This approach works but the conversation quality is off. Although machine translation has improved by leaps and bounds, there are still cases in which the conversation feels stiff and culturally disconnected. There are three areas where this approach suffers.

  • Input utterance cultural nuances: utterance translation can sometimes feel awkward, especially for heavy slang or for highly proprietary language. NLU model performance suffer as a result.
  • Ambiguous language utterance affect conversation flow: a word like no or mama can easily turn conversation into another language. For example, in some language detection engines, the word no gets consistently classified as Spanish. If the bot were to ask a yes/no question, answering no will trigger a response in Spanish.
  • Output translation branding quality: although automatic machine translation is a good start, companies and brands that want fine tuned control over their bot’s output will cringe at the output generated by the machine translation service.

Moving to a Hybrid Managed Approach

I address each issue separately. The answer to these problems vary based on risk aversion, content quality and available resources. I highlight options for each item as we progress through the items.

Multi Language NLU

Ideally, I like my chat bot solutions to have an NLU model for each supported language. Obviously, the cost of creating and maintaining these models can be significant. For multi language solutions, I always ask for the highest priority languages that a client would like to support. If an enterprise can support 90% of employees by getting two languages working well, then we can limit the NLU scope to those two languages, while using the automatic approach for any other languages. In many of my projects, I use Microsoft’s LUIS. I might create one model for English and another one for Simplified Chinese. That way, Chinese users don’t suffer the nuanced translation tax. Project stakeholders also need to decide whether the chat bot should support an arbitrary amount of languages or limit the valid inputs to languages with an NLU model. If it does, the automatic approach above will be applied to non-natively supported languages.

Ambiguous Language Detection

The issue with ambiguous language detection is that short utterances may be valid utterances in multiple languages. Further complicating the matter is that the translation APIs such as Microsoft and Google’s do not return options and confidence levels. There are numerous approaches in terms of resolving the ambiguous language problem. Two possible approaches are (1) run a concatenation of the last N user utterances through the language recognition engine, or, (2) maintain a list of ambiguous words that we ignore for language detection and use the user’s last utterance language instead. Both are different flavors of simply considering the user’s language preference as a conversation level rather than message level property. If we are interested in supporting switching between languages mid conversation, a mix of both approaches works well.

Output Content Translation

Similarly to the Multi Language NLU piece, I encourage clients to maintain the precise localized content sent by the chat bot, especially for public consumer or regulated industry use cases where any mistranslated content might result in either pain for a brand or fines. This, again, is a risk versus effort calculation that needs to be performed by the right stakeholders. The necessity of controlling localized content and the effort involved in it typically weighs on whether the bot supports arbitrary languages or not.

Final Architecture

Based on all the above, here is what a true approach to a multi lingual chat bot experience would look like.

The bot in this case:

  1. Receives user input…
  2. … in their native language.
  3. Detects the user input language and store in user’s preferences. Language detection is based both on an API but also on utterance ambiguity rules.
  4. Depending on the detected language…
    1. If we have an NLU model for the detected language, the bot queries that NLU model.
    2. If not, assuming we want to support all languages, the bot translates the user’s messages into English and uses the English NLU model to resolve intent. Assuming we want to support a closed set of languages, the bot may response with a not recognized kind of message.
  5. Executes the chat bot logic and render localized output.
  6. If user’s language was not English and our bot support arbitrary languages, the bot automatically translates the output into user’s native language.
  7. Sends response back to user.

The managed models and paths to automatic translation add nuance to the automatic approach. If we imagine a spectrum in which on one end we find the fully automatic approach and on the other end the fully managed approach, all implementations fall somewhere within this spectrum. Clients in regulated industries and heavily branded scenarios will lean towards the fully managed end and clients with internal or less precise use cases will typically find the automatic approach more effective and economical.

The hybrid managed/automatic implementation does take some effort but results in the best conversational experience. Let me know your experience!

 

Posted by Szymon in Bots

Alexa Gadget Skill API: Let’s Make a Game

In this post of the Alexa Gadget Skill API series, we create a real game for Alexa and Echo Buttons. I figured that I would create a game for my 18 month old to play with. I figure that he can tell the difference between lit up and unlit buttons. With this in mind, I create a game I called Whack-a-button. The game will randomly light up anywhere from 1-2 buttons at a time. The object is to press the lit buttons. Every time you press the right button you get one point. If you press an unlit button, you lose a point.

In the first two parts of this series we explored input handlers and setting light animations on the gadgets. We will pick up the work done in those two posts to create the Whack-a-button game.

Setting the Scene

It took me some time to build the code for this post. Getting something basic up and going was simple. Even now, I feel there are some rough edges to this. In particular, I had some problems with the input event handlers and the data I was receiving from them. This is due to one of two factors: either there is a bug in the simulator version of the input handlers or my input handler JSON is buggy. I’ll give more details when we get to that part.

My goal with this post was to create a framework in which the user can ask to play a specific game, but there is one skill that support all of these games. I assumed that we would have a game object in the session attributes that would store the type of game and the current state. Each game would have a different state representation. When a game is started, all input calls are delegated to that game’s code. The code decides whether what is to be done given an internal state and it decides when the game ends. At that point, we can kick it back to some menu for deciding the game. For the purpose of this post, I have only implemented one game but the pattern works for multiple games. Here’s a classy Visio diagram of the overall approach.

  1. The game starts with the Launch state.
  2. In this state we ask the user if they would like to play a game or ask them which game they want to play if we support more than one game.
  3. The user responds with a game selection.
  4. The internal state of our skill moves into the in game state…
  5. And we initialize the game. In this case, it is just Whack-a-button. The diagram illustrates the interface we expect each game to implement. We call the interface IGameTurn, because we create a new instance at each user input.
  6. The game delegates to theRollCall functionality first, as we need to make sure that the buttons are correctly identified before the game starts.
  7. The RollCall, sends its input handler…
  8. The user pressed the buttons and RollCall finished…
  9. And passes control back into the game by using the resumeAfterRollCall() call.
  10. The game initializes itself and sends the first input handler to the user. In our sample code, this will be a confirmation to press any button to get started.
  11. At this point, any input event should be delegated over to the game handle() method. We also assume that any AMAZON.HelpIntent or AMAZON.CancelIntent will be handled by the game’s help() or cancel() methods.
  12. The game responds to incoming events as long as it lasts.
  13. The game transitions to a PostGameState in which the user can restart the game or ask for their score.
  14. The user can exit the skill or restart the game.

Show Us The Code!

There is a lot of new code in this post, and I’m going to do my best to walk through it. As always, feel free to skip ahead and jump into the Github repo yourself.

At the center of everything is the IGameTurn interface. Each game must implement this functionality.


export interface IGameTurn {
    initialize(): Response;
    handle(): Response;
    help(): Response;
    cancel(): Response;
    postGameSummary(): Response;
    resumeAfterRollCall(): Response;
}

When the game is first created, we call initialize(). Initialize should invoke the RollCall functionality. Once RollCall is done, the resumeAfterRollCall() call is made. We begin in the InLaunchStateHandler. If the user responds with AMAZON.YesIntent to playing the game, we call:


if (req.intent.name === "AMAZON.YesIntent") {
    const game = new WhackabuttonGame(handlerInput);
    return game.initialize();
}

initialize() is defined as:


public initialize(): Response {
    const game = new GameState();
    game.currentGame = GameType.WhackaButton;
    game.data = new WhackState();

    GameHelpers.setState(this.handlerInput, game);
    return RollCall.initialize(this.handlerInput, WHACKABUTTON_NUM_OF_BUTTONS);
}

GameState is defined as follows. Note that for each method, it resolves the right IGameTurn instance based on the selected game type.


export class GameState {
    public currentGame: GameType;
    public data: any;

    public static deleteState(handlerInput: HandlerInput): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        delete sessionAttr.game;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static setInLaunchState(handlerInput: HandlerInput, val: boolean): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inLaunch = val;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static setInPostGame(handlerInput: HandlerInput, val: boolean): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inPostGame = val;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static getGameState(handlerInput: HandlerInput): GameState {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const game = sessionAttr.game;
        return new GameState(game);
    }

    constructor(obj?: GameState) {
        this.currentGame = GameType.None;
        if (obj) {
            this.currentGame = obj.currentGame;
            this.data = obj.data;
        }
    }

    public reinit(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.initialize();
    }

    public resumeGameFromRollcall(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.resumeAfterRollCall();
    }

    public cancel(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.cancel();
    }

    public help(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.help();
    }

    public handleInput(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.handle();
    }

    private resolveGameTurn(handlerInput: HandlerInput): IGameTurn {
        switch (this.currentGame) {
            case GameType.WhackaButton:
                return new WhackabuttonGame(handlerInput);
            default:
                throw new Error("Unsupported game type.");
        }
    }

}

export enum GameType {
    None,
    WhackaButton
}

At this point, RollCall takes over. Any request from the user hits the RollCallHandler. We change the RollCall‘s handleDone() method to the following:


const gameState = GameState.getGameState(handlerInput);
handlerInput.responseBuilder
    .addDirective(blackOutUnusedButtons)
    .addDirective(lightUpSelectedButtons);
return gameState.resumeGameFromRollcall(handlerInput);

For the Whack-a-button game, the resumeAfterRollCall() method looks as follows:


public resumeAfterRollCall(): Response {
    const gameState = GameHelpers.getState(this.handlerInput, new WhackState());
    const whackState = gameState.data;
    whackState.waitingOnConfirmation = true;
    whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);
    GameHelpers.setState(this.handlerInput, gameState);

    const confirmationInputHandler = this.generateConfirmationHandler(GameHelpers.getAvailableButtons(this.handlerInput));

    const resp = LocalizedStrings.whack_start();
    this.handlerInput.responseBuilder
        .speak(resp.speech)
        .reprompt(resp.reprompt)
        .addDirective(confirmationInputHandler);
    return this.handlerInput.responseBuilder.getResponse();
}

We initialize a new game state, set some Whack-a-button specific state and ask the user to confirm when they are ready to start. The confirmation occurs by having the user press any of the selected buttons. That is the input handler that the this.generateConfirmationHandler(...) method generates.

At this point, control will flow into the InGameHandler. If there is a game object set and we receive either an InputHandlerEvent, a AMAZON.StopIntent, AMAZON.CancelIntent or AMAZON.HelpInent, we delegate the action to the current game. Here is the code for the handler.


export class InGameHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const result = !sessionAttr.inPostGame &&
            !sessionAttr.inRollcall &&
            sessionAttr.game &&
            (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent"
                || handlerInput.requestEnvelope.request.type === "IntentRequest");

        console.log(`InGameHandler: ${result}`);
        return result;
    }

    handle(handlerInput: HandlerInput): Response {
        console.log("executing in game state handler");
        const gameState = GameState.getGameState(handlerInput);
        if (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent") {
            return gameState.handleInput(handlerInput);
        } else if (handlerInput.requestEnvelope.request.type === "IntentRequest") {
            const intent = handlerInput.requestEnvelope.request.intent;

            if (intent.name === "AMAZON.CancelIntent" || intent.name === "AMAZON.StopIntent") {
                return gameState.cancel(handlerInput);
            } else if (intent.name === "AMAZON.HelpIntent") {
                return gameState.help(handlerInput);
            } else if (intent.name === "AMAZON.StopIntent") {
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .withShouldEndSession(true)
                    .getResponse();
            } else {
                // empty response for anything else  that comes in during game play
                return handlerInput.responseBuilder.getResponse();
            }
        }
        throw new Error("Unexpected event type. Not supported in roll call.");

    }
}

Now For the Real Stuff

When an event comes in, it’ll be an indication for our game to begin.

  1. If we game has been going for longer than GAME_DURATION_SECONDS, we finish by responding with the user’s score.
  2. We begin a turn by randomly select some buttons that we want the user to select.
  3. We set buttons the user should press to a random color, not black.
  4. Buttons the user should not press have their color set to black.
  5. We generate a new input handler with a timeout between MIN_TIME_TO_PRESS and MAX_TIME_TO_PRESS.
  6. If the user presses a black button, we deduct the score and indicate they did something wrong.
  7. If the user presses a button she was supposed to press, we increase the score. If there are buttons left, we wait for those buttons to be pressed, otherwise we go back to step 1 for a new turn.

Selecting a random set of buttons and preparing the input handlers looks as follows:


// we select buttons randomly for the next turn
const shuffle = Utilities.shuffle(btns.slice(0));
const num = Utilities.randInt(1, shuffle.length);
console.log(`generating input handler with ${num} buttons.`);

const buttonsInPlay = shuffle.slice(0, num);
const buttonsNotInPlay = btns.filter(p => !buttonsInPlay.some(p1 => p1 === p));
console.log(`${buttonsInPlay.length} buttons in play for next turn: ${JSON.stringify(buttonsInPlay)}. ` +
        `Not in play: ${JSON.stringify(buttonsNotInPlay)}`);

// assign a random time duration to the turn, but make sure we don't go past the max game duration
const timeTilEnd = whackState.timeInMsUntilEnd();
console.log(`${timeTilEnd}ms left until end`);
const turnDuration = Math.min(Utilities.randInt(MIN_TIME_TO_PRESS, MAX_TIME_TO_PRESS), timeTilEnd);
whackState.expectedEvents = buttonsInPlay;
whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);
whackState.lastHandlerStartTime = moment().utc().format(Utilities.DT_FORMAT);
whackState.lastHandlerLength = turnDuration;

// generate the input handler
const startHandler = this.generateInputHandlerTemplate(btns, turnDuration);

// turn off buttons not assigned to this turn and turn on buttons assigned to the turn
const turnOffEverything = SetLightDirectiveBuilder.setLight(
    SkillAnimations.rollCallFinishedUnused(), buttonsNotInPlay.map(p => p.gadgetId));
const setLight = SetLightDirectiveBuilder.setLight(
    SkillAnimations.lightUpWhackaButton(turnDuration), buttonsInPlay.map(p => p.gadgetId));

 

I struggled with the right way to model the input handlers and the complexity of the code probably increased as a function of this; I blame myself and not fully understanding the rules of how Alexa reports events. My first approach was to create one input handler for the entirety of the game, but this would not work well with the MAX_TIME_TO_PRESS concept; I want there to be a time pressure involved. I could also not use the input handler’s shouldEndInputHandler functionality; if the current turn requires more than one button to be pressed, the same handler should be able to generate the two events. If I had one handler that looked for button down events anchored to anywhere and reported the matches, the reported match would always be the first match. Why does this matter? Well, I want to see the latest event and its timestamp so I can make sure I verify if I handled that. If I used the input handler below, any time I pressed the button once I would receive two calls into my endpoint and the timestamp on the input event would be the same. Here is the input handler directive (gadgetId set to something easier to read).


{
    "type": "GameEngine.StartInputHandler",
    "proxies": [],
    "recognizers": {
        "btn1": {
            "type": "match",
            "anchor": "anywhere",
            "fuzzy": false,
            "pattern": [
                {
                    "action": "down",
                    "gadgetIds": [
                        "A"
                    ]
                }
            ]
        },
        "btn2": {
            "type": "match",
            "anchor": "anywhere",
            "fuzzy": false,
            "pattern": [
                {
                    "action": "down",
                    "gadgetIds": [
                        "B"
                    ]
                }
            ]
        }
    },
    "events": {
        "failed": {
            "meets": [
                "timed out"
            ],
            "reports": "history",
            "shouldEndInputHandler": true
        },
        "btn1": {
            "shouldEndInputHandler": false,
            "meets": [
                "btn1"
            ],
            "reports": "matches"
        },
        "btn2": {
            "shouldEndInputHandler": false,
            "meets": [
                "btn2"
            ],
            "reports": "matches"
        }
    },
    "timeout": 7708
}

And the two requests sent to my skill.


{
    "type": "GameEngine.InputHandlerEvent",
    "requestId": "amzn1.echo-api.request.ee60ad56-56a0-4b73-b4f5-48a7bee715b7",
    "timestamp": "2018-10-11T15:39:52Z",
    "locale": "en-US",
    "originatingRequestId": "amzn1.echo-api.request.a0b25097-030e-465c-9454-0c0e1caa0386",
    "events": [
        {
            "name": "btn1",
            "inputEvents": [
                {
                    "gadgetId": "A",
                    "timestamp": "2018-10-11T15:39:52.324Z",
                    "color": "000000",
                    "feature": "press",
                    "action": "down"
                }
            ]
        }
    ]
}

{
    "type": "GameEngine.InputHandlerEvent",
    "requestId": "amzn1.echo-api.request.8c4ab5ab-8580-4c7b-994e-598c35e192c5",
    "timestamp": "2018-10-11T15:39:52Z",
    "locale": "en-US",
    "originatingRequestId": "amzn1.echo-api.request.a0b25097-030e-465c-9454-0c0e1caa0386",
    "events": [
        {
            "name": "btn1",
            "inputEvents": [
                {
                    "gadgetId": "A",
                    "timestamp": "2018-10-11T15:39:52.324Z",
                    "color": "000000",
                    "feature": "press",
                    "action": "down"
                }
            ]
        }
    ]
}

 

Note everything is the same EXCEPT the originatingRequestId. So it sounds like I need to start tracking the timestamp of the latest event. It is not enough to use the request’s timestamp, since it doesn’t provide millisecond resolution. One could easily generate two real buttons presses within a second of each other. So… I decided I’ll track the latest input event timestamp and will only consider event if their input event is after my latest event timestamp. BUT, I also need to send a new input handler directive anytime an event comes in, because of then fact that matches reports the first input event only.

Ok enough cryptic text. Let’s see the code. Here is the code that selects the relevant events and the latest timestamp.


export function getEventsAndMaxTimeSince(
    events: services.gameEngine.InputHandlerEvent[],
    lastEvent: moment.Moment,
    timeoutEventName: string)
    : { maxTime: moment.Moment, events: Array } {
    if (events.some(p => p.name! === timeoutEventName)) {
        return { maxTime: moment.utc(lastEvent), events: [timeoutEventName] };
    }
    const mapped = events
        .map(p => {
            const temp = p.inputEvents!.map(p1 => moment(p1.timestamp!).utc().valueOf());
            const max = moment.utc(Math.max.apply({}, temp));
            const diff = max.diff(lastEvent, "ms");
            console.log(`temp: ${JSON.stringify(temp)}`);
            console.log(`max: ${max.format(Utilities.DT_FORMAT)}`);
            return { max: max.valueOf(), maxMoment: max, diff: diff, name: p.name! };
        });

    console.log(`Mapping events last update${lastEvent.format(Utilities.DT_FORMAT)}: \n${JSON.stringify(mapped, null, 2)}`);
    const filtered = mapped.filter(p => p.diff > 0);
    let globalMax = Math.max.apply({}, filtered.map(p => p.max));
    if (!globalMax || isNaN(globalMax) || !isFinite(globalMax)) {
        console.log(`setting global max to ${lastEvent.valueOf()}`);
        globalMax = lastEvent.valueOf();
    }
    const resultGlobalMax = moment.utc(globalMax);
    console.log(`GLOBAL MAX ${resultGlobalMax.format(Utilities.DT_FORMAT)}`);

    const array = filtered.map(p => p.name);
    const result = { maxTime: resultGlobalMax, events: array };
    console.log(`returning result\n${JSON.stringify(result)}`);
    return result;
}

We get the constituent input event timestamps, select the maximum value, select the events whose maximum value is after the current latest value and then return those event names and the new maximum timestamp. If the event is a timeout event, we simply return as we have to generate a new turn.

Once we have the relevant events handy, we increase the score if we get an expected event, otherwise we increase the bad count.


private processRelevantEvents(relevantEvents: string[], whackState: WhackState): { good: string[], bad: string[] } {
    console.log(`received events ${JSON.stringify(relevantEvents)}`);
    const result: { good: string[], bad: string[] } = {
        good: [],
        bad: []
    };

    relevantEvents.forEach(evName => {
        // check if we are expecting this event
        const index = whackState.expectedEvents.findIndex(val => val.name === evName);
        if (index > -1) {
            // if we are, great. increase score and remove event from expected list.
            console.log(`increasing good`);
            result.good.push(whackState.expectedEvents[index].gadgetId);
            whackState.good++;
            whackState.expectedEvents.splice(index, 1);
        } else {
            // otherwise, increase bad count.
            console.log(`increasing bad.`);
            console.log(`still expecting number of buttons ${whackState.expectedEvents.length}`);
            result.bad.push(evName);
            whackState.bad++;
        }
    });

    return result;
}

If a the user has any buttons left, we simply turn off any good buttons that were pressed, and we add a voice response if there were any bad buttons pressed.


let rb = this.handlerInput.responseBuilder;
if (hasBad) {
    rb.speak(LocalizedStrings.whack_bad_answer().speech);
}

// need to turn off all good pressed buttons
if (goodPressedButtons.length > 0) {
    rb = rb.addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallFinishedUnused(), goodPressedButtons));
}
return rb.getResponse();

Deeper and Deeper

Another effect of the issue with the input handler we presented below, is that the code above needs to generate a new input handler. The entire method looks as follows:


private buttonsOutstanding(
    whackState: WhackState,
    hasBad: boolean,
    goodPressedButtons: string[],
    btns: GameButton[]): Response
{
    console.log(`responding with acknowledgment and new handler; more buttons remaining`);

    const now = moment.utc();
    const turnDuration = whackState.lastHandlerLength - (now.diff(whackState.lastHandlerStartTime, "ms"));
    whackState.lastHandlerStartTime = now.format(Utilities.DT_FORMAT);
    whackState.lastHandlerLength = turnDuration;
    whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);

    const startHandler = this.generateInputHandlerTemplate(btns, turnDuration);
    let rb = this.handlerInput.responseBuilder.addDirective(startHandler);
    if (hasBad) {
        rb.speak(LocalizedStrings.whack_bad_answer().speech);
    }

    // need to turn off all good pressed buttons
    if (goodPressedButtons.length > 0) {
        rb.addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallFinishedUnused(), goodPressedButtons));
    }
    return rb.getResponse();
}

Amazon recommends that the skill ensures that the input event requests are coming from the right originatingRequestId, since requests might come in late. The code that does this utilizes the lastHandlerIds property on the WhackState. The reason we use a list instead of one value is that if we press button 1 and button 2 one right after the other, the handler for button 1 would send a new input handler and reset the lastHandlerId, rendering the event from button 2 as junk. So we store the last two handlerIds


let ev = inputHandlerEvent;
if (!whackState.lastHandlerIds.some(p => p === ev.originatingRequestId)) {
    console.warn(`SKIPPING MESSAGE.\nLAST HANDLER IDs: \n${JSON.stringify(whackState.lastHandlerIds, null, 2)}`
        + `\nORIGINATING REQUEST ID: ${ev.originatingRequestId}`);
    return this.handlerInput.responseBuilder.getResponse();
}

For completeness, this is what the WhackState type looks like.


class WhackState {
    public startTime: string | undefined;
    public good: number = 0;
    public bad: number = 0;
    public turn: number = 0;
    public waitingOnConfirmation: boolean = false;

    public expectedEvents: Array = [];
    public lastEventTime: string | undefined;
    public lastHandlerIds: Array = [];
    public lastHandlerStartTime: string | undefined;
    public lastHandlerLength: number = 0;

    public initGame(): void {
        console.log(`initializing game. start time ${moment.utc(this.startTime).format(Utilities.DT_FORMAT)}`);

        this.waitingOnConfirmation = false;
        this.expectedEvents = [];
        this.bad = 0;
        this.good = 0;
        this.startTime = moment.utc().format(Utilities.DT_FORMAT);
        this.lastEventTime = this.startTime;
    }

    public pushAndTrimHandler(reqId: string): void {
        this.lastHandlerIds.push(reqId);
        while (this.lastHandlerIds.length > WHACKABUTTON_NUM_OF_BUTTONS + 2) {
            this.lastHandlerIds.shift();
        }
    }

    public timeInMsUntilEnd(): number {
        const now = moment.utc();
        const start = moment.utc(this.startTime);
        const end = start.add(GAME_DURATION_SECONDS, "s");
        const diff = end.diff(now, "ms");
        return diff;
    }


    public timeSinceStarted(): number {
        const now = moment.utc();
        const start = moment.utc(this.startTime);
        const diff = now.diff(start, "s");
        console.log(`it has been ${diff} seconds since the game started.`);
        return diff;
    }

}

Wrapping The Game Up

What happens when the game is done? We check for the time elapsed anytime user input or a time out request comes in. If the game has lasted long enough, we send the result, transition to the InLaunchStateHandler and ask the user if they want to play again.


private finish(handlerInput: HandlerInput, finish: boolean): Response {
    const whackState = GameHelpers.getState(handlerInput, new WhackState()).data;
    GameState.setInPostGame(handlerInput, true);

    let resp = LocalizedStrings.whack_summary({
        score: whackState.good - whackState.bad,
        good: whackState.good,
        bad: whackState.bad
    });
    if (finish) {
        resp = LocalizedStrings.whack_finish({
            score: whackState.good - whackState.bad,
            good: whackState.good,
            bad: whackState.bad
        });
    }

    const turnOffEverything = SetLightDirectiveBuilder.setLight(
        SkillAnimations.rollCallFinishedUnused());

    return handlerInput.responseBuilder
        .speak(resp.speech)
        .reprompt(resp.reprompt)
        .addDirective(turnOffEverything)
        .getResponse();
}

At this point the user can either restart the game, ask for their score (added a ScoreIntent to support this) or exit out. The PostGameStateHandler is implements this logic.


export class PostGameStateHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const issupportedintent = handlerInput.requestEnvelope.request.type === "IntentRequest"
            && ["AMAZON.YesIntent",
                "AMAZON.NoIntent",
                "StartGameIntent",
                "ScoreIntent"]
                .some(p => p === (handlerInput.requestEnvelope.request).intent.name);
        return sessionAttr.inPostGame && issupportedintent;
    }

    handle(handlerInput: HandlerInput): Response {
        console.log("executing in post game state handler");

        if (handlerInput.requestEnvelope.request.type === "IntentRequest") {
            const req = handlerInput.requestEnvelope.request as IntentRequest;
            if (req.intent.name === "AMAZON.YesIntent" || req.intent.name === "StartGameIntent") {
                GameState.deleteState(handlerInput);
                const game = new WhackabuttonGame(handlerInput);
                GameState.setInPostGame(handlerInput, false);
                return game.initialize();
            } else if (req.intent.name === "AMAZON.NoIntent") {
                GameState.deleteState(handlerInput);
                GameState.setInPostGame(handlerInput, false);
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .getResponse();
            } else if(req.intent.name === "ScoreIntent") {
                return new WhackabuttonGame(handlerInput).postGameSummary();
            }
        }

        const donotresp = LocalizedStrings.donotunderstand();
        return handlerInput.responseBuilder
            .speak(donotresp.speech)
            .reprompt(donotresp.reprompt)
            .getResponse();
    }
}

How Did It Go?

Building this was a lot of fun but the development process was much more complicated than I expected. The number of events and semantics of the requests that Alexa sends are rather confusing, so there is a bit of a learning curve. The simulator isn’t great at helping debug some of this as the timeout and button presses do not show the JSON inside the simulator, so trying to figure out bugs was an exercise in diving into CloudWatch and figuring it out. I’ve seen inconsistent animation behavior; sometimes my animations wouldn’t play at all in the buttons. Sometimes although the input events seems to show up in the simulator, they never flow into the skill, either from the simulator or the real buttons. It would have helped to have unit tests but… you know how it goes when playing with a new tech.

As an exploratory exercise this was fairly successful. Let’s see how Teddy enjoyed the game.

As always, you can find the code in the Github repo. Enjoy!

Posted by Szymon in Alexa

Alexa Gadget Skill API: Using the SetLight Directive

In a previous post, we began developing an Alexa Gadget Skill and set up a simple roll call dialog. One thing that our sample could really benefit from is providing visual feedback to the user when the Echo Buttons are pressed or selected during the roll call process. The Alexa Gadget Skills API gives us control over each gadget’s light. In this post we dive into and have our skill take advantage of this functionality.

Exploring the SetLight Directive

So far, we have been working with the GameEngineInterface. This interface allows us to set input handlers on gadgets and to receive users’ gadget input events. The interface that lets us control the device itself is the GadgetControllerInterface. This interface contains one directive called SetLight and one request called System.ExceptionEncountered. The System.ExceptionEncountered request is sent to our skill when the SetLight directive has failed for whatever reason. In this post, we focus look at the SetLight directive.

The SetLight directive allows developers to set animations on the devices discovered during the roll call process. The following is an example of the directive:


 {
   "type": "GadgetController.SetLight",
   "version": 1,
   "targetGadgets": [ "gadgetId1", "gadgetId2" ],
   "parameters": {
      "triggerEvent": "none",
      "triggerEventTimeMs": 0,
      "animations": [ 
        {
          "repeat": 1,
          "targetLights": ["1"],
          "sequence": [ 
           {
              "durationMs": 10000,
              "blend": false,
              "color": "0000FF"
           }
          ] 
        }
      ]
    }
 }

The directive can act on either a specified collection of gadgets or, if the targetGadgets array remains empty, all paired gadgets. The parameters field contains the details of what event triggers the animation and the animation’s definition. The triggerEvent can be a button up, down or none, in which case the animation begins playing immediately. triggerEventTimeMs is a delay in milliseconds after the event occurs before the animation begins. The animations object includes instructions on how many times to repeat a sequence (repeat field), which lights on the gadget animation is for (targetLights field) and a list of step by step instructions on how to execute the animation (sequence field). Each sequence step has a duration in milliseconds, a color in HEX without the # character and a blend flag indicating whether the device should interpolate the color from its current state to the step color. A few additional items to note.

  • The targetLights array is simply [ "1" ] because the Echo Buttons have one light. Future gadgets might have more. This field will provide fine tuned control over each light when those gadgets come out.
  • The number of sequence steps allowed is limited by the length of the targetGadgets array. The formula for the limit is: 38 - targetGadgets.length * 3. Of course, that might be subject to change, so please consult the official docs.
  • Each Echo Button can have one animation set per trigger. Any directive that sends a different animation for a trigger will overwrite whatever animation was set before.

Let us now turn our attention back to the roll call code. We would like to do the following:

  1. Set all lights to an animation when the skill launches.
  2. Set all lights to some color and fade out when the roll call initializes.
  3. Set a light to a solid color if a button has been selected.
  4. Once roll call is finished, set the buttons that are not used to black and set the used buttons to some animation indicating that they are in the game.

We first create something to help us build the animations JSON. As it turns out, the trivia game sample has a really cool animations helper that we can use. I went ahead and translated it to TypeScript. The helper now has two modules: BasicAnimations and ComplexAnimations. BasicAnimations contains many functions such as setting a static color (SolidAnimation), fade in/out (FadeInAnimation/FadeOutAnimation) or alternating between a color and black (BlinkAnimation). ComplexAnimations contains two functions, one of which is SpectrumAnimation, an animation that takes the light through any number of color transitions. The code for all of these is fairly easy to follow. Here is what two of them look like.


export module BasicAnimations {
    export function SolidAnimation(cycles: number, color: string, duration: number): Array {
        return [
            {
                "repeat": cycles,
                "targetLights": ["1"],
                "sequence": [
                    {
                        "durationMs": duration,
                        "blend": false,
                        "color": ColorHelper.validateColor(color)
                    }
                ]
            }
        ];
    }
    ...
    export function BlinkAnimation(cycles: number, color: string): Array {
        return [
            {
                "repeat": cycles,
                "targetLights": ["1"],
                "sequence": [
                    {
                        "durationMs": 500,
                        "blend": false,
                        "color": ColorHelper.validateColor(color)
                    }, {
                        "durationMs": 500,
                        "blend": false,
                        "color": "000000"
                    }
                ]
            }
        ];
    }
    ...
}

export module ComplexAnimations {
    export function SpectrumAnimation(cycles: number, color: string[]): Array {
        let colorSequence = [];
        for (let i = 0; i < color.length; i++) {

            colorSequence.push({
                "durationMs": 400,
                "color": ColorHelper.validateColor(color[i]),
                "blend": true
            });
        }
        return [
            {
                "repeat": cycles,
                "targetLights": ["1"],
                "sequence": colorSequence
            }
        ];
    }
    ...
}

Here is the generated JSON:


// Solid Animation
[
  {
    "repeat": 1,
    "targetLights": [
      "1"
    ],
    "sequence": [
      {
        "durationMs": 2000,
        "blend": false,
        "color": "ff0000"
      }
    ]
  }
]
// Fade In Animation
[
  {
    "repeat": 4,
    "targetLights": [
      "1"
    ],
    "sequence": [
      {
        "durationMs": 1,
        "blend": true,
        "color": "000000"
      },
      {
        "durationMs": 1000,
        "blend": true,
        "color": "ffd400"
      }
    ]
  }
]
// Spectrum Animation
[
  {
    "repeat": 3,
    "targetLights": [
      "1"
    ],
    "sequence": [
      {
        "durationMs": 400,
        "color": "ff0000",
        "blend": true
      },
      {
        "durationMs": 400,
        "color": "0000ff",
        "blend": true
      },
      {
        "durationMs": 400,
        "color": "00ff00",
        "blend": true
      },
      {
        "durationMs": 400,
        "color": "ffffff",
        "blend": true
      }
    ]
  }
]

As a next step, we ensure that our skill has a set of reusable animations. We create a module called SkillAnimations for this purpose. We create one method per distinct animation that we want to send to our users. The module looks like this:


export module SkillAnimations {
    ...
    export function rollCallInitialized(): Array {
        return BasicAnimations.CrossFadeAnimation(1, "yellow", "black", 5000, 15000);
    }

    export function rollCallButtonSelected(): Array {
        return BasicAnimations.SolidAnimation(1, "orange", 300);
    }
    ...
}

Lastly, we create a SetLightDirectiveBuilder module to build the SetLight directive instances. The goal of this code is to generate the directives given an animation, an optional array of targetGadgetIds, the triggering event and the delay.


export module SetLightDirectiveBuilder {
    ...
    function setLightImpl(animations: Array,
        on: services.gadgetController.TriggerEventType,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        const result: interfaces.gadgetController.SetLightDirective = {
            type: "GadgetController.SetLight",
            version: 1,
            targetGadgets: targetGadgets,
            parameters: {
                triggerEvent: on,
                triggerEventTimeMs: delayInMs,
                animations: animations
            }
        };
        return result;
    }
}

We create three helpers so we do not have to pass the TiggerEventType parameter. A minor convenience.


export module SetLightDirectiveBuilder {
    export function setLight(animations: Array,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        return setLightImpl(animations, "none", targetGadgets, delayInMs);
    }

    export function setLightOnButtonDown(animations: Array,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        return setLightImpl(animations, "buttonDown", targetGadgets, delayInMs);
    }

    export function setLightOnButtonUp(animations: Array,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        return setLightImpl(animations, "buttonUp", targetGadgets, delayInMs);
    }
    ...
}

Now, we can call the code:


    SetLightDirectiveBuilder.setLightOnButtonDown(
        BasicAnimations.FadeInAnimation(1, "yellow", 500), ["left", "right"])

and receive the following JSON to send back as part of our response from a skill.


{
  "type": "GadgetController.SetLight",
  "version": 1,
  "targetGadgets": [
    "left",
    "right"
  ],
  "parameters": {
    "triggerEvent": "buttonDown",
    "animations": [
      {
        "repeat": 1,
        "targetLights": [
          "1"
        ],
        "sequence": [
          {
            "durationMs": 1,
            "blend": true,
            "color": "000000"
          },
          {
            "durationMs": 500,
            "blend": true,
            "color": "ffd400"
          }
        ]
      }
    ]
  }
}

Excellent! Let us integrate these new features into our skill to see the lights in action!

Fire It Up

We’ve actually done most of the work we needed to do already. The Roll Call code we had created in the previous part of this series is in a state where we can easily add the SetLight directive. The LaunchHandler changes only to add directives. Note that not only do we add the skillLaunch animation, which cycles through white, purple, and yellow, for 5 cycles. We also add the default button up and down animations. We do this so that whenever a user presses any button, we provide some sort of color feedback.


return handlerInput.responseBuilder
    .speak(resp.speech)
    .reprompt(resp.reprompt)
    .addDirective(SetLightDirectiveBuilder.setLightOnButtonDown(SkillAnimations.buttonDown()))
    .addDirective(SetLightDirectiveBuilder.setLightOnButtonDown(SkillAnimations.buttonUp()))
    .addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.skillLaunch()))
    .getResponse();

The other interesting change is specific to feedback when a button is selected during the roll call. Recall, when a user pressed the first button in the roll call, our skill acknowledges this via a voice response and asks the user to press the second button. We would like to have the light perform an animation at this point; a good visual cue for the user to know which buttons are selected and which are not.

We modify the handleButtonCheckin function in the RollCall module to send the directive for each gadgetId that was selected in the current request. We also do the math to ensure that the button count is reflected correctly if the skill received multiple button inputs simultaneously. I’m not certain this can actually occur, but since inputEvents is an array… better safe than sorry.


export module RollCall {
    ...
    export function handleButtonCheckin(handlerInput: HandlerInput, inputEvents: Array): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();


        const directives = inputEvents.map(ev => {
            const gadgetIds = getGadgetIds(ev);
            return SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallButtonSelected(), gadgetIds);
        });

        sessionAttr.rollcallButtonsCheckedIn += directives.length;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
        const resp = LocalizedStrings.rollcall_checkin(numOfButtons - sessionAttr.rollcallButtonsCheckedIn);

        let temp = handlerInput.responseBuilder.speak(resp.speech);
        directives.forEach(p => temp.addDirective(p));
        return temp.getResponse();
    }

    function getGadgetIds(ev: services.gameEngine.InputHandlerEvent): string[] {
        const btns = ev!.inputEvents!.map(p => { return p.gadgetId!; });
        return btns;
    }
    ...
}

Beyond that, it’s smooth sailing. You can deploy this code into a skill by using ask deploy. Here is a short video of the current code working on my desk.

Code can be found in the Github repo.

Posted by Szymon in Alexa

Broken Alexa Interaction Model Slot Extraction When Using Dialog Management

I’m probably not the first one to notice this but it’s worth documenting for posterity. It’s also one of many minor inconsistencies in Alexa behavior that I hope will be fixed in the near future.

On one of my projects, we are integrating our Microsoft Bot Framework bot with an Alexa Interaction Model that includes Dialog Management. It’s a very interesting effort and allows us to get us intimately familiar with how the Alexa Interaction Model works, versus Microsoft’s LUIS, a system that we have much more experience with.

Let me set the stage. We want to be able to train an Intent that requires multiple custom slot types. For the sake of the example, let’s say we need three slots: a number, a custom slot comprised of two values (buy/sell) and a custom free text slot indicating the name of a product the user is buying. This last slot is the problem area I’d like to focus on. This slot is fairly free form and the set of values may change any time, so we cannot simply populate all the possible values into the slot type definition. We have no problems with Alexa detecting an unseen slot value based on what we can assume is a common machine learning-based model. However, once we enable Dialog Management on the intent, the slot is no longer recognized!

Jumping In

Let’s say we want to create an intent called RegisterNeedIntent. This lets the user inform our skill when she wants to sell or purchase an amount of some product. For example, the user may want to buy thirty two pounds of butter or sell thirty two tulips. We will create three slot types: a custom slot type for the buy/sell distinction, the number slot to capture the amount of the product and a custom slot type representing the product in question.

The interaction model looks something like this:


{
  "interactionModel": {
      "languageModel": {
          "invocationName": "product catalog",
          "intents": [
              {
                  "name": "AMAZON.CancelIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.HelpIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.StopIntent",
                  "samples": []
              },
              {
                  "name": "RegisterNeedIntent",
                  "slots": [
                      {
                          "name": "Amount",
                          "type": "AMAZON.NUMBER"
                      },
                      {
                          "name": "Product",
                          "type": "Product"
                      },
                      {
                          "name": "Action",
                          "type": "Action"
                      }
                  ],
                  "samples": [
                      "{Action} {Product}",
                      "{Action} {Amount} pounds of {Product}",
                      "{Action} {Amount} {Product}",
                      "{Action} {Amount} packages of {Product}"
                  ]
              }
          ],
          "types": [
              {
                  "name": "Action",
                  "values": [
                      {
                          "name": {
                              "value": "sell"
                          }
                      },
                      {
                          "name": {
                              "value": "buy"
                          }
                      }
                  ]
              },
              {
                  "name": "Product",
                  "values": [
                      {
                          "name": {
                              "value": "flying squirrel"
                          }
                      },
                      {
                          "name": {
                              "value": "hammer"
                          }
                      }
                  ]
              }
          ]
      }
  }
}

Simple enough. We considered using the AMAZON.SearchQuery slot type but it will not work because it cannot be combined with other slot types. When we build this model, the user can enter an utterance like buy twenty boxes of notebooks. Even with this small of amount of sample utterances, Alexa recognizes that the word notebooks is a Product slot. See the intent object passed into the skill below.


{
  "name": "RegisterNeedIntent",
  "confirmationStatus": "NONE",
  "slots": {
    "Product": {
      "name": "Product",
      "value": "notebooks",
      "resolutions": {
        "resolutionsPerAuthority": [
          {
            "authority": "amzn1.er-authority.echo-sdk.amzn1.ask.skill.a20631d9-6a23-413b-8bdc-68ea3c702a10.Product",
            "status": {
              "code": "ER_SUCCESS_NO_MATCH"
            }
          }
        ]
      },
      "confirmationStatus": "NONE"
    },
    "Action": {
      "name": "Action",
      "value": "buy",
      "resolutions": {
        "resolutionsPerAuthority": [
          {
            "authority": "amzn1.er-authority.echo-sdk.amzn1.ask.skill.a20631d9-6a23-413b-8bdc-68ea3c702a10.Action",
            "status": {
              "code": "ER_SUCCESS_MATCH"
            },
            "values": [
              {
                "value": {
                  "name": "buy",
                  "id": "0461ebd2b773878eac9f78a891912d65"
                }
              }
            ]
          }
        ]
      },
      "confirmationStatus": "NONE"
    },
    "Amount": {
      "name": "Amount",
      "value": "20",
      "confirmationStatus": "NONE"
    }
  }
}

The only caveat is that although the Action slot type has a resolution with status ER_SUCCESS_MATCH, the resolution for Product is ER_SUCCESS_NO_MATCH. Fair enough! notebooks does not exist in the Interaction Model. This is actually great. At this point, our skill code receives the raw user Product input, validates it against some known database and we’re in business. So far, this works like LUIS. LUIS’ slots equivalent, entities, do draw a distinction between dictionary look ups of known values, know as List Entities, and machine learned Simple Entities. In essence, Amazon slots act as a combination of those two entity types.

Here is Where It Breaks

We know we want to include more slot types down the road. We also know that we want all those slots to be collected by the entity. Perfect use case for Dialog Management. The following Interaction Model is a result of the addition of dialogs and prompts.


{
  "interactionModel": {
      "languageModel": {
          "invocationName": "product catalog",
          "intents": [
              {
                  "name": "AMAZON.CancelIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.HelpIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.StopIntent",
                  "samples": []
              },
              {
                  "name": "RegisterNeedIntent",
                  "slots": [
                      {
                          "name": "Amount",
                          "type": "AMAZON.NUMBER",
                          "samples": [
                              "{Amount}",
                              "I want {Amount} units"
                          ]
                      },
                      {
                          "name": "Product",
                          "type": "Product",
                          "samples": [
                              "{Product}"
                          ]
                      },
                      {
                          "name": "Action",
                          "type": "Action",
                          "samples": [
                              "I want to {Action}",
                              "{Action}"
                          ]
                      }
                  ],
                  "samples": [
                      "{Action} {Product}",
                      "{Action} {Amount} pounds of {Product}",
                      "{Action} {Amount} {Product}",
                      "{Action} {Amount} packages of {Product}"
                  ]
              }
          ],
          "types": [
              {
                  "name": "Action",
                  "values": [
                      {
                          "name": {
                              "value": "sell"
                          }
                      },
                      {
                          "name": {
                              "value": "buy"
                          }
                      }
                  ]
              },
              {
                  "name": "Product",
                  "values": [
                      {
                          "name": {
                              "value": "flying squirrel"
                          }
                      },
                      {
                          "name": {
                              "value": "hammer"
                          }
                      }
                  ]
              }
          ]
      },
      "dialog": {
          "intents": [
              {
                  "name": "RegisterNeedIntent",
                  "confirmationRequired": true,
                  "prompts": {
                      "confirmation": "Confirm.Intent.1103891303624"
                  },
                  "slots": [
                      {
                          "name": "Amount",
                          "type": "AMAZON.NUMBER",
                          "confirmationRequired": false,
                          "elicitationRequired": true,
                          "prompts": {
                              "elicitation": "Elicit.Slot.1103891303624.102254088595"
                          }
                      },
                      {
                          "name": "Product",
                          "type": "Product",
                          "confirmationRequired": false,
                          "elicitationRequired": true,
                          "prompts": {
                              "elicitation": "Elicit.Slot.1103891303624.804409414770"
                          }
                      },
                      {
                          "name": "Action",
                          "type": "Action",
                          "confirmationRequired": false,
                          "elicitationRequired": true,
                          "prompts": {
                              "elicitation": "Elicit.Slot.1103891303624.279514542098"
                          }
                      }
                  ]
              }
          ]
      },
      "prompts": [
          {
              "id": "Elicit.Slot.1103891303624.102254088595",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "How many units do you want to buy?"
                  }
              ]
          },
          {
              "id": "Elicit.Slot.1103891303624.804409414770",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "Which product would you like to {Action} ?"
                  }
              ]
          },
          {
              "id": "Elicit.Slot.1103891303624.279514542098",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "Do you want to buy or sell?"
                  }
              ]
          },
          {
              "id": "Confirm.Intent.1103891303624",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "Do you want to {Action} {Amount} units of {Product} ?"
                  }
              ]
          }
      ]
  }
}

Looks good. When we run this model, it works but only for values we have declared in the Product slot (for example, buy one hammer). If we use the same utterance as before, we get the following intent from Alexa.


{
    "name": "RegisterNeedIntent",
    "confirmationStatus": "NONE",
    "slots": {
        "Product": {
            "name": "Product",
            "confirmationStatus": "NONE"
        },
        "Action": {
            "name": "Action",
            "value": "buy",
            "resolutions": {
                "resolutionsPerAuthority": [
                    {
                        "authority": "amzn1.er-authority.echo-sdk.amzn1.ask.skill.a20631d9-6a23-413b-8bdc-68ea3c702a10.Action",
                        "status": {
                            "code": "ER_SUCCESS_MATCH"
                        },
                        "values": [
                            {
                                "value": {
                                    "name": "buy",
                                    "id": "0461ebd2b773878eac9f78a891912d65"
                                }
                            }
                        ]
                    }
                ]
            },
            "confirmationStatus": "NONE"
        },
        "Amount": {
            "name": "Amount",
            "value": "20",
            "confirmationStatus": "NONE"
        }
    }
}

My expectation is we would get the raw value and the same resolution result stating there was no match. Consequently, the Dialog.Delegate would simply ask to fill the slot if the resolution was unmatched. In our case, we could have code in our skill to resolve the right value from an external database on the raw user input. If there was a match in our database, we would update the intent with the right resolved product with a confirmation, and delegate the dialog engine to fill in additional missing slots. Instead, we have a completely empty slot and no idea what the user said. Sigh.

We are not completely blocked; but the voice experience is hindered as we can not collect multiple slots in one utterance if it leverages Dialog Management.

Posted by Szymon in Alexa

Alexa Gadget Skill API: Intro to Input Handlers and Roll Call

In a previous post, we set up a basic TypeScript Alexa boilerplate repo to begin Gadget Skill API development. My aim for this and future posts is to create a game on Alexa that my one and a half year old son can play with, while also guiding the readers through the journey of creating their own game. In this post, we create a skill that listens to input events using the Game Engine interface. In particular, we study roll call functionality and the process of developing a somewhat reusable roll call helper to figure out what gadgets are available to the skill. Much of the inspiration to these concepts come from the Alexa trivia game sample, which is unfortunately somewhat difficult to follow.

Introduction to Roll Call and Input Handlers

The idea of a roll call is as follows. Say we have an Alexa device with four paired Echo Button gadgets. Our skill, however, only needs to use two of the four buttons. How do we determine which buttons to use? How do we grab a unique identifier for each button to control it? How do we associate the identifiers with a semantic name such as as Player 2 or Button B? The roll call process allows us to answer these questions. Once completed, we will have the Alexa gadgetIds in our possession. This will allow us to control each device individually and create a compelling game.

To create a roll call we need to generate a GameEngine.StartInputHandler directive and send it as part of a response from our Alexa Skill. This can be a response to the LaunchRequest or to any other request. The directive instructs Alexa gadgets to start recognizing user actions. It can recognize events in a specific pattern, when a user has deviated from a pattern or when a user has completed a portion of a pattern. These units of recognition are called recognizers. The three types just mentioned map to patternRecognizer, deviationRecognizer and progressRecognizer. A patternRecognizer, in the context of a roll call, is one in which each button has one down event. Once we define all of our input handler’s recognizers, we can define custom events. A patternRecognizer can recognize patterns on all or specified gadgetIds. Recognizers are by default set to false. As user input is collected, recognizers are set to true. An event is sent to the skill when a set of recognizer conditions has been met. A developer can specify the list of recognizers that must be true or false, what data to report, a maximum number of invocations and a few other pieces of data. There is also an implicit timed out recognizer that we can utilize in our events.

Conveniently enough, an input handler may include a list of proxies. A proxy is an identifier for a button that can be utilized in a recognizer when a gadgetId is not known. The basic roll call sample input handler in the Alexa documentation shows how we can use proxies and use them in recognizers.


{
  "type": "GameEngine.StartInputHandler",
  "timeout": 10000,
  "proxies": [ "left", "middle", "right" ],
  "recognizers": {
    "all pressed": {
      "type": "match",
      "fuzzy": true,
      "anchor": "start",
      "pattern": [
        {
          "gadgetIds": [ "left" ],
          "action": "down"
        },
        {
          "gadgetIds": [ "middle" ],
          "action": "down"
        },
        {
          "gadgetIds": [ "right" ],
          "action": "down"
        }
      ]
    }
  },
  "events": {
    "complete": {
      "meets": [ "all pressed" ],
      "reports": "matches",
      "shouldEndInputHandler": true
    },
    "failed": {
      "meets": [ "timed out" ],
      "reports": "history",
      "shouldEndInputHandler": true
    }
  }
}

In this example, we expect the user to register three buttons called: left, middle and right. We then define a recognizer composed of a button down event from each button. Whichever button is pressed first will be treated as left, the second as middle and the last one as right. If the first button is pressed twice, the pattern recognizer considers it as two button down event from the left button and continue on its merry way. The fuzzy flag on the recognizer ensures that this doesn’t invalidate the recognizer. Once the recognizer registers true, the complete event is sent to our Alexa Skill and the input handler is unregistered. Note, if the input handler times out after 10 seconds, our Alexa Skill will receive an event called failed.

In the rest of the post, we create a basic Alexa Skill that utilizes this roll call object to register three buttons. Once we have the button gadgetIds we exit the skill. Of note is that the structure of the input handler is very flexible. We could easily add three extra events to call our skill when each individual button is pressed. For example, we create a recognizer leftButtonDown that only recognizes the left button’s down event and declare an event called leftButtonPressed that is invoked when the leftButtonDown recognizer becomes true. The input handler stays active as we set shouldEndInputHandler to false and limit the event to 1 invocation.


"leftButtonPressed": {
  "meets": ["left"],
  "reports: "matches",
  "shouldEndInputHandler: false,
  "maximumInvocations: 1
}

This approach lets us assign semantic names to each button. For example, our skill could say Please press the button for player 1. Once an event comes in, the button is assigned to player1. The skill can then say Please press the button for player 2, and so on.

Getting Ready For a Basic Roll Call Skill

Since I am aiming to develop a single player game, we create a roll call helper that can register between 1 and 4 buttons. Let’s go ahead and see this in action. We are starting with a TypeScript Alexa Skill Boilerplate created in another post.

To create reusable roll call functionality, we must write code that generates the right input handler and can process each message from the Alexa Skills Kit correctly. We will be using the skill’s SessionAttributes to store the roll call state. At the end, we will also store the discovered buttons.

For functionality to be implemented further down the post, we add the AMAZON.YesIntent and AMAZON.NoIntent intents to models/en-US.json. We also assign the invocation name to be something friendlier. Any Alexa skill needs at least one custom intent, so, for now, we leave the HelloIntent in there.


{
  "interactionModel": {
    "languageModel": {
      "invocationName": "my games",
      "types": [],
      "intents": [
        {
          "name": "AMAZON.CancelIntent",
          "samples": []
        },
        {
          "name": "AMAZON.HelpIntent",
          "samples": []
        },
        {
          "name": "AMAZON.StopIntent",
          "samples": []
        },
        {
          "name": "AMAZON.YesIntent",
          "samples": []
        },
        {
          "name": "AMAZON.NoIntent",
          "samples": []
        },
        {
          "name": "HelloIntent",
          "samples": [
            "hello",
            "say hello",
            "say hello world"
          ]
        }
      ]
    }
  }
}

We also add a LocalizedStrings module, so we can have one place with all of our speech strings. The code uses the i18next Node package. The Alexa Skill Kit SDK for Node doesn’t include any direction on how to accomplish localization, though some samples like Trivia, use i18n. The topic is beyond the scope of this blog post.


import * as i18next from "i18next";

export interface ILocalizationResult {
    speech: string;
    reprompt: string;
}

export module LocalizedStrings {

    export function donotunderstand(): ILocalizationResult {
        return {
            speech: i.t("donotunderstand_speech"),
            reprompt: i.t("donotunderstand_reprompt")
        };
    }

    export function welcome(): ILocalizationResult {
        return {
            speech: i.t("welcome_speech"),
            reprompt: i.t("welcome_reprompt")
        };
    }

    export function goodbye(): ILocalizationResult {
        return {
            speech: i.t("goodbye"),
            reprompt: ""
        };
    }
}

const i = i18next.init({
    lng: "en",
    debug: true,
    resources: {
        en: {
            translation: {
                "donotunderstand_speech": "I'm sorry, I didn't quite catch that.",
                "donotunderstand_reprompt": "Sorry, I didn't understand.",
                "goodbye": "Ok, good bye.",
                "welcome_speech": "Hello. Welcome to the games sample. Do you want to play a game?",
                "welcome_reprompt": "Do you want to play a game?"
            }
        }
    }
});

We add more strings to this as we go along.

We modify the LaunchHandler, so that we ask the user if they want to play a game first. If they say yes, we begin the roll call. Otherwise, we leave the skill.


export class LaunchHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const request = handlerInput.requestEnvelope.request;
        return request.type === "LaunchRequest";
    }

    handle(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inLaunch = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.welcome();

        return handlerInput.responseBuilder
            .speak(resp.speech)
            .reprompt(resp.reprompt).getResponse();
    }
}

Note, we set the inLaunch session attribute to true. We add an InLaunchStateHandler to handle the requests when we are in this state.


export class InLaunchStateHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inLaunch;
    }

    handle(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inLaunch = false;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const req = handlerInput.requestEnvelope.request as IntentRequest;
        if (req) {
            if (req.intent.name === "AMAZON.YesIntent") {
                // proceed to roll call
                return RollCall.initialize(handlerInput);
            } else if (req.intent.name === "AMAZON.NoIntent") {
                // exit
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .getResponse();
            }
        }

        const donotresp = LocalizedStrings.donotunderstand();
        return handlerInput.responseBuilder
            .speak(donotresp.speech)
            .reprompt(donotresp.reprompt)
            .getResponse();
    }
}

A few things of note. If the handler receives a AMAZON.NoIntent, the skill exits. If the handler receives a AMAZON.YesIntent, we initialize a new roll call. The implication is that the RollCall.initialize function returns a Response object. Presumably, this will send the input handler JSON we discussed in the previous section. For now, we simply send some speech and set a boolean flag on the SessionAttributes.


export module RollCall {
    export function initialize(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        return handlerInput.responseBuilder
            .speak("Starting Roll Call")
            .reprompt("Waiting on input")
            .getResponse();
    }
}

We create a handler for the roll call state. For now, it simply exits the skill on any input.


export class RollCallHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inRollcall;
    }

    handle(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = false;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.goodbye();
        return handlerInput.responseBuilder
            .speak(resp.speech)
            .getResponse();
    }
}

The code layout now looks as follows. Note that we have both a RollCall helper and a RollCall handler.

Let’s see how this works. We do this by initializing a new skill using ask new -n My Games, copying over the lambda and models directory and then running ask deploy. We can now navigate to the skill’s Test tab. We can respond either yes or no to the launch response and it routes us accordingly.

Diving Into the Roll Call

We now create the code to build the roll call intent handler. In our RollCall module, we create a function called createRollCallDirective and a field const rollcallHandlerTemplate. The const is a template for the input handler and we generate new directives based on it.


export module RollCall {
    ...

    const rollcallHandlerTemplate: interfaces.gameEngine.StartInputHandlerDirective = {
        type: "GameEngine.StartInputHandler",
        proxies: [],
        recognizers: {
            "all pressed": {
                type: "match",
                fuzzy: true,
                anchor: "start",
                pattern: []
            }
        },
        events: {
            complete: {
                meets: ["all pressed"],
                reports: "matches",
                shouldEndInputHandler: true
            },
            failed: {
                meets: ["timed out"],
                reports: "history",
                shouldEndInputHandler: true
            }
        }
    };

    export function createRollCallDirective(numOfButtons: number, timeout?: number): interfaces.gameEngine.StartInputHandlerDirective {
        const handler = JSON.parse(JSON.stringify(rollcallHandlerTemplate));
        if (timeout) {
            handler.timeout = timeout;
        }

        if (numOfButtons > 4 || numOfButtons < 1) {
            throw new Error("Only 1-4 buttons are supported.");
        }

        for (let i = 0; i < numOfButtons; i++) {
            const proxy = "btn" + (i + 1);

            const patternStep: services.gameEngine.Pattern = {
                action: "down",
                gadgetIds: [proxy]
            };
            handler.proxies!.push(proxy);

            (handler.recognizers!["all pressed"] as services.gameEngine.PatternRecognizer)
                .pattern!.push(patternStep);

        }

        return handler;
    }
}

The code is basically adding a proxy and pattern step for every player. Here is what the code produces for three players.


{
  "type": "GameEngine.StartInputHandler",
  "proxies": [
    "btn1",
    "btn2",
    "btn3"
  ],
  "recognizers": {
    "all pressed": {
      "type": "match",
      "fuzzy": true,
      "anchor": "start",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn1"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn2"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn3"
          ]
        }
      ]
    }
  },
  "events": {
    "complete": {
      "meets": [
        "all pressed"
      ],
      "reports": "matches",
      "shouldEndInputHandler": true
    },
    "failed": {
      "meets": [
        "timed out"
      ],
      "reports": "history",
      "shouldEndInputHandler": true
    }
  }
}

We now change the RollCall.initialize call to send the directive.


export module RollCall {
    export function initialize(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        return handlerInput.responseBuilder
            .speak("Starting Roll Call")
            .reprompt("Waiting on input")
            .addDirective(createRollCallDirective(2, 20000))
            .getResponse();
    }
}

If we were to try to run this code, our skill would fail because we have not yet enabled our skill to support the GameEngine and GadgetController interfaces. This page shows us how we can do it. Here are the necessary fields to add to our skill.json manifest.


{
  "publishingInformation": {
    "gadgetSupport": {
      "requirement": "REQUIRED", // or "OPTIONAL"
      "numPlayersMin": int,
      "numPlayersMax": int, // or null
      "minGadgetButtons": int,
      "maxGadgetButtons": int // or null     
    }
  },
  "apis": {
    "custom": {
      "interfaces": [
        {
          "type": "GAME_ENGINE"
        },
        {
          "type": "GADGET_CONTROLLER"
        }
      ]
    }
  }
}

We modify our skill manifest file to reflect this. Here are the values I utilized.


{
  "manifest": {
    "publishingInformation": {
       // other publishingInformation goes here
      "gadgetSupport": {
        "requirement": "REQUIRED",
        "numPlayersMin": 1,
        "numPlayersMax": 1,
        "minGadgetButtons": 1,
        "maxGadgetButtons": 4
      }
    },
    "apis": {
      "custom": {
        "endpoint": {
          "sourceDir": "lambda/custom"
        },
        "interfaces": [
          {
            "type": "GAME_ENGINE"
          },
          {
            "type": "GADGET_CONTROLLER"
          }
        ]
      }
    }
    // any additional manifest data here
  }
}

When we run ask deploy, the Test tab will now have button simulators!

We can run the skill in the simulator, after we answer “yes” when asked if we want to play a game, our skill sends the message “Starting Roll Call” and the generated directive JSON. At that point, you can use the simulator Echo Buttons. Press two different ones and you’ll notice that after pressing two, the skills responds with “Ok, good bye.”

This is great news; this good bye message is generated by our RollCallHandler. That means the input handler is working as expected. We should verify that our 20 second timeout works as well. Since our RollCallHandler always closes the skill, we should see the same behavior if we don’t press the buttons. Go ahead and verify it.

We now make the following changes:

  1. Add functionality to the RollCall module to support input events from the buttons.
  2. Add timeout retry functionality, so that if the roll call times out, we give the user a chance to try again.
  3. Break roll call state handling into two handlers: one focused on events from buttons and the other on user utterances as a response to timeouts.

To address item 1, we add a method called handleInput, which calls into handleTimeout or handleDone depending on which event was received. In handleDone, we retrieve the button gadgetIds from the message, assign them to our sessionAttributes, log them and exit the skill for the time being. handleTimeout implements the logic for item 2. The skill asks the user if they would like to retry. If there are two timeouts in a row, the skill exits. Here is the code for the updated RollCall module.


    export function initialize(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.rollcall_start();
        return handlerInput.responseBuilder
            .speak(resp.speech)
            .reprompt(resp.reprompt)
            .addDirective(createRollCallDirective(2, 20000))
            .getResponse();
    }

    export function handleInput(handlerInput: HandlerInput,
        input: interfaces.gameEngine.InputHandlerEventRequest): Response {
        const inputEvents = input.events!;

        if (inputEvents.some(p => p.name === "failed")) {
            return handleTimeoutOut(handlerInput);
        } else {
            const complete = inputEvents.find(p => p.name === "complete");
            if (complete) {
                return handleDone(handlerInput, complete);
            } else {
               throw new Error("Unexpected event");
            }
        }
    }

    export function handleDone(handlerInput: HandlerInput,
        complete: services.gameEngine.InputHandlerEvent): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        clearSessionAttr(sessionAttr);
        const btns = complete!.inputEvents!.map(p => { return { name: "", id: p.gadgetId }; });
        for (let i = 0; i < btns.length; i++) {
            btns[i].name = "btn" + (i + 1);
        }

        sessionAttr.rollcallResult = btns;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        console.log(`Registered buttons: \n${JSON.stringify(btns, null, 2)}`);

        const resp = LocalizedStrings.rollcall_done();
        return handlerInput.responseBuilder
            .speak(resp.speech)
            .withShouldEndSession(true)
            .getResponse();
    }

    function clearSessionAttr(sessionAttr: { [key: string]: any }): void {
        delete sessionAttr.inRollcall;
        delete sessionAttr.rollcallTimeout;
    }

    ...

}

For item 3, we create a RollCallRetryTimeoutHandler and move some logic over from RollCallHandler. The two handlers are shown below.


export class RollCallHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inRollcall &&
            (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent");
    }

    handle(handlerInput: HandlerInput): Response {
        const inputEventRequest = handlerInput.requestEnvelope.request as interfaces.gameEngine.InputHandlerEventRequest;
        if (inputEventRequest) {
            return RollCall.handleInput(handlerInput, inputEventRequest);
        } else {
            throw new Error("Unexpected event type. Not supported in roll call.");
        }
    }
}

export class RollCallTimeoutRetryHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inRollcall &&
            sessionAttr.rollcallTimeout > 0 &&
            handlerInput.requestEnvelope.request.type === "IntentRequest" &&
            (handlerInput.requestEnvelope.request.intent.name === "AMAZON.YesIntent" ||
                handlerInput.requestEnvelope.request.intent.name === "AMAZON.NoIntent");
    }

    handle(handlerInput: HandlerInput): Response {
        const intentRequest = handlerInput.requestEnvelope.request as IntentRequest;
        if (intentRequest) {
            if (intentRequest.intent.name === "AMAZON.YesIntent") {
                return RollCall.initialize(handlerInput);
            } else if (intentRequest.intent.name === "AMAZON.NoIntent") {
                const resp = LocalizedStrings.goodbye();
                return handlerInput.responseBuilder
                    .speak(resp.speech)
                    .withShouldEndSession(true)
                    .getResponse();
            }
        }
        throw new Error("Unexpected input type. Not supported.");
    }
}

We can go ahead and build and deploy the code to our skill. The happy path pressing the two buttons works and we can see the gadgetIds in our CloudWatch logs on AWS. Even though we did not know the button identifiers before, we now have them and can send specific commands to each button. Try letting the input handler time out and observe the behavior as well as the retry option.

Providing Feedback for a Button Tap

If there is anything odd to state about the experience it is that when we press the first button, there is no feedback of any sort. The buttons stay dark, and there is no acknowledgement of a button press. In fact, our skill doesn’t even know a button has been pressed. We previously suggested that we can get around this and get those events. We will do so now.

First, we add code to generate the right input handler JSON. We modify the RollCall module’s createRollCallDirective function. The result is that we generate a recognizer and event for each individual button.


    export function createRollCallDirective(numOfButtons: number, timeout?: number): interfaces.gameEngine.StartInputHandlerDirective {
        const handler = JSON.parse(JSON.stringify(rollcallHandlerTemplate));
        if (timeout) {
            handler.timeout = timeout;
        }

        if (numOfButtons > 4 || numOfButtons < 1) {
            throw new Error("Only 1-4 buttons are supported.");
        }

        for (let i = 0; i < numOfButtons; i++) {
            const proxy = "btn" + (i + 1);
            const recognizer = "recognizer_" + proxy;
            const eventName = "event_" + proxy;

            const patternStep: services.gameEngine.Pattern = {
                action: "down",
                gadgetIds: [proxy]
            };
            handler.proxies!.push(proxy);

            (handler.recognizers!["all pressed"] as services.gameEngine.PatternRecognizer)
                .pattern!.push(patternStep);

            const newRecognizer: services.gameEngine.PatternRecognizer = {
                anchor: "end",
                fuzzy: true,
                type: "match",
                pattern: [patternStep]
            };

            handler.recognizers![recognizer] = newRecognizer;

            handler.events![eventName] = {
                shouldEndInputHandler: false,
                maximumInvocations: 1,
                meets: [recognizer],
                reports: "matches"
            };
        }

        return handler;
    }

The JSON produced by this code for three players is shown below.


{
  "type": "GameEngine.StartInputHandler",
  "proxies": [
    "btn1",
    "btn2",
    "btn3"
  ],
  "recognizers": {
    "all pressed": {
      "type": "match",
      "fuzzy": true,
      "anchor": "start",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn1"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn2"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn3"
          ]
        }
      ]
    },
    "recognizer_btn1": {
      "anchor": "end",
      "fuzzy": true,
      "type": "match",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn1"
          ]
        }
      ]
    },
    "recognizer_btn2": {
      "anchor": "end",
      "fuzzy": true,
      "type": "match",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn2"
          ]
        }
      ]
    },
    "recognizer_btn3": {
      "anchor": "end",
      "fuzzy": true,
      "type": "match",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn3"
          ]
        }
      ]
    }
  },
  "events": {
    "complete": {
      "meets": [
        "all pressed"
      ],
      "reports": "matches",
      "shouldEndInputHandler": true
    },
    "failed": {
      "meets": [
        "timed out"
      ],
      "reports": "history",
      "shouldEndInputHandler": true
    },
    "event_btn1": {
      "shouldEndInputHandler": false,
      "maximumInvocations": 1,
      "meets": [
        "recognizer_btn1"
      ],
      "reports": "matches"
    },
    "event_btn2": {
      "shouldEndInputHandler": false,
      "maximumInvocations": 1,
      "meets": [
        "recognizer_btn2"
      ],
      "reports": "matches"
    },
    "event_btn3": {
      "shouldEndInputHandler": false,
      "maximumInvocations": 1,
      "meets": [
        "recognizer_btn3"
      ],
      "reports": "matches"
    }
  }
}

It is worth taking a minor detour at this point. You may have noticed that the interfaces.gameEngine.InputHandlerEventRequest interface contains an array called events. The implication is that one button down may result in a request with more than one event. Using the directive above as an example, when we press a button, we receive a request with one event called event_btn1. When we press the second button, we receive a second request with two events: event_btn2 and complete. This means that any logic that we write to generate responses must take this behavior into account. For example, we are planning on adding a response message when the user presses a button. However, when the user pressed the second button, the skill must also inform the user that the roll call is done and the game is starting. All that to say, we’ll need to be cognizant of these factors when building our skills.

For now, we will create a function called handleCheckin that handles the individual events. We update handleInput as well.


    export function handleInput(handlerInput: HandlerInput,
        input: interfaces.gameEngine.InputHandlerEventRequest): Response {
        const inputEvents = input.events!;

        if (inputEvents.some(p => p.name === "failed")) {
            return handleTimeoutOut(handlerInput);
        } else {
            const complete = inputEvents.find(p => p.name === "complete");
            if (complete) {
                return handleDone(handlerInput, complete);
            } else {
                return handleButtonCheckin(handlerInput);
            }
        }
    }

    export function handleButtonCheckin(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.rollcallButtonsCheckedIn++;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.rollcall_checkin(numOfButtons - sessionAttr.rollcallButtonsCheckedIn);
        return handlerInput.responseBuilder.speak(resp.speech).getResponse();
    }

When we deploy the skill we should be able to receive feedback when the first button is pressed.

The timeout behavior is a bit odd. If we press one button but never press the second one, our skill asks if we want to retry. At that point, the skill doesn’t remember the first button. We will leave the behavior as is for now.

We have made some really great progress towards setting up a game on Alexa’s Gadget Skills API. In further posts we will explore setting color animations on the buttons to provide user feedback and actually getting a game in place.

The code for this post can be found on GitHub.

 

Posted by Szymon in Alexa

An Alexa Node.js TypeScript Boilerplate Project

I am currently looking at the Alexa Gadget Skills API for the purpose of creating a fun game for my son and presenting on the experience at a few conferences.  Even though the SDK is written in TypeScript, the Trivia Game sample is not. That irked me a bit. It was particularly painful as the sample does some odd things using globals that was difficult to track. So, before I create a series around how I created a few Echo Button enabled games on Alexa, I present a simple, bare bones Alexa Skill TypeScript Boilerplate Project. You can find the code on GitHub.

The V2 version of the SDK has some really good improvements over the first version. The documentation is quite good. The core concepts that we should keep in mind for our boilerplate are:

  • The index file declares the skill and pull in all necessary handlers to compose a lambda handler.
  • Each RequestHandler has two methods: canHandle and handle. Handlers are called in the order they were registered. The first handler to evaluate canHandle to true is selected for processing and itshandle method is invoked.
  • The index file registers instances of RequestInterceptor and ResponseInterceptor. A RequestInterceptor is code that executes and can perform actions on an incoming message before a handler is selected. Likewise, a ResponseInterceptor is code that executes after a handler has finished executing. The most obvious use case for these two is message logging, though, as per the sample above, developers can get much more creative.
  • Lastly, the index declares a custom ErrorHandler. In the previous version of the SDK, when an error occurs, the Alexa device responds with the dreaded “There was a problem with the skill’s response”. Now, we log the error and respond with a friendly message to the user.

We start by taking advantage of the Alexa Skills Kit CLI. If you are doing Alexa development without using it, what is wrong with you? Go ahead and get setup with it before continuing. If we go ahead and create a new skill by using ask new -n TestSkill, we get a basic Skill Manifest (skill.json), the interaction model in the default culture (models/en-US.json) and code for the skill under lambda/custom.

The generated skill interaction model includes Amazon’s builtin intents: AMAZON.CancelIntent, AMAZON.HelpIntent and AMAZON.StopIntent. The default interaction model also includes a custom HelloWorldIntent. The auto-generated index.js file includes all the code to handle all intents, plus a LaunchRequest handler and a custom ErrorHandler. The full code is shown below.


/* eslint-disable  func-names */
/* eslint-disable  no-console */

const Alexa = require('ask-sdk-core');

const LaunchRequestHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'LaunchRequest';
  },
  handle(handlerInput) {
    const speechText = 'Welcome to the Alexa Skills Kit, you can say hello!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .reprompt(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const HelloWorldIntentHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && handlerInput.requestEnvelope.request.intent.name === 'HelloWorldIntent';
  },
  handle(handlerInput) {
    const speechText = 'Hello World!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const HelpIntentHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && handlerInput.requestEnvelope.request.intent.name === 'AMAZON.HelpIntent';
  },
  handle(handlerInput) {
    const speechText = 'You can say hello to me!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .reprompt(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const CancelAndStopIntentHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && (handlerInput.requestEnvelope.request.intent.name === 'AMAZON.CancelIntent'
        || handlerInput.requestEnvelope.request.intent.name === 'AMAZON.StopIntent');
  },
  handle(handlerInput) {
    const speechText = 'Goodbye!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const SessionEndedRequestHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'SessionEndedRequest';
  },
  handle(handlerInput) {
    console.log(`Session ended with reason: ${handlerInput.requestEnvelope.request.reason}`);

    return handlerInput.responseBuilder.getResponse();
  },
};

const ErrorHandler = {
  canHandle() {
    return true;
  },
  handle(handlerInput, error) {
    console.log(`Error handled: ${error.message}`);

    return handlerInput.responseBuilder
      .speak('Sorry, I can\'t understand the command. Please say again.')
      .reprompt('Sorry, I can\'t understand the command. Please say again.')
      .getResponse();
  },
};

const skillBuilder = Alexa.SkillBuilders.custom();

exports.handler = skillBuilder
  .addRequestHandlers(
    LaunchRequestHandler,
    HelloWorldIntentHandler,
    HelpIntentHandler,
    CancelAndStopIntentHandler,
    SessionEndedRequestHandler
  )
  .addErrorHandlers(ErrorHandler)
  .lambda();

The code is pretty straightforward. Each handler’s canHandle method checks for the right request type and intent name to be present. The ErrorHandler logs the error and asks the user to repeat themselves. Lastly, the code composes the skill and returns an AWS Lambda handler.

We now turn this code into TypeScript. On top of that, we break the file up into different handler files and add intercept handlers. I work using Visual Studio Code, so I also enable tslint, a TypeScript linter so I can get extra tips of fixing up my TypeScript.

We first need to create the tsconfig.json file. This file basically provides options for the TypeScript compiler. More details can be found here. We will go ahead and write all of our TypeScript code in a folder called src. The destination will be root directory. That way, once tsc compiles everything, we run ask deploy to push the skill code into Lambda.

We use the following tsconfig.json file as a starting point.


{
    "include": [
        "src/**/*"
    ],
    "exclude": [

    ],
    "compilerOptions": {
        "lib": [
            "dom",
            "es2017"
        ],
        /* Basic Options */
        "target": "ES2015", /* Specify ECMAScript target version: 'ES3' (default), 'ES5', 'ES2015', 'ES2016', 'ES2017', or 'ESNEXT'. */
        "module": "commonjs", /* Specify module code generation: 'commonjs', 'amd', 'system', 'umd' or 'es2015'. */
        "sourceMap": true, /* Generates corresponding '.map' file. */
        "outDir": ".", /* Redirect output structure to the directory. */
        "rootDir": "./src", /* Specify the root directory of input files. Use to control the output directory structure with --outDir. */
        "moduleResolution": "node", /* Specify module resolution strategy: 'node' (Node.js) or 'classic' (TypeScript pre-1.6). */
        "strict": true,
        "noUnusedLocals": true
    }
}

We will place this file inside the lambda/custom directory. Next, we add two scripts into ourpackage.json file.


  "scripts": {
    "build": "tsc",
    "watch": "tsc --watch"
  }

You should have the Node.js TypeScript package installed for this to work.
npm install -g typescript

Let’s confirm the setup works. Create the src directory and place the following content into an index.ts file.


console.log("Hello World!");

If we run npm run build in the custom/lambda directory, the index.js gets created in the root. Good job! Our directory layout should look as follows. (You can also run npm run watch, which makes sure that any time a TypeScript file is modified, the compiler runs)

Next, we create theinterceptors and handlers directory. Inside of those, we create the files to support all the different handlers and interceptors.

You can see where this is going. We now fill out the code for each of these files. Once done and compiled into JavaScript, we run ask deploy and our skill will just work.

We begin with the Launch handler in the Launch.ts file.


import { HandlerInput, RequestHandler } from "ask-sdk-core";
import { Response } from "ask-sdk-model";

export class LaunchHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const request = handlerInput.requestEnvelope.request;
        return request.type === "LaunchRequest";
    }

    handle(handlerInput: HandlerInput): Response {
        const speechText = "Welcome to the Alexa Skills Kit, you can say hello!";

        return handlerInput.responseBuilder
            .speak(speechText)
            .reprompt(speechText)
            .withSimpleCard("Hello World", speechText)
            .getResponse();
    }
}

Note that the ask-sdk-core and ask-sdk-model modules both include TypeScript declaration files making development in an environment like Visual Studio Code easier.

Every other handler looks similar. For example, the HelloWorldHandler looks as follows:


import { HandlerInput, RequestHandler } from "ask-sdk-core";
import { Response } from "ask-sdk-model";

export class HelloWorldHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const request = handlerInput.requestEnvelope.request;
        return request.type === "IntentRequest" && request.intent.name === "HelloWorldIntent";
    }

    handle(handlerInput: HandlerInput): Response {
        const speechText = "Hello World!";

        return handlerInput.responseBuilder
            .speak(speechText)
            .withSimpleCard("Hello World", speechText)
            .getResponse();
    }
}

The CustomErrorHandler logs the error and provides a friendly response.


import { HandlerInput, ErrorHandler } from "ask-sdk-core";
import { Response } from "ask-sdk-model";

export class CustomErrorHandler implements ErrorHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        return true;
    }

    handle(handlerInput: HandlerInput, error: Error): Response {
        const request = handlerInput.requestEnvelope.request;

        console.log(`Error handled: ${error.message}`);
        console.log(`Original Request was: ${JSON.stringify(request, null, 2)}`);

        return handlerInput.responseBuilder
            .speak("Sorry, I can not understand the command.  Please say again.")
            .reprompt("Sorry, I can not understand the command.  Please say again.")
            .getResponse();
    }
}

The intercept handlers are similar, here is what the request one looks like.


import { RequestInterceptor, HandlerInput } from "ask-sdk-core";

export class RequestLoggingInterceptor implements RequestInterceptor {
    process(handlerInput: HandlerInput): Promise {
        return new Promise((resolve, reject) => {
            console.log("Incoming request:\n" + JSON.stringify(handlerInput.requestEnvelope.request));
            resolve();
        });
    }
}

The last code we will show in this post is the index.ts file that puts it all together. It imports all of the different modules we just created and ensures they are exposed to the Azure handler.


import { SkillBuilders } from "ask-sdk-core";

import { BuiltinAmazonCancelHandler } from "./handlers/builtin/AMAZON.CANCEL";
import { BuiltinAmazonHelpHandler } from "./handlers/builtin/AMAZON.Help";
import { BuiltinAmazonStopHandler } from "./handlers/builtin/AMAZON.Stop";
import { LaunchHandler } from "./handlers/Launch";
import { HelloWorldHandler } from "./handlers/HelloWorld";
import { SessionEndedHandler } from "./handlers/SessionEndedRequst";

import { CustomErrorHandler } from "./handlers/Error";
import { RequestLoggingInterceptor } from "./interceptors/RequestLogging";
import { ResponseLoggingInterceptor } from "./interceptors/ResponseLogging";

function buildLambdaSkill(): any {
    return SkillBuilders.custom()
        .addRequestHandlers(
            new LaunchHandler(),
            new HelloWorldHandler(),
            new BuiltinAmazonCancelHandler(),
            new BuiltinAmazonHelpHandler(),
            new BuiltinAmazonStopHandler(),
            new SessionEndedHandler()
        )
        .addRequestInterceptors(new RequestLoggingInterceptor())
        .addResponseInterceptors(new ResponseLoggingInterceptor())
        .addErrorHandlers(new CustomErrorHandler())
        .lambda();
}

export let handler = buildLambdaSkill();

Once all code is in place, we make sure that TypeScript compiled everything and run ask deploy. If all goes well, our skill and lambda function will be updated to reflect the changes. Below, I show the Test tab’s output as I type through a conversation. It all works great! Notice, I left the default invocation name as “greeter”. We’ll change that later on.

That’s it! For now, I am not placing any effort in terms of supporting automated testing or providing a strong opinion as to where the business logic lives. Some of this may change as I develop more skill. In the mean time, you can find all the code in this GitHub repo.

Posted by Szymon in Alexa

Dynamically Rendered Graphics for Conversational Experiences

About a year and a half ago, my team and I embarked on a journey to build a chat bot for a client in the financial industry. They had a remarkable amount of market and education data. One of our goals was to figure out the best way to consume all of that data and communicate it back to the user. In a text-only world, sending back this amount of data would be incredibly verbose.

To illustrate the point, let’s take a look at what data a financial stock quote may communicate. At a minimum, a quote is composed of the last price, change and change percentage for the latest trading session. In general, it is also useful to know the opening, high and low price for the day. The 52-week high and low are relevant as they give us more context around what the stock was doing over the last year. For example, in the Google Finance card below, we can tell that in the last year, Amazon had a low of $931 and since then has doubled. Crazy! A quote may have other information like the bid/ask prices and sizes. All this information is a Level I quote.

Say a user asked for a Amazon.com quote. What would a text message with all this data look like? Maybe something as follows:

The latest price for AMZN (Amazon Inc) was $1,788.02 at 9:48 AM EDT. This is a change of $8.80 (0.49%) for the day. The open price was $1,786.49 and the high and low are $1,801.83 and $1,741.64 respectively. The 52-week high and low are $1,880.05 and $931.75 respectively.

It should be clear that parsing through this text for every quote is mentally exhausting. It is not immediately clear if the stock is up or down. The color for the change is a nice touch in the card, something we lack in the text. The open, high, low and 52-week prices all blend in. If we were to ask for a few quotes in succession, we would develop a headache because of the massive amount of gymnastics the brain would have to go through. To many, all of this is obvious. It wasn’t to me when I first entered this space.

You sold me, now what?

Hopefully you agree that a graphical display of the financial data is easier to digest and more effective at conveying the information. In fact, this approach not only applies to financial data, but any other graphics. Take a chart of historical weather averages. Perhaps as part of a weather bot, we would like to display a chart of the last month of temperatures. Maybe a chart of the Los Angeles daily high and lows, as well as hourly temperatures.

How do we go about generating a graphic like this to incorporate in our bot’s response?

This question has come up in various projects that I’ve been a part of. HTML and CSS always seemed like a good approach. The problem was that it is difficult find a library that can take arbitrary HTML/CSS input and result in a faithful rendering of the web standards. In fact, this is usually an exercise in futility. For instance, in our .Net based project we found some old libraries that ignored most modern web development techniques; we could only specify font sizes in pixels inline. What we really wanted was a WebKit (Apple) or Chromium (Google) based library maintained by a reputable party to do the work for us.

Headless Browsers

Headless browsers have been around for some time. One of the better-known classics may be PhantomJS (development has been suspended as of March 2018). The concept is to run an entire instance of a browser without displaying any user interface. The main use case for these would be something like automated unit and functional JavaScript tests. If functional tests on Single Page Apps were failing, it would be useful to take a screenshot of what the app looked like at the time.

Google’s Chrome gained a headless mode in 2017. One of the more exciting projects, Puppeteer, is a Node API for Headless Chrome maintained by the Chrome Dev Tools team. With Puppeteer, we can run scripts like this one below from the examples. It loads a page, enters text into an input box to search for articles and then scrapes the resulting page (source: https://github.com/GoogleChrome/puppeteer/blob/master/examples/search.js).


const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://developers.google.com/web/');

  // Type into search box.
  await page.type('#searchbox input', 'Headless Chrome');

  // Wait for suggest overlay to appear and click "show all results".
  const allResultsSelector = '.devsite-suggest-all-results';
  await page.waitForSelector(allResultsSelector);
  await page.click(allResultsSelector);

  // Wait for the results page to load and display the results.
  const resultsSelector = '.gsc-results .gsc-thumbnail-inside a.gs-title';
  await page.waitForSelector(resultsSelector);

  // Extract the results from the page.
  const links = await page.evaluate(resultsSelector => {
    const anchors = Array.from(document.querySelectorAll(resultsSelector));
    return anchors.map(anchor => {
      const title = anchor.textContent.split('|')[0].trim();
      return `${title} - ${anchor.href}`;
    });
  }, resultsSelector);
  console.log(links.join('\n'));

  await browser.close();
})();

How can we leverage Puppeteer to fill our needs? We take advantage of page.screenshot function, as shown in the code below. We first set the viewport to reflect the size of our screenshot. Notice that we ask Puppeteer to load the HTML using a data URL. An alternative is to create the file in a temporary folder on disk and point Chrome at it. When loading the content we pass a waitUntil parameter set to load. There are some other options here that look at the network being idle. More information can be found here. Lastly, we take a screenshot. The
omitBackground flag allows us to have transparent backgrounds in our screenshots. The result of the screenshot will be a Node Buffer with base64 encoded data.


async function renderHtml(html, width, height) {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.setViewport({ width: width, height: height });
    await page.goto(`data:text/html,${html}`, { waitUntil: 'load' });
    const pageResultBuffer = await page.screenshot({ omitBackground: true, encoding: 'base64' });
    await page.close();
    browser.disconnect();
    return pageResultBuffer;
}

 

Once the buffer is created, we can do just about anything with it. We can send down it down to a bot as an inline PNG URL or we can upload it to a blob store like S3, directing any channel to utilize the image from the blob store. In the rest of this post, we will create a Node server that simply responds to GET requests with the weather graphic above for any city passed through a URL parameter.

There is one more implication of Headless Chrome that we have not yet explicitly spelled out. The HTML we pass can include all manners of SVG, JavaScript, loading of external resources, etc. We can truly take advantage of the various Chrome features and even create an SPA. For our weather graphic use case, we will use a JavaScript charting library to draw the visualization. With all the libraries available out there, we can get into some pretty nifty visualizations.

A Simple Weather Graphic Image Server

We will not walk through the creation of a simple Node server that generates these weather graphics for any city. As Facebook Messenger requires landscape images to have a 1.91:1 aspect ratio, we create a card of that size. We use C3.js, a charting library based on the well-known D3.js document manipulation library. Let us take a look at the card template HTML. Within, we create a basic C3 timeseries chart that includes two x axes: one for the daily high/low data and one for the hourly temperature data. Note that we use placeholders that will be replaced with the actual data that will be used in the chart.


<html>

<head>
    <style>
        body {
            font-family: sans-serif;
            margin: 0;
            padding: 0;
            background: #ffffff;
        }
    </style>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/d3/5.5.0/d3.min.js"></script>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/c3/0.6.6/c3.min.js"></script>
    <link href="https://cdnjs.cloudflare.com/ajax/libs/c3/0.6.6/c3.min.css" rel="stylesheet" type="text/css">
</head>

<body>
    <div class="card">
        <div id="chart"></div>
    </div>
</body>

<script type="text/javascript">
    var chart = c3.generate({
        size: {
            width: 764,
            height: 400
        },
        data: {
            xFormat: '%Y-%m-%d-%H',
            xs: {
                'Low': 'x1',
                'High': 'x1',
                'Hourly': 'x2',
            },
            columns: [
                ['x1', { X }],
                ['x2', { X2 }],
                ['Low', { LOW }],
                ['High', { HIGH }],
                ['Hourly', { HR }]
            ]
        },
        point: {
            show: false
        },
        grid: {
            y: {
                show: true
            }
        },
        axis: {
            x: {
                type: 'timeseries',
                tick: {
                    count: 12,
                    format: '%Y-%m-%d'
                }
            }
        }
    });
</script>
</html>

As an example, if we were to set the following data in the columns:


columns: [
    ['x1', '2018-07-02-0','2018-07-03-0','2018-07-04-0','2018-07-05-0'],
    ['x2', '2018-07-02-0','2018-07-03-0','2018-07-04-0','2018-07-05-0'],
    ['Low', 63,64,63,62],
    ['High', 74,74,76,85],
    ['Hourly', 67,70,70,78]
]

We would see the following chart:

All that is left is for us to retrieve the data, transform it into the format require by C3.js and we’ll generate the graphic we want.

I found a free trial weather API that we could use for this purpose: World Weather Online. On their web site you can create an account and receive a trial key for 500 API calls a day. With the key in our possessions, we can retrieve data using a URL in this format:

https://api.worldweatheronline.com/premium/v1/past-weather.ashx?key={INSERT_YOUR_KEY_HERE}&q=los%20angeles&format=json&date=2018-07-31&enddate=2018-08-01&tp=1

The parameter tp correspond to the frequency of data points, in this case 1 means we receive hourly data. The q parameter is the name of the city. We can also pass the start and enddate for our requests. The result of the query above modified for brevity is:


{
    "data": {
        "request": [
            {
                "type": "City",
                "query": "Los Angeles, United States of America"
            }
        ],
        "weather": [
            {
                "date": "2018-07-31",
                "astronomy": [
                    {
                        "sunrise": "06:04 AM",
                        "sunset": "07:55 PM",
                        "moonrise": "10:24 PM",
                        "moonset": "09:27 AM",
                        "moon_phase": "Waning Gibbous",
                        "moon_illumination": "83"
                    }
                ],
                "maxtempC": "30",
                "maxtempF": "86",
                "mintempC": "24",
                "mintempF": "76",
                "totalSnow_cm": "0.0",
                "sunHour": "13.0",
                "uvIndex": "0",
                "hourly": [
                    {
                        "time": "0",
                        "tempC": "23",
                        "tempF": "74",
                        "windspeedMiles": "1",
                        "windspeedKmph": "1",
                        "winddirDegree": "193",
                        "winddir16Point": "SSW",
                        "weatherCode": "116",
                        "weatherIconUrl": [
                            {
                                "value": "http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0004_black_low_cloud.png"
                            }
                        ],
                        "weatherDesc": [
                            {
                                "value": "Partly cloudy"
                            }
                        ],
                        "precipMM": "0.0",
                        "humidity": "72",
                        "visibility": "10",
                        "pressure": "1013",
                        "cloudcover": "4",
                        "HeatIndexC": "24",
                        "HeatIndexF": "75",
                        "DewPointC": "18",
                        "DewPointF": "65",
                        "WindChillC": "24",
                        "WindChillF": "75",
                        "WindGustMiles": "4",
                        "WindGustKmph": "6",
                        "FeelsLikeC": "24",
                        "FeelsLikeF": "75"
                    },
                    {
                        "time": "100",
                        "tempC": "23",
                        "tempF": "74",
                        "windspeedMiles": "1",
                        "windspeedKmph": "2",
                        "winddirDegree": "193",
                        "winddir16Point": "SSW",
                        "weatherCode": "116",
                        "weatherIconUrl": [
                            {
                                "value": "http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0004_black_low_cloud.png"
                            }
                        ],
                        "weatherDesc": [
                            {
                                "value": "Partly cloudy"
                            }
                        ],
                        "precipMM": "0.0",
                        "humidity": "73",
                        "visibility": "10",
                        "pressure": "1012",
                        "cloudcover": "4",
                        "HeatIndexC": "24",
                        "HeatIndexF": "75",
                        "DewPointC": "19",
                        "DewPointF": "65",
                        "WindChillC": "24",
                        "WindChillF": "75",
                        "WindGustMiles": "4",
                        "WindGustKmph": "6",
                        "FeelsLikeC": "24",
                        "FeelsLikeF": "75"
                    },
…
}

For every day we have the minimum and maximum temperatures and for every hour we have a temperature. We can use some code to retrieve and parse this into something useful. I used the code below. The API sometimes resulted in a timeout error so I built in retry logic. In effect, we retrieve the last 30 days of data and transform the objects into a format we can easily use.


async function getWeatherData(location) {
    const uri = `https://api.worldweatheronline.com/premium/v1/past-weather.ashx?key=${process.env.WEATHER_KEY}&q=${encodeURIComponent(location)}&format=json&date={start}&enddate={end}&tp=1`;
    const start = moment().add(-30, 'days');
    const end = moment().startOf('day');

    const data = [];
    let done = false;
    let errorCount = 0;
    while (!done) {
        const startStr = start.format('YYYY-MM-DD');
        const endStr = end.format('YYYY-MM-DD');
        const reqUri = uri.replace('{start}', startStr).replace('{end}', endStr);
        console.log(`fetching ${reqUri}`);

        try {
            const rawResponse = await rp({ uri: reqUri, json: true });
            const response = rawResponse.data.weather.map(item => {
                return {
                    date: item.date + '-0',
                    min: item.mintempF,
                    max: item.maxtempF,
                    hourly: item.hourly.map(hr => {
                        let date = moment(item.date);
                        date.hour(parseInt(hr.time) / 100);
                        date.minute(0); date.second(0);
                        return {
                            date: date.format('YYYY-MM-DD-HH'),
                            temp: hr.tempF
                        }
                    })
                };
            });
            response.forEach(item => { data.push(item) });
            done = true;
        } catch (error) {
            errorCount++;
            if (errorCount >= 3) return null;
            console.error('error... retrying');
            await timeout(3 * 1000);
        }
    }

    return data;
}

The last piece of code creates the GET endpoint on our server using restify, retrieves the weather data, populates the template HTML, takes a screenshot using Headless Chrome and responds with the image.


server.get('/api/:location', async (req, res, next) => {
    const location = req.params.location;
    const weatherData = await getWeatherData(location);

    if (weatherData == null) {
        // this means we got some error. we return Internal Server Error
        res.writeHead(500);
        res.end();
        next();
        return;
    }

    const x = weatherData.map(item => "'" + item.date + "'").join(',');
    const low = weatherData.map(item => item.min).join(',');
    const high = weatherData.map(item => item.max).join(',');

    const _x2 = [];
    const _hrs = [];
    weatherData.map(item => item.hourly).forEach(hr => hr.forEach(hri => _x2.push(hri.date)));
    weatherData.map(item => item.hourly).forEach(hr => hr.forEach(hri => _hrs.push(hri.temp)));
    const x2 = _x2.map(d => "'" + d + "'").join(',');
    const hrs = _hrs.join(',');

    let data = fs.readFileSync('cardTemplate.html', 'utf8');
    data = data.replace('{ X }', x);
    data = data.replace('{ LOW }', low);
    data = data.replace('{ HIGH }', high);
    data = data.replace('{ X2 }', x2);
    data = data.replace('{ HR }', hrs);

    const cardData = await renderHtml(data, 764, 400);

    res.writeHead(200, {
        'Content-Type': 'image/png',
        'Content-Length': cardData.length
    });

    res.end(cardData);

    next();
});

 

The result is that we can run the server by running npm start, navigate to a URL like http://localhost:8080/api/Miami and receive the following image.

Not bad for a few minutes of coding! I’ll assume that the low temperatures being higher than the hourly data is either a data quality issue or something I did wrong in the chart.

Conclusion

Clearly there’s more work to be done to take this into a production environment. The result looks somewhat pixelated. We could render a larger image and then resample back down to get a higher quality image. You may have noticed some slowness in rendering; if we are remotely loading JavaScript and CSS resource, we may want to load them from the same computer.

Despite some issues, this is a sound approach and with some fine tuning can result in high quality visualizations for our bot experiences or any other application that need static visualizations.

In the .Net project I referenced earlier, we actually created a standalone ASP.Net Core web app on Azure that called into Puppeteer scripts using ASP.Net Core Node Services. It works very well and performs great. We did not spend too much time optimizing and were able to get performance to around 300ms, which is sufficient for our purposes.

You can find the full code sample on Github.

We dive into this technique in further detail in then context of bots in my book, Practical Bot Development.

Posted by Szymon in Bots

Time to Get Started with Chat and Voice

I have spent the last two and a half years of my career focusing on a technology that, back then, was easily dismissible. So much that others in at my work doubted we could build a successful business around it. At the time, chat bots had gained a somewhat notorious reputation for underwhelming users because of the bots’ limitations. From a technology implementation perspective, what was a clear attempt at providing narrow, but useful, conversational experiences became a target of Turing completeness ridicule. No way this is AI, they said. It was fair, but very misplaced criticism. In the past two years, chat bots have been gaining steam across the consumer and enterprise space. Bots are filling a real need.

Users who have a smartphone love their messaging apps. Look at the average user’s phone and you will find apps the likes of WhatsApp, WeChat, Snapchat, Facebook Messenger and so on. You know what you will not find? A mobile app for a local mechanic or a local flower shop. Users, millennials especially, heavily prefer messaging to calling. Messaging is convenient and, of importance, asynchronous. If we interact with friends using messaging apps, why should we interact with businesses any differently? The writing is on the wall and companies from Facebook to Twitter and Apple are on board.

Of equal relevance are digital assistants like Alexa, Cortana and Google Assistant. As these become more and more integrated with our daily activities, our expectations around communicating with computer agents using natural language become more ingrained. I just attended the VOICE 2018 conference in Newark, NJ. The stories shared around our interactions with voice assistants resonated, especially as they reflect real usage in our homes. For instance, children love Alexa. They love asking her all kinds of questions, watching fun videos and, most recently, playing games by using gadgets like Echo Buttons. Nursing homes and the elderly stand to benefit as well; there is something human about being able to speak to Alexa at any time, especially for those living alone. For everyone in between, it acts as an appointment assistant, a task tracker or a glorified kitchen timer. As we become accustomed to these voice interactions, expecting the same level of natural language comprehension with all kinds of computer agents will become second nature.

As one would expect, there is significant overlap between the technologies powering both chat bots and voice experiences. At the end of the day, a conversational experience is composed of a per-user state machine. An incoming user message gets distilled into an intent and an optional set of entities. Given a user’s state, incoming intent and entities, the state machine takes the three pieces as input and transitions the user to the next state. For example, if I begin a conversation with a bot I may be in a Begin state. If I say, What is the current weather?, the state machine would transition me to the CurrentWeather state, in which the right business logic to fetch the weather and generate a response would be executed. The collection of all these state transitions is the conversation. Natural Language Understanding (NLU) technologies such as Microsoft’s LUIS, Rasa NLU and Google’s Dialogflow, among many others are the Narrow AI behind conversational experiences. There are also many options for developing the conversation engine that powers the state machine, such as Microsoft’s Bot Framework, Google’s Dialogflow, Amazon’s Lex, Watson Assistant and many others. Once we have an NLU system and a conversation engine, our last task is to build the business logic to provide responses to the combination of users’ context, and their input intents and entities.

The process of building voice and chat bots is very similar across the different tools. Many approaches leave the NLU and conversation engine piece in the cloud and only call into your business logic as necessary. In my book, Practical Bot Development: Designing and Building Bots with Node.js and Microsoft Bot Framework, I make the explicit choice of using Microsoft’s Bot Framework, one of more flexible options in the marker that my team has used across more than a dozen production bots. Microsoft’s approach allows developers the flexibility to implement their own conversation engine logic, and thus, is a great teaching tool. In the book, we make the journey from developing simple bots connected to Facebook Messenger to powering a Twilio phone conversation or Alexa skill using the same technology. We integrate with Google’s OAuth and connect a chat bot to Google’s Calendar API. We discuss the ins and outs of NLU using LUIS, Adaptive Cards, dynamic graphics generation, human handover, bot analytics and many other topics. The goal is to excite and equip developers with the skills to build fun and impactful conversational experiences!

This is where it gets interesting; once we have the skills to build conversational experiences, what then? The truth is that this is still a new space and we are learning about what it takes to build a truly engaging chat bot or voice skill. So much that the technology to build these experiences is evolving at a breakneck pace. Although frustrating when writing a book, this should excite you! We know so little about this new way of interacting that the platforms are constantly improving ways in which we communicate with users. The space needs innovators and forward-looking developers willing to showcase new and experimental applications that make users’ lives easier. There is no better time to jump into this space that now. Join us!

Posted by Szymon in Bots