Month: October 2018

Introducing Luis Version Tools

Microsoft’s LUIS is a super cool NLU (Natural Language Understanding) platform that our teams at Insight have been using on projects for over two years since it was in preview and supported a maximum of 10 intents per application. Since then, LUIS has come a really long way in terms of features, performance, automation and governance. Last May, at Build, Microsoft announced the Bot Builder Tools, a set of mostly Node based scripts that allow for easy scripting of everything from the Azure Bot Service, QnA Maker and LUIS as well as provide facilities to help author LUIS models (ludown) or provide a dispatch LUIS model fronting a set of child LUIS models (dispatch). Ludown quickly became one of my favorite tools and I was fortunate enough to help the team by reporting bugs and requesting feature enhancements. This work inspired much of what I present below.

The Problems with Authoring LUIS

LUIS allows users to create Applications. An Application can have one or more Versions. An Application has two deployment slots: staging and production. A Version is a collection of intents and entities. It can be trained and published into any deployment slot. Version A can be published to the production slot for production apps to utilize, and Version B can be published to the staging slot for development/test application to use. Any Version can be exported into JSON.

There are a number of issues that accumulate:

  • The LUIS JSON is very verbose.
  • There is no audit log; one cannot tell who changed what and when.
  • There is no easy way to tell the difference between two versions.
  • There is no clear direction on when to create a new version or any version naming conventions.
  • An Application having two slots is limiting. For many apps, there are more than two environments. In these cases, the two-slot model fails.

Microsoft provides the Authoring API (also accessible via the luis-apis package). The functionality provided by the API is the beginning of how we solve these problems.

Introducing luis-version

I created two tools to help fill the gap and provide easy automation for model and version management right within source control. The tools are based on ludown and luis-apis. We assume that the entire contents of a LUIS app are managed within a ludown file, which we call model.lu by default. The first tool is luis-version.

The goal of luis-version is to generate a new LUIS app version name if the contents of model.lu have changed since the last version. This is tracked via a file called .luis-app-version that sits within source control. The version name is the current UTC date formatted as YYMMDDHHmm. A new version generated right now, would be named 1810271732. This might seem cryptic, but LUIS limits version names to 10 characters so that’s what we ended up with for now. If a second developer runs luis-version with the same model.lu file and the same .luis-app-version in their working directory, luis-version will understand that the hash is the same, therefore the version name is reused.

After determining the version name, luis-version ensures that the version exists. If it does not, it creates it. It runs model.lu through ludown to generate the LUIS JSON and calls luis import version. You can pass an optional --publish flag to ensure that the new version is immediately published.

luis-version obeys the rules of .luisrc as documented here. In addition, you can pass a different luisrc file by using the --luisrc parameter. For example: luis-version --model model.lu --luisrc .luisrc.prod --publish.

And one can add the --verbose flag to see exactly what the utility is doing underneath.

Sample Walkthrough

Let’s say we have a LUIS model we want to manage in source control. We are familiar with the ludown format and use this sample to get started. We will call this file model.lu. Second, we create three new .luisrc files. The contents of each one are similar except for the appId. These are the three LUIS apps representing the dev, test and production environments. In my case, the authoringKey is the same for all three environments. I left the real appIds in here, though I’ve gotten rid of the applications since.


.luisrc
{
  "authoringKey": "",
  "region": "westus",
  "appId": "c83b0094-8c19-4d73-bd91-689d91ccfd8c",
  "endpointBasePath": "https://westus.api.cognitive.microsoft.com/luis/api/v2.0"
}

.luisrc.test
{
  "authoringKey": "",
  "region": "westus",
  "appId": "8650ff9f-6be5-4b28-87ad-a61053d8dbdc",
  "endpointBasePath": "https://westus.api.cognitive.microsoft.com/luis/api/v2.0"
}

.luisrc.prod
{
  "authoringKey": "",
  "region": "westus",
  "appId": "d920f225-6a80-4857-bb11-baf73416ec96",
  "endpointBasePath": "https://westus.api.cognitive.microsoft.com/luis/api/v2.0"
}

I can now run the following command to deploy the latest model data into the development LUIS app.

luis-version --model model.lu --luisrc .luisrc --publish

Getting app id c83b0094-8c19-4d73-bd91-689d91ccfd8c...
Calculating hash...
Hash 59c0db24 and tag 1810271806 generated
Checking if version 1810271806 exists...
Version 1810271806 doesn't exist. Continuing...
Running ludown.
Importing version 1810271806...
Version 1810271806 imported.
Training 1810271806...
Done training ...
Publishing...
All done.

Note that the script created a file called .luis-app-version that contains the latest hash/version name based on the model.lu content. In this case, the file contents match the output hash/version.


{
  "tag": "1810271806",
  "hash": "59c0db24"
}

If we look at our dev LUIS application, we will note that we have created a new version that is published into the production slot. We can easily deploy the same to test and prod using.

luis-version --model model.lu --luisrc.test .luisrc.test --publish
luis-version --model model.lu --luisrc.prod .luisrc.test --publish

If the version already exists, the version is simply retrained and published if the flag is passed. If the .luis-app-version file exists with the same hash, the old version name is used as shown by this output.

Getting app id c83b0094-8c19-4d73-bd91-689d91ccfd8c...
Calculating hash...
Hash 59c0db24 and tag 1810271809 generated
Found old version with hash 59c0db24. Using version 1810271806
Checking if version 1810271806 exists...
Version 1810271806 exists...
Version exists. Not updating...
Training 1810271806...
Done training ...
Publishing...
All done.

If we were now to modify the model.lu file and run the same command, the script will create a new version and publish.

Getting app id c83b0094-8c19-4d73-bd91-689d91ccfd8c...
Calculating hash...
Hash 5bc3e294 and tag 1810271809 generated
Checking if version 1810271809 exists...
Version 1810271809 doesn't exist. Continuing...
Running ludown.
Importing version 1810271809...
Version 1810271809 imported.
Training 1810271809...
Done training ...
Publishing...
All done.

Running luis list versions should result in three versions.


[
  {
    "version": "1.0",
    "createdDateTime": "2018-10-27T14:22:51.000Z",
    "lastModifiedDateTime": "2018-10-27T14:22:51.000Z",
    "lastTrainedDateTime": null,
    "lastPublishedDateTime": null,
    "endpointUrl": null,
    "assignedEndpointKey": null,
    "externalApiKeys": null,
    "intentsCount": 1,
    "entitiesCount": 0,
    "endpointHitsCount": 0,
    "trainingStatus": "NeedsTraining"
  },
  {
    "version": "1810271806",
    "createdDateTime": "2018-10-27T18:06:18.000Z",
    "lastModifiedDateTime": "2018-10-27T18:06:30.000Z",
    "lastTrainedDateTime": "2018-10-27T18:08:59.000Z",
    "lastPublishedDateTime": "2018-10-27T18:09:09.000Z",
    "endpointUrl": null,
    "assignedEndpointKey": null,
    "externalApiKeys": null,
    "intentsCount": 7,
    "entitiesCount": 3,
    "endpointHitsCount": 0,
    "trainingStatus": "Trained"
  },
  {
    "version": "1810271809",
    "createdDateTime": "2018-10-27T18:09:51.000Z",
    "lastModifiedDateTime": "2018-10-27T18:10:13.000Z",
    "lastTrainedDateTime": "2018-10-27T18:10:08.000Z",
    "lastPublishedDateTime": "2018-10-27T18:10:19.000Z",
    "endpointUrl": null,
    "assignedEndpointKey": null,
    "externalApiKeys": null,
    "intentsCount": 7,
    "entitiesCount": 3,
    "endpointHitsCount": 0,
    "trainingStatus": "Trained"
  }
]

Supporting Manual or Web App Editing

Not all users will be happy supporting editing the model using the ludown file. Some team members might still want to use the web app UI to iterate, test and make sure the model is working correctly. That is fine. The second tool in the luis-version-tools NPM package is luis-lu-export. This script downloads the latest model and writes is to the destination file of choice. For example, we can run the following command to get the latest online version.

luis-lu-export --luisrc .luisrc --model model.lu --version 1810271809

Any edits made online will be applied to model.lu. At this point, before checking into source control, we can run luis-version --luisrc .luisrc --model model.lu --publish to ensure the .luis-app-version file is re-generated based on the latest model content and a new version is created. At this point, we can check all changes into source control.

In my experience, this manual editing of the model using the web app should only be allowed in the development version of the model. Test, QA, Prod, Integration, and all other environments should be generated directly from a ludown file.

What’s Next?

The scripts are in a state where they can integrated into dev ops pipelines. Go ahead and submit feature requests/bug reports on GitHub. I’m very interested in how developers may end up using the tools and feedback on the approach. NPM package details here.

Posted by Szymon in LUIS Version Tools

Multi Language Chat Bot Suggested Architecture

Natural conversations, by their very nature, allow for the flexibility of switching language mid-conversation. In fact, for multi-lingual individuals such as my brothers and me, switching between various languages allows us to emphasize certain concepts without explicitly stating so. We generally speak in Polish (English if our wives are present), English to fill in words we don’t know in Polish and Spanish to provide emphasis or a callback to something that happened in our childhood growing up in Puerto Rico. Chat bots, in their current state without Artificial General Intelligence, does not allow for the nuance of language choice. However, given the state of language recognition and machine translation, we can implement a somewhat intelligent multilingual chat bot. In fact, I design and develop the code for an automated approach in my book. In this post, I outline the general automatic approach below. Afterwards, I highlight the downsides of this approach and list the different problems that need to be solved when creating a production quality multi language chat bot experience.

A Naive Approach

I call the fully automated approach naive. This is the type of approach most projects start off with. It’s somewhat easy to put in place and moves the project into the multi lingual realm quite quickly. It comes with its set of challenges. Before I dive into those, let’s review the approach. Assume we have a working English natural language model and English content, the bot can implement multi lingual conversations as follows.

  1. Receive user input…
  2. … in their native language.
  3. Detect the user input language and store in user’s preferences.
  4. If incoming message is not English, translate into English.
  5. Send English user utterance to NLU platform.
  6. Execute logic and render English output.
  7. If user’s language was not English, translate output into user’s native language.
  8. Send response back to user.

This approach works but the conversation quality is off. Although machine translation has improved by leaps and bounds, there are still cases in which the conversation feels stiff and culturally disconnected. There are three areas where this approach suffers.

  • Input utterance cultural nuances: utterance translation can sometimes feel awkward, especially for heavy slang or for highly proprietary language. NLU model performance suffer as a result.
  • Ambiguous language utterance affect conversation flow: a word like no or mama can easily turn conversation into another language. For example, in some language detection engines, the word no gets consistently classified as Spanish. If the bot were to ask a yes/no question, answering no will trigger a response in Spanish.
  • Output translation branding quality: although automatic machine translation is a good start, companies and brands that want fine tuned control over their bot’s output will cringe at the output generated by the machine translation service.

Moving to a Hybrid Managed Approach

I address each issue separately. The answer to these problems vary based on risk aversion, content quality and available resources. I highlight options for each item as we progress through the items.

Multi Language NLU

Ideally, I like my chat bot solutions to have an NLU model for each supported language. Obviously, the cost of creating and maintaining these models can be significant. For multi language solutions, I always ask for the highest priority languages that a client would like to support. If an enterprise can support 90% of employees by getting two languages working well, then we can limit the NLU scope to those two languages, while using the automatic approach for any other languages. In many of my projects, I use Microsoft’s LUIS. I might create one model for English and another one for Simplified Chinese. That way, Chinese users don’t suffer the nuanced translation tax. Project stakeholders also need to decide whether the chat bot should support an arbitrary amount of languages or limit the valid inputs to languages with an NLU model. If it does, the automatic approach above will be applied to non-natively supported languages.

Ambiguous Language Detection

The issue with ambiguous language detection is that short utterances may be valid utterances in multiple languages. Further complicating the matter is that the translation APIs such as Microsoft and Google’s do not return options and confidence levels. There are numerous approaches in terms of resolving the ambiguous language problem. Two possible approaches are (1) run a concatenation of the last N user utterances through the language recognition engine, or, (2) maintain a list of ambiguous words that we ignore for language detection and use the user’s last utterance language instead. Both are different flavors of simply considering the user’s language preference as a conversation level rather than message level property. If we are interested in supporting switching between languages mid conversation, a mix of both approaches works well.

Output Content Translation

Similarly to the Multi Language NLU piece, I encourage clients to maintain the precise localized content sent by the chat bot, especially for public consumer or regulated industry use cases where any mistranslated content might result in either pain for a brand or fines. This, again, is a risk versus effort calculation that needs to be performed by the right stakeholders. The necessity of controlling localized content and the effort involved in it typically weighs on whether the bot supports arbitrary languages or not.

Final Architecture

Based on all the above, here is what a true approach to a multi lingual chat bot experience would look like.

The bot in this case:

  1. Receives user input…
  2. … in their native language.
  3. Detects the user input language and store in user’s preferences. Language detection is based both on an API but also on utterance ambiguity rules.
  4. Depending on the detected language…
    1. If we have an NLU model for the detected language, the bot queries that NLU model.
    2. If not, assuming we want to support all languages, the bot translates the user’s messages into English and uses the English NLU model to resolve intent. Assuming we want to support a closed set of languages, the bot may response with a not recognized kind of message.
  5. Executes the chat bot logic and render localized output.
  6. If user’s language was not English and our bot support arbitrary languages, the bot automatically translates the output into user’s native language.
  7. Sends response back to user.

The managed models and paths to automatic translation add nuance to the automatic approach. If we imagine a spectrum in which on one end we find the fully automatic approach and on the other end the fully managed approach, all implementations fall somewhere within this spectrum. Clients in regulated industries and heavily branded scenarios will lean towards the fully managed end and clients with internal or less precise use cases will typically find the automatic approach more effective and economical.

The hybrid managed/automatic implementation does take some effort but results in the best conversational experience. Let me know your experience!

 

Posted by Szymon in Bots

Alexa Gadget Skill API: Let’s Make a Game

In this post of the Alexa Gadget Skill API series, we create a real game for Alexa and Echo Buttons. I figured that I would create a game for my 18 month old to play with. I figure that he can tell the difference between lit up and unlit buttons. With this in mind, I create a game I called Whack-a-button. The game will randomly light up anywhere from 1-2 buttons at a time. The object is to press the lit buttons. Every time you press the right button you get one point. If you press an unlit button, you lose a point.

In the first two parts of this series we explored input handlers and setting light animations on the gadgets. We will pick up the work done in those two posts to create the Whack-a-button game.

Setting the Scene

It took me some time to build the code for this post. Getting something basic up and going was simple. Even now, I feel there are some rough edges to this. In particular, I had some problems with the input event handlers and the data I was receiving from them. This is due to one of two factors: either there is a bug in the simulator version of the input handlers or my input handler JSON is buggy. I’ll give more details when we get to that part.

My goal with this post was to create a framework in which the user can ask to play a specific game, but there is one skill that support all of these games. I assumed that we would have a game object in the session attributes that would store the type of game and the current state. Each game would have a different state representation. When a game is started, all input calls are delegated to that game’s code. The code decides whether what is to be done given an internal state and it decides when the game ends. At that point, we can kick it back to some menu for deciding the game. For the purpose of this post, I have only implemented one game but the pattern works for multiple games. Here’s a classy Visio diagram of the overall approach.

  1. The game starts with the Launch state.
  2. In this state we ask the user if they would like to play a game or ask them which game they want to play if we support more than one game.
  3. The user responds with a game selection.
  4. The internal state of our skill moves into the in game state…
  5. And we initialize the game. In this case, it is just Whack-a-button. The diagram illustrates the interface we expect each game to implement. We call the interface IGameTurn, because we create a new instance at each user input.
  6. The game delegates to theRollCall functionality first, as we need to make sure that the buttons are correctly identified before the game starts.
  7. The RollCall, sends its input handler…
  8. The user pressed the buttons and RollCall finished…
  9. And passes control back into the game by using the resumeAfterRollCall() call.
  10. The game initializes itself and sends the first input handler to the user. In our sample code, this will be a confirmation to press any button to get started.
  11. At this point, any input event should be delegated over to the game handle() method. We also assume that any AMAZON.HelpIntent or AMAZON.CancelIntent will be handled by the game’s help() or cancel() methods.
  12. The game responds to incoming events as long as it lasts.
  13. The game transitions to a PostGameState in which the user can restart the game or ask for their score.
  14. The user can exit the skill or restart the game.

Show Us The Code!

There is a lot of new code in this post, and I’m going to do my best to walk through it. As always, feel free to skip ahead and jump into the Github repo yourself.

At the center of everything is the IGameTurn interface. Each game must implement this functionality.


export interface IGameTurn {
    initialize(): Response;
    handle(): Response;
    help(): Response;
    cancel(): Response;
    postGameSummary(): Response;
    resumeAfterRollCall(): Response;
}

When the game is first created, we call initialize(). Initialize should invoke the RollCall functionality. Once RollCall is done, the resumeAfterRollCall() call is made. We begin in the InLaunchStateHandler. If the user responds with AMAZON.YesIntent to playing the game, we call:


if (req.intent.name === "AMAZON.YesIntent") {
    const game = new WhackabuttonGame(handlerInput);
    return game.initialize();
}

initialize() is defined as:


public initialize(): Response {
    const game = new GameState();
    game.currentGame = GameType.WhackaButton;
    game.data = new WhackState();

    GameHelpers.setState(this.handlerInput, game);
    return RollCall.initialize(this.handlerInput, WHACKABUTTON_NUM_OF_BUTTONS);
}

GameState is defined as follows. Note that for each method, it resolves the right IGameTurn instance based on the selected game type.


export class GameState {
    public currentGame: GameType;
    public data: any;

    public static deleteState(handlerInput: HandlerInput): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        delete sessionAttr.game;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static setInLaunchState(handlerInput: HandlerInput, val: boolean): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inLaunch = val;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static setInPostGame(handlerInput: HandlerInput, val: boolean): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inPostGame = val;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static getGameState(handlerInput: HandlerInput): GameState {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const game = sessionAttr.game;
        return new GameState(game);
    }

    constructor(obj?: GameState) {
        this.currentGame = GameType.None;
        if (obj) {
            this.currentGame = obj.currentGame;
            this.data = obj.data;
        }
    }

    public reinit(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.initialize();
    }

    public resumeGameFromRollcall(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.resumeAfterRollCall();
    }

    public cancel(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.cancel();
    }

    public help(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.help();
    }

    public handleInput(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.handle();
    }

    private resolveGameTurn(handlerInput: HandlerInput): IGameTurn {
        switch (this.currentGame) {
            case GameType.WhackaButton:
                return new WhackabuttonGame(handlerInput);
            default:
                throw new Error("Unsupported game type.");
        }
    }

}

export enum GameType {
    None,
    WhackaButton
}

At this point, RollCall takes over. Any request from the user hits the RollCallHandler. We change the RollCall‘s handleDone() method to the following:


const gameState = GameState.getGameState(handlerInput);
handlerInput.responseBuilder
    .addDirective(blackOutUnusedButtons)
    .addDirective(lightUpSelectedButtons);
return gameState.resumeGameFromRollcall(handlerInput);

For the Whack-a-button game, the resumeAfterRollCall() method looks as follows:


public resumeAfterRollCall(): Response {
    const gameState = GameHelpers.getState(this.handlerInput, new WhackState());
    const whackState = gameState.data;
    whackState.waitingOnConfirmation = true;
    whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);
    GameHelpers.setState(this.handlerInput, gameState);

    const confirmationInputHandler = this.generateConfirmationHandler(GameHelpers.getAvailableButtons(this.handlerInput));

    const resp = LocalizedStrings.whack_start();
    this.handlerInput.responseBuilder
        .speak(resp.speech)
        .reprompt(resp.reprompt)
        .addDirective(confirmationInputHandler);
    return this.handlerInput.responseBuilder.getResponse();
}

We initialize a new game state, set some Whack-a-button specific state and ask the user to confirm when they are ready to start. The confirmation occurs by having the user press any of the selected buttons. That is the input handler that the this.generateConfirmationHandler(...) method generates.

At this point, control will flow into the InGameHandler. If there is a game object set and we receive either an InputHandlerEvent, a AMAZON.StopIntent, AMAZON.CancelIntent or AMAZON.HelpInent, we delegate the action to the current game. Here is the code for the handler.


export class InGameHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const result = !sessionAttr.inPostGame &&
            !sessionAttr.inRollcall &&
            sessionAttr.game &&
            (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent"
                || handlerInput.requestEnvelope.request.type === "IntentRequest");

        console.log(`InGameHandler: ${result}`);
        return result;
    }

    handle(handlerInput: HandlerInput): Response {
        console.log("executing in game state handler");
        const gameState = GameState.getGameState(handlerInput);
        if (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent") {
            return gameState.handleInput(handlerInput);
        } else if (handlerInput.requestEnvelope.request.type === "IntentRequest") {
            const intent = handlerInput.requestEnvelope.request.intent;

            if (intent.name === "AMAZON.CancelIntent" || intent.name === "AMAZON.StopIntent") {
                return gameState.cancel(handlerInput);
            } else if (intent.name === "AMAZON.HelpIntent") {
                return gameState.help(handlerInput);
            } else if (intent.name === "AMAZON.StopIntent") {
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .withShouldEndSession(true)
                    .getResponse();
            } else {
                // empty response for anything else  that comes in during game play
                return handlerInput.responseBuilder.getResponse();
            }
        }
        throw new Error("Unexpected event type. Not supported in roll call.");

    }
}

Now For the Real Stuff

When an event comes in, it’ll be an indication for our game to begin.

  1. If we game has been going for longer than GAME_DURATION_SECONDS, we finish by responding with the user’s score.
  2. We begin a turn by randomly select some buttons that we want the user to select.
  3. We set buttons the user should press to a random color, not black.
  4. Buttons the user should not press have their color set to black.
  5. We generate a new input handler with a timeout between MIN_TIME_TO_PRESS and MAX_TIME_TO_PRESS.
  6. If the user presses a black button, we deduct the score and indicate they did something wrong.
  7. If the user presses a button she was supposed to press, we increase the score. If there are buttons left, we wait for those buttons to be pressed, otherwise we go back to step 1 for a new turn.

Selecting a random set of buttons and preparing the input handlers looks as follows:


// we select buttons randomly for the next turn
const shuffle = Utilities.shuffle(btns.slice(0));
const num = Utilities.randInt(1, shuffle.length);
console.log(`generating input handler with ${num} buttons.`);

const buttonsInPlay = shuffle.slice(0, num);
const buttonsNotInPlay = btns.filter(p => !buttonsInPlay.some(p1 => p1 === p));
console.log(`${buttonsInPlay.length} buttons in play for next turn: ${JSON.stringify(buttonsInPlay)}. ` +
        `Not in play: ${JSON.stringify(buttonsNotInPlay)}`);

// assign a random time duration to the turn, but make sure we don't go past the max game duration
const timeTilEnd = whackState.timeInMsUntilEnd();
console.log(`${timeTilEnd}ms left until end`);
const turnDuration = Math.min(Utilities.randInt(MIN_TIME_TO_PRESS, MAX_TIME_TO_PRESS), timeTilEnd);
whackState.expectedEvents = buttonsInPlay;
whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);
whackState.lastHandlerStartTime = moment().utc().format(Utilities.DT_FORMAT);
whackState.lastHandlerLength = turnDuration;

// generate the input handler
const startHandler = this.generateInputHandlerTemplate(btns, turnDuration);

// turn off buttons not assigned to this turn and turn on buttons assigned to the turn
const turnOffEverything = SetLightDirectiveBuilder.setLight(
    SkillAnimations.rollCallFinishedUnused(), buttonsNotInPlay.map(p => p.gadgetId));
const setLight = SetLightDirectiveBuilder.setLight(
    SkillAnimations.lightUpWhackaButton(turnDuration), buttonsInPlay.map(p => p.gadgetId));

 

I struggled with the right way to model the input handlers and the complexity of the code probably increased as a function of this; I blame myself and not fully understanding the rules of how Alexa reports events. My first approach was to create one input handler for the entirety of the game, but this would not work well with the MAX_TIME_TO_PRESS concept; I want there to be a time pressure involved. I could also not use the input handler’s shouldEndInputHandler functionality; if the current turn requires more than one button to be pressed, the same handler should be able to generate the two events. If I had one handler that looked for button down events anchored to anywhere and reported the matches, the reported match would always be the first match. Why does this matter? Well, I want to see the latest event and its timestamp so I can make sure I verify if I handled that. If I used the input handler below, any time I pressed the button once I would receive two calls into my endpoint and the timestamp on the input event would be the same. Here is the input handler directive (gadgetId set to something easier to read).


{
    "type": "GameEngine.StartInputHandler",
    "proxies": [],
    "recognizers": {
        "btn1": {
            "type": "match",
            "anchor": "anywhere",
            "fuzzy": false,
            "pattern": [
                {
                    "action": "down",
                    "gadgetIds": [
                        "A"
                    ]
                }
            ]
        },
        "btn2": {
            "type": "match",
            "anchor": "anywhere",
            "fuzzy": false,
            "pattern": [
                {
                    "action": "down",
                    "gadgetIds": [
                        "B"
                    ]
                }
            ]
        }
    },
    "events": {
        "failed": {
            "meets": [
                "timed out"
            ],
            "reports": "history",
            "shouldEndInputHandler": true
        },
        "btn1": {
            "shouldEndInputHandler": false,
            "meets": [
                "btn1"
            ],
            "reports": "matches"
        },
        "btn2": {
            "shouldEndInputHandler": false,
            "meets": [
                "btn2"
            ],
            "reports": "matches"
        }
    },
    "timeout": 7708
}

And the two requests sent to my skill.


{
    "type": "GameEngine.InputHandlerEvent",
    "requestId": "amzn1.echo-api.request.ee60ad56-56a0-4b73-b4f5-48a7bee715b7",
    "timestamp": "2018-10-11T15:39:52Z",
    "locale": "en-US",
    "originatingRequestId": "amzn1.echo-api.request.a0b25097-030e-465c-9454-0c0e1caa0386",
    "events": [
        {
            "name": "btn1",
            "inputEvents": [
                {
                    "gadgetId": "A",
                    "timestamp": "2018-10-11T15:39:52.324Z",
                    "color": "000000",
                    "feature": "press",
                    "action": "down"
                }
            ]
        }
    ]
}

{
    "type": "GameEngine.InputHandlerEvent",
    "requestId": "amzn1.echo-api.request.8c4ab5ab-8580-4c7b-994e-598c35e192c5",
    "timestamp": "2018-10-11T15:39:52Z",
    "locale": "en-US",
    "originatingRequestId": "amzn1.echo-api.request.a0b25097-030e-465c-9454-0c0e1caa0386",
    "events": [
        {
            "name": "btn1",
            "inputEvents": [
                {
                    "gadgetId": "A",
                    "timestamp": "2018-10-11T15:39:52.324Z",
                    "color": "000000",
                    "feature": "press",
                    "action": "down"
                }
            ]
        }
    ]
}

 

Note everything is the same EXCEPT the originatingRequestId. So it sounds like I need to start tracking the timestamp of the latest event. It is not enough to use the request’s timestamp, since it doesn’t provide millisecond resolution. One could easily generate two real buttons presses within a second of each other. So… I decided I’ll track the latest input event timestamp and will only consider event if their input event is after my latest event timestamp. BUT, I also need to send a new input handler directive anytime an event comes in, because of then fact that matches reports the first input event only.

Ok enough cryptic text. Let’s see the code. Here is the code that selects the relevant events and the latest timestamp.


export function getEventsAndMaxTimeSince(
    events: services.gameEngine.InputHandlerEvent[],
    lastEvent: moment.Moment,
    timeoutEventName: string)
    : { maxTime: moment.Moment, events: Array } {
    if (events.some(p => p.name! === timeoutEventName)) {
        return { maxTime: moment.utc(lastEvent), events: [timeoutEventName] };
    }
    const mapped = events
        .map(p => {
            const temp = p.inputEvents!.map(p1 => moment(p1.timestamp!).utc().valueOf());
            const max = moment.utc(Math.max.apply({}, temp));
            const diff = max.diff(lastEvent, "ms");
            console.log(`temp: ${JSON.stringify(temp)}`);
            console.log(`max: ${max.format(Utilities.DT_FORMAT)}`);
            return { max: max.valueOf(), maxMoment: max, diff: diff, name: p.name! };
        });

    console.log(`Mapping events last update${lastEvent.format(Utilities.DT_FORMAT)}: \n${JSON.stringify(mapped, null, 2)}`);
    const filtered = mapped.filter(p => p.diff > 0);
    let globalMax = Math.max.apply({}, filtered.map(p => p.max));
    if (!globalMax || isNaN(globalMax) || !isFinite(globalMax)) {
        console.log(`setting global max to ${lastEvent.valueOf()}`);
        globalMax = lastEvent.valueOf();
    }
    const resultGlobalMax = moment.utc(globalMax);
    console.log(`GLOBAL MAX ${resultGlobalMax.format(Utilities.DT_FORMAT)}`);

    const array = filtered.map(p => p.name);
    const result = { maxTime: resultGlobalMax, events: array };
    console.log(`returning result\n${JSON.stringify(result)}`);
    return result;
}

We get the constituent input event timestamps, select the maximum value, select the events whose maximum value is after the current latest value and then return those event names and the new maximum timestamp. If the event is a timeout event, we simply return as we have to generate a new turn.

Once we have the relevant events handy, we increase the score if we get an expected event, otherwise we increase the bad count.


private processRelevantEvents(relevantEvents: string[], whackState: WhackState): { good: string[], bad: string[] } {
    console.log(`received events ${JSON.stringify(relevantEvents)}`);
    const result: { good: string[], bad: string[] } = {
        good: [],
        bad: []
    };

    relevantEvents.forEach(evName => {
        // check if we are expecting this event
        const index = whackState.expectedEvents.findIndex(val => val.name === evName);
        if (index > -1) {
            // if we are, great. increase score and remove event from expected list.
            console.log(`increasing good`);
            result.good.push(whackState.expectedEvents[index].gadgetId);
            whackState.good++;
            whackState.expectedEvents.splice(index, 1);
        } else {
            // otherwise, increase bad count.
            console.log(`increasing bad.`);
            console.log(`still expecting number of buttons ${whackState.expectedEvents.length}`);
            result.bad.push(evName);
            whackState.bad++;
        }
    });

    return result;
}

If a the user has any buttons left, we simply turn off any good buttons that were pressed, and we add a voice response if there were any bad buttons pressed.


let rb = this.handlerInput.responseBuilder;
if (hasBad) {
    rb.speak(LocalizedStrings.whack_bad_answer().speech);
}

// need to turn off all good pressed buttons
if (goodPressedButtons.length > 0) {
    rb = rb.addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallFinishedUnused(), goodPressedButtons));
}
return rb.getResponse();

Deeper and Deeper

Another effect of the issue with the input handler we presented below, is that the code above needs to generate a new input handler. The entire method looks as follows:


private buttonsOutstanding(
    whackState: WhackState,
    hasBad: boolean,
    goodPressedButtons: string[],
    btns: GameButton[]): Response
{
    console.log(`responding with acknowledgment and new handler; more buttons remaining`);

    const now = moment.utc();
    const turnDuration = whackState.lastHandlerLength - (now.diff(whackState.lastHandlerStartTime, "ms"));
    whackState.lastHandlerStartTime = now.format(Utilities.DT_FORMAT);
    whackState.lastHandlerLength = turnDuration;
    whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);

    const startHandler = this.generateInputHandlerTemplate(btns, turnDuration);
    let rb = this.handlerInput.responseBuilder.addDirective(startHandler);
    if (hasBad) {
        rb.speak(LocalizedStrings.whack_bad_answer().speech);
    }

    // need to turn off all good pressed buttons
    if (goodPressedButtons.length > 0) {
        rb.addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallFinishedUnused(), goodPressedButtons));
    }
    return rb.getResponse();
}

Amazon recommends that the skill ensures that the input event requests are coming from the right originatingRequestId, since requests might come in late. The code that does this utilizes the lastHandlerIds property on the WhackState. The reason we use a list instead of one value is that if we press button 1 and button 2 one right after the other, the handler for button 1 would send a new input handler and reset the lastHandlerId, rendering the event from button 2 as junk. So we store the last two handlerIds


let ev = inputHandlerEvent;
if (!whackState.lastHandlerIds.some(p => p === ev.originatingRequestId)) {
    console.warn(`SKIPPING MESSAGE.\nLAST HANDLER IDs: \n${JSON.stringify(whackState.lastHandlerIds, null, 2)}`
        + `\nORIGINATING REQUEST ID: ${ev.originatingRequestId}`);
    return this.handlerInput.responseBuilder.getResponse();
}

For completeness, this is what the WhackState type looks like.


class WhackState {
    public startTime: string | undefined;
    public good: number = 0;
    public bad: number = 0;
    public turn: number = 0;
    public waitingOnConfirmation: boolean = false;

    public expectedEvents: Array = [];
    public lastEventTime: string | undefined;
    public lastHandlerIds: Array = [];
    public lastHandlerStartTime: string | undefined;
    public lastHandlerLength: number = 0;

    public initGame(): void {
        console.log(`initializing game. start time ${moment.utc(this.startTime).format(Utilities.DT_FORMAT)}`);

        this.waitingOnConfirmation = false;
        this.expectedEvents = [];
        this.bad = 0;
        this.good = 0;
        this.startTime = moment.utc().format(Utilities.DT_FORMAT);
        this.lastEventTime = this.startTime;
    }

    public pushAndTrimHandler(reqId: string): void {
        this.lastHandlerIds.push(reqId);
        while (this.lastHandlerIds.length > WHACKABUTTON_NUM_OF_BUTTONS + 2) {
            this.lastHandlerIds.shift();
        }
    }

    public timeInMsUntilEnd(): number {
        const now = moment.utc();
        const start = moment.utc(this.startTime);
        const end = start.add(GAME_DURATION_SECONDS, "s");
        const diff = end.diff(now, "ms");
        return diff;
    }


    public timeSinceStarted(): number {
        const now = moment.utc();
        const start = moment.utc(this.startTime);
        const diff = now.diff(start, "s");
        console.log(`it has been ${diff} seconds since the game started.`);
        return diff;
    }

}

Wrapping The Game Up

What happens when the game is done? We check for the time elapsed anytime user input or a time out request comes in. If the game has lasted long enough, we send the result, transition to the InLaunchStateHandler and ask the user if they want to play again.


private finish(handlerInput: HandlerInput, finish: boolean): Response {
    const whackState = GameHelpers.getState(handlerInput, new WhackState()).data;
    GameState.setInPostGame(handlerInput, true);

    let resp = LocalizedStrings.whack_summary({
        score: whackState.good - whackState.bad,
        good: whackState.good,
        bad: whackState.bad
    });
    if (finish) {
        resp = LocalizedStrings.whack_finish({
            score: whackState.good - whackState.bad,
            good: whackState.good,
            bad: whackState.bad
        });
    }

    const turnOffEverything = SetLightDirectiveBuilder.setLight(
        SkillAnimations.rollCallFinishedUnused());

    return handlerInput.responseBuilder
        .speak(resp.speech)
        .reprompt(resp.reprompt)
        .addDirective(turnOffEverything)
        .getResponse();
}

At this point the user can either restart the game, ask for their score (added a ScoreIntent to support this) or exit out. The PostGameStateHandler is implements this logic.


export class PostGameStateHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const issupportedintent = handlerInput.requestEnvelope.request.type === "IntentRequest"
            && ["AMAZON.YesIntent",
                "AMAZON.NoIntent",
                "StartGameIntent",
                "ScoreIntent"]
                .some(p => p === (handlerInput.requestEnvelope.request).intent.name);
        return sessionAttr.inPostGame && issupportedintent;
    }

    handle(handlerInput: HandlerInput): Response {
        console.log("executing in post game state handler");

        if (handlerInput.requestEnvelope.request.type === "IntentRequest") {
            const req = handlerInput.requestEnvelope.request as IntentRequest;
            if (req.intent.name === "AMAZON.YesIntent" || req.intent.name === "StartGameIntent") {
                GameState.deleteState(handlerInput);
                const game = new WhackabuttonGame(handlerInput);
                GameState.setInPostGame(handlerInput, false);
                return game.initialize();
            } else if (req.intent.name === "AMAZON.NoIntent") {
                GameState.deleteState(handlerInput);
                GameState.setInPostGame(handlerInput, false);
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .getResponse();
            } else if(req.intent.name === "ScoreIntent") {
                return new WhackabuttonGame(handlerInput).postGameSummary();
            }
        }

        const donotresp = LocalizedStrings.donotunderstand();
        return handlerInput.responseBuilder
            .speak(donotresp.speech)
            .reprompt(donotresp.reprompt)
            .getResponse();
    }
}

How Did It Go?

Building this was a lot of fun but the development process was much more complicated than I expected. The number of events and semantics of the requests that Alexa sends are rather confusing, so there is a bit of a learning curve. The simulator isn’t great at helping debug some of this as the timeout and button presses do not show the JSON inside the simulator, so trying to figure out bugs was an exercise in diving into CloudWatch and figuring it out. I’ve seen inconsistent animation behavior; sometimes my animations wouldn’t play at all in the buttons. Sometimes although the input events seems to show up in the simulator, they never flow into the skill, either from the simulator or the real buttons. It would have helped to have unit tests but… you know how it goes when playing with a new tech.

As an exploratory exercise this was fairly successful. Let’s see how Teddy enjoyed the game.

As always, you can find the code in the Github repo. Enjoy!

Posted by Szymon in Alexa