Alexa

Alexa Gadget Skill API: Let’s Make a Game

In this post of the Alexa Gadget Skill API series, we create a real game for Alexa and Echo Buttons. I figured that I would create a game for my 18 month old to play with. I figure that he can tell the difference between lit up and unlit buttons. With this in mind, I create a game I called Whack-a-button. The game will randomly light up anywhere from 1-2 buttons at a time. The object is to press the lit buttons. Every time you press the right button you get one point. If you press an unlit button, you lose a point.

In the first two parts of this series we explored input handlers and setting light animations on the gadgets. We will pick up the work done in those two posts to create the Whack-a-button game.

Setting the Scene

It took me some time to build the code for this post. Getting something basic up and going was simple. Even now, I feel there are some rough edges to this. In particular, I had some problems with the input event handlers and the data I was receiving from them. This is due to one of two factors: either there is a bug in the simulator version of the input handlers or my input handler JSON is buggy. I’ll give more details when we get to that part.

My goal with this post was to create a framework in which the user can ask to play a specific game, but there is one skill that support all of these games. I assumed that we would have a game object in the session attributes that would store the type of game and the current state. Each game would have a different state representation. When a game is started, all input calls are delegated to that game’s code. The code decides whether what is to be done given an internal state and it decides when the game ends. At that point, we can kick it back to some menu for deciding the game. For the purpose of this post, I have only implemented one game but the pattern works for multiple games. Here’s a classy Visio diagram of the overall approach.

  1. The game starts with the Launch state.
  2. In this state we ask the user if they would like to play a game or ask them which game they want to play if we support more than one game.
  3. The user responds with a game selection.
  4. The internal state of our skill moves into the in game state…
  5. And we initialize the game. In this case, it is just Whack-a-button. The diagram illustrates the interface we expect each game to implement. We call the interface IGameTurn, because we create a new instance at each user input.
  6. The game delegates to theRollCall functionality first, as we need to make sure that the buttons are correctly identified before the game starts.
  7. The RollCall, sends its input handler…
  8. The user pressed the buttons and RollCall finished…
  9. And passes control back into the game by using the resumeAfterRollCall() call.
  10. The game initializes itself and sends the first input handler to the user. In our sample code, this will be a confirmation to press any button to get started.
  11. At this point, any input event should be delegated over to the game handle() method. We also assume that any AMAZON.HelpIntent or AMAZON.CancelIntent will be handled by the game’s help() or cancel() methods.
  12. The game responds to incoming events as long as it lasts.
  13. The game transitions to a PostGameState in which the user can restart the game or ask for their score.
  14. The user can exit the skill or restart the game.

Show Us The Code!

There is a lot of new code in this post, and I’m going to do my best to walk through it. As always, feel free to skip ahead and jump into the Github repo yourself.

At the center of everything is the IGameTurn interface. Each game must implement this functionality.


export interface IGameTurn {
    initialize(): Response;
    handle(): Response;
    help(): Response;
    cancel(): Response;
    postGameSummary(): Response;
    resumeAfterRollCall(): Response;
}

When the game is first created, we call initialize(). Initialize should invoke the RollCall functionality. Once RollCall is done, the resumeAfterRollCall() call is made. We begin in the InLaunchStateHandler. If the user responds with AMAZON.YesIntent to playing the game, we call:


if (req.intent.name === "AMAZON.YesIntent") {
    const game = new WhackabuttonGame(handlerInput);
    return game.initialize();
}

initialize() is defined as:


public initialize(): Response {
    const game = new GameState();
    game.currentGame = GameType.WhackaButton;
    game.data = new WhackState();

    GameHelpers.setState(this.handlerInput, game);
    return RollCall.initialize(this.handlerInput, WHACKABUTTON_NUM_OF_BUTTONS);
}

GameState is defined as follows. Note that for each method, it resolves the right IGameTurn instance based on the selected game type.


export class GameState {
    public currentGame: GameType;
    public data: any;

    public static deleteState(handlerInput: HandlerInput): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        delete sessionAttr.game;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static setInLaunchState(handlerInput: HandlerInput, val: boolean): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inLaunch = val;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static setInPostGame(handlerInput: HandlerInput, val: boolean): void {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inPostGame = val;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
    }

    public static getGameState(handlerInput: HandlerInput): GameState {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const game = sessionAttr.game;
        return new GameState(game);
    }

    constructor(obj?: GameState) {
        this.currentGame = GameType.None;
        if (obj) {
            this.currentGame = obj.currentGame;
            this.data = obj.data;
        }
    }

    public reinit(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.initialize();
    }

    public resumeGameFromRollcall(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.resumeAfterRollCall();
    }

    public cancel(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.cancel();
    }

    public help(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.help();
    }

    public handleInput(handlerInput: HandlerInput): Response {
        const gameTurn = this.resolveGameTurn(handlerInput);
        return gameTurn.handle();
    }

    private resolveGameTurn(handlerInput: HandlerInput): IGameTurn {
        switch (this.currentGame) {
            case GameType.WhackaButton:
                return new WhackabuttonGame(handlerInput);
            default:
                throw new Error("Unsupported game type.");
        }
    }

}

export enum GameType {
    None,
    WhackaButton
}

At this point, RollCall takes over. Any request from the user hits the RollCallHandler. We change the RollCall‘s handleDone() method to the following:


const gameState = GameState.getGameState(handlerInput);
handlerInput.responseBuilder
    .addDirective(blackOutUnusedButtons)
    .addDirective(lightUpSelectedButtons);
return gameState.resumeGameFromRollcall(handlerInput);

For the Whack-a-button game, the resumeAfterRollCall() method looks as follows:


public resumeAfterRollCall(): Response {
    const gameState = GameHelpers.getState(this.handlerInput, new WhackState());
    const whackState = gameState.data;
    whackState.waitingOnConfirmation = true;
    whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);
    GameHelpers.setState(this.handlerInput, gameState);

    const confirmationInputHandler = this.generateConfirmationHandler(GameHelpers.getAvailableButtons(this.handlerInput));

    const resp = LocalizedStrings.whack_start();
    this.handlerInput.responseBuilder
        .speak(resp.speech)
        .reprompt(resp.reprompt)
        .addDirective(confirmationInputHandler);
    return this.handlerInput.responseBuilder.getResponse();
}

We initialize a new game state, set some Whack-a-button specific state and ask the user to confirm when they are ready to start. The confirmation occurs by having the user press any of the selected buttons. That is the input handler that the this.generateConfirmationHandler(...) method generates.

At this point, control will flow into the InGameHandler. If there is a game object set and we receive either an InputHandlerEvent, a AMAZON.StopIntent, AMAZON.CancelIntent or AMAZON.HelpInent, we delegate the action to the current game. Here is the code for the handler.


export class InGameHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const result = !sessionAttr.inPostGame &&
            !sessionAttr.inRollcall &&
            sessionAttr.game &&
            (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent"
                || handlerInput.requestEnvelope.request.type === "IntentRequest");

        console.log(`InGameHandler: ${result}`);
        return result;
    }

    handle(handlerInput: HandlerInput): Response {
        console.log("executing in game state handler");
        const gameState = GameState.getGameState(handlerInput);
        if (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent") {
            return gameState.handleInput(handlerInput);
        } else if (handlerInput.requestEnvelope.request.type === "IntentRequest") {
            const intent = handlerInput.requestEnvelope.request.intent;

            if (intent.name === "AMAZON.CancelIntent" || intent.name === "AMAZON.StopIntent") {
                return gameState.cancel(handlerInput);
            } else if (intent.name === "AMAZON.HelpIntent") {
                return gameState.help(handlerInput);
            } else if (intent.name === "AMAZON.StopIntent") {
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .withShouldEndSession(true)
                    .getResponse();
            } else {
                // empty response for anything else  that comes in during game play
                return handlerInput.responseBuilder.getResponse();
            }
        }
        throw new Error("Unexpected event type. Not supported in roll call.");

    }
}

Now For the Real Stuff

When an event comes in, it’ll be an indication for our game to begin.

  1. If we game has been going for longer than GAME_DURATION_SECONDS, we finish by responding with the user’s score.
  2. We begin a turn by randomly select some buttons that we want the user to select.
  3. We set buttons the user should press to a random color, not black.
  4. Buttons the user should not press have their color set to black.
  5. We generate a new input handler with a timeout between MIN_TIME_TO_PRESS and MAX_TIME_TO_PRESS.
  6. If the user presses a black button, we deduct the score and indicate they did something wrong.
  7. If the user presses a button she was supposed to press, we increase the score. If there are buttons left, we wait for those buttons to be pressed, otherwise we go back to step 1 for a new turn.

Selecting a random set of buttons and preparing the input handlers looks as follows:


// we select buttons randomly for the next turn
const shuffle = Utilities.shuffle(btns.slice(0));
const num = Utilities.randInt(1, shuffle.length);
console.log(`generating input handler with ${num} buttons.`);

const buttonsInPlay = shuffle.slice(0, num);
const buttonsNotInPlay = btns.filter(p => !buttonsInPlay.some(p1 => p1 === p));
console.log(`${buttonsInPlay.length} buttons in play for next turn: ${JSON.stringify(buttonsInPlay)}. ` +
        `Not in play: ${JSON.stringify(buttonsNotInPlay)}`);

// assign a random time duration to the turn, but make sure we don't go past the max game duration
const timeTilEnd = whackState.timeInMsUntilEnd();
console.log(`${timeTilEnd}ms left until end`);
const turnDuration = Math.min(Utilities.randInt(MIN_TIME_TO_PRESS, MAX_TIME_TO_PRESS), timeTilEnd);
whackState.expectedEvents = buttonsInPlay;
whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);
whackState.lastHandlerStartTime = moment().utc().format(Utilities.DT_FORMAT);
whackState.lastHandlerLength = turnDuration;

// generate the input handler
const startHandler = this.generateInputHandlerTemplate(btns, turnDuration);

// turn off buttons not assigned to this turn and turn on buttons assigned to the turn
const turnOffEverything = SetLightDirectiveBuilder.setLight(
    SkillAnimations.rollCallFinishedUnused(), buttonsNotInPlay.map(p => p.gadgetId));
const setLight = SetLightDirectiveBuilder.setLight(
    SkillAnimations.lightUpWhackaButton(turnDuration), buttonsInPlay.map(p => p.gadgetId));

 

I struggled with the right way to model the input handlers and the complexity of the code probably increased as a function of this; I blame myself and not fully understanding the rules of how Alexa reports events. My first approach was to create one input handler for the entirety of the game, but this would not work well with the MAX_TIME_TO_PRESS concept; I want there to be a time pressure involved. I could also not use the input handler’s shouldEndInputHandler functionality; if the current turn requires more than one button to be pressed, the same handler should be able to generate the two events. If I had one handler that looked for button down events anchored to anywhere and reported the matches, the reported match would always be the first match. Why does this matter? Well, I want to see the latest event and its timestamp so I can make sure I verify if I handled that. If I used the input handler below, any time I pressed the button once I would receive two calls into my endpoint and the timestamp on the input event would be the same. Here is the input handler directive (gadgetId set to something easier to read).


{
    "type": "GameEngine.StartInputHandler",
    "proxies": [],
    "recognizers": {
        "btn1": {
            "type": "match",
            "anchor": "anywhere",
            "fuzzy": false,
            "pattern": [
                {
                    "action": "down",
                    "gadgetIds": [
                        "A"
                    ]
                }
            ]
        },
        "btn2": {
            "type": "match",
            "anchor": "anywhere",
            "fuzzy": false,
            "pattern": [
                {
                    "action": "down",
                    "gadgetIds": [
                        "B"
                    ]
                }
            ]
        }
    },
    "events": {
        "failed": {
            "meets": [
                "timed out"
            ],
            "reports": "history",
            "shouldEndInputHandler": true
        },
        "btn1": {
            "shouldEndInputHandler": false,
            "meets": [
                "btn1"
            ],
            "reports": "matches"
        },
        "btn2": {
            "shouldEndInputHandler": false,
            "meets": [
                "btn2"
            ],
            "reports": "matches"
        }
    },
    "timeout": 7708
}

And the two requests sent to my skill.


{
    "type": "GameEngine.InputHandlerEvent",
    "requestId": "amzn1.echo-api.request.ee60ad56-56a0-4b73-b4f5-48a7bee715b7",
    "timestamp": "2018-10-11T15:39:52Z",
    "locale": "en-US",
    "originatingRequestId": "amzn1.echo-api.request.a0b25097-030e-465c-9454-0c0e1caa0386",
    "events": [
        {
            "name": "btn1",
            "inputEvents": [
                {
                    "gadgetId": "A",
                    "timestamp": "2018-10-11T15:39:52.324Z",
                    "color": "000000",
                    "feature": "press",
                    "action": "down"
                }
            ]
        }
    ]
}

{
    "type": "GameEngine.InputHandlerEvent",
    "requestId": "amzn1.echo-api.request.8c4ab5ab-8580-4c7b-994e-598c35e192c5",
    "timestamp": "2018-10-11T15:39:52Z",
    "locale": "en-US",
    "originatingRequestId": "amzn1.echo-api.request.a0b25097-030e-465c-9454-0c0e1caa0386",
    "events": [
        {
            "name": "btn1",
            "inputEvents": [
                {
                    "gadgetId": "A",
                    "timestamp": "2018-10-11T15:39:52.324Z",
                    "color": "000000",
                    "feature": "press",
                    "action": "down"
                }
            ]
        }
    ]
}

 

Note everything is the same EXCEPT the originatingRequestId. So it sounds like I need to start tracking the timestamp of the latest event. It is not enough to use the request’s timestamp, since it doesn’t provide millisecond resolution. One could easily generate two real buttons presses within a second of each other. So… I decided I’ll track the latest input event timestamp and will only consider event if their input event is after my latest event timestamp. BUT, I also need to send a new input handler directive anytime an event comes in, because of then fact that matches reports the first input event only.

Ok enough cryptic text. Let’s see the code. Here is the code that selects the relevant events and the latest timestamp.


export function getEventsAndMaxTimeSince(
    events: services.gameEngine.InputHandlerEvent[],
    lastEvent: moment.Moment,
    timeoutEventName: string)
    : { maxTime: moment.Moment, events: Array } {
    if (events.some(p => p.name! === timeoutEventName)) {
        return { maxTime: moment.utc(lastEvent), events: [timeoutEventName] };
    }
    const mapped = events
        .map(p => {
            const temp = p.inputEvents!.map(p1 => moment(p1.timestamp!).utc().valueOf());
            const max = moment.utc(Math.max.apply({}, temp));
            const diff = max.diff(lastEvent, "ms");
            console.log(`temp: ${JSON.stringify(temp)}`);
            console.log(`max: ${max.format(Utilities.DT_FORMAT)}`);
            return { max: max.valueOf(), maxMoment: max, diff: diff, name: p.name! };
        });

    console.log(`Mapping events last update${lastEvent.format(Utilities.DT_FORMAT)}: \n${JSON.stringify(mapped, null, 2)}`);
    const filtered = mapped.filter(p => p.diff > 0);
    let globalMax = Math.max.apply({}, filtered.map(p => p.max));
    if (!globalMax || isNaN(globalMax) || !isFinite(globalMax)) {
        console.log(`setting global max to ${lastEvent.valueOf()}`);
        globalMax = lastEvent.valueOf();
    }
    const resultGlobalMax = moment.utc(globalMax);
    console.log(`GLOBAL MAX ${resultGlobalMax.format(Utilities.DT_FORMAT)}`);

    const array = filtered.map(p => p.name);
    const result = { maxTime: resultGlobalMax, events: array };
    console.log(`returning result\n${JSON.stringify(result)}`);
    return result;
}

We get the constituent input event timestamps, select the maximum value, select the events whose maximum value is after the current latest value and then return those event names and the new maximum timestamp. If the event is a timeout event, we simply return as we have to generate a new turn.

Once we have the relevant events handy, we increase the score if we get an expected event, otherwise we increase the bad count.


private processRelevantEvents(relevantEvents: string[], whackState: WhackState): { good: string[], bad: string[] } {
    console.log(`received events ${JSON.stringify(relevantEvents)}`);
    const result: { good: string[], bad: string[] } = {
        good: [],
        bad: []
    };

    relevantEvents.forEach(evName => {
        // check if we are expecting this event
        const index = whackState.expectedEvents.findIndex(val => val.name === evName);
        if (index > -1) {
            // if we are, great. increase score and remove event from expected list.
            console.log(`increasing good`);
            result.good.push(whackState.expectedEvents[index].gadgetId);
            whackState.good++;
            whackState.expectedEvents.splice(index, 1);
        } else {
            // otherwise, increase bad count.
            console.log(`increasing bad.`);
            console.log(`still expecting number of buttons ${whackState.expectedEvents.length}`);
            result.bad.push(evName);
            whackState.bad++;
        }
    });

    return result;
}

If a the user has any buttons left, we simply turn off any good buttons that were pressed, and we add a voice response if there were any bad buttons pressed.


let rb = this.handlerInput.responseBuilder;
if (hasBad) {
    rb.speak(LocalizedStrings.whack_bad_answer().speech);
}

// need to turn off all good pressed buttons
if (goodPressedButtons.length > 0) {
    rb = rb.addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallFinishedUnused(), goodPressedButtons));
}
return rb.getResponse();

Deeper and Deeper

Another effect of the issue with the input handler we presented below, is that the code above needs to generate a new input handler. The entire method looks as follows:


private buttonsOutstanding(
    whackState: WhackState,
    hasBad: boolean,
    goodPressedButtons: string[],
    btns: GameButton[]): Response
{
    console.log(`responding with acknowledgment and new handler; more buttons remaining`);

    const now = moment.utc();
    const turnDuration = whackState.lastHandlerLength - (now.diff(whackState.lastHandlerStartTime, "ms"));
    whackState.lastHandlerStartTime = now.format(Utilities.DT_FORMAT);
    whackState.lastHandlerLength = turnDuration;
    whackState.pushAndTrimHandler(this.handlerInput.requestEnvelope.request.requestId);

    const startHandler = this.generateInputHandlerTemplate(btns, turnDuration);
    let rb = this.handlerInput.responseBuilder.addDirective(startHandler);
    if (hasBad) {
        rb.speak(LocalizedStrings.whack_bad_answer().speech);
    }

    // need to turn off all good pressed buttons
    if (goodPressedButtons.length > 0) {
        rb.addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallFinishedUnused(), goodPressedButtons));
    }
    return rb.getResponse();
}

Amazon recommends that the skill ensures that the input event requests are coming from the right originatingRequestId, since requests might come in late. The code that does this utilizes the lastHandlerIds property on the WhackState. The reason we use a list instead of one value is that if we press button 1 and button 2 one right after the other, the handler for button 1 would send a new input handler and reset the lastHandlerId, rendering the event from button 2 as junk. So we store the last two handlerIds


let ev = inputHandlerEvent;
if (!whackState.lastHandlerIds.some(p => p === ev.originatingRequestId)) {
    console.warn(`SKIPPING MESSAGE.\nLAST HANDLER IDs: \n${JSON.stringify(whackState.lastHandlerIds, null, 2)}`
        + `\nORIGINATING REQUEST ID: ${ev.originatingRequestId}`);
    return this.handlerInput.responseBuilder.getResponse();
}

For completeness, this is what the WhackState type looks like.


class WhackState {
    public startTime: string | undefined;
    public good: number = 0;
    public bad: number = 0;
    public turn: number = 0;
    public waitingOnConfirmation: boolean = false;

    public expectedEvents: Array = [];
    public lastEventTime: string | undefined;
    public lastHandlerIds: Array = [];
    public lastHandlerStartTime: string | undefined;
    public lastHandlerLength: number = 0;

    public initGame(): void {
        console.log(`initializing game. start time ${moment.utc(this.startTime).format(Utilities.DT_FORMAT)}`);

        this.waitingOnConfirmation = false;
        this.expectedEvents = [];
        this.bad = 0;
        this.good = 0;
        this.startTime = moment.utc().format(Utilities.DT_FORMAT);
        this.lastEventTime = this.startTime;
    }

    public pushAndTrimHandler(reqId: string): void {
        this.lastHandlerIds.push(reqId);
        while (this.lastHandlerIds.length > WHACKABUTTON_NUM_OF_BUTTONS + 2) {
            this.lastHandlerIds.shift();
        }
    }

    public timeInMsUntilEnd(): number {
        const now = moment.utc();
        const start = moment.utc(this.startTime);
        const end = start.add(GAME_DURATION_SECONDS, "s");
        const diff = end.diff(now, "ms");
        return diff;
    }


    public timeSinceStarted(): number {
        const now = moment.utc();
        const start = moment.utc(this.startTime);
        const diff = now.diff(start, "s");
        console.log(`it has been ${diff} seconds since the game started.`);
        return diff;
    }

}

Wrapping The Game Up

What happens when the game is done? We check for the time elapsed anytime user input or a time out request comes in. If the game has lasted long enough, we send the result, transition to the InLaunchStateHandler and ask the user if they want to play again.


private finish(handlerInput: HandlerInput, finish: boolean): Response {
    const whackState = GameHelpers.getState(handlerInput, new WhackState()).data;
    GameState.setInPostGame(handlerInput, true);

    let resp = LocalizedStrings.whack_summary({
        score: whackState.good - whackState.bad,
        good: whackState.good,
        bad: whackState.bad
    });
    if (finish) {
        resp = LocalizedStrings.whack_finish({
            score: whackState.good - whackState.bad,
            good: whackState.good,
            bad: whackState.bad
        });
    }

    const turnOffEverything = SetLightDirectiveBuilder.setLight(
        SkillAnimations.rollCallFinishedUnused());

    return handlerInput.responseBuilder
        .speak(resp.speech)
        .reprompt(resp.reprompt)
        .addDirective(turnOffEverything)
        .getResponse();
}

At this point the user can either restart the game, ask for their score (added a ScoreIntent to support this) or exit out. The PostGameStateHandler is implements this logic.


export class PostGameStateHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        const issupportedintent = handlerInput.requestEnvelope.request.type === "IntentRequest"
            && ["AMAZON.YesIntent",
                "AMAZON.NoIntent",
                "StartGameIntent",
                "ScoreIntent"]
                .some(p => p === (handlerInput.requestEnvelope.request).intent.name);
        return sessionAttr.inPostGame && issupportedintent;
    }

    handle(handlerInput: HandlerInput): Response {
        console.log("executing in post game state handler");

        if (handlerInput.requestEnvelope.request.type === "IntentRequest") {
            const req = handlerInput.requestEnvelope.request as IntentRequest;
            if (req.intent.name === "AMAZON.YesIntent" || req.intent.name === "StartGameIntent") {
                GameState.deleteState(handlerInput);
                const game = new WhackabuttonGame(handlerInput);
                GameState.setInPostGame(handlerInput, false);
                return game.initialize();
            } else if (req.intent.name === "AMAZON.NoIntent") {
                GameState.deleteState(handlerInput);
                GameState.setInPostGame(handlerInput, false);
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .getResponse();
            } else if(req.intent.name === "ScoreIntent") {
                return new WhackabuttonGame(handlerInput).postGameSummary();
            }
        }

        const donotresp = LocalizedStrings.donotunderstand();
        return handlerInput.responseBuilder
            .speak(donotresp.speech)
            .reprompt(donotresp.reprompt)
            .getResponse();
    }
}

How Did It Go?

Building this was a lot of fun but the development process was much more complicated than I expected. The number of events and semantics of the requests that Alexa sends are rather confusing, so there is a bit of a learning curve. The simulator isn’t great at helping debug some of this as the timeout and button presses do not show the JSON inside the simulator, so trying to figure out bugs was an exercise in diving into CloudWatch and figuring it out. I’ve seen inconsistent animation behavior; sometimes my animations wouldn’t play at all in the buttons. Sometimes although the input events seems to show up in the simulator, they never flow into the skill, either from the simulator or the real buttons. It would have helped to have unit tests but… you know how it goes when playing with a new tech.

As an exploratory exercise this was fairly successful. Let’s see how Teddy enjoyed the game.

As always, you can find the code in the Github repo. Enjoy!

Posted by Szymon in Alexa

Alexa Gadget Skill API: Using the SetLight Directive

In a previous post, we began developing an Alexa Gadget Skill and set up a simple roll call dialog. One thing that our sample could really benefit from is providing visual feedback to the user when the Echo Buttons are pressed or selected during the roll call process. The Alexa Gadget Skills API gives us control over each gadget’s light. In this post we dive into and have our skill take advantage of this functionality.

Exploring the SetLight Directive

So far, we have been working with the GameEngineInterface. This interface allows us to set input handlers on gadgets and to receive users’ gadget input events. The interface that lets us control the device itself is the GadgetControllerInterface. This interface contains one directive called SetLight and one request called System.ExceptionEncountered. The System.ExceptionEncountered request is sent to our skill when the SetLight directive has failed for whatever reason. In this post, we focus look at the SetLight directive.

The SetLight directive allows developers to set animations on the devices discovered during the roll call process. The following is an example of the directive:


 {
   "type": "GadgetController.SetLight",
   "version": 1,
   "targetGadgets": [ "gadgetId1", "gadgetId2" ],
   "parameters": {
      "triggerEvent": "none",
      "triggerEventTimeMs": 0,
      "animations": [ 
        {
          "repeat": 1,
          "targetLights": ["1"],
          "sequence": [ 
           {
              "durationMs": 10000,
              "blend": false,
              "color": "0000FF"
           }
          ] 
        }
      ]
    }
 }

The directive can act on either a specified collection of gadgets or, if the targetGadgets array remains empty, all paired gadgets. The parameters field contains the details of what event triggers the animation and the animation’s definition. The triggerEvent can be a button up, down or none, in which case the animation begins playing immediately. triggerEventTimeMs is a delay in milliseconds after the event occurs before the animation begins. The animations object includes instructions on how many times to repeat a sequence (repeat field), which lights on the gadget animation is for (targetLights field) and a list of step by step instructions on how to execute the animation (sequence field). Each sequence step has a duration in milliseconds, a color in HEX without the # character and a blend flag indicating whether the device should interpolate the color from its current state to the step color. A few additional items to note.

  • The targetLights array is simply [ "1" ] because the Echo Buttons have one light. Future gadgets might have more. This field will provide fine tuned control over each light when those gadgets come out.
  • The number of sequence steps allowed is limited by the length of the targetGadgets array. The formula for the limit is: 38 - targetGadgets.length * 3. Of course, that might be subject to change, so please consult the official docs.
  • Each Echo Button can have one animation set per trigger. Any directive that sends a different animation for a trigger will overwrite whatever animation was set before.

Let us now turn our attention back to the roll call code. We would like to do the following:

  1. Set all lights to an animation when the skill launches.
  2. Set all lights to some color and fade out when the roll call initializes.
  3. Set a light to a solid color if a button has been selected.
  4. Once roll call is finished, set the buttons that are not used to black and set the used buttons to some animation indicating that they are in the game.

We first create something to help us build the animations JSON. As it turns out, the trivia game sample has a really cool animations helper that we can use. I went ahead and translated it to TypeScript. The helper now has two modules: BasicAnimations and ComplexAnimations. BasicAnimations contains many functions such as setting a static color (SolidAnimation), fade in/out (FadeInAnimation/FadeOutAnimation) or alternating between a color and black (BlinkAnimation). ComplexAnimations contains two functions, one of which is SpectrumAnimation, an animation that takes the light through any number of color transitions. The code for all of these is fairly easy to follow. Here is what two of them look like.


export module BasicAnimations {
    export function SolidAnimation(cycles: number, color: string, duration: number): Array {
        return [
            {
                "repeat": cycles,
                "targetLights": ["1"],
                "sequence": [
                    {
                        "durationMs": duration,
                        "blend": false,
                        "color": ColorHelper.validateColor(color)
                    }
                ]
            }
        ];
    }
    ...
    export function BlinkAnimation(cycles: number, color: string): Array {
        return [
            {
                "repeat": cycles,
                "targetLights": ["1"],
                "sequence": [
                    {
                        "durationMs": 500,
                        "blend": false,
                        "color": ColorHelper.validateColor(color)
                    }, {
                        "durationMs": 500,
                        "blend": false,
                        "color": "000000"
                    }
                ]
            }
        ];
    }
    ...
}

export module ComplexAnimations {
    export function SpectrumAnimation(cycles: number, color: string[]): Array {
        let colorSequence = [];
        for (let i = 0; i < color.length; i++) {

            colorSequence.push({
                "durationMs": 400,
                "color": ColorHelper.validateColor(color[i]),
                "blend": true
            });
        }
        return [
            {
                "repeat": cycles,
                "targetLights": ["1"],
                "sequence": colorSequence
            }
        ];
    }
    ...
}

Here is the generated JSON:


// Solid Animation
[
  {
    "repeat": 1,
    "targetLights": [
      "1"
    ],
    "sequence": [
      {
        "durationMs": 2000,
        "blend": false,
        "color": "ff0000"
      }
    ]
  }
]
// Fade In Animation
[
  {
    "repeat": 4,
    "targetLights": [
      "1"
    ],
    "sequence": [
      {
        "durationMs": 1,
        "blend": true,
        "color": "000000"
      },
      {
        "durationMs": 1000,
        "blend": true,
        "color": "ffd400"
      }
    ]
  }
]
// Spectrum Animation
[
  {
    "repeat": 3,
    "targetLights": [
      "1"
    ],
    "sequence": [
      {
        "durationMs": 400,
        "color": "ff0000",
        "blend": true
      },
      {
        "durationMs": 400,
        "color": "0000ff",
        "blend": true
      },
      {
        "durationMs": 400,
        "color": "00ff00",
        "blend": true
      },
      {
        "durationMs": 400,
        "color": "ffffff",
        "blend": true
      }
    ]
  }
]

As a next step, we ensure that our skill has a set of reusable animations. We create a module called SkillAnimations for this purpose. We create one method per distinct animation that we want to send to our users. The module looks like this:


export module SkillAnimations {
    ...
    export function rollCallInitialized(): Array {
        return BasicAnimations.CrossFadeAnimation(1, "yellow", "black", 5000, 15000);
    }

    export function rollCallButtonSelected(): Array {
        return BasicAnimations.SolidAnimation(1, "orange", 300);
    }
    ...
}

Lastly, we create a SetLightDirectiveBuilder module to build the SetLight directive instances. The goal of this code is to generate the directives given an animation, an optional array of targetGadgetIds, the triggering event and the delay.


export module SetLightDirectiveBuilder {
    ...
    function setLightImpl(animations: Array,
        on: services.gadgetController.TriggerEventType,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        const result: interfaces.gadgetController.SetLightDirective = {
            type: "GadgetController.SetLight",
            version: 1,
            targetGadgets: targetGadgets,
            parameters: {
                triggerEvent: on,
                triggerEventTimeMs: delayInMs,
                animations: animations
            }
        };
        return result;
    }
}

We create three helpers so we do not have to pass the TiggerEventType parameter. A minor convenience.


export module SetLightDirectiveBuilder {
    export function setLight(animations: Array,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        return setLightImpl(animations, "none", targetGadgets, delayInMs);
    }

    export function setLightOnButtonDown(animations: Array,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        return setLightImpl(animations, "buttonDown", targetGadgets, delayInMs);
    }

    export function setLightOnButtonUp(animations: Array,
        targetGadgets?: string[],
        delayInMs?: number): interfaces.gadgetController.SetLightDirective {
        return setLightImpl(animations, "buttonUp", targetGadgets, delayInMs);
    }
    ...
}

Now, we can call the code:


    SetLightDirectiveBuilder.setLightOnButtonDown(
        BasicAnimations.FadeInAnimation(1, "yellow", 500), ["left", "right"])

and receive the following JSON to send back as part of our response from a skill.


{
  "type": "GadgetController.SetLight",
  "version": 1,
  "targetGadgets": [
    "left",
    "right"
  ],
  "parameters": {
    "triggerEvent": "buttonDown",
    "animations": [
      {
        "repeat": 1,
        "targetLights": [
          "1"
        ],
        "sequence": [
          {
            "durationMs": 1,
            "blend": true,
            "color": "000000"
          },
          {
            "durationMs": 500,
            "blend": true,
            "color": "ffd400"
          }
        ]
      }
    ]
  }
}

Excellent! Let us integrate these new features into our skill to see the lights in action!

Fire It Up

We’ve actually done most of the work we needed to do already. The Roll Call code we had created in the previous part of this series is in a state where we can easily add the SetLight directive. The LaunchHandler changes only to add directives. Note that not only do we add the skillLaunch animation, which cycles through white, purple, and yellow, for 5 cycles. We also add the default button up and down animations. We do this so that whenever a user presses any button, we provide some sort of color feedback.


return handlerInput.responseBuilder
    .speak(resp.speech)
    .reprompt(resp.reprompt)
    .addDirective(SetLightDirectiveBuilder.setLightOnButtonDown(SkillAnimations.buttonDown()))
    .addDirective(SetLightDirectiveBuilder.setLightOnButtonDown(SkillAnimations.buttonUp()))
    .addDirective(SetLightDirectiveBuilder.setLight(SkillAnimations.skillLaunch()))
    .getResponse();

The other interesting change is specific to feedback when a button is selected during the roll call. Recall, when a user pressed the first button in the roll call, our skill acknowledges this via a voice response and asks the user to press the second button. We would like to have the light perform an animation at this point; a good visual cue for the user to know which buttons are selected and which are not.

We modify the handleButtonCheckin function in the RollCall module to send the directive for each gadgetId that was selected in the current request. We also do the math to ensure that the button count is reflected correctly if the skill received multiple button inputs simultaneously. I’m not certain this can actually occur, but since inputEvents is an array… better safe than sorry.


export module RollCall {
    ...
    export function handleButtonCheckin(handlerInput: HandlerInput, inputEvents: Array): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();


        const directives = inputEvents.map(ev => {
            const gadgetIds = getGadgetIds(ev);
            return SetLightDirectiveBuilder.setLight(SkillAnimations.rollCallButtonSelected(), gadgetIds);
        });

        sessionAttr.rollcallButtonsCheckedIn += directives.length;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);
        const resp = LocalizedStrings.rollcall_checkin(numOfButtons - sessionAttr.rollcallButtonsCheckedIn);

        let temp = handlerInput.responseBuilder.speak(resp.speech);
        directives.forEach(p => temp.addDirective(p));
        return temp.getResponse();
    }

    function getGadgetIds(ev: services.gameEngine.InputHandlerEvent): string[] {
        const btns = ev!.inputEvents!.map(p => { return p.gadgetId!; });
        return btns;
    }
    ...
}

Beyond that, it’s smooth sailing. You can deploy this code into a skill by using ask deploy. Here is a short video of the current code working on my desk.

Code can be found in the Github repo.

Posted by Szymon in Alexa

Broken Alexa Interaction Model Slot Extraction When Using Dialog Management

I’m probably not the first one to notice this but it’s worth documenting for posterity. It’s also one of many minor inconsistencies in Alexa behavior that I hope will be fixed in the near future.

On one of my projects, we are integrating our Microsoft Bot Framework bot with an Alexa Interaction Model that includes Dialog Management. It’s a very interesting effort and allows us to get us intimately familiar with how the Alexa Interaction Model works, versus Microsoft’s LUIS, a system that we have much more experience with.

Let me set the stage. We want to be able to train an Intent that requires multiple custom slot types. For the sake of the example, let’s say we need three slots: a number, a custom slot comprised of two values (buy/sell) and a custom free text slot indicating the name of a product the user is buying. This last slot is the problem area I’d like to focus on. This slot is fairly free form and the set of values may change any time, so we cannot simply populate all the possible values into the slot type definition. We have no problems with Alexa detecting an unseen slot value based on what we can assume is a common machine learning-based model. However, once we enable Dialog Management on the intent, the slot is no longer recognized!

Jumping In

Let’s say we want to create an intent called RegisterNeedIntent. This lets the user inform our skill when she wants to sell or purchase an amount of some product. For example, the user may want to buy thirty two pounds of butter or sell thirty two tulips. We will create three slot types: a custom slot type for the buy/sell distinction, the number slot to capture the amount of the product and a custom slot type representing the product in question.

The interaction model looks something like this:


{
  "interactionModel": {
      "languageModel": {
          "invocationName": "product catalog",
          "intents": [
              {
                  "name": "AMAZON.CancelIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.HelpIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.StopIntent",
                  "samples": []
              },
              {
                  "name": "RegisterNeedIntent",
                  "slots": [
                      {
                          "name": "Amount",
                          "type": "AMAZON.NUMBER"
                      },
                      {
                          "name": "Product",
                          "type": "Product"
                      },
                      {
                          "name": "Action",
                          "type": "Action"
                      }
                  ],
                  "samples": [
                      "{Action} {Product}",
                      "{Action} {Amount} pounds of {Product}",
                      "{Action} {Amount} {Product}",
                      "{Action} {Amount} packages of {Product}"
                  ]
              }
          ],
          "types": [
              {
                  "name": "Action",
                  "values": [
                      {
                          "name": {
                              "value": "sell"
                          }
                      },
                      {
                          "name": {
                              "value": "buy"
                          }
                      }
                  ]
              },
              {
                  "name": "Product",
                  "values": [
                      {
                          "name": {
                              "value": "flying squirrel"
                          }
                      },
                      {
                          "name": {
                              "value": "hammer"
                          }
                      }
                  ]
              }
          ]
      }
  }
}

Simple enough. We considered using the AMAZON.SearchQuery slot type but it will not work because it cannot be combined with other slot types. When we build this model, the user can enter an utterance like buy twenty boxes of notebooks. Even with this small of amount of sample utterances, Alexa recognizes that the word notebooks is a Product slot. See the intent object passed into the skill below.


{
  "name": "RegisterNeedIntent",
  "confirmationStatus": "NONE",
  "slots": {
    "Product": {
      "name": "Product",
      "value": "notebooks",
      "resolutions": {
        "resolutionsPerAuthority": [
          {
            "authority": "amzn1.er-authority.echo-sdk.amzn1.ask.skill.a20631d9-6a23-413b-8bdc-68ea3c702a10.Product",
            "status": {
              "code": "ER_SUCCESS_NO_MATCH"
            }
          }
        ]
      },
      "confirmationStatus": "NONE"
    },
    "Action": {
      "name": "Action",
      "value": "buy",
      "resolutions": {
        "resolutionsPerAuthority": [
          {
            "authority": "amzn1.er-authority.echo-sdk.amzn1.ask.skill.a20631d9-6a23-413b-8bdc-68ea3c702a10.Action",
            "status": {
              "code": "ER_SUCCESS_MATCH"
            },
            "values": [
              {
                "value": {
                  "name": "buy",
                  "id": "0461ebd2b773878eac9f78a891912d65"
                }
              }
            ]
          }
        ]
      },
      "confirmationStatus": "NONE"
    },
    "Amount": {
      "name": "Amount",
      "value": "20",
      "confirmationStatus": "NONE"
    }
  }
}

The only caveat is that although the Action slot type has a resolution with status ER_SUCCESS_MATCH, the resolution for Product is ER_SUCCESS_NO_MATCH. Fair enough! notebooks does not exist in the Interaction Model. This is actually great. At this point, our skill code receives the raw user Product input, validates it against some known database and we’re in business. So far, this works like LUIS. LUIS’ slots equivalent, entities, do draw a distinction between dictionary look ups of known values, know as List Entities, and machine learned Simple Entities. In essence, Amazon slots act as a combination of those two entity types.

Here is Where It Breaks

We know we want to include more slot types down the road. We also know that we want all those slots to be collected by the entity. Perfect use case for Dialog Management. The following Interaction Model is a result of the addition of dialogs and prompts.


{
  "interactionModel": {
      "languageModel": {
          "invocationName": "product catalog",
          "intents": [
              {
                  "name": "AMAZON.CancelIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.HelpIntent",
                  "samples": []
              },
              {
                  "name": "AMAZON.StopIntent",
                  "samples": []
              },
              {
                  "name": "RegisterNeedIntent",
                  "slots": [
                      {
                          "name": "Amount",
                          "type": "AMAZON.NUMBER",
                          "samples": [
                              "{Amount}",
                              "I want {Amount} units"
                          ]
                      },
                      {
                          "name": "Product",
                          "type": "Product",
                          "samples": [
                              "{Product}"
                          ]
                      },
                      {
                          "name": "Action",
                          "type": "Action",
                          "samples": [
                              "I want to {Action}",
                              "{Action}"
                          ]
                      }
                  ],
                  "samples": [
                      "{Action} {Product}",
                      "{Action} {Amount} pounds of {Product}",
                      "{Action} {Amount} {Product}",
                      "{Action} {Amount} packages of {Product}"
                  ]
              }
          ],
          "types": [
              {
                  "name": "Action",
                  "values": [
                      {
                          "name": {
                              "value": "sell"
                          }
                      },
                      {
                          "name": {
                              "value": "buy"
                          }
                      }
                  ]
              },
              {
                  "name": "Product",
                  "values": [
                      {
                          "name": {
                              "value": "flying squirrel"
                          }
                      },
                      {
                          "name": {
                              "value": "hammer"
                          }
                      }
                  ]
              }
          ]
      },
      "dialog": {
          "intents": [
              {
                  "name": "RegisterNeedIntent",
                  "confirmationRequired": true,
                  "prompts": {
                      "confirmation": "Confirm.Intent.1103891303624"
                  },
                  "slots": [
                      {
                          "name": "Amount",
                          "type": "AMAZON.NUMBER",
                          "confirmationRequired": false,
                          "elicitationRequired": true,
                          "prompts": {
                              "elicitation": "Elicit.Slot.1103891303624.102254088595"
                          }
                      },
                      {
                          "name": "Product",
                          "type": "Product",
                          "confirmationRequired": false,
                          "elicitationRequired": true,
                          "prompts": {
                              "elicitation": "Elicit.Slot.1103891303624.804409414770"
                          }
                      },
                      {
                          "name": "Action",
                          "type": "Action",
                          "confirmationRequired": false,
                          "elicitationRequired": true,
                          "prompts": {
                              "elicitation": "Elicit.Slot.1103891303624.279514542098"
                          }
                      }
                  ]
              }
          ]
      },
      "prompts": [
          {
              "id": "Elicit.Slot.1103891303624.102254088595",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "How many units do you want to buy?"
                  }
              ]
          },
          {
              "id": "Elicit.Slot.1103891303624.804409414770",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "Which product would you like to {Action} ?"
                  }
              ]
          },
          {
              "id": "Elicit.Slot.1103891303624.279514542098",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "Do you want to buy or sell?"
                  }
              ]
          },
          {
              "id": "Confirm.Intent.1103891303624",
              "variations": [
                  {
                      "type": "PlainText",
                      "value": "Do you want to {Action} {Amount} units of {Product} ?"
                  }
              ]
          }
      ]
  }
}

Looks good. When we run this model, it works but only for values we have declared in the Product slot (for example, buy one hammer). If we use the same utterance as before, we get the following intent from Alexa.


{
    "name": "RegisterNeedIntent",
    "confirmationStatus": "NONE",
    "slots": {
        "Product": {
            "name": "Product",
            "confirmationStatus": "NONE"
        },
        "Action": {
            "name": "Action",
            "value": "buy",
            "resolutions": {
                "resolutionsPerAuthority": [
                    {
                        "authority": "amzn1.er-authority.echo-sdk.amzn1.ask.skill.a20631d9-6a23-413b-8bdc-68ea3c702a10.Action",
                        "status": {
                            "code": "ER_SUCCESS_MATCH"
                        },
                        "values": [
                            {
                                "value": {
                                    "name": "buy",
                                    "id": "0461ebd2b773878eac9f78a891912d65"
                                }
                            }
                        ]
                    }
                ]
            },
            "confirmationStatus": "NONE"
        },
        "Amount": {
            "name": "Amount",
            "value": "20",
            "confirmationStatus": "NONE"
        }
    }
}

My expectation is we would get the raw value and the same resolution result stating there was no match. Consequently, the Dialog.Delegate would simply ask to fill the slot if the resolution was unmatched. In our case, we could have code in our skill to resolve the right value from an external database on the raw user input. If there was a match in our database, we would update the intent with the right resolved product with a confirmation, and delegate the dialog engine to fill in additional missing slots. Instead, we have a completely empty slot and no idea what the user said. Sigh.

We are not completely blocked; but the voice experience is hindered as we can not collect multiple slots in one utterance if it leverages Dialog Management.

Posted by Szymon in Alexa

Alexa Gadget Skill API: Intro to Input Handlers and Roll Call

In a previous post, we set up a basic TypeScript Alexa boilerplate repo to begin Gadget Skill API development. My aim for this and future posts is to create a game on Alexa that my one and a half year old son can play with, while also guiding the readers through the journey of creating their own game. In this post, we create a skill that listens to input events using the Game Engine interface. In particular, we study roll call functionality and the process of developing a somewhat reusable roll call helper to figure out what gadgets are available to the skill. Much of the inspiration to these concepts come from the Alexa trivia game sample, which is unfortunately somewhat difficult to follow.

Introduction to Roll Call and Input Handlers

The idea of a roll call is as follows. Say we have an Alexa device with four paired Echo Button gadgets. Our skill, however, only needs to use two of the four buttons. How do we determine which buttons to use? How do we grab a unique identifier for each button to control it? How do we associate the identifiers with a semantic name such as as Player 2 or Button B? The roll call process allows us to answer these questions. Once completed, we will have the Alexa gadgetIds in our possession. This will allow us to control each device individually and create a compelling game.

To create a roll call we need to generate a GameEngine.StartInputHandler directive and send it as part of a response from our Alexa Skill. This can be a response to the LaunchRequest or to any other request. The directive instructs Alexa gadgets to start recognizing user actions. It can recognize events in a specific pattern, when a user has deviated from a pattern or when a user has completed a portion of a pattern. These units of recognition are called recognizers. The three types just mentioned map to patternRecognizer, deviationRecognizer and progressRecognizer. A patternRecognizer, in the context of a roll call, is one in which each button has one down event. Once we define all of our input handler’s recognizers, we can define custom events. A patternRecognizer can recognize patterns on all or specified gadgetIds. Recognizers are by default set to false. As user input is collected, recognizers are set to true. An event is sent to the skill when a set of recognizer conditions has been met. A developer can specify the list of recognizers that must be true or false, what data to report, a maximum number of invocations and a few other pieces of data. There is also an implicit timed out recognizer that we can utilize in our events.

Conveniently enough, an input handler may include a list of proxies. A proxy is an identifier for a button that can be utilized in a recognizer when a gadgetId is not known. The basic roll call sample input handler in the Alexa documentation shows how we can use proxies and use them in recognizers.


{
  "type": "GameEngine.StartInputHandler",
  "timeout": 10000,
  "proxies": [ "left", "middle", "right" ],
  "recognizers": {
    "all pressed": {
      "type": "match",
      "fuzzy": true,
      "anchor": "start",
      "pattern": [
        {
          "gadgetIds": [ "left" ],
          "action": "down"
        },
        {
          "gadgetIds": [ "middle" ],
          "action": "down"
        },
        {
          "gadgetIds": [ "right" ],
          "action": "down"
        }
      ]
    }
  },
  "events": {
    "complete": {
      "meets": [ "all pressed" ],
      "reports": "matches",
      "shouldEndInputHandler": true
    },
    "failed": {
      "meets": [ "timed out" ],
      "reports": "history",
      "shouldEndInputHandler": true
    }
  }
}

In this example, we expect the user to register three buttons called: left, middle and right. We then define a recognizer composed of a button down event from each button. Whichever button is pressed first will be treated as left, the second as middle and the last one as right. If the first button is pressed twice, the pattern recognizer considers it as two button down event from the left button and continue on its merry way. The fuzzy flag on the recognizer ensures that this doesn’t invalidate the recognizer. Once the recognizer registers true, the complete event is sent to our Alexa Skill and the input handler is unregistered. Note, if the input handler times out after 10 seconds, our Alexa Skill will receive an event called failed.

In the rest of the post, we create a basic Alexa Skill that utilizes this roll call object to register three buttons. Once we have the button gadgetIds we exit the skill. Of note is that the structure of the input handler is very flexible. We could easily add three extra events to call our skill when each individual button is pressed. For example, we create a recognizer leftButtonDown that only recognizes the left button’s down event and declare an event called leftButtonPressed that is invoked when the leftButtonDown recognizer becomes true. The input handler stays active as we set shouldEndInputHandler to false and limit the event to 1 invocation.


"leftButtonPressed": {
  "meets": ["left"],
  "reports: "matches",
  "shouldEndInputHandler: false,
  "maximumInvocations: 1
}

This approach lets us assign semantic names to each button. For example, our skill could say Please press the button for player 1. Once an event comes in, the button is assigned to player1. The skill can then say Please press the button for player 2, and so on.

Getting Ready For a Basic Roll Call Skill

Since I am aiming to develop a single player game, we create a roll call helper that can register between 1 and 4 buttons. Let’s go ahead and see this in action. We are starting with a TypeScript Alexa Skill Boilerplate created in another post.

To create reusable roll call functionality, we must write code that generates the right input handler and can process each message from the Alexa Skills Kit correctly. We will be using the skill’s SessionAttributes to store the roll call state. At the end, we will also store the discovered buttons.

For functionality to be implemented further down the post, we add the AMAZON.YesIntent and AMAZON.NoIntent intents to models/en-US.json. We also assign the invocation name to be something friendlier. Any Alexa skill needs at least one custom intent, so, for now, we leave the HelloIntent in there.


{
  "interactionModel": {
    "languageModel": {
      "invocationName": "my games",
      "types": [],
      "intents": [
        {
          "name": "AMAZON.CancelIntent",
          "samples": []
        },
        {
          "name": "AMAZON.HelpIntent",
          "samples": []
        },
        {
          "name": "AMAZON.StopIntent",
          "samples": []
        },
        {
          "name": "AMAZON.YesIntent",
          "samples": []
        },
        {
          "name": "AMAZON.NoIntent",
          "samples": []
        },
        {
          "name": "HelloIntent",
          "samples": [
            "hello",
            "say hello",
            "say hello world"
          ]
        }
      ]
    }
  }
}

We also add a LocalizedStrings module, so we can have one place with all of our speech strings. The code uses the i18next Node package. The Alexa Skill Kit SDK for Node doesn’t include any direction on how to accomplish localization, though some samples like Trivia, use i18n. The topic is beyond the scope of this blog post.


import * as i18next from "i18next";

export interface ILocalizationResult {
    speech: string;
    reprompt: string;
}

export module LocalizedStrings {

    export function donotunderstand(): ILocalizationResult {
        return {
            speech: i.t("donotunderstand_speech"),
            reprompt: i.t("donotunderstand_reprompt")
        };
    }

    export function welcome(): ILocalizationResult {
        return {
            speech: i.t("welcome_speech"),
            reprompt: i.t("welcome_reprompt")
        };
    }

    export function goodbye(): ILocalizationResult {
        return {
            speech: i.t("goodbye"),
            reprompt: ""
        };
    }
}

const i = i18next.init({
    lng: "en",
    debug: true,
    resources: {
        en: {
            translation: {
                "donotunderstand_speech": "I'm sorry, I didn't quite catch that.",
                "donotunderstand_reprompt": "Sorry, I didn't understand.",
                "goodbye": "Ok, good bye.",
                "welcome_speech": "Hello. Welcome to the games sample. Do you want to play a game?",
                "welcome_reprompt": "Do you want to play a game?"
            }
        }
    }
});

We add more strings to this as we go along.

We modify the LaunchHandler, so that we ask the user if they want to play a game first. If they say yes, we begin the roll call. Otherwise, we leave the skill.


export class LaunchHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const request = handlerInput.requestEnvelope.request;
        return request.type === "LaunchRequest";
    }

    handle(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inLaunch = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.welcome();

        return handlerInput.responseBuilder
            .speak(resp.speech)
            .reprompt(resp.reprompt).getResponse();
    }
}

Note, we set the inLaunch session attribute to true. We add an InLaunchStateHandler to handle the requests when we are in this state.


export class InLaunchStateHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inLaunch;
    }

    handle(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inLaunch = false;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const req = handlerInput.requestEnvelope.request as IntentRequest;
        if (req) {
            if (req.intent.name === "AMAZON.YesIntent") {
                // proceed to roll call
                return RollCall.initialize(handlerInput);
            } else if (req.intent.name === "AMAZON.NoIntent") {
                // exit
                return handlerInput.responseBuilder
                    .speak(LocalizedStrings.goodbye().speech)
                    .getResponse();
            }
        }

        const donotresp = LocalizedStrings.donotunderstand();
        return handlerInput.responseBuilder
            .speak(donotresp.speech)
            .reprompt(donotresp.reprompt)
            .getResponse();
    }
}

A few things of note. If the handler receives a AMAZON.NoIntent, the skill exits. If the handler receives a AMAZON.YesIntent, we initialize a new roll call. The implication is that the RollCall.initialize function returns a Response object. Presumably, this will send the input handler JSON we discussed in the previous section. For now, we simply send some speech and set a boolean flag on the SessionAttributes.


export module RollCall {
    export function initialize(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        return handlerInput.responseBuilder
            .speak("Starting Roll Call")
            .reprompt("Waiting on input")
            .getResponse();
    }
}

We create a handler for the roll call state. For now, it simply exits the skill on any input.


export class RollCallHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inRollcall;
    }

    handle(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = false;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.goodbye();
        return handlerInput.responseBuilder
            .speak(resp.speech)
            .getResponse();
    }
}

The code layout now looks as follows. Note that we have both a RollCall helper and a RollCall handler.

Let’s see how this works. We do this by initializing a new skill using ask new -n My Games, copying over the lambda and models directory and then running ask deploy. We can now navigate to the skill’s Test tab. We can respond either yes or no to the launch response and it routes us accordingly.

Diving Into the Roll Call

We now create the code to build the roll call intent handler. In our RollCall module, we create a function called createRollCallDirective and a field const rollcallHandlerTemplate. The const is a template for the input handler and we generate new directives based on it.


export module RollCall {
    ...

    const rollcallHandlerTemplate: interfaces.gameEngine.StartInputHandlerDirective = {
        type: "GameEngine.StartInputHandler",
        proxies: [],
        recognizers: {
            "all pressed": {
                type: "match",
                fuzzy: true,
                anchor: "start",
                pattern: []
            }
        },
        events: {
            complete: {
                meets: ["all pressed"],
                reports: "matches",
                shouldEndInputHandler: true
            },
            failed: {
                meets: ["timed out"],
                reports: "history",
                shouldEndInputHandler: true
            }
        }
    };

    export function createRollCallDirective(numOfButtons: number, timeout?: number): interfaces.gameEngine.StartInputHandlerDirective {
        const handler = JSON.parse(JSON.stringify(rollcallHandlerTemplate));
        if (timeout) {
            handler.timeout = timeout;
        }

        if (numOfButtons > 4 || numOfButtons < 1) {
            throw new Error("Only 1-4 buttons are supported.");
        }

        for (let i = 0; i < numOfButtons; i++) {
            const proxy = "btn" + (i + 1);

            const patternStep: services.gameEngine.Pattern = {
                action: "down",
                gadgetIds: [proxy]
            };
            handler.proxies!.push(proxy);

            (handler.recognizers!["all pressed"] as services.gameEngine.PatternRecognizer)
                .pattern!.push(patternStep);

        }

        return handler;
    }
}

The code is basically adding a proxy and pattern step for every player. Here is what the code produces for three players.


{
  "type": "GameEngine.StartInputHandler",
  "proxies": [
    "btn1",
    "btn2",
    "btn3"
  ],
  "recognizers": {
    "all pressed": {
      "type": "match",
      "fuzzy": true,
      "anchor": "start",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn1"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn2"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn3"
          ]
        }
      ]
    }
  },
  "events": {
    "complete": {
      "meets": [
        "all pressed"
      ],
      "reports": "matches",
      "shouldEndInputHandler": true
    },
    "failed": {
      "meets": [
        "timed out"
      ],
      "reports": "history",
      "shouldEndInputHandler": true
    }
  }
}

We now change the RollCall.initialize call to send the directive.


export module RollCall {
    export function initialize(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        return handlerInput.responseBuilder
            .speak("Starting Roll Call")
            .reprompt("Waiting on input")
            .addDirective(createRollCallDirective(2, 20000))
            .getResponse();
    }
}

If we were to try to run this code, our skill would fail because we have not yet enabled our skill to support the GameEngine and GadgetController interfaces. This page shows us how we can do it. Here are the necessary fields to add to our skill.json manifest.


{
  "publishingInformation": {
    "gadgetSupport": {
      "requirement": "REQUIRED", // or "OPTIONAL"
      "numPlayersMin": int,
      "numPlayersMax": int, // or null
      "minGadgetButtons": int,
      "maxGadgetButtons": int // or null     
    }
  },
  "apis": {
    "custom": {
      "interfaces": [
        {
          "type": "GAME_ENGINE"
        },
        {
          "type": "GADGET_CONTROLLER"
        }
      ]
    }
  }
}

We modify our skill manifest file to reflect this. Here are the values I utilized.


{
  "manifest": {
    "publishingInformation": {
       // other publishingInformation goes here
      "gadgetSupport": {
        "requirement": "REQUIRED",
        "numPlayersMin": 1,
        "numPlayersMax": 1,
        "minGadgetButtons": 1,
        "maxGadgetButtons": 4
      }
    },
    "apis": {
      "custom": {
        "endpoint": {
          "sourceDir": "lambda/custom"
        },
        "interfaces": [
          {
            "type": "GAME_ENGINE"
          },
          {
            "type": "GADGET_CONTROLLER"
          }
        ]
      }
    }
    // any additional manifest data here
  }
}

When we run ask deploy, the Test tab will now have button simulators!

We can run the skill in the simulator, after we answer “yes” when asked if we want to play a game, our skill sends the message “Starting Roll Call” and the generated directive JSON. At that point, you can use the simulator Echo Buttons. Press two different ones and you’ll notice that after pressing two, the skills responds with “Ok, good bye.”

This is great news; this good bye message is generated by our RollCallHandler. That means the input handler is working as expected. We should verify that our 20 second timeout works as well. Since our RollCallHandler always closes the skill, we should see the same behavior if we don’t press the buttons. Go ahead and verify it.

We now make the following changes:

  1. Add functionality to the RollCall module to support input events from the buttons.
  2. Add timeout retry functionality, so that if the roll call times out, we give the user a chance to try again.
  3. Break roll call state handling into two handlers: one focused on events from buttons and the other on user utterances as a response to timeouts.

To address item 1, we add a method called handleInput, which calls into handleTimeout or handleDone depending on which event was received. In handleDone, we retrieve the button gadgetIds from the message, assign them to our sessionAttributes, log them and exit the skill for the time being. handleTimeout implements the logic for item 2. The skill asks the user if they would like to retry. If there are two timeouts in a row, the skill exits. Here is the code for the updated RollCall module.


    export function initialize(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.inRollcall = true;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.rollcall_start();
        return handlerInput.responseBuilder
            .speak(resp.speech)
            .reprompt(resp.reprompt)
            .addDirective(createRollCallDirective(2, 20000))
            .getResponse();
    }

    export function handleInput(handlerInput: HandlerInput,
        input: interfaces.gameEngine.InputHandlerEventRequest): Response {
        const inputEvents = input.events!;

        if (inputEvents.some(p => p.name === "failed")) {
            return handleTimeoutOut(handlerInput);
        } else {
            const complete = inputEvents.find(p => p.name === "complete");
            if (complete) {
                return handleDone(handlerInput, complete);
            } else {
               throw new Error("Unexpected event");
            }
        }
    }

    export function handleDone(handlerInput: HandlerInput,
        complete: services.gameEngine.InputHandlerEvent): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        clearSessionAttr(sessionAttr);
        const btns = complete!.inputEvents!.map(p => { return { name: "", id: p.gadgetId }; });
        for (let i = 0; i < btns.length; i++) {
            btns[i].name = "btn" + (i + 1);
        }

        sessionAttr.rollcallResult = btns;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        console.log(`Registered buttons: \n${JSON.stringify(btns, null, 2)}`);

        const resp = LocalizedStrings.rollcall_done();
        return handlerInput.responseBuilder
            .speak(resp.speech)
            .withShouldEndSession(true)
            .getResponse();
    }

    function clearSessionAttr(sessionAttr: { [key: string]: any }): void {
        delete sessionAttr.inRollcall;
        delete sessionAttr.rollcallTimeout;
    }

    ...

}

For item 3, we create a RollCallRetryTimeoutHandler and move some logic over from RollCallHandler. The two handlers are shown below.


export class RollCallHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inRollcall &&
            (handlerInput.requestEnvelope.request.type === "GameEngine.InputHandlerEvent");
    }

    handle(handlerInput: HandlerInput): Response {
        const inputEventRequest = handlerInput.requestEnvelope.request as interfaces.gameEngine.InputHandlerEventRequest;
        if (inputEventRequest) {
            return RollCall.handleInput(handlerInput, inputEventRequest);
        } else {
            throw new Error("Unexpected event type. Not supported in roll call.");
        }
    }
}

export class RollCallTimeoutRetryHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        return sessionAttr.inRollcall &&
            sessionAttr.rollcallTimeout > 0 &&
            handlerInput.requestEnvelope.request.type === "IntentRequest" &&
            (handlerInput.requestEnvelope.request.intent.name === "AMAZON.YesIntent" ||
                handlerInput.requestEnvelope.request.intent.name === "AMAZON.NoIntent");
    }

    handle(handlerInput: HandlerInput): Response {
        const intentRequest = handlerInput.requestEnvelope.request as IntentRequest;
        if (intentRequest) {
            if (intentRequest.intent.name === "AMAZON.YesIntent") {
                return RollCall.initialize(handlerInput);
            } else if (intentRequest.intent.name === "AMAZON.NoIntent") {
                const resp = LocalizedStrings.goodbye();
                return handlerInput.responseBuilder
                    .speak(resp.speech)
                    .withShouldEndSession(true)
                    .getResponse();
            }
        }
        throw new Error("Unexpected input type. Not supported.");
    }
}

We can go ahead and build and deploy the code to our skill. The happy path pressing the two buttons works and we can see the gadgetIds in our CloudWatch logs on AWS. Even though we did not know the button identifiers before, we now have them and can send specific commands to each button. Try letting the input handler time out and observe the behavior as well as the retry option.

Providing Feedback for a Button Tap

If there is anything odd to state about the experience it is that when we press the first button, there is no feedback of any sort. The buttons stay dark, and there is no acknowledgement of a button press. In fact, our skill doesn’t even know a button has been pressed. We previously suggested that we can get around this and get those events. We will do so now.

First, we add code to generate the right input handler JSON. We modify the RollCall module’s createRollCallDirective function. The result is that we generate a recognizer and event for each individual button.


    export function createRollCallDirective(numOfButtons: number, timeout?: number): interfaces.gameEngine.StartInputHandlerDirective {
        const handler = JSON.parse(JSON.stringify(rollcallHandlerTemplate));
        if (timeout) {
            handler.timeout = timeout;
        }

        if (numOfButtons > 4 || numOfButtons < 1) {
            throw new Error("Only 1-4 buttons are supported.");
        }

        for (let i = 0; i < numOfButtons; i++) {
            const proxy = "btn" + (i + 1);
            const recognizer = "recognizer_" + proxy;
            const eventName = "event_" + proxy;

            const patternStep: services.gameEngine.Pattern = {
                action: "down",
                gadgetIds: [proxy]
            };
            handler.proxies!.push(proxy);

            (handler.recognizers!["all pressed"] as services.gameEngine.PatternRecognizer)
                .pattern!.push(patternStep);

            const newRecognizer: services.gameEngine.PatternRecognizer = {
                anchor: "end",
                fuzzy: true,
                type: "match",
                pattern: [patternStep]
            };

            handler.recognizers![recognizer] = newRecognizer;

            handler.events![eventName] = {
                shouldEndInputHandler: false,
                maximumInvocations: 1,
                meets: [recognizer],
                reports: "matches"
            };
        }

        return handler;
    }

The JSON produced by this code for three players is shown below.


{
  "type": "GameEngine.StartInputHandler",
  "proxies": [
    "btn1",
    "btn2",
    "btn3"
  ],
  "recognizers": {
    "all pressed": {
      "type": "match",
      "fuzzy": true,
      "anchor": "start",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn1"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn2"
          ]
        },
        {
          "action": "down",
          "gadgetIds": [
            "btn3"
          ]
        }
      ]
    },
    "recognizer_btn1": {
      "anchor": "end",
      "fuzzy": true,
      "type": "match",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn1"
          ]
        }
      ]
    },
    "recognizer_btn2": {
      "anchor": "end",
      "fuzzy": true,
      "type": "match",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn2"
          ]
        }
      ]
    },
    "recognizer_btn3": {
      "anchor": "end",
      "fuzzy": true,
      "type": "match",
      "pattern": [
        {
          "action": "down",
          "gadgetIds": [
            "btn3"
          ]
        }
      ]
    }
  },
  "events": {
    "complete": {
      "meets": [
        "all pressed"
      ],
      "reports": "matches",
      "shouldEndInputHandler": true
    },
    "failed": {
      "meets": [
        "timed out"
      ],
      "reports": "history",
      "shouldEndInputHandler": true
    },
    "event_btn1": {
      "shouldEndInputHandler": false,
      "maximumInvocations": 1,
      "meets": [
        "recognizer_btn1"
      ],
      "reports": "matches"
    },
    "event_btn2": {
      "shouldEndInputHandler": false,
      "maximumInvocations": 1,
      "meets": [
        "recognizer_btn2"
      ],
      "reports": "matches"
    },
    "event_btn3": {
      "shouldEndInputHandler": false,
      "maximumInvocations": 1,
      "meets": [
        "recognizer_btn3"
      ],
      "reports": "matches"
    }
  }
}

It is worth taking a minor detour at this point. You may have noticed that the interfaces.gameEngine.InputHandlerEventRequest interface contains an array called events. The implication is that one button down may result in a request with more than one event. Using the directive above as an example, when we press a button, we receive a request with one event called event_btn1. When we press the second button, we receive a second request with two events: event_btn2 and complete. This means that any logic that we write to generate responses must take this behavior into account. For example, we are planning on adding a response message when the user presses a button. However, when the user pressed the second button, the skill must also inform the user that the roll call is done and the game is starting. All that to say, we’ll need to be cognizant of these factors when building our skills.

For now, we will create a function called handleCheckin that handles the individual events. We update handleInput as well.


    export function handleInput(handlerInput: HandlerInput,
        input: interfaces.gameEngine.InputHandlerEventRequest): Response {
        const inputEvents = input.events!;

        if (inputEvents.some(p => p.name === "failed")) {
            return handleTimeoutOut(handlerInput);
        } else {
            const complete = inputEvents.find(p => p.name === "complete");
            if (complete) {
                return handleDone(handlerInput, complete);
            } else {
                return handleButtonCheckin(handlerInput);
            }
        }
    }

    export function handleButtonCheckin(handlerInput: HandlerInput): Response {
        const sessionAttr = handlerInput.attributesManager.getSessionAttributes();
        sessionAttr.rollcallButtonsCheckedIn++;
        handlerInput.attributesManager.setSessionAttributes(sessionAttr);

        const resp = LocalizedStrings.rollcall_checkin(numOfButtons - sessionAttr.rollcallButtonsCheckedIn);
        return handlerInput.responseBuilder.speak(resp.speech).getResponse();
    }

When we deploy the skill we should be able to receive feedback when the first button is pressed.

The timeout behavior is a bit odd. If we press one button but never press the second one, our skill asks if we want to retry. At that point, the skill doesn’t remember the first button. We will leave the behavior as is for now.

We have made some really great progress towards setting up a game on Alexa’s Gadget Skills API. In further posts we will explore setting color animations on the buttons to provide user feedback and actually getting a game in place.

The code for this post can be found on GitHub.

 

Posted by Szymon in Alexa

An Alexa Node.js TypeScript Boilerplate Project

I am currently looking at the Alexa Gadget Skills API for the purpose of creating a fun game for my son and presenting on the experience at a few conferences.  Even though the SDK is written in TypeScript, the Trivia Game sample is not. That irked me a bit. It was particularly painful as the sample does some odd things using globals that was difficult to track. So, before I create a series around how I created a few Echo Button enabled games on Alexa, I present a simple, bare bones Alexa Skill TypeScript Boilerplate Project. You can find the code on GitHub.

The V2 version of the SDK has some really good improvements over the first version. The documentation is quite good. The core concepts that we should keep in mind for our boilerplate are:

  • The index file declares the skill and pull in all necessary handlers to compose a lambda handler.
  • Each RequestHandler has two methods: canHandle and handle. Handlers are called in the order they were registered. The first handler to evaluate canHandle to true is selected for processing and itshandle method is invoked.
  • The index file registers instances of RequestInterceptor and ResponseInterceptor. A RequestInterceptor is code that executes and can perform actions on an incoming message before a handler is selected. Likewise, a ResponseInterceptor is code that executes after a handler has finished executing. The most obvious use case for these two is message logging, though, as per the sample above, developers can get much more creative.
  • Lastly, the index declares a custom ErrorHandler. In the previous version of the SDK, when an error occurs, the Alexa device responds with the dreaded “There was a problem with the skill’s response”. Now, we log the error and respond with a friendly message to the user.

We start by taking advantage of the Alexa Skills Kit CLI. If you are doing Alexa development without using it, what is wrong with you? Go ahead and get setup with it before continuing. If we go ahead and create a new skill by using ask new -n TestSkill, we get a basic Skill Manifest (skill.json), the interaction model in the default culture (models/en-US.json) and code for the skill under lambda/custom.

The generated skill interaction model includes Amazon’s builtin intents: AMAZON.CancelIntent, AMAZON.HelpIntent and AMAZON.StopIntent. The default interaction model also includes a custom HelloWorldIntent. The auto-generated index.js file includes all the code to handle all intents, plus a LaunchRequest handler and a custom ErrorHandler. The full code is shown below.


/* eslint-disable  func-names */
/* eslint-disable  no-console */

const Alexa = require('ask-sdk-core');

const LaunchRequestHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'LaunchRequest';
  },
  handle(handlerInput) {
    const speechText = 'Welcome to the Alexa Skills Kit, you can say hello!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .reprompt(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const HelloWorldIntentHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && handlerInput.requestEnvelope.request.intent.name === 'HelloWorldIntent';
  },
  handle(handlerInput) {
    const speechText = 'Hello World!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const HelpIntentHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && handlerInput.requestEnvelope.request.intent.name === 'AMAZON.HelpIntent';
  },
  handle(handlerInput) {
    const speechText = 'You can say hello to me!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .reprompt(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const CancelAndStopIntentHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'IntentRequest'
      && (handlerInput.requestEnvelope.request.intent.name === 'AMAZON.CancelIntent'
        || handlerInput.requestEnvelope.request.intent.name === 'AMAZON.StopIntent');
  },
  handle(handlerInput) {
    const speechText = 'Goodbye!';

    return handlerInput.responseBuilder
      .speak(speechText)
      .withSimpleCard('Hello World', speechText)
      .getResponse();
  },
};

const SessionEndedRequestHandler = {
  canHandle(handlerInput) {
    return handlerInput.requestEnvelope.request.type === 'SessionEndedRequest';
  },
  handle(handlerInput) {
    console.log(`Session ended with reason: ${handlerInput.requestEnvelope.request.reason}`);

    return handlerInput.responseBuilder.getResponse();
  },
};

const ErrorHandler = {
  canHandle() {
    return true;
  },
  handle(handlerInput, error) {
    console.log(`Error handled: ${error.message}`);

    return handlerInput.responseBuilder
      .speak('Sorry, I can\'t understand the command. Please say again.')
      .reprompt('Sorry, I can\'t understand the command. Please say again.')
      .getResponse();
  },
};

const skillBuilder = Alexa.SkillBuilders.custom();

exports.handler = skillBuilder
  .addRequestHandlers(
    LaunchRequestHandler,
    HelloWorldIntentHandler,
    HelpIntentHandler,
    CancelAndStopIntentHandler,
    SessionEndedRequestHandler
  )
  .addErrorHandlers(ErrorHandler)
  .lambda();

The code is pretty straightforward. Each handler’s canHandle method checks for the right request type and intent name to be present. The ErrorHandler logs the error and asks the user to repeat themselves. Lastly, the code composes the skill and returns an AWS Lambda handler.

We now turn this code into TypeScript. On top of that, we break the file up into different handler files and add intercept handlers. I work using Visual Studio Code, so I also enable tslint, a TypeScript linter so I can get extra tips of fixing up my TypeScript.

We first need to create the tsconfig.json file. This file basically provides options for the TypeScript compiler. More details can be found here. We will go ahead and write all of our TypeScript code in a folder called src. The destination will be root directory. That way, once tsc compiles everything, we run ask deploy to push the skill code into Lambda.

We use the following tsconfig.json file as a starting point.


{
    "include": [
        "src/**/*"
    ],
    "exclude": [

    ],
    "compilerOptions": {
        "lib": [
            "dom",
            "es2017"
        ],
        /* Basic Options */
        "target": "ES2015", /* Specify ECMAScript target version: 'ES3' (default), 'ES5', 'ES2015', 'ES2016', 'ES2017', or 'ESNEXT'. */
        "module": "commonjs", /* Specify module code generation: 'commonjs', 'amd', 'system', 'umd' or 'es2015'. */
        "sourceMap": true, /* Generates corresponding '.map' file. */
        "outDir": ".", /* Redirect output structure to the directory. */
        "rootDir": "./src", /* Specify the root directory of input files. Use to control the output directory structure with --outDir. */
        "moduleResolution": "node", /* Specify module resolution strategy: 'node' (Node.js) or 'classic' (TypeScript pre-1.6). */
        "strict": true,
        "noUnusedLocals": true
    }
}

We will place this file inside the lambda/custom directory. Next, we add two scripts into ourpackage.json file.


  "scripts": {
    "build": "tsc",
    "watch": "tsc --watch"
  }

You should have the Node.js TypeScript package installed for this to work.
npm install -g typescript

Let’s confirm the setup works. Create the src directory and place the following content into an index.ts file.


console.log("Hello World!");

If we run npm run build in the custom/lambda directory, the index.js gets created in the root. Good job! Our directory layout should look as follows. (You can also run npm run watch, which makes sure that any time a TypeScript file is modified, the compiler runs)

Next, we create theinterceptors and handlers directory. Inside of those, we create the files to support all the different handlers and interceptors.

You can see where this is going. We now fill out the code for each of these files. Once done and compiled into JavaScript, we run ask deploy and our skill will just work.

We begin with the Launch handler in the Launch.ts file.


import { HandlerInput, RequestHandler } from "ask-sdk-core";
import { Response } from "ask-sdk-model";

export class LaunchHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const request = handlerInput.requestEnvelope.request;
        return request.type === "LaunchRequest";
    }

    handle(handlerInput: HandlerInput): Response {
        const speechText = "Welcome to the Alexa Skills Kit, you can say hello!";

        return handlerInput.responseBuilder
            .speak(speechText)
            .reprompt(speechText)
            .withSimpleCard("Hello World", speechText)
            .getResponse();
    }
}

Note that the ask-sdk-core and ask-sdk-model modules both include TypeScript declaration files making development in an environment like Visual Studio Code easier.

Every other handler looks similar. For example, the HelloWorldHandler looks as follows:


import { HandlerInput, RequestHandler } from "ask-sdk-core";
import { Response } from "ask-sdk-model";

export class HelloWorldHandler implements RequestHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        const request = handlerInput.requestEnvelope.request;
        return request.type === "IntentRequest" && request.intent.name === "HelloWorldIntent";
    }

    handle(handlerInput: HandlerInput): Response {
        const speechText = "Hello World!";

        return handlerInput.responseBuilder
            .speak(speechText)
            .withSimpleCard("Hello World", speechText)
            .getResponse();
    }
}

The CustomErrorHandler logs the error and provides a friendly response.


import { HandlerInput, ErrorHandler } from "ask-sdk-core";
import { Response } from "ask-sdk-model";

export class CustomErrorHandler implements ErrorHandler {
    canHandle(handlerInput: HandlerInput): boolean {
        return true;
    }

    handle(handlerInput: HandlerInput, error: Error): Response {
        const request = handlerInput.requestEnvelope.request;

        console.log(`Error handled: ${error.message}`);
        console.log(`Original Request was: ${JSON.stringify(request, null, 2)}`);

        return handlerInput.responseBuilder
            .speak("Sorry, I can not understand the command.  Please say again.")
            .reprompt("Sorry, I can not understand the command.  Please say again.")
            .getResponse();
    }
}

The intercept handlers are similar, here is what the request one looks like.


import { RequestInterceptor, HandlerInput } from "ask-sdk-core";

export class RequestLoggingInterceptor implements RequestInterceptor {
    process(handlerInput: HandlerInput): Promise {
        return new Promise((resolve, reject) => {
            console.log("Incoming request:\n" + JSON.stringify(handlerInput.requestEnvelope.request));
            resolve();
        });
    }
}

The last code we will show in this post is the index.ts file that puts it all together. It imports all of the different modules we just created and ensures they are exposed to the Azure handler.


import { SkillBuilders } from "ask-sdk-core";

import { BuiltinAmazonCancelHandler } from "./handlers/builtin/AMAZON.CANCEL";
import { BuiltinAmazonHelpHandler } from "./handlers/builtin/AMAZON.Help";
import { BuiltinAmazonStopHandler } from "./handlers/builtin/AMAZON.Stop";
import { LaunchHandler } from "./handlers/Launch";
import { HelloWorldHandler } from "./handlers/HelloWorld";
import { SessionEndedHandler } from "./handlers/SessionEndedRequst";

import { CustomErrorHandler } from "./handlers/Error";
import { RequestLoggingInterceptor } from "./interceptors/RequestLogging";
import { ResponseLoggingInterceptor } from "./interceptors/ResponseLogging";

function buildLambdaSkill(): any {
    return SkillBuilders.custom()
        .addRequestHandlers(
            new LaunchHandler(),
            new HelloWorldHandler(),
            new BuiltinAmazonCancelHandler(),
            new BuiltinAmazonHelpHandler(),
            new BuiltinAmazonStopHandler(),
            new SessionEndedHandler()
        )
        .addRequestInterceptors(new RequestLoggingInterceptor())
        .addResponseInterceptors(new ResponseLoggingInterceptor())
        .addErrorHandlers(new CustomErrorHandler())
        .lambda();
}

export let handler = buildLambdaSkill();

Once all code is in place, we make sure that TypeScript compiled everything and run ask deploy. If all goes well, our skill and lambda function will be updated to reflect the changes. Below, I show the Test tab’s output as I type through a conversation. It all works great! Notice, I left the default invocation name as “greeter”. We’ll change that later on.

That’s it! For now, I am not placing any effort in terms of supporting automated testing or providing a strong opinion as to where the business logic lives. Some of this may change as I develop more skill. In the mean time, you can find all the code in this GitHub repo.

Posted by Szymon in Alexa