Categories
Episodes

Cheryl Platz on Multimodality

“There’s a lot of potential here and there’s a lot of need.”

Cheryl Platz is an accomplished interaction designer who has designed multimodal experiences for Amazon, Microsoft, the Walt Disney Company, and more. The focus of our conversation is her new book on the subject, Design Beyond Devices.

Listen to the show

Download episode 51

Show notes

Some show notes may include Amazon affiliate links. I get a small commission for purchases made through these links.

Read the transcript

Jorge: Cheryl, welcome to the show.

Cheryl: Thank you so much for having me, Jorge. It's great to be here.

Jorge: Well, I'm very excited to have you here. For folks who might not know you, would you please mind introducing yourself?

About Cheryl

Cheryl: No problem. My name is Cheryl Platz, and I am a designer that is inspired by complexity, natural user interfaces and I've had a rather interesting career, or you might say a couple of careers: from video game production to enterprise software to consumer grade, natural user interfaces like Alexa and Cortana. Currently I work at the Bill and Melinda Gates Foundation as a principal user experience designer. And I also own my own design education company called Ideaplatz through which I… well, I used to travel the world, giving workshops and talks about the work I did. Particularly about natural user interfaces, teaching people how to do voice interaction and soon to be multimodal interaction design work, to help people level up their own work and move a step into the future of design.

Natural user interfaces

Jorge: What is a natural user interface?

Cheryl: Well, it's not a keyboard. The term was sort of an umbrella term given to things you didn't need a peripheral for. So, a mouse or a keyboard they're not inherently natural, because you needed a manmade object to mediate the interaction and manipulate the interaction. Whereas a gesture is… at least you could argue "natural," right? Like, we could have a semantic debate about like whether waving at your Kinect on the X-Box is actually natural. But, at the time when it was created, it was called a natural user interface because it was more natural than a mouse. Same with speaking to your device. It's considered more of a natural user interface than clicking on a mouse, for example. It's interesting to think about whether a category of interactions that I call "ambient interactions" are natural, where you express your intent upfront and devices interact with you based on sensors and their interpretation of the environment. Like, you walk into the house and the lights come on. Is that a natural user interaction? I think that's interesting to think about moving forward, too.

Jorge: What I'm hearing there is that the interaction happens as a result of you using your body, somehow — the capabilities of your body — without the assist of some kind of peripheral… you mentioned a keyboard, a mouse, right?

Cheryl: Yes. You can use sensors, obviously, because the computer needs a way to interpret it somehow, but you don't have to manipulate something. You can just move in space freely.

Multimodality

Jorge: Which brings us squarely to the subject of our conversation today. You've just published a book called Design Beyond Devices. And the subject of the book, going to try to summarize it into one word: it's multimodality.

Cheryl: Yes! And when I was starting to work on the book and describe the book to others, you know, the subtitle of the book is Creating Multimodal Cross-Device Experiences and everybody's on board for cross-device experiences. So, like, "Oh yeah! We got a lot of devices. That's a definite problem! I understand you there." but when I would say a multimodality, a lot of people would be like, "ahhh, what is that?" And it made sense to me at the time, because I had been working on multimodal experiences for a long while, since working on video games and working on the Nintendo DS, which was a multimodal game system. But it was really interesting to go out in the world and see what — even in the design world — that the word multimodality is multimodal. It has multiple definitions.

For the purposes of my book, the definition we're working with for multimodality is that multimodality is an exchange between a device and a human where multiple input or output modalities can be used simultaneously or sequentially, depending on context and preference. So, if we think about the traditional desktop-to- human relationship or laptop-to-human relationship, you have your keyboard and mouse and your monitors. There was one output, for the most part, which was the dominant output is visual. And the dominant input is haptic, where you're using your hands to manipulate physical input devices. It's not really super multimodal. And it's certainly not optimized for multimodality.

You could argue that occasionally there's a secondary output in audio. And some designers are doing a little bit of kinetic input when they use like a Wacom tablet or something like that. But it's not the default way of working. And there's so much more potential there. And we think about what's happened in the last few years with the arrival of smart speakers, with the arrival of voice search on Google, with the fact that most of our customers are deeply comfortable speaking to their devices now, with the arrival of Kinect back in like 2010-2011 timeframe, and the fact that some customers are even comfortable, like waving to their devices and gesturing at them now. There's so much more potential than just moving a mouse and keyboard around.

But I can say from experience, that if you're trying to do more than just move a keyboard or mouse, it definitely drastically increases the complexity of the design experience. And that's why I brought this book into being. There are a lot of considerations, there are greater burdens on your customer research to understand your customer's context and views. And to understand what form of multimodal manifestation makes sense for your customers. Because just because you have a ton of outputs and inputs doesn't mean that you should lean on all of them all the time. You know, if your customer is in their living room and they're always going to be near a remote control, maybe you don't want to lean on voice as much as you do the remote because it's just faster to press the channel button versus the kitchen where… the kitchen is a dynamic environment where the hands are often full, and voice becomes far more compelling. You know, those are two environments we're very familiar with, but your customers may have other contextual limitations we're not aware of hence the additional requirements on your research. It's just so much to dig into, which is why the book.

And, just to make things even more interesting, when we start dealing with more potential to express ourselves on a natural human level, the ethical considerations become more complex too. Particularly with speech, because speech is directly wired to the emotional centers of our brain, and there's really nothing we can do about that. In our current technological state. Our brain is just going to interpret spoken signals as people and evaluate them based on our social contract. So, there's an additional burden on designers to consider a multimodal design, a multimodal experience's impact within a greater system and impact on the human being.

All of those things are explored in the book, the how to take a multimodal experience — an idea you have — how to deepen your customer research to get better context, how to build a systems design framework to represent your customer's context within the system so the system can adapt to them as opposed to the other way around, how to evaluate the ethical validity of — for lack of a better term — your work. And perhaps suss out potential problems and deal with them at an earlier stage. And if there's also a deep exploration of all of the potential current technologies you could be working with and where they fall on a spectrum of multimodality.

Jorge: There's a lot there to take in, right? And I think that many designers working today in — gosh, I'm not happy with the phrase "digital design," but designing for things that will manifest in digital systems — are working in problems that will manifest as screen-based work. And it feels like even within… like, if we were to stick to that mode, there's so much to learn. Are there many designers working on these multimodal problems as opposed to like screen-based problems?

Cheryl: Well, it's interesting because I think a lot of designers are actually working in multimodal context and don't realize that they're already doing so. Especially on mobile phones because they're designing for touch but also potentially gesture. So, we're working with emotion-based input and touch-based input. They may be also supporting voice. Siri, for example, is inherently multimodal: You speak to it, then it shows you output in addition to speaks to you and you may can continue your interaction on the phone. And then on the desktop, sometimes your experiences are multimodal, but you're just not paying any attention to them, the multimodal side of things.

For screen readers, you have customers that need a multimodal way to interact with your system or need your system to allow choice and there's just not attention being paid to that choice. So, it's a subpar experience for your customers. If your customer is visually dominant, and your output is primarily screens, your customers who have visual disabilities are not getting a great experience. And so, they're interacting with your system with a screen reader. And you may or may not — depending on your commitment to accessibility — be giving them a good experience with a connection to a screen reader, and attention to that sort of part of your service design. But by incorporating voice design or acoustic design, even just the speech output part of things, into your work and becoming a little bit more intentionally multimodal, you can differentiate your work, potentially expand your market, and become more inclusive all at the same time.

Those customers are there. They've always been there. And your work in many cases, especially if you're working with government clients, you have to deal with those customers. It's just, are you being intentional about it? And the thing is, especially with desktops and laptops, the sensors, like the capability to expand and be more creative and be more dynamic and more intentionally multimodal that's all there. 10 years ago, that was not there, to add the ability to be more acoustically dominant, to process speech or things like that, that was not there. But now we have cameras, we have far-field microphones built into devices. That's all there for the taking, if we want to be a bit more intentional about those relationships with our customers.

Jorge: If I might reflect that back to you, what I'm hearing there is that, if we perceive ourselves as being screen-based designers primarily, that's for legacy reasons. And we're doing ourselves a disservice by not acknowledging the fact that we are working with a much richer palette now that allows us to do things like create systems that are more accessible, even if we're not explicitly aiming to design for something like Siri or Cortana.

Cheryl: That's very well said. Now to the root of your original question: designers who are working like, deeply in both with voice and screen all at the same time, that's a… right now that's a smaller subset to be sure. And when I was working at Amazon, when we were working on the launch of the Echo Show, it was us for the most part and a couple of automotive designers at various automotive manufacturers around the world, maybe. So that, that was a daunting challenge. Part of the reason this book exists, is to take some of that complexity that we dealt with away from you, so that it's possible that you can add that level. Because at the time, like you needed Amazon's resources to have that level of experience because it was a very complicated problem: "okay, take the smart speaker and add screens to it and rationalize all of your choices again with the screen." Hopefully, this can take a little bit of that burden away.

But the other point I want to make, coming back to all of us are kind of multimodal designers, whether we recognize it or not, I certainly don't mean to imply that we're all going to become that designer, working on the Echo Show, trying to pull voice and screen and everything together. We, as an industry, we have specialties. Some of us are visually focused, some of us are interaction focused. And there are going to be designers that really lean into this kind of work. The people who are really living at the intersection of screens and visuals and haptics and there will be folks who, for whom that's like a career passion, they love it, it's great. But there is still going to be a need for folks who are visually specialized. Adding that multimodal layer to your experience does not remove the need for a polished visual experience. It does not remove the need for a well thought out voice experience. It just adds complexity.

If it were easy, everybody would be multimodal already. But I also contend that even if you're one of those folks in those verticals, if you're just in voice or visuals, it's really helpful to understand the big picture and to understand some of the concepts of this book, like how you expand your understanding of your customer's context, because that is changing, how you expand your understanding of the different technologies that it may be in use in your devices, because that is changing, and for any designer, how you understand the implications and ethics of the use of your devices in this new world order, because that is changing. All of those things are really important when we start expanding the capabilities of these experiences.

CROW

Jorge: When I saw the title of the book, Design Beyond Devices, where my mind went was, " well, if I'm 'designing beyond devices,' I have to think maybe deeper or more abstractly than user interface." And my apprehension going into it was, "boy, that can get pretty abstract pretty quickly!" And I was very gratified to see that there are very practical frameworks throughout the book to help you tackle different aspects of design work with this complex and potentially abstract problem space. I was hoping that you would tell us a little bit more about one of these, which I felt was central to the book, and it's CROW, which is an acronym.

Cheryl: Yes, I'd be happy to. And first of all, thank you, because I'm just so glad to hear that the frameworks resonated because I am a pragmatist. I love design theory, but it's important to me that people have something to apply. Even in my talks, I want people to be able to walk away with something. And the CROW framework is particularly close to my heart because it's inspired by my time in improvisational theater. Outside of design, I am a professional improviser and yes, that is something you can do professionally. It doesn't really pay a living wage. That's another subject; you should pay artists. But I work with a theater called Unexpected Productions in Seattle. And they're in the Pike Place Market when there's not a pandemic. Fun fact: they're the reason the gum wall exists; that's also another story.

But at our theater, for almost as long as the theater has existed, when improvisers step on stage, and there's an audience there waiting for them, they have expectations that a scene is going to be compelling. And so, we as performers need a framework for making a scene compelling quickly for bringing audience members into an invisible context. And so, we have a framework that we teach our students — and I do teach improv with Unexpected Productions — and that framework we use is called CROW. It's a shorthand for four elements of a compelling scene in our vernacular. And so "C" stands for character, "R" for relationship, "O" for objective, and "W" for where. And our goal in improv is to define a little bit of each of these elements as quickly as possible. If we can establish where we are, if we can establish our relationship to the other people in a scene, if we can establish what our characters objective is in the scene, and we can establish something about our character, whether it's an affectation or their occupation — like something about them — it's much more compelling for the audience than just two people having a conversation with no context.

Now, obviously designers are not trying to create an improv scene, but they are trying to reverse-engineer what's compelling about a customer's world. And so, I've taken the CROW framework and done that reverse engineering for you and turn it into a series of questions and prompts to help you pull out all of the interesting context out of a customer's environment. And this is important because when you're working on multimodal or cross-device experiences, what your customer's doing is deeply important because that's… like what your customer's objective is in the moment is going to tell you why they switched devices or why they chose a particular modality. You know, what defines your customer's character — how they define themselves in their identity — is going to tell you how your experience affects them. If they're transgender and you've chosen a specific gender, a voice, but it's expressing itself in a specific way, it may have a particular emotional impact on them that you didn't expect.

So, what is your customer's relationship to the other people in the room? Do they trust the other people in the room? If you've got a spoken interface and they don't trust the other people in the room, they're certainly not going to do banking. That's a problem we've had with Cortana from the beginning, right? They wanted not necessarily banking, but it was built for people doing productivity stuff and then open workplaces, and you feel embarrassed around those coworkers because your relationship with them is you want to impress them and be professional around them. And then the "where," we've been able to assume so much about our customer's context for so long. Like, they are in an office, in an open workspace, and they have a desktop and a keyboard and… or they're at home and they're in front of a desktop and keyboard. But now that our devices are smaller and more capable and they're smart speakers everywhere and phones, those assumptions are just all out the window.

And that was true when I started writing this book, and then a pandemic happened. And wow, all the assumptions are super, like, they're out the window and they're like six miles away, completely invisible now. And you can see that, if you look at LinkedIn listings because — you know, it's tragic how many jobs have been lost in this market — but I've seen a lot of movement on UX researchers. There's a lot of new listings for UX researchers because the companies that are really with it understand that wow, we just… we don't understand anything anymore! And it's particularly important when you're trying to build a system where it's going to allow customers to switch back and forth between voice and touch and gesture and all that stuff. You need to understand: are they moving between rooms? What devices are in arm's reach? Who else is around them? What are the ambient conditions? All of that stuff.

So, CROW: character, relationship, objective and where. In the book, I walk through that and I give you specific questions for each of those prompts so that you can with your team explore which of those things you think you know already, and which of those things you think you need to build into your customer research that you might not have already. And I've also put together a set of four worksheets that you can use to kind of power up a small workshop at your own firm. And those are on my website, and those are free, anybody can download them, because I've given a couple talks on this concept of capturing customer context — chapter two of my book. And just a quick plug: I'm giving a workshop on this concept at Interaction 21 in February. So, there's spots available for that at the time of this recording. So, feel free to join me if you're passionate about this subject.

Designing the underlying structures

Jorge: A lot of design work is done towards making a tangible artifact, something that you can put in front of people and test. And certainly, many of the chapters in the book deal with the manifestations of some of these design decisions. But they are informed by an underlying set of concepts, like a conceptual model that underlies the whole thing, which is informed by things like CROW. And I'm wondering about the design work at that level. So, I understand the design work towards making a screen-based artifact that you can test with people or a voice-based system that you can simulate through Wizard of Oz-type research sessions. But what about the modeling of the underlying concepts that inform all of these things?

I feel like I'm asking the question and I sounds to my own ears a little abstract. So, I'll try to make it concrete with an example. So, if I'm interacting with the smart cylinder in my living room — we have a HomePod here at home — I'm making requests of that thing that require my understanding a certain vocabulary and a certain conception of conceptual objects. Right? Like, I need to understand that it conceives of music as being contained in a structure of albums, artists, songs, right? And I can learn this grammar over time. And that conceptual structure, I experience not just by talking to the cylinder, I also experience it by opening the Apple Music app on my computer. And I'm wondering about the design work that happens on the conceptual structures, as opposed to the manifestation in the cylinder. You know what I mean?

Cheryl: Yeah. It was an interesting journey. When I joined the Alexa team, my main assignment was design Alexa notifications, like the Alexa notification system. And it's not just like writing the notifications, it was designing the system you can build everything upon it. Before you could build the notifications, you needed to figure out like how you could interrupt the activities on all of these devices. And that manifested in a couple of different ways. And it felt like one of the reasons I was — I wouldn't call it comfortable, because this was definitely outside of a normal comfort zone because it was a really big task — but coming from a computer science background, you do describe algorithms and you do describe conceptual frameworks on a somewhat regular basis, tech specs and things like that. So I just fell back to that to start, using flow diagrams and falling back to my old game design days. And game design too, like you're using prose to describe conceptual frameworks that will turn into gameplay upfront because you can't necessarily just go straight to prototype, you need your developers to be on your side. And always… My documentation usually takes a pyramid-shaped structure; I'll start with prose and then just keep boiling it down into simpler concepts until I get to something I think everybody can grok, and that's usually the core.

And with notifications, it was like finding those patterns and figuring out that… In chapter four, we talk about the patterns of the interruption matrix. An interruption matrix as I manifested it for the first time on Alexa notifications is, on one axis, you have all of the different activities your customer might be engaged in, and on the other axis, you have all of the different ways that customer might be interrupted. And then each cell is a pattern, like a way UI might manifest. But all the rows and columns have to be patterns, because if you think about all of the different things you can do on a system like Alexa, there's just hundreds and hundreds. You would not want a table like that, trust me, because they had one for Echo and it was a lot. So, we had to boil that down. And so those conceptual frameworks…it was all about boiling things down, looking for patterns, sifting, squishing, something that I'm sure that feels very much at home for you and the kind of work that you do. And just continuing to push towards the top of that pyramid and finding something common that we can all talk about.

The same is true of the spectrum of multimodality that manifests in chapter seven, which is a two-by-two grid with two axes. One of them is how close your customer is to the device, and one of them is, how much information your customer needs. Whether it's just a little bit or whether it's really rich information? If you have those two pieces of information, we can start to have a really rich conversation about what types of multimodal design choices you should be making or can be making for your customer. And again, that was the way to get to the top of the pyramid from a lot of complicated conversations we were having. Talking about the Echo Show versus the Echo, do we want to be as verbose versus the Fire TV? Like the Fire TV, we don't want it to talk to you as much because you have the remote and it just seems talky. But the Echo Show, we wanted it to say as much as the Echo did because we didn't know you were going to be looking at it. And so, how do you turn that into something we can all just quickly talk about? So, it's always pushing towards the top of that pyramid.

The multimodal future

Jorge: In the book, there are several, I'm going to say interests, but maybe they're even passions of yours that come across. Things like Star Trek as a precedent for some of this stuff, right? And another one is the work of the Walt Disney Imagineers, the people who design theme parks. And I'm wondering, as more and more of these ideas become mainstream through things like the Echo and Cortana and Siri, what are you most excited about our multimodal future?

Cheryl: Honestly, inclusive design — in particular, manifestations in healthcare. I mentioned in chapter one that I now identify as disabled, and that's been a journey for me. I knew I had bad medical luck, been in and out of doctor's offices, like a German car, for a very long time. And I am German so that I can make that joke. But this year, I found out that it's a genetic condition that I've had my whole life. And so, I spent a lot of time in doctor's offices. I spent a lot of time watching doctors struggle with mass-based systems and like the EPIC record system. And also talking to Anna Abovyan, who was one of the 10 experts I interviewed, who was working at 3M on multimodal systems that help doctors with dictation and AI augmentation of the dictation experience. There's just so much there! Like just in that one example. I spent a ton of time at physical therapists and there's so much there. You know, I want an Alexa app that talks to me through my physical therapy stuff. Because, looking at my phone while doing physical therapy doesn't make a lot of sense. There's just so much potential particularly in the medical space and there are so many disabled folks using technology that just are continually left behind. And even on things we think we've gotten right.

You know, there's still entertainment apps, like TikTok, that don't have automatic captions. And we need to think about that. But there's so much more potential than even that baseline. And I'm looking forward to seeing how that expands. Like when I was working at Microsoft and working on power apps, we never got around to it, but I was like, well, you know, it would be really arduous to create an initial app via a screen reader and connect all the backend stuff and do the initial layout. Why couldn't you just use a conversational app to express what you wanted? I want a list view over here and I want a preview over here and I want it connected to this database and get your initial app set up. If you do something like that, you could do that with voice. If you're mobility impaired, you could start building an app. And those improvements help everybody. So, they seem strange at first and they seem quirky and it's easy to get stuff wrong, like conversational design, so I don't mean to imply that chatbots or anything like that are the solution to everything because they are not, and they can be easily done wrong. But there's just… particularly healthcare, but inclusive design in general, there's just so much there.

Jorge: And it's so much that applies to designers working on all sorts of design challenges, not just design challenges that are explicitly about multimodality.

Cheryl: Absolutely. And a point I made a couple of days ago at a talk, someone asked, like, "how do you want this book to affect people?" And one point I made, going back to the change the pandemic has caused and the changes to our assumptions… I used to design for the Gates Foundation, the experiences in conference rooms, and making sure that when we bring everybody into a room for a very important meeting, that everything would go smoothly. Whiteboards aren't really a thing, right? Are people going to touch whiteboard markers anymore? Are people going to touch the in-room consoles that we spent so much time working on? What do we do now? Like, how do we get around that problem? And when we talk about inclusivity, it's horrible, but COVID is a disability factory, you know? Long COVID and the impact it is having to people's mobility, the problems are getting bigger and they're getting more urgent. And so there's a lot of potential here and there's a lot of need.

Closing

Jorge: Well, it seems like it's the right book for the right time. So, thank you so much for sharing it with us, Cheryl. Where can folks follow up with you?

Cheryl: Well, the good news is lots of places, because I'm addicted to social media. So, I have two accounts to choose from on Twitter. If you want everything — all of the tweets — I am @FunnyGodmother, and that's a mix of my, design work, but also just random musings, some stuff on improv, everything. But if you just want design-focused content, you can follow me at @IdeaPlatz, my company account. I'm also on TikToK as @FunnyGodmother, which is a mix of design content, also talking about my time in video games. There's a series — very popular — talking about stories from that time. And @FunnyGodmother on Instagram. My company website, Ideaplatz, you can go there to see those worksheets I described, if you want to see a taste of what the CROW workshop worksheets might look like for you and see examples of some of my past talks, there's a lot of links to past videos from some of my past talks about voice design, et cetera. And of course, cherylplatz.com if you're curious about my career or other trivia bits like that or want to see embarrassing videos of me doing past acting stuff. So, that's all there too.

Jorge: Fantastic. Thank you so much for being with us Cheryl.

Cheryl: Thank you. This is a really fun conversation. It was great to be here, Jorge.