WEBVTT 00:00:03.375 --> 00:00:09.641 There are several different ways that interfaces can help people think more fluidly 00:00:09.641 --> 00:00:14.706 by distributing their cognition into artifacts in the world. 00:00:14.706 --> 00:00:20.403 When interfaces help people distribute a cognition, it can encourage experimentation; 00:00:20.403 --> 00:00:25.207 it can scaffold learning and reduce errors through redundancy; 00:00:25.207 --> 00:00:28.073 it can show only the differences that matter; 00:00:28.073 --> 00:00:32.409 it can convert slow calculation into fast perception; 00:00:32.409 --> 00:00:35.679 it can support chunking, specially by experts; 00:00:35.679 --> 00:00:37.822 it can increase efficiency of interactions; 00:00:37.822 --> 00:00:40.566 and it can facilitate collaboration. 00:00:40.566 --> 00:00:44.047 Let’s go through these one at a time. 00:00:46.031 --> 00:00:52.613 Here’s a video game that, I bet, many of you all know: This is the Tetris video game. 00:00:52.613 --> 00:00:59.861 And there was a very clever study that David Kirsh and Paul Maglio at UC San Diego ran, 00:00:59.861 --> 00:01:05.704 where they looked at Tetris players playing Tetris, and what they found was really interesting. 00:01:05.704 --> 00:01:10.829 In Tetris, as you may know, you can use keys to move objects around on the screen. 00:01:10.829 --> 00:01:18.456 And at a certain point you could hit the space bar to drop the object and try and get a row of a few blocks. 00:01:18.456 --> 00:01:21.594 Moving and rotating pieces on the screen may seem like a waste of time 00:01:21.594 --> 00:01:25.303 because you have a limited amount of time before the block hits the bottom. 00:01:25.303 --> 00:01:28.850 So, how do you use that most efficiently? 00:01:28.850 --> 00:01:34.004 It turns out that people move the block around the screen 00:01:34.004 --> 00:01:39.072 more than they — in some purely theoretical sense — need to. 00:01:39.072 --> 00:01:44.437 So, in essence, I’m trying out different places that the blocks could go. 00:01:44.437 --> 00:01:50.623 Maybe this is just for novices that when you’re learning Tetris you need to feel things out 00:01:50.623 --> 00:01:54.430 but as you become more of an expert that’s no longer the case? 00:01:54.430 --> 00:01:55.919 Exactly the opposite! 00:01:55.919 --> 00:02:02.900 Kirsh and Maglio found that experts actually relied more heavily on moving objects in the world. 00:02:02.900 --> 00:02:07.906 And what they were doing is they were distributing the cognitive effort of “what if” scenarios — 00:02:07.906 --> 00:02:10.666 What if I placed it over here? What if I placed it over there? 00:02:10.666 --> 00:02:13.175 You absolutely can do that in your mind, 00:02:13.175 --> 00:02:18.615 but it turns that in this case, for most people, it’s cheaper to do it out in the world 00:02:18.615 --> 00:02:24.418 to be able to turn the cognitive task of reasoning about all of the what-if scenarios 00:02:24.418 --> 00:02:29.606 into a perceptual task of “Oh yeah, that block would work perfectly right there.” 00:02:29.606 --> 00:02:34.628 It saves you the effort of having to, in particular, mentally rotate the different pieces 00:02:34.628 --> 00:02:38.318 to figure out how they would fit into the screen. 00:02:38.318 --> 00:02:43.955 Here’s another example from the learning sciences: There are called Montesorri blocks. 00:02:43.955 --> 00:02:51.349 And what we have here are beads that provide a physical representation for number[s]. 00:02:51.349 --> 00:02:57.291 Especially for young children, numbers are really abstract concepts, difficult to get your head around. 00:02:57.291 --> 00:03:04.274 And these physical instantiations can help teach addition, multiplication and other simple arithmetic operations. 00:03:04.274 --> 00:03:11.317 So, for example, if I’m going to take three and multiply it by three to get three squared, 00:03:11.317 --> 00:03:18.328 well, I can see that I have three by three — I have a square and I can see that it’s composed of nine beads. 00:03:18.328 --> 00:03:21.776 And there’re other addition and multiplication options that you could with these also. 00:03:21.776 --> 00:03:24.317 And by having this redundant information — 00:03:24.317 --> 00:03:28.648 by taking an abstract concept and “realfying” it and making it concrete — 00:03:28.648 --> 00:03:32.344 it can help scaffold learning and reduce errors. 00:03:32.344 --> 00:03:36.023 The Montessori blocks example and the Tetris example 00:03:36.023 --> 00:03:42.138 show us the power of providing a visual or physical instantiation of abstract ideas. 00:03:42.138 --> 00:03:45.615 So what makes a good representation of this sort? 00:03:45.615 --> 00:03:50.960 Well, you should show the information that you need and nothing else. 00:03:50.960 --> 00:03:54.752 And, what these representations should do is 00:03:54.752 --> 00:04:01.056 it should enable to kinds of tasks that users want to do like comparison and exploration and problem solving. 00:04:01.056 --> 00:04:03.790 And if that seems too abstract or maybe obvious, 00:04:03.790 --> 00:04:07.122 let’s take a look at this example from the London Underground. 00:04:07.122 --> 00:04:10.489 This subway map was introduced about a century ago 00:04:10.489 --> 00:04:16.560 and it was one of the very first maps to introduce a brand-new idea in map design: 00:04:16.560 --> 00:04:24.548 of abstracting the layout of the tracks from the underlying physical geography. 00:04:24.548 --> 00:04:29.586 Prior to this London subway map, the maps would show what the geography was 00:04:29.586 --> 00:04:31.921 and so long things were long and short things were short; 00:04:31.921 --> 00:04:34.872 and if the tracks wandered because that’s the way that it works; 00:04:34.872 --> 00:04:38.934 then the tracks on the map would wander because that’s the way things worked. 00:04:38.934 --> 00:04:42.918 And with the Underground map designers realized was that 00:04:42.918 --> 00:04:50.018 the most common task for subway riders is to figure out how to get from point A to point B, 00:04:50.018 --> 00:04:54.881 and all of this additional detail of faithfulness to the underlying topology 00:04:54.881 --> 00:05:00.521 was getting in the way of that A-to-B task more than it was helping it. 00:05:00.521 --> 00:05:04.617 And so what they did is they stripped that a lot of that unnecessary detail, 00:05:04.617 --> 00:05:09.604 turning this into vertical, diagonal, and horizontal lines. 00:05:09.604 --> 00:05:15.846 So there are some representation between the layout on the map and the layout on the real world — 00:05:15.846 --> 00:05:18.184 North is north, and south is south, 00:05:18.184 --> 00:05:22.192 and things roughly head in the direction that they do in the real world, 00:05:22.192 --> 00:05:24.252 but the detail is stripped out. 00:05:24.252 --> 00:05:29.646 And this makes it much easier to be able to figure out how to get between connections. 00:05:29.646 --> 00:05:32.088 Another thing that they did is they introduced 00:05:32.088 --> 00:05:38.082 what a century later we would call a “focus plus context” representation for the map. 00:05:38.082 --> 00:05:42.445 In the center of London, the subway stations are very densely packed. 00:05:42.445 --> 00:05:46.355 So that area is expanded out: it consumes more of the map real estate. 00:05:46.355 --> 00:05:50.869 As you get out toward the suburbs, the stations are fewer and further between. 00:05:50.869 --> 00:05:55.355 As oppose to that taking up 90% of the map because it’s 90% of the space. 00:05:55.355 --> 00:05:57.991 Those stations are actually scrunched, 00:05:57.991 --> 00:06:00.829 because if you know you need to go northeast to a particular station, 00:06:00.829 --> 00:06:06.504 then the exact distances involved are most of the time less relevant. 00:06:06.504 --> 00:06:11.281 Now, of course with constitutes of good representation is of course task-specific. 00:06:11.281 --> 00:06:15.811 And so what you can see is that by making some tasks easier — 00:06:15.811 --> 00:06:19.433 like getting from A to B when you know A and B, 00:06:19.433 --> 00:06:23.650 or being able to navigate the center of London more effectively — 00:06:23.650 --> 00:06:26.285 you’ve compromised on other tasks. 00:06:26.285 --> 00:06:28.871 And so, for somebody who may need to make decisions 00:06:28.871 --> 00:06:32.748 about what stop to get off at based on some underlying topography, 00:06:32.748 --> 00:06:39.739 or another task that’s compromised is by virtue of the distance between the stations on the map 00:06:39.739 --> 00:06:43.998 not lining up between the distances between stations in the world, 00:06:43.998 --> 00:06:48.219 you can make poor judgments about what’s close and what’s far: 00:06:48.219 --> 00:06:53.177 In the center you may believe things are far apart when they are really close, 00:06:53.177 --> 00:06:58.197 and out in the suburbs, you may believe from the map that things are closer than they actually are. 00:06:58.197 --> 00:07:03.806 And so nearly all representation design is about fitness to task. 00:07:03.806 --> 00:07:06.878 Here’s a temperature map from the Weather Underground. 00:07:06.878 --> 00:07:10.177 It shows the temperature at each location in the Bay area, 00:07:10.177 --> 00:07:17.011 geo-referenced so that the temperature number is placed right on top of that physical location. 00:07:17.011 --> 00:07:21.302 What do you think are the benefits and drawbacks of this representation? 00:07:21.302 --> 00:07:26.039 What’s it good for and what’s it a problem for? 00:07:30.670 --> 00:07:35.438 If you know the physical coordinate that you’d like the temperature for, 00:07:35.438 --> 00:07:39.376 say for example, “What temperature is it along the coast?” 00:07:39.376 --> 00:07:45.707 and I don’t care or don’t know the exact name of the town, this works incredibly well. 00:07:45.707 --> 00:07:50.024 It’s also a reasonable interface, in some sense, 00:07:50.024 --> 00:07:54.481 for trying to get a good [inaudible] of what are the temperatures like in the area overall: 00:07:54.481 --> 00:08:00.425 I can see, for example, as I head inland the temperature tends to get warmer. 00:08:00.425 --> 00:08:03.958 There are a lot of ways in which we could make this better. 00:08:03.958 --> 00:08:10.147 So, right now, every temperature is shown identically no matter what temperature it is 00:08:10.147 --> 00:08:15.138 which means it’s hard to scan: I have to read every single temperature one by one. 00:08:15.138 --> 00:08:22.289 I could make this better if instead I had the temperature number somehow — 00:08:22.289 --> 00:08:29.088 the color or size of the temperature number — correspond to the weather, 00:08:29.088 --> 00:08:36.533 But if we’re going to start mapping the variables of the map to something like color, 00:08:36.533 --> 00:08:39.448 we need to be careful to get it right. 00:08:39.448 --> 00:08:43.353 This is an example that comes from Edward Tufte. 00:08:43.353 --> 00:08:49.315 His books on visual design and the graphical representations of data are fantastic 00:08:49.315 --> 00:08:50.856 and I strongly encourage you to read them. 00:08:50.856 --> 00:08:58.596 In this map, we see how a computer scientist might make a mapping for Japan. 00:08:58.596 --> 00:09:04.235 This is showing the height above or below sea level as color, 00:09:04.235 --> 00:09:11.963 and what you can see is the depth below sea level is represented by the color spectrum Roy G. Biv. 00:09:11.963 --> 00:09:21.007 Now, one of the challenges of hue is that it’s not an additive representation. 00:09:21.007 --> 00:09:26.901 So it’s not really a strong ordering that we give to the colors perceptually. 00:09:26.901 --> 00:09:32.961 It’s a substitutive representation that red and yellow are qualitatively different, 00:09:32.961 --> 00:09:37.113 but we don’t automatically have a more-than or less-than relationship between the two. 00:09:37.113 --> 00:09:39.095 Another problem on this representation, 00:09:39.095 --> 00:09:45.093 is it’s very difficult by glance to tell what’s higher and what’s lower in the sea. 00:09:45.093 --> 00:09:55.938 Conversely, the individual isosurfaces — the individual chunks of a particular depth — really pop out. 00:09:55.938 --> 00:10:01.948 Like, to me, for example, the yellow depth pops out very strongly 00:10:01.948 --> 00:10:05.846 and that shape really comes to the attentional fore, 00:10:05.846 --> 00:10:10.942 which, if you are making a rock-and-roll poster for the Fillmore, would be awesome, 00:10:10.942 --> 00:10:15.266 but if you’re trying to get a sense of the contours of the sea, 00:10:15.266 --> 00:10:23.435 what become salient to you, the outlines for what’s 400 meters below sea level, it’s probably just not that relevant. 00:10:23.435 --> 00:10:28.301 So how could we continue our theme of using color as a representational cue 00:10:28.301 --> 00:10:32.805 but have it be more meaningful than you might see in this case 00:10:32.805 --> 00:10:35.565 where we’re mapping it to the color spectrum? 00:10:35.565 --> 00:10:39.643 And here’s Edward Tufte’s redesign which I think is much better. 00:10:39.643 --> 00:10:43.701 There’s a couple of things that I really like about the representation here. 00:10:43.701 --> 00:10:51.250 The first one is that all of the things that are above sea level are brown — are kind of an earth tone. 00:10:51.250 --> 00:10:56.566 So, we’re leveraging our intuitions about the physical world and using that metaphorically for the map. 00:10:56.566 --> 00:11:00.874 So, the land-colored stuff is land. 00:11:00.874 --> 00:11:05.577 Similarly, all of the water is blue — the water-colored stuff is water. 00:11:05.577 --> 00:11:15.846 And furthermore, we can see how the intensity — or the luminance — of that color blue changes with depth. 00:11:15.846 --> 00:11:22.773 And the deeper blues are darker blue, which corresponds to our physical intuitions. 00:11:22.773 --> 00:11:29.651 And of course, water doesn’t really get that much darker at the kind of depth that we’re talking about here — 00:11:29.651 --> 00:11:34.762 our intuition about darker colors being deeper comes from much shallower depths. 00:11:34.762 --> 00:11:41.224 But the idea — you can leverage this thing that we all know that water right by the shore is a paler color 00:11:41.224 --> 00:11:44.359 and as you get more of it gets darker. 00:11:44.359 --> 00:11:51.860 So this is a really wonderful way to see that these darker areas here are deeper than these shallow areas here. 00:11:51.860 --> 00:11:58.196 With the London Underground map, we saw how the representation of the map — 00:11:58.196 --> 00:12:03.869 what makes a good representation — was intrinsically tied up with the task that the user is doing. 00:12:03.869 --> 00:12:11.972 Similarly, what makes a good representation is also dependent on what a user’s expertise is. 00:12:11.972 --> 00:12:18.696 And a wonderful example of this comes from Herb Simon and [Bill] Chase in the early 1970’s. 00:12:18.696 --> 00:12:24.965 They were looking at chess as an exemplar domain for trying to understand expertise. 00:12:24.965 --> 00:12:28.229 One of the things that they observed was that 00:12:28.229 --> 00:12:34.817 expert chess players were much better at being able to remember the configuration of a chess board. 00:12:34.817 --> 00:12:38.580 You can imagine a couple hypotheses for this. 00:12:38.580 --> 00:12:46.659 So, one of them would be “Experts were born with a better memory for that kind of thing”; 00:12:46.659 --> 00:12:51.643 Or, similarly, “Experts by virtue of their ten thousand hours of training 00:12:51.643 --> 00:12:56.656 have trained themselves up to build up that muscle and be very good at remembering that kind of thing.” 00:12:56.656 --> 00:13:00.265 Neither turns out to be the case. 00:13:00.265 --> 00:13:12.124 Experts are much better at remembering the configuration of a board, but only if it’s an actual game! 00:13:12.124 --> 00:13:18.978 So if the configuration of the chess board is the configuration of how a chess board could be, 00:13:18.978 --> 00:13:22.332 experts do a fantastic job of remembering it. 00:13:22.332 --> 00:13:28.536 But if you arranged the pieces on the board in a way that a chess play could not ever achieve, 00:13:28.536 --> 00:13:32.433 the experts actually do no better than novices at all. 00:13:32.433 --> 00:13:41.286 And so what we’re seeing is that the ability of experts to chunk things and have higher memory capacity 00:13:41.286 --> 00:13:46.597 is because they are able to leverage their knowledge about the domain. 00:13:46.597 --> 00:13:51.547 Game design and user interface design are both concerned 00:13:51.547 --> 00:13:56.035 with how easy or hard it’s for a user to accomplish a particular task. 00:13:56.035 --> 00:14:00.524 The difference is that often designers want to make it hard, the right hard; 00:14:00.524 --> 00:14:04.325 and interface game designers want to make things easy. 00:14:04.325 --> 00:14:07.024 And so we can learn from this chess example 00:14:07.024 --> 00:14:10.047 and we can ask this question as interface designers: 00:14:10.047 --> 00:14:13.748 “Can we make interfaces more chunkable?” 00:14:13.748 --> 00:14:17.807 Can we make interactions that can be accomplished in one chunk 00:14:17.807 --> 00:14:23.665 and therefore place a lower load on our memory and make it easier for users to work with? 00:14:23.665 --> 00:14:26.455 A great example of this comes from Bill Buxton 00:14:26.455 --> 00:14:32.650 who looked at being able to move text between locations on a document. 00:14:32.650 --> 00:14:37.055 And in a common desktop user interface today, 00:14:37.055 --> 00:14:43.543 one common operation would be either a keyboard shortcut or a menu command to cut some text, 00:14:43.543 --> 00:14:47.588 and then you move the cursor to a new location, and you paste that text. 00:14:47.588 --> 00:14:54.479 That’s three different operations and, in between, if you got interrupted, 00:14:54.479 --> 00:14:59.351 you might forget what’s in the copy buffer — in fact, I’m sure that’s happened to all of us at some point. 00:14:59.351 --> 00:15:08.553 What Buxton realized is what if you could turn all of this, through a gestural interface, into one command? 00:15:08.553 --> 00:15:16.572 So, maybe I could grab a text that I want, draw a new location and drop it right there. 00:15:16.572 --> 00:15:22.652 That’d be much better: There’s never a time where I could be interrupted and lose state 00:15:22.652 --> 00:15:28.092 because all of the state is maintained in this continuous gesture. 00:15:28.092 --> 00:15:31.709 We’ve all heard the saying that a picture is worth ten thousand words. 00:15:31.709 --> 00:15:39.801 As interface designers, we’re tasked with the project of representing the information to the user 00:15:39.801 --> 00:15:44.831 and one task that we commonly have to deal with is: 00:15:44.831 --> 00:15:50.518 Should we represent information visually, or should we represent information textually? 00:15:50.518 --> 00:15:54.025 The answer of course is it depends. 00:15:54.025 --> 00:15:59.909 But one time when representing information visually can be much more effective 00:15:59.909 --> 00:16:06.026 is when you can convert slow reasoning tasks into fast perception tasks 00:16:06.026 --> 00:16:09.950 by virtue of making them visually salient. 00:16:09.950 --> 00:16:15.396 We saw that with the map example: In that case, both the colorings of the map were visual, 00:16:15.396 --> 00:16:21.482 but [in] one, it was much easier to just add a glance to understand what’s going on in good coloring. 00:16:21.482 --> 00:16:27.095 And the poor coloring, you have to reason about it much more slowly and the wrong things kept popping out. 00:16:27.095 --> 00:16:31.365 If you think about a table of numbers, it can often be difficult to see trends, 00:16:31.365 --> 00:16:34.936 whereas if you represent that same information visually, 00:16:34.936 --> 00:16:40.482 it can often pop out what the high points, the low points, trends, outliers — 00:16:40.482 --> 99:59:59.999 all of that becomes salient and automatically visible to you.