WEBVTT
00:00:03.375 --> 00:00:09.641
There are several different ways that interfaces can help people think more fluidly
00:00:09.641 --> 00:00:14.706
by distributing their cognition into artifacts in the world.
00:00:14.706 --> 00:00:20.403
When interfaces help people distribute a cognition, it can encourage experimentation;
00:00:20.403 --> 00:00:25.207
it can scaffold learning and reduce errors through redundancy;
00:00:25.207 --> 00:00:28.073
it can show only the differences that matter;
00:00:28.073 --> 00:00:32.409
it can convert slow calculation into fast perception;
00:00:32.409 --> 00:00:35.679
it can support chunking, specially by experts;
00:00:35.679 --> 00:00:37.822
it can increase efficiency of interactions;
00:00:37.822 --> 00:00:40.566
and it can facilitate collaboration.
00:00:40.566 --> 00:00:44.047
Let’s go through these one at a time.
00:00:46.031 --> 00:00:52.613
Here’s a video game that, I bet, many of you all know: This is the Tetris video game.
00:00:52.613 --> 00:00:59.861
And there was a very clever study that David Kirsh and Paul Maglio at UC San Diego ran,
00:00:59.861 --> 00:01:05.704
where they looked at Tetris players playing Tetris, and what they found was really interesting.
00:01:05.704 --> 00:01:10.829
In Tetris, as you may know, you can use keys to move objects around on the screen.
00:01:10.829 --> 00:01:18.456
And at a certain point you could hit the space bar to drop the object and try and get a row of a few blocks.
00:01:18.456 --> 00:01:21.594
Moving and rotating pieces on the screen may seem like a waste of time
00:01:21.594 --> 00:01:25.303
because you have a limited amount of time before the block hits the bottom.
00:01:25.303 --> 00:01:28.850
So, how do you use that most efficiently?
00:01:28.850 --> 00:01:34.004
It turns out that people move the block around the screen
00:01:34.004 --> 00:01:39.072
more than they — in some purely theoretical sense — need to.
00:01:39.072 --> 00:01:44.437
So, in essence, I’m trying out different places that the blocks could go.
00:01:44.437 --> 00:01:50.623
Maybe this is just for novices that when you’re learning Tetris you need to feel things out
00:01:50.623 --> 00:01:54.430
but as you become more of an expert that’s no longer the case?
00:01:54.430 --> 00:01:55.919
Exactly the opposite!
00:01:55.919 --> 00:02:02.900
Kirsh and Maglio found that experts actually relied more heavily on moving objects in the world.
00:02:02.900 --> 00:02:07.906
And what they were doing is they were distributing the cognitive effort of “what if” scenarios —
00:02:07.906 --> 00:02:10.666
What if I placed it over here? What if I placed it over there?
00:02:10.666 --> 00:02:13.175
You absolutely can do that in your mind,
00:02:13.175 --> 00:02:18.615
but it turns that in this case, for most people, it’s cheaper to do it out in the world
00:02:18.615 --> 00:02:24.418
to be able to turn the cognitive task of reasoning about all of the what-if scenarios
00:02:24.418 --> 00:02:29.606
into a perceptual task of “Oh yeah, that block would work perfectly right there.”
00:02:29.606 --> 00:02:34.628
It saves you the effort of having to, in particular, mentally rotate the different pieces
00:02:34.628 --> 00:02:38.318
to figure out how they would fit into the screen.
00:02:38.318 --> 00:02:43.955
Here’s another example from the learning sciences: There are called Montesorri blocks.
00:02:43.955 --> 00:02:51.349
And what we have here are beads that provide a physical representation for number[s].
00:02:51.349 --> 00:02:57.291
Especially for young children, numbers are really abstract concepts, difficult to get your head around.
00:02:57.291 --> 00:03:04.274
And these physical instantiations can help teach addition, multiplication and other simple arithmetic operations.
00:03:04.274 --> 00:03:11.317
So, for example, if I’m going to take three and multiply it by three to get three squared,
00:03:11.317 --> 00:03:18.328
well, I can see that I have three by three — I have a square and I can see that it’s composed of nine beads.
00:03:18.328 --> 00:03:21.776
And there’re other addition and multiplication options that you could with these also.
00:03:21.776 --> 00:03:24.317
And by having this redundant information —
00:03:24.317 --> 00:03:28.648
by taking an abstract concept and “realfying” it and making it concrete —
00:03:28.648 --> 00:03:32.344
it can help scaffold learning and reduce errors.
00:03:32.344 --> 00:03:36.023
The Montessori blocks example and the Tetris example
00:03:36.023 --> 00:03:42.138
show us the power of providing a visual or physical instantiation of abstract ideas.
00:03:42.138 --> 00:03:45.615
So what makes a good representation of this sort?
00:03:45.615 --> 00:03:50.960
Well, you should show the information that you need and nothing else.
00:03:50.960 --> 00:03:54.752
And, what these representations should do is
00:03:54.752 --> 00:04:01.056
it should enable to kinds of tasks that users want to do like comparison and exploration and problem solving.
00:04:01.056 --> 00:04:03.790
And if that seems too abstract or maybe obvious,
00:04:03.790 --> 00:04:07.122
let’s take a look at this example from the London Underground.
00:04:07.122 --> 00:04:10.489
This subway map was introduced about a century ago
00:04:10.489 --> 00:04:16.560
and it was one of the very first maps to introduce a brand-new idea in map design:
00:04:16.560 --> 00:04:24.548
of abstracting the layout of the tracks from the underlying physical geography.
00:04:24.548 --> 00:04:29.586
Prior to this London subway map, the maps would show what the geography was
00:04:29.586 --> 00:04:31.921
and so long things were long and short things were short;
00:04:31.921 --> 00:04:34.872
and if the tracks wandered because that’s the way that it works;
00:04:34.872 --> 00:04:38.934
then the tracks on the map would wander because that’s the way things worked.
00:04:38.934 --> 00:04:42.918
And with the Underground map designers realized was that
00:04:42.918 --> 00:04:50.018
the most common task for subway riders is to figure out how to get from point A to point B,
00:04:50.018 --> 00:04:54.881
and all of this additional detail of faithfulness to the underlying topology
00:04:54.881 --> 00:05:00.521
was getting in the way of that A-to-B task more than it was helping it.
00:05:00.521 --> 00:05:04.617
And so what they did is they stripped that a lot of that unnecessary detail,
00:05:04.617 --> 00:05:09.604
turning this into vertical, diagonal, and horizontal lines.
00:05:09.604 --> 00:05:15.846
So there are some representation between the layout on the map and the layout on the real world —
00:05:15.846 --> 00:05:18.184
North is north, and south is south,
00:05:18.184 --> 00:05:22.192
and things roughly head in the direction that they do in the real world,
00:05:22.192 --> 00:05:24.252
but the detail is stripped out.
00:05:24.252 --> 00:05:29.646
And this makes it much easier to be able to figure out how to get between connections.
00:05:29.646 --> 00:05:32.088
Another thing that they did is they introduced
00:05:32.088 --> 00:05:38.082
what a century later we would call a “focus plus context” representation for the map.
00:05:38.082 --> 00:05:42.445
In the center of London, the subway stations are very densely packed.
00:05:42.445 --> 00:05:46.355
So that area is expanded out: it consumes more of the map real estate.
00:05:46.355 --> 00:05:50.869
As you get out toward the suburbs, the stations are fewer and further between.
00:05:50.869 --> 00:05:55.355
As oppose to that taking up 90% of the map because it’s 90% of the space.
00:05:55.355 --> 00:05:57.991
Those stations are actually scrunched,
00:05:57.991 --> 00:06:00.829
because if you know you need to go northeast to a particular station,
00:06:00.829 --> 00:06:06.504
then the exact distances involved are most of the time less relevant.
00:06:06.504 --> 00:06:11.281
Now, of course with constitutes of good representation is of course task-specific.
00:06:11.281 --> 00:06:15.811
And so what you can see is that by making some tasks easier —
00:06:15.811 --> 00:06:19.433
like getting from A to B when you know A and B,
00:06:19.433 --> 00:06:23.650
or being able to navigate the center of London more effectively —
00:06:23.650 --> 00:06:26.285
you’ve compromised on other tasks.
00:06:26.285 --> 00:06:28.871
And so, for somebody who may need to make decisions
00:06:28.871 --> 00:06:32.748
about what stop to get off at based on some underlying topography,
00:06:32.748 --> 00:06:39.739
or another task that’s compromised is by virtue of the distance between the stations on the map
00:06:39.739 --> 00:06:43.998
not lining up between the distances between stations in the world,
00:06:43.998 --> 00:06:48.219
you can make poor judgments about what’s close and what’s far:
00:06:48.219 --> 00:06:53.177
In the center you may believe things are far apart when they are really close,
00:06:53.177 --> 00:06:58.197
and out in the suburbs, you may believe from the map that things are closer than they actually are.
00:06:58.197 --> 00:07:03.806
And so nearly all representation design is about fitness to task.
00:07:03.806 --> 00:07:06.878
Here’s a temperature map from the Weather Underground.
00:07:06.878 --> 00:07:10.177
It shows the temperature at each location in the Bay area,
00:07:10.177 --> 00:07:17.011
geo-referenced so that the temperature number is placed right on top of that physical location.
00:07:17.011 --> 00:07:21.302
What do you think are the benefits and drawbacks of this representation?
00:07:21.302 --> 00:07:26.039
What’s it good for and what’s it a problem for?
00:07:30.670 --> 00:07:35.438
If you know the physical coordinate that you’d like the temperature for,
00:07:35.438 --> 00:07:39.376
say for example, “What temperature is it along the coast?”
00:07:39.376 --> 00:07:45.707
and I don’t care or don’t know the exact name of the town, this works incredibly well.
00:07:45.707 --> 00:07:50.024
It’s also a reasonable interface, in some sense,
00:07:50.024 --> 00:07:54.481
for trying to get a good [inaudible] of what are the temperatures like in the area overall:
00:07:54.481 --> 00:08:00.425
I can see, for example, as I head inland the temperature tends to get warmer.
00:08:00.425 --> 00:08:03.958
There are a lot of ways in which we could make this better.
00:08:03.958 --> 00:08:10.147
So, right now, every temperature is shown identically no matter what temperature it is
00:08:10.147 --> 00:08:15.138
which means it’s hard to scan: I have to read every single temperature one by one.
00:08:15.138 --> 00:08:22.289
I could make this better if instead I had the temperature number somehow —
00:08:22.289 --> 00:08:29.088
the color or size of the temperature number — correspond to the weather,
00:08:29.088 --> 00:08:36.533
But if we’re going to start mapping the variables of the map to something like color,
00:08:36.533 --> 00:08:39.448
we need to be careful to get it right.
00:08:39.448 --> 00:08:43.353
This is an example that comes from Edward Tufte.
00:08:43.353 --> 00:08:49.315
His books on visual design and the graphical representations of data are fantastic
00:08:49.315 --> 00:08:50.856
and I strongly encourage you to read them.
00:08:50.856 --> 00:08:58.596
In this map, we see how a computer scientist might make a mapping for Japan.
00:08:58.596 --> 00:09:04.235
This is showing the height above or below sea level as color,
00:09:04.235 --> 00:09:11.963
and what you can see is the depth below sea level is represented by the color spectrum Roy G. Biv.
00:09:11.963 --> 00:09:21.007
Now, one of the challenges of hue is that it’s not an additive representation.
00:09:21.007 --> 00:09:26.901
So it’s not really a strong ordering that we give to the colors perceptually.
00:09:26.901 --> 00:09:32.961
It’s a substitutive representation that red and yellow are qualitatively different,
00:09:32.961 --> 00:09:37.113
but we don’t automatically have a more-than or less-than relationship between the two.
00:09:37.113 --> 00:09:39.095
Another problem on this representation,
00:09:39.095 --> 00:09:45.093
is it’s very difficult by glance to tell what’s higher and what’s lower in the sea.
00:09:45.093 --> 00:09:55.938
Conversely, the individual isosurfaces — the individual chunks of a particular depth — really pop out.
00:09:55.938 --> 00:10:01.948
Like, to me, for example, the yellow depth pops out very strongly
00:10:01.948 --> 00:10:05.846
and that shape really comes to the attentional fore,
00:10:05.846 --> 00:10:10.942
which, if you are making a rock-and-roll poster for the Fillmore, would be awesome,
00:10:10.942 --> 00:10:15.266
but if you’re trying to get a sense of the contours of the sea,
00:10:15.266 --> 00:10:23.435
what become salient to you, the outlines for what’s 400 meters below sea level, it’s probably just not that relevant.
00:10:23.435 --> 00:10:28.301
So how could we continue our theme of using color as a representational cue
00:10:28.301 --> 00:10:32.805
but have it be more meaningful than you might see in this case
00:10:32.805 --> 00:10:35.565
where we’re mapping it to the color spectrum?
00:10:35.565 --> 00:10:39.643
And here’s Edward Tufte’s redesign which I think is much better.
00:10:39.643 --> 00:10:43.701
There’s a couple of things that I really like about the representation here.
00:10:43.701 --> 00:10:51.250
The first one is that all of the things that are above sea level are brown — are kind of an earth tone.
00:10:51.250 --> 00:10:56.566
So, we’re leveraging our intuitions about the physical world and using that metaphorically for the map.
00:10:56.566 --> 00:11:00.874
So, the land-colored stuff is land.
00:11:00.874 --> 00:11:05.577
Similarly, all of the water is blue — the water-colored stuff is water.
00:11:05.577 --> 00:11:15.846
And furthermore, we can see how the intensity — or the luminance — of that color blue changes with depth.
00:11:15.846 --> 00:11:22.773
And the deeper blues are darker blue, which corresponds to our physical intuitions.
00:11:22.773 --> 00:11:29.651
And of course, water doesn’t really get that much darker at the kind of depth that we’re talking about here —
00:11:29.651 --> 00:11:34.762
our intuition about darker colors being deeper comes from much shallower depths.
00:11:34.762 --> 00:11:41.224
But the idea — you can leverage this thing that we all know that water right by the shore is a paler color
00:11:41.224 --> 00:11:44.359
and as you get more of it gets darker.
00:11:44.359 --> 00:11:51.860
So this is a really wonderful way to see that these darker areas here are deeper than these shallow areas here.
00:11:51.860 --> 00:11:58.196
With the London Underground map, we saw how the representation of the map —
00:11:58.196 --> 00:12:03.869
what makes a good representation — was intrinsically tied up with the task that the user is doing.
00:12:03.869 --> 00:12:11.972
Similarly, what makes a good representation is also dependent on what a user’s expertise is.
00:12:11.972 --> 00:12:18.696
And a wonderful example of this comes from Herb Simon and [Bill] Chase in the early 1970’s.
00:12:18.696 --> 00:12:24.965
They were looking at chess as an exemplar domain for trying to understand expertise.
00:12:24.965 --> 00:12:28.229
One of the things that they observed was that
00:12:28.229 --> 00:12:34.817
expert chess players were much better at being able to remember the configuration of a chess board.
00:12:34.817 --> 00:12:38.580
You can imagine a couple hypotheses for this.
00:12:38.580 --> 00:12:46.659
So, one of them would be “Experts were born with a better memory for that kind of thing”;
00:12:46.659 --> 00:12:51.643
Or, similarly, “Experts by virtue of their ten thousand hours of training
00:12:51.643 --> 00:12:56.656
have trained themselves up to build up that muscle and be very good at remembering that kind of thing.”
00:12:56.656 --> 00:13:00.265
Neither turns out to be the case.
00:13:00.265 --> 00:13:12.124
Experts are much better at remembering the configuration of a board, but only if it’s an actual game!
00:13:12.124 --> 00:13:18.978
So if the configuration of the chess board is the configuration of how a chess board could be,
00:13:18.978 --> 00:13:22.332
experts do a fantastic job of remembering it.
00:13:22.332 --> 00:13:28.536
But if you arranged the pieces on the board in a way that a chess play could not ever achieve,
00:13:28.536 --> 00:13:32.433
the experts actually do no better than novices at all.
00:13:32.433 --> 00:13:41.286
And so what we’re seeing is that the ability of experts to chunk things and have higher memory capacity
00:13:41.286 --> 00:13:46.597
is because they are able to leverage their knowledge about the domain.
00:13:46.597 --> 00:13:51.547
Game design and user interface design are both concerned
00:13:51.547 --> 00:13:56.035
with how easy or hard it’s for a user to accomplish a particular task.
00:13:56.035 --> 00:14:00.524
The difference is that often designers want to make it hard, the right hard;
00:14:00.524 --> 00:14:04.325
and interface game designers want to make things easy.
00:14:04.325 --> 00:14:07.024
And so we can learn from this chess example
00:14:07.024 --> 00:14:10.047
and we can ask this question as interface designers:
00:14:10.047 --> 00:14:13.748
“Can we make interfaces more chunkable?”
00:14:13.748 --> 00:14:17.807
Can we make interactions that can be accomplished in one chunk
00:14:17.807 --> 00:14:23.665
and therefore place a lower load on our memory and make it easier for users to work with?
00:14:23.665 --> 00:14:26.455
A great example of this comes from Bill Buxton
00:14:26.455 --> 00:14:32.650
who looked at being able to move text between locations on a document.
00:14:32.650 --> 00:14:37.055
And in a common desktop user interface today,
00:14:37.055 --> 00:14:43.543
one common operation would be either a keyboard shortcut or a menu command to cut some text,
00:14:43.543 --> 00:14:47.588
and then you move the cursor to a new location, and you paste that text.
00:14:47.588 --> 00:14:54.479
That’s three different operations and, in between, if you got interrupted,
00:14:54.479 --> 00:14:59.351
you might forget what’s in the copy buffer — in fact, I’m sure that’s happened to all of us at some point.
00:14:59.351 --> 00:15:08.553
What Buxton realized is what if you could turn all of this, through a gestural interface, into one command?
00:15:08.553 --> 00:15:16.572
So, maybe I could grab a text that I want, draw a new location and drop it right there.
00:15:16.572 --> 00:15:22.652
That’d be much better: There’s never a time where I could be interrupted and lose state
00:15:22.652 --> 00:15:28.092
because all of the state is maintained in this continuous gesture.
00:15:28.092 --> 00:15:31.709
We’ve all heard the saying that a picture is worth ten thousand words.
00:15:31.709 --> 00:15:39.801
As interface designers, we’re tasked with the project of representing the information to the user
00:15:39.801 --> 00:15:44.831
and one task that we commonly have to deal with is:
00:15:44.831 --> 00:15:50.518
Should we represent information visually, or should we represent information textually?
00:15:50.518 --> 00:15:54.025
The answer of course is it depends.
00:15:54.025 --> 00:15:59.909
But one time when representing information visually can be much more effective
00:15:59.909 --> 00:16:06.026
is when you can convert slow reasoning tasks into fast perception tasks
00:16:06.026 --> 00:16:09.950
by virtue of making them visually salient.
00:16:09.950 --> 00:16:15.396
We saw that with the map example: In that case, both the colorings of the map were visual,
00:16:15.396 --> 00:16:21.482
but [in] one, it was much easier to just add a glance to understand what’s going on in good coloring.
00:16:21.482 --> 00:16:27.095
And the poor coloring, you have to reason about it much more slowly and the wrong things kept popping out.
00:16:27.095 --> 00:16:31.365
If you think about a table of numbers, it can often be difficult to see trends,
00:16:31.365 --> 00:16:34.936
whereas if you represent that same information visually,
00:16:34.936 --> 00:16:40.482
it can often pop out what the high points, the low points, trends, outliers —
00:16:40.482 --> 99:59:59.999
all of that becomes salient and automatically visible to you.