WEBVTT

00:00:01.426 --> 00:00:06.083
In many ways, the most creative, challenging, and under-appreciated aspect of interaction design

00:00:06.083 --> 00:00:08.464
is evaluating designs with people.

00:00:08.464 --> 00:00:11.566
The insights that you’ll get from testing designs with people

00:00:11.566 --> 00:00:16.017
can help you get new ideas, make changes, decide wisely, and fix bugs.

00:00:16.017 --> 00:00:20.811
One reason I think design is such an interesting field is its relationship to truth and objectivity.

00:00:20.811 --> 00:00:26.407
I find design so incredibly fascinating because we can say more in response to a question like:

00:00:26.407 --> 00:00:32.961
“How can we measure success?” than “It’s just personal preference” or “Whatever feels right.”

00:00:32.961 --> 00:00:37.474
At the same time, the answers are more complex and more open-ended, more subjective,

00:00:37.474 --> 00:00:41.545
and require more wisdom than just a number like 7 or 3.

00:00:41.545 --> 00:00:43.850
One of the things that we’re going to learn in this class

00:00:43.850 --> 00:00:48.372
is the different kinds of knowledge that you can get out of different kinds of methods.

00:00:48.372 --> 00:00:53.319
Why evaluate designs with people? Why learn about how people use interactive systems?

00:00:53.319 --> 00:00:58.444
I think one major reason for this is that it can be difficult to tell how good a user interface is

00:00:58.444 --> 00:01:03.974
until you’ve tried it out with actual users, and that’s because clients and designers and developers,

00:01:03.974 --> 00:01:07.107
they may know too much about the domain and the user interface,

00:01:07.107 --> 00:01:11.376
or have acquired blinders through designing and building the user interface.

00:01:11.376 --> 00:01:15.415
At the same time they may not know enough about the user’s actual tasks.

00:01:15.415 --> 00:01:20.965
And while experience and theory can help, it can still be hard to predict what real users will actually do.

00:01:21.888 --> 00:01:25.002
You might want to know, “Can people figure out how to use it?”

00:01:25.002 --> 00:01:28.859
or “Do they swear or giggle when using this interface?”

00:01:28.859 --> 00:01:31.224
“How does this design compare to that design?”

00:01:31.224 --> 00:01:35.337
and, “If we changed the interface, how does that change people’s behaviour?”

00:01:35.337 --> 00:01:39.499
“What new practices might emerge?” “How do things change over time?”

00:01:39.499 --> 00:01:44.714
These are all great questions to ask about an interface, and each will come from different methods.

00:01:44.714 --> 00:01:49.932
The value of having a broad toolbox of different methods can be especially valuable in emerging areas

00:01:49.932 --> 00:01:56.178
like mobile and social software where people’s use practices can be particularly context-dependent

00:01:56.178 --> 00:02:00.681
and also evolves significantly over time in response to how other people use software

00:02:00.681 --> 00:02:03.197
through network effects and things like that.

00:02:03.197 --> 00:02:08.534
To give you a flavour of this, I’d like to quickly run through some common types of empiracal research in HCI.

00:02:08.534 --> 00:02:11.741
The examples I’ll show are mostly published work of one sort or another,

00:02:11.741 --> 00:02:14.024
because that’s the easiest stuff to share.

00:02:14.024 --> 00:02:18.654
If you have good examples from current systems out in the world, post them to the forum!

00:02:18.654 --> 00:02:21.130
I keep an archive of user interface examples,

00:02:21.130 --> 00:02:24.434
and I and the other students would love to see what you can come up with.

00:02:24.434 --> 00:02:27.176
One way to learn about the user experience of a design

00:02:27.176 --> 00:02:30.811
is to bring people into your lab or office and have them try it out.

00:02:30.811 --> 00:02:32.978
We often call these usability studies.

00:02:32.978 --> 00:02:37.458
This “watch someone use my interface” approach is a common one in HCI.

00:02:37.458 --> 00:02:43.622
This basic strategy for traditional user-centred design is to iteratively bring people

00:02:43.622 --> 00:02:48.221
into your lab or office until you run out of time. And then release.

00:02:48.221 --> 00:02:52.312
And, if you had deep pockets, these rooms had a one-way glass mirror,

00:02:52.312 --> 00:02:54.684
and the development team was on the other side.

00:02:54.684 --> 00:02:59.245
In a leaner environment, this may be just bring in people into your dorm room office.

00:02:59.245 --> 00:03:01.672
You’ll learn a huge amount by doing this.

00:03:01.672 --> 00:03:04.702
Every single time that I or a student, friend, or colleague

00:03:04.702 --> 00:03:07.731
has watched somebody use a new interactive system,

00:03:07.731 --> 00:03:14.185
we learn something, [as,] as designers we get blinders to systems’ quirks, bugs, and false assumptions.

00:03:15.308 --> 00:03:19.562
However, there are some major shortcomings to this approach.

00:03:19.562 --> 00:03:24.122
In particular, the setting probably isn’t very ecologically valid.

00:03:24.122 --> 00:03:29.463
In the real world, people may have different tasks, goals, motivations, and physical settings

00:03:29.463 --> 00:03:32.288
than your office or lab.

00:03:32.288 --> 00:03:35.354
This can be especially true for user interfaces that you think people might use on the go,

00:03:35.354 --> 00:03:38.405
like at a bus stop or while waiting in line.

00:03:38.405 --> 00:03:40.827
Second, there can be a “please me” experimental bias,

00:03:40.827 --> 00:03:44.122
where when you bring somebody in to try out a user interface,

00:03:44.122 --> 00:03:47.339
they know that they’re trying out the technology that you developed

00:03:47.339 --> 00:03:50.966
and so they may work harder or be nicer

00:03:50.966 --> 00:03:54.593
than they would if they had to use it without the constraints of a lab setup

00:03:54.593 --> 00:03:58.497
with the person who developed it watching right over them.

00:03:58.497 --> 00:04:03.338
Third, in its most basic form where you’re just trying out just one user interface, there is no comparison point.

00:04:03.338 --> 00:04:09.177
So while you can track when people laugh, or swear, or smile with joy,

00:04:09.177 --> 00:04:12.456
you won’t know whether they would’ve laugh more, or sworn less, or smiled more

00:04:12.456 --> 00:04:14.974
if you’d had a different user interface.

00:04:14.974 --> 00:04:18.176
And finally it requires bringing people to your physical location.

00:04:18.176 --> 00:04:20.596
This is often a whole lot easier than a lot of people think.

00:04:20.596 --> 00:04:23.845
It can be a psychological burden, even if nothing else.

00:04:24.307 --> 00:04:28.172
A very different way of getting feedback from people is to use a survey.

00:04:28.172 --> 00:04:31.150
Here is an example of a survey that I got recently from San Francisco

00:04:31.150 --> 00:04:34.127
asking about different street light designs.

00:04:34.127 --> 00:04:38.151
Surveys are great because you can quickly get feedback from a large number of responses.

00:04:38.151 --> 00:04:41.353
And it’s relatively easy to compare multiple alternatives.

00:04:41.353 --> 00:04:44.385
You can also automatically tally the results.

00:04:44.385 --> 00:04:48.390
You don’t even need to build anything; you can just show screen shots or mock-ups.

00:04:48.390 --> 00:04:50.532
One of the things that I’ve learned the hard way, though,

00:04:50.532 --> 00:04:55.144
is the difference between what people say they’re going to do and what they actually do.

00:04:55.144 --> 00:04:59.026
Ask people how often they exercise and you’ll probably get a much more optimistic answer

00:04:59.026 --> 00:05:02.060
than how often they really do exercise.

00:05:02.060 --> 00:05:05.173
The same holds for the street light example here.

00:05:05.173 --> 00:05:08.999
Try to imagine what a number of different street light designs might be

00:05:08.999 --> 00:05:12.191
is really different than actually observing them on the street

00:05:12.191 --> 00:05:15.384
and having them become part of normal everyday life.

00:05:15.384 --> 00:05:18.085
Still, it can be valuable to get feedback.

00:05:18.085 --> 00:05:20.439
Another type of responder strategy is focus groups.

00:05:20.439 --> 00:05:26.046
In a focus group, you’ll gather together a small group of people to discuss a design or idea.

00:05:26.046 --> 00:05:31.372
The fact that focus groups involve a group of people is a double-edged sword.

00:05:31.372 --> 00:05:37.541
On one hand, you can get people to tease out of their colleagues things that they might not have thought

00:05:37.541 --> 00:05:44.579
to say on their own; on the other hand, for a variety of psychological reasons, people may be inclined

00:05:44.579 --> 00:05:48.774
to say polite things or generate answers completely on the spot

00:05:48.774 --> 00:05:53.785
that are totally uncorrelated with what they believe or what they would actually do.

00:05:54.662 --> 00:05:59.982
Focus groups can be a particularly problematic method when you are looking at trying to gather data

00:05:59.982 --> 00:06:04.135
about taboo topics or about cultural biases.

00:06:04.135 --> 00:06:06.723
With those caveats — right now we’re just making a laundry list, and —

00:06:06.723 --> 00:06:12.312
I think that focus groups, like almost any other method, can play an important role in your toolbelt.

00:06:13.420 --> 00:06:16.574
Our third category of techniques is to get feedback from experts.

00:06:16.574 --> 00:06:22.905
For example, in this class we’re going to do a bunch of peer critique for your weekly project assignments.

00:06:22.905 --> 00:06:25.370
In addition to having users try your interface,

00:06:25.370 --> 00:06:29.775
it can be important to eat your own dog food and use the tools that you built yourself.

00:06:29.775 --> 00:06:35.069
When you are getting feedback from experts, it can often be helpful to have some kind of structured format,

00:06:35.069 --> 00:06:38.558
much like the rubrics you’ll see in your project assignments.

00:06:38.558 --> 00:06:44.881
And, for getting feedback on user interfaces, one common approach to this structured feedback

00:06:44.881 --> 00:06:48.390
is called heuristic evaluation, and you’ll learn how to do that in this class;

00:06:48.390 --> 00:06:51.051
it’s pioneered by Jacob Nielson.

00:06:51.051 --> 00:06:53.496
Our next genre is comparative experiments:

00:06:53.496 --> 00:06:57.565
taking two or more distinct options and comparing their performance to each other.

00:06:57.565 --> 00:07:00.183
These comparisons can take place in lots of different ways:

00:07:00.183 --> 00:07:04.061
They can be in the lab; they can be in the field; they can be online.

00:07:04.061 --> 00:07:06.543
These experiments can be more-or-less controlled,

00:07:06.543 --> 00:07:10.125
and they can take place over shorter or longer durations.

00:07:10.125 --> 00:07:14.235
What you’re trying to learn here is which option is the more effective,

00:07:14.235 --> 00:07:16.998
and, more often, what are the active ingredients,

00:07:16.998 --> 00:07:21.422
what are the variables that matter in creating the user experience that you seek.

00:07:22.006 --> 00:07:26.714
Here’s an example: My former PhD student Joel Brandt, and his colleague at Adobe,

00:07:26.714 --> 00:07:30.847
ran a number of studies comparing help interfaces for programmers.

00:07:32.139 --> 00:07:38.319
In particular they compared a more traditional search-style user interface for finding programming help

00:07:38.319 --> 00:07:43.443
with a search interface that integrated programming help directly into your environment.

00:07:43.443 --> 00:07:46.979
By running these comparisons they were able to see how programmers’ behaviour differed

00:07:46.979 --> 00:07:50.588
based on the changing help user interface.

00:07:50.588 --> 00:07:53.698
Comparative experiments have an advantage over surveys

00:07:53.698 --> 00:07:57.230
in that you get to see the actual behaviour as opposed to self report,

00:07:57.230 --> 00:08:02.329
and they can be better than usability studies because you’re comparing multiple alternatives.

00:08:02.329 --> 00:08:06.780
This enables you to see what works better or worse, or at least what works different.

00:08:06.780 --> 00:08:10.366
I find that comparative feedback is also often much more actionable.

00:08:11.166 --> 00:08:13.938
However, if you are running controlled experiments online,

00:08:13.938 --> 00:08:18.079
you don’t get to see much about the person on the other side of the screen.

00:08:18.079 --> 00:08:20.774
And if you are inviting people into your office or lab,

00:08:20.774 --> 00:08:24.111
the behaviour you’re measuring might not be very realistic.

00:08:24.111 --> 00:08:30.283
If realistic longitudinal behaviour is what you’re after, participant observation may be the approach for you.

00:08:30.283 --> 00:08:36.419
This approach is just what it sounds like: observing what people actually do in their actual work environment.

00:08:36.419 --> 00:08:40.226
And this more long-term evaluation can be important for uncovering things

00:08:40.226 --> 00:08:44.131
that you might not see in shorter term, more controlled scenarios.

00:08:44.131 --> 00:08:48.015
For example, my colleagues Bob Sutton and Andrew Hargadon studied brainstorming.

00:08:48.015 --> 00:08:51.655
The prior literature on brainstorming had focused mostly on questions like

00:08:51.655 --> 00:08:54.402
“Do people come up with more ideas?”

00:08:54.402 --> 00:08:56.829
What Bob and Andrew realized by going into the field

00:08:56.829 --> 00:09:00.517
was that brainstorming served a number of other functions also,

00:09:00.517 --> 00:09:05.365
like, for example, brainstorming provides a way for members of the design team

00:09:05.365 --> 00:09:08.081
to demonstrate their creativity to their peers;

00:09:08.081 --> 00:09:13.210
it allows them to pass along knowledge that then can be reused in other projects;

00:09:13.210 --> 00:09:19.057
and it creates a fun, exciting environment that people like to work in and that clients like to participate in.

00:09:19.057 --> 00:09:22.206
In a real ecosystem, all of these things are important,

00:09:22.206 --> 00:09:25.514
in addition to just having the ideas that people come up with.

00:09:26.191 --> 00:09:32.908
Nearly all experiments seek to build a theory on some level — I don’t mean anything fancy by this,

00:09:32.908 --> 00:09:37.309
just that we take some things to be more relevant, and other things less relevant.

00:09:37.309 --> 00:09:39.250
We might, for example, assume

00:09:39.250 --> 00:09:43.068
that the ordering of search results may play an important role in what people click on,

00:09:43.068 --> 00:09:46.415
but that the batting average of the Detroit Tigers doesn’t,

00:09:46.415 --> 00:09:49.763
unless, of course, somebody’s searching for baseball.

00:09:49.763 --> 00:09:55.093
If you have a theory that sufficiently, formal mathematically that you may make predictions,

00:09:55.093 --> 00:10:00.037
then you can compare alternative interfaces using that model, without having to bring people in.

00:10:00.037 --> 00:10:05.576
And we’ll go over that in this class a little bit, with respect to input models.

00:10:05.576 --> 00:10:10.072
This makes it possible to try out a number of alternatives really fast.

00:10:10.072 --> 00:10:12.286
Consequently, when people use simulations,

00:10:12.286 --> 00:10:16.378
it’s often in conjunction with something like Monte Carlo optimization.

00:10:16.378 --> 00:10:19.934
One example of this can be found in the ShapeWriter system,

00:10:19.934 --> 00:10:22.735
where Shuman Zhai and colleagues figured out how to build a keyboard

00:10:22.735 --> 00:10:26.122
where people could enter an entire word in a single stroke.

00:10:26.122 --> 00:10:31.247
They were able to do this with the benefit of formal models and optimization-based approaches.

00:10:31.247 --> 00:10:34.402
Simulation has mostly been used for input techniques

00:10:34.402 --> 00:10:39.795
because people’s motor performance is probably the most well-quantified area of HCI.

00:10:39.795 --> 00:10:42.701
And, while we won’t get much to it in this intro course,

00:10:42.701 --> 00:10:46.266
simulation can also be used for higher-level cognitive tasks;

00:10:46.266 --> 00:10:48.497
for example, Pete Pirolli and colleagues at PARC

00:10:48.497 --> 00:10:51.528
had built impressive models of people’s web-searching behaviour.

00:10:52.467 --> 00:10:57.253
These models enable them to estimate, for example, which links somebody is most likely to click on

00:10:57.253 --> 00:11:00.238
by looking at the relevant link texts.

00:11:00.238 --> 00:11:05.072
That’s our whirlwind tour of a number of empirical methods that this class will introduce.

00:11:05.072 --> 00:11:09.481
You’ll want to pick the right method for the right task, and here’s some issues to consider:

00:11:09.481 --> 00:11:13.187
If you did it again, would you get the same thing?

00:11:13.187 --> 00:11:18.544
Another is generalizability and realism — Does this hold for people other than 18-year-old

00:11:18.544 --> 00:11:23.135
upper-middle-class students who are doing this for course credit or a gift certificate?

00:11:23.135 --> 00:11:28.546
Is this behaviour also what you’d see in the real world, or only in a more stilted lab environment?

00:11:28.546 --> 00:11:30.864
Comparisons are important, because they can tell you

00:11:30.879 --> 00:11:34.351
how the user experience would change with different interface choices,

00:11:34.351 --> 00:11:38.553
as opposed to just a “people liked it” study.

00:11:38.553 --> 00:11:42.784
It’s also important to think about how to achieve how these insights efficiently,

00:11:42.784 --> 00:11:48.747
and not chew up a lot of resources, especially when your goal is practical.

00:11:48.747 --> 00:11:54.252
My experience as a designer, researcher, teacher, consultant, advisor and mentor has taught me

00:11:54.252 --> 00:12:01.340
that evaluating designs with people is both easier and more valuable than many people expect,

00:12:01.340 --> 00:12:04.704
and there’s an incredible lightbulb moment that happens

00:12:04.704 --> 00:12:08.831
when you actually get designs in front of people and see how they use them.

00:12:08.831 --> 00:12:12.945
So, to sum up this video, I’d like to ask what could be the most important question:

00:12:12.945 --> 99:59:59.999
“What do you want to learn?”