WEBVTT 00:00:01.426 --> 00:00:06.083 In many ways, the most creative, challenging, and under-appreciated aspect of interaction design 00:00:06.083 --> 00:00:08.464 is evaluating designs with people. 00:00:08.464 --> 00:00:11.566 The insights that you’ll get from testing designs with people 00:00:11.566 --> 00:00:16.017 can help you get new ideas, make changes, decide wisely, and fix bugs. 00:00:16.017 --> 00:00:20.811 One reason I think design is such an interesting field is its relationship to truth and objectivity. 00:00:20.811 --> 00:00:26.407 I find design so incredibly fascinating because we can say more in response to a question like: 00:00:26.407 --> 00:00:32.961 “How can we measure success?” than “It’s just personal preference” or “Whatever feels right.” 00:00:32.961 --> 00:00:37.474 At the same time, the answers are more complex and more open-ended, more subjective, 00:00:37.474 --> 00:00:41.545 and require more wisdom than just a number like 7 or 3. 00:00:41.545 --> 00:00:43.850 One of the things that we’re going to learn in this class 00:00:43.850 --> 00:00:48.372 is the different kinds of knowledge that you can get out of different kinds of methods. 00:00:48.372 --> 00:00:53.319 Why evaluate designs with people? Why learn about how people use interactive systems? 00:00:53.319 --> 00:00:58.444 I think one major reason for this is that it can be difficult to tell how good a user interface is 00:00:58.444 --> 00:01:03.974 until you’ve tried it out with actual users, and that’s because clients and designers and developers, 00:01:03.974 --> 00:01:07.107 they may know too much about the domain and the user interface, 00:01:07.107 --> 00:01:11.376 or have acquired blinders through designing and building the user interface. 00:01:11.376 --> 00:01:15.415 At the same time they may not know enough about the user’s actual tasks. 00:01:15.415 --> 00:01:20.965 And while experience and theory can help, it can still be hard to predict what real users will actually do. 00:01:21.888 --> 00:01:25.002 You might want to know, “Can people figure out how to use it?” 00:01:25.002 --> 00:01:28.859 or “Do they swear or giggle when using this interface?” 00:01:28.859 --> 00:01:31.224 “How does this design compare to that design?” 00:01:31.224 --> 00:01:35.337 and, “If we changed the interface, how does that change people’s behaviour?” 00:01:35.337 --> 00:01:39.499 “What new practices might emerge?” “How do things change over time?” 00:01:39.499 --> 00:01:44.714 These are all great questions to ask about an interface, and each will come from different methods. 00:01:44.714 --> 00:01:49.932 The value of having a broad toolbox of different methods can be especially valuable in emerging areas 00:01:49.932 --> 00:01:56.178 like mobile and social software where people’s use practices can be particularly context-dependent 00:01:56.178 --> 00:02:00.681 and also evolves significantly over time in response to how other people use software 00:02:00.681 --> 00:02:03.197 through network effects and things like that. 00:02:03.197 --> 00:02:08.534 To give you a flavour of this, I’d like to quickly run through some common types of empiracal research in HCI. 00:02:08.534 --> 00:02:11.741 The examples I’ll show are mostly published work of one sort or another, 00:02:11.741 --> 00:02:14.024 because that’s the easiest stuff to share. 00:02:14.024 --> 00:02:18.654 If you have good examples from current systems out in the world, post them to the forum! 00:02:18.654 --> 00:02:21.130 I keep an archive of user interface examples, 00:02:21.130 --> 00:02:24.434 and I and the other students would love to see what you can come up with. 00:02:24.434 --> 00:02:27.176 One way to learn about the user experience of a design 00:02:27.176 --> 00:02:30.811 is to bring people into your lab or office and have them try it out. 00:02:30.811 --> 00:02:32.978 We often call these usability studies. 00:02:32.978 --> 00:02:37.458 This “watch someone use my interface” approach is a common one in HCI. 00:02:37.458 --> 00:02:43.622 This basic strategy for traditional user-centred design is to iteratively bring people 00:02:43.622 --> 00:02:48.221 into your lab or office until you run out of time. And then release. 00:02:48.221 --> 00:02:52.312 And, if you had deep pockets, these rooms had a one-way glass mirror, 00:02:52.312 --> 00:02:54.684 and the development team was on the other side. 00:02:54.684 --> 00:02:59.245 In a leaner environment, this may be just bring in people into your dorm room office. 00:02:59.245 --> 00:03:01.672 You’ll learn a huge amount by doing this. 00:03:01.672 --> 00:03:04.702 Every single time that I or a student, friend, or colleague 00:03:04.702 --> 00:03:07.731 has watched somebody use a new interactive system, 00:03:07.731 --> 00:03:14.185 we learn something, [as,] as designers we get blinders to systems’ quirks, bugs, and false assumptions. 00:03:15.308 --> 00:03:19.562 However, there are some major shortcomings to this approach. 00:03:19.562 --> 00:03:24.122 In particular, the setting probably isn’t very ecologically valid. 00:03:24.122 --> 00:03:29.463 In the real world, people may have different tasks, goals, motivations, and physical settings 00:03:29.463 --> 00:03:32.288 than your office or lab. 00:03:32.288 --> 00:03:35.354 This can be especially true for user interfaces that you think people might use on the go, 00:03:35.354 --> 00:03:38.405 like at a bus stop or while waiting in line. 00:03:38.405 --> 00:03:40.827 Second, there can be a “please me” experimental bias, 00:03:40.827 --> 00:03:44.122 where when you bring somebody in to try out a user interface, 00:03:44.122 --> 00:03:47.339 they know that they’re trying out the technology that you developed 00:03:47.339 --> 00:03:50.966 and so they may work harder or be nicer 00:03:50.966 --> 00:03:54.593 than they would if they had to use it without the constraints of a lab setup 00:03:54.593 --> 00:03:58.497 with the person who developed it watching right over them. 00:03:58.497 --> 00:04:03.338 Third, in its most basic form where you’re just trying out just one user interface, there is no comparison point. 00:04:03.338 --> 00:04:09.177 So while you can track when people laugh, or swear, or smile with joy, 00:04:09.177 --> 00:04:12.456 you won’t know whether they would’ve laugh more, or sworn less, or smiled more 00:04:12.456 --> 00:04:14.974 if you’d had a different user interface. 00:04:14.974 --> 00:04:18.176 And finally it requires bringing people to your physical location. 00:04:18.176 --> 00:04:20.596 This is often a whole lot easier than a lot of people think. 00:04:20.596 --> 00:04:23.845 It can be a psychological burden, even if nothing else. 00:04:24.307 --> 00:04:28.172 A very different way of getting feedback from people is to use a survey. 00:04:28.172 --> 00:04:31.150 Here is an example of a survey that I got recently from San Francisco 00:04:31.150 --> 00:04:34.127 asking about different street light designs. 00:04:34.127 --> 00:04:38.151 Surveys are great because you can quickly get feedback from a large number of responses. 00:04:38.151 --> 00:04:41.353 And it’s relatively easy to compare multiple alternatives. 00:04:41.353 --> 00:04:44.385 You can also automatically tally the results. 00:04:44.385 --> 00:04:48.390 You don’t even need to build anything; you can just show screen shots or mock-ups. 00:04:48.390 --> 00:04:50.532 One of the things that I’ve learned the hard way, though, 00:04:50.532 --> 00:04:55.144 is the difference between what people say they’re going to do and what they actually do. 00:04:55.144 --> 00:04:59.026 Ask people how often they exercise and you’ll probably get a much more optimistic answer 00:04:59.026 --> 00:05:02.060 than how often they really do exercise. 00:05:02.060 --> 00:05:05.173 The same holds for the street light example here. 00:05:05.173 --> 00:05:08.999 Try to imagine what a number of different street light designs might be 00:05:08.999 --> 00:05:12.191 is really different than actually observing them on the street 00:05:12.191 --> 00:05:15.384 and having them become part of normal everyday life. 00:05:15.384 --> 00:05:18.085 Still, it can be valuable to get feedback. 00:05:18.085 --> 00:05:20.439 Another type of responder strategy is focus groups. 00:05:20.439 --> 00:05:26.046 In a focus group, you’ll gather together a small group of people to discuss a design or idea. 00:05:26.046 --> 00:05:31.372 The fact that focus groups involve a group of people is a double-edged sword. 00:05:31.372 --> 00:05:37.541 On one hand, you can get people to tease out of their colleagues things that they might not have thought 00:05:37.541 --> 00:05:44.579 to say on their own; on the other hand, for a variety of psychological reasons, people may be inclined 00:05:44.579 --> 00:05:48.774 to say polite things or generate answers completely on the spot 00:05:48.774 --> 00:05:53.785 that are totally uncorrelated with what they believe or what they would actually do. 00:05:54.662 --> 00:05:59.982 Focus groups can be a particularly problematic method when you are looking at trying to gather data 00:05:59.982 --> 00:06:04.135 about taboo topics or about cultural biases. 00:06:04.135 --> 00:06:06.723 With those caveats — right now we’re just making a laundry list, and — 00:06:06.723 --> 00:06:12.312 I think that focus groups, like almost any other method, can play an important role in your toolbelt. 00:06:13.420 --> 00:06:16.574 Our third category of techniques is to get feedback from experts. 00:06:16.574 --> 00:06:22.905 For example, in this class we’re going to do a bunch of peer critique for your weekly project assignments. 00:06:22.905 --> 00:06:25.370 In addition to having users try your interface, 00:06:25.370 --> 00:06:29.775 it can be important to eat your own dog food and use the tools that you built yourself. 00:06:29.775 --> 00:06:35.069 When you are getting feedback from experts, it can often be helpful to have some kind of structured format, 00:06:35.069 --> 00:06:38.558 much like the rubrics you’ll see in your project assignments. 00:06:38.558 --> 00:06:44.881 And, for getting feedback on user interfaces, one common approach to this structured feedback 00:06:44.881 --> 00:06:48.390 is called heuristic evaluation, and you’ll learn how to do that in this class; 00:06:48.390 --> 00:06:51.051 it’s pioneered by Jacob Nielson. 00:06:51.051 --> 00:06:53.496 Our next genre is comparative experiments: 00:06:53.496 --> 00:06:57.565 taking two or more distinct options and comparing their performance to each other. 00:06:57.565 --> 00:07:00.183 These comparisons can take place in lots of different ways: 00:07:00.183 --> 00:07:04.061 They can be in the lab; they can be in the field; they can be online. 00:07:04.061 --> 00:07:06.543 These experiments can be more-or-less controlled, 00:07:06.543 --> 00:07:10.125 and they can take place over shorter or longer durations. 00:07:10.125 --> 00:07:14.235 What you’re trying to learn here is which option is the more effective, 00:07:14.235 --> 00:07:16.998 and, more often, what are the active ingredients, 00:07:16.998 --> 00:07:21.422 what are the variables that matter in creating the user experience that you seek. 00:07:22.006 --> 00:07:26.714 Here’s an example: My former PhD student Joel Brandt, and his colleague at Adobe, 00:07:26.714 --> 00:07:30.847 ran a number of studies comparing help interfaces for programmers. 00:07:32.139 --> 00:07:38.319 In particular they compared a more traditional search-style user interface for finding programming help 00:07:38.319 --> 00:07:43.443 with a search interface that integrated programming help directly into your environment. 00:07:43.443 --> 00:07:46.979 By running these comparisons they were able to see how programmers’ behaviour differed 00:07:46.979 --> 00:07:50.588 based on the changing help user interface. 00:07:50.588 --> 00:07:53.698 Comparative experiments have an advantage over surveys 00:07:53.698 --> 00:07:57.230 in that you get to see the actual behaviour as opposed to self report, 00:07:57.230 --> 00:08:02.329 and they can be better than usability studies because you’re comparing multiple alternatives. 00:08:02.329 --> 00:08:06.780 This enables you to see what works better or worse, or at least what works different. 00:08:06.780 --> 00:08:10.366 I find that comparative feedback is also often much more actionable. 00:08:11.166 --> 00:08:13.938 However, if you are running controlled experiments online, 00:08:13.938 --> 00:08:18.079 you don’t get to see much about the person on the other side of the screen. 00:08:18.079 --> 00:08:20.774 And if you are inviting people into your office or lab, 00:08:20.774 --> 00:08:24.111 the behaviour you’re measuring might not be very realistic. 00:08:24.111 --> 00:08:30.283 If realistic longitudinal behaviour is what you’re after, participant observation may be the approach for you. 00:08:30.283 --> 00:08:36.419 This approach is just what it sounds like: observing what people actually do in their actual work environment. 00:08:36.419 --> 00:08:40.226 And this more long-term evaluation can be important for uncovering things 00:08:40.226 --> 00:08:44.131 that you might not see in shorter term, more controlled scenarios. 00:08:44.131 --> 00:08:48.015 For example, my colleagues Bob Sutton and Andrew Hargadon studied brainstorming. 00:08:48.015 --> 00:08:51.655 The prior literature on brainstorming had focused mostly on questions like 00:08:51.655 --> 00:08:54.402 “Do people come up with more ideas?” 00:08:54.402 --> 00:08:56.829 What Bob and Andrew realized by going into the field 00:08:56.829 --> 00:09:00.517 was that brainstorming served a number of other functions also, 00:09:00.517 --> 00:09:05.365 like, for example, brainstorming provides a way for members of the design team 00:09:05.365 --> 00:09:08.081 to demonstrate their creativity to their peers; 00:09:08.081 --> 00:09:13.210 it allows them to pass along knowledge that then can be reused in other projects; 00:09:13.210 --> 00:09:19.057 and it creates a fun, exciting environment that people like to work in and that clients like to participate in. 00:09:19.057 --> 00:09:22.206 In a real ecosystem, all of these things are important, 00:09:22.206 --> 00:09:25.514 in addition to just having the ideas that people come up with. 00:09:26.191 --> 00:09:32.908 Nearly all experiments seek to build a theory on some level — I don’t mean anything fancy by this, 00:09:32.908 --> 00:09:37.309 just that we take some things to be more relevant, and other things less relevant. 00:09:37.309 --> 00:09:39.250 We might, for example, assume 00:09:39.250 --> 00:09:43.068 that the ordering of search results may play an important role in what people click on, 00:09:43.068 --> 00:09:46.415 but that the batting average of the Detroit Tigers doesn’t, 00:09:46.415 --> 00:09:49.763 unless, of course, somebody’s searching for baseball. 00:09:49.763 --> 00:09:55.093 If you have a theory that sufficiently, formal mathematically that you may make predictions, 00:09:55.093 --> 00:10:00.037 then you can compare alternative interfaces using that model, without having to bring people in. 00:10:00.037 --> 00:10:05.576 And we’ll go over that in this class a little bit, with respect to input models. 00:10:05.576 --> 00:10:10.072 This makes it possible to try out a number of alternatives really fast. 00:10:10.072 --> 00:10:12.286 Consequently, when people use simulations, 00:10:12.286 --> 00:10:16.378 it’s often in conjunction with something like Monte Carlo optimization. 00:10:16.378 --> 00:10:19.934 One example of this can be found in the ShapeWriter system, 00:10:19.934 --> 00:10:22.735 where Shuman Zhai and colleagues figured out how to build a keyboard 00:10:22.735 --> 00:10:26.122 where people could enter an entire word in a single stroke. 00:10:26.122 --> 00:10:31.247 They were able to do this with the benefit of formal models and optimization-based approaches. 00:10:31.247 --> 00:10:34.402 Simulation has mostly been used for input techniques 00:10:34.402 --> 00:10:39.795 because people’s motor performance is probably the most well-quantified area of HCI. 00:10:39.795 --> 00:10:42.701 And, while we won’t get much to it in this intro course, 00:10:42.701 --> 00:10:46.266 simulation can also be used for higher-level cognitive tasks; 00:10:46.266 --> 00:10:48.497 for example, Pete Pirolli and colleagues at PARC 00:10:48.497 --> 00:10:51.528 had built impressive models of people’s web-searching behaviour. 00:10:52.467 --> 00:10:57.253 These models enable them to estimate, for example, which links somebody is most likely to click on 00:10:57.253 --> 00:11:00.238 by looking at the relevant link texts. 00:11:00.238 --> 00:11:05.072 That’s our whirlwind tour of a number of empirical methods that this class will introduce. 00:11:05.072 --> 00:11:09.481 You’ll want to pick the right method for the right task, and here’s some issues to consider: 00:11:09.481 --> 00:11:13.187 If you did it again, would you get the same thing? 00:11:13.187 --> 00:11:18.544 Another is generalizability and realism — Does this hold for people other than 18-year-old 00:11:18.544 --> 00:11:23.135 upper-middle-class students who are doing this for course credit or a gift certificate? 00:11:23.135 --> 00:11:28.546 Is this behaviour also what you’d see in the real world, or only in a more stilted lab environment? 00:11:28.546 --> 00:11:30.864 Comparisons are important, because they can tell you 00:11:30.879 --> 00:11:34.351 how the user experience would change with different interface choices, 00:11:34.351 --> 00:11:38.553 as opposed to just a “people liked it” study. 00:11:38.553 --> 00:11:42.784 It’s also important to think about how to achieve how these insights efficiently, 00:11:42.784 --> 00:11:48.747 and not chew up a lot of resources, especially when your goal is practical. 00:11:48.747 --> 00:11:54.252 My experience as a designer, researcher, teacher, consultant, advisor and mentor has taught me 00:11:54.252 --> 00:12:01.340 that evaluating designs with people is both easier and more valuable than many people expect, 00:12:01.340 --> 00:12:04.704 and there’s an incredible lightbulb moment that happens 00:12:04.704 --> 00:12:08.831 when you actually get designs in front of people and see how they use them. 00:12:08.831 --> 00:12:12.945 So, to sum up this video, I’d like to ask what could be the most important question: 00:12:12.945 --> 99:59:59.999 “What do you want to learn?”