-
In this video we are going to introduce a technique called Heuristic Evaluation.
-
As we talked about at the beginning of the course, there’s lots of different ways to evaluate software.
-
One that you might be most familiar with is empirical methods,
-
where, of some level of formality, you have actual people trying out your software.
-
It’s also possible to have formal methods, where you’re building a model
-
of how people behave in a particular situation,
-
and that enables you to predict how different user interfaces will work.
-
Or, if you can’t build a closed-form formal model,
-
you can also try out your interface with simulation and have automated tests —
-
that can detect usability bugs and effective designs.
-
This works especially well for low-level stuff; it’s harder to do for higher-level stuff.
-
And what we’re going to talk about today is critique-based approaches,
-
where people are giving you feedback directly, based on their expertise or a set of heuristics.
-
As any of you who have ever taken an art or design class know,
-
peer critique can be an incredibly effective form of feedback,
-
and it can make you make your designs even better.
-
You can get peer critique really at any stage of your design process,
-
but I’d like to highlight a couple that I think can be particularly valuable.
-
First, it’s really valuable to get peer critique before user testing,
-
because that helps you not waste your users on stuff that’s just going to get picked up automatically.
-
You want to be able to focus the valuable resources of user testing
-
on stuff that other people wouldn’t be able to pick up on.
-
The rich qualitative feedback that peer critique provides
-
can also be really valuable before redesigning your application,
-
because what it can do is it can show you what parts of your app you probably want to keep,
-
and what are other parts that are more problematic and deserve redesign.
-
Third, sometimes, you know there are problems,
-
and you need data to be able to convince other stakeholders to make the changes.
-
And peer critique can be a great way, especially if it’s structured,
-
to be able to get the feedback that you need, to make the changes that you know need to happen.
-
And lastly, this kind of structured peer critique can be really valuable before releasing software,
-
because it helps you do a final sanding of the entire design, and smooth out any rough edges.
-
As with most types of evaluation, it’s usually helpful to begin with a clear goal,
-
even if what you ultimately learn is completely unexpected.
-
And so, what we’re going to talk about today is a particular technique called Heuristic Evaluation.
-
Heuristic Evaluation was created by Jakob Nielsen and colleagues, about twenty years ago now.
-
And the goal of Heuristic Evaluation is to be able to find usability problems in the design.
-
I first learned about Heuristic Evaluation
-
when I TA’d James Landay’s Intro to HCI course, and I’ve been using it and teaching it ever since.
-
It’s a really valuable technique because it lets you get feedback really quickly
-
and it’s a high bang-for-the-buck strategy.
-
And the slides that I have here are based off James’ slides for this course,
-
and the materials are all available on Jacob Nielsen’s website.
-
The basic idea of heuristic evaluation is that you’re going to provide a set of people —
-
often other stakeholders on the design team or outside design experts —
-
with a set of heuristics or principles,
-
and they’re going to use those to look for problems in your design.
-
Each of them is first going to do this independently
-
and so they’ll walk through a variety of tasks using your design to look for these bugs.
-
And you’ll see that different evaluators are going to find different problems.
-
And then they’re going to communicate and talk together only at the end, afterwards.
-
At the end of the process, they’re going to get back together and talk about what they found.
-
And this “independent first, gather afterwards”
-
is how you get a “wisdom of crowds” benefit in having multiple evaluators.
-
And one of the reasons that we’re talking about this early in the class
-
is that it’s a technique that you can use, either on a working user interface or on sketches of user interfaces.
-
And so heuristic evaluation works really well in conjunction with paper prototypes
-
and other rapid, low fidelity techniques that you may be using to get your design ideas out quick and fast.
-
Here’s Neilsen’s ten heuristics, and they’re a pretty darn good set.
-
That said, there’s nothing magic about these heuristics.
-
They do a pretty good job of covering many of the problems that you’ll see in many user interfaces;
-
but you can add on any that you want
-
and get rid of any that aren’t appropriate for your system.
-
We’re going to go over the content of these ten heuristics in the next couple lectures,
-
and in this lecture I’d like to introduce the process that you’re going to use with these heuristics.
-
So here’s what you’re going to have your evaluators do:
-
Give them a couple of tasks to use your design for,
-
and have them do each task, stepping through carefully several times.
-
When they’re doing this, they’re going to keep the list of usability principles
-
as a reminder of things to pay attention to.
-
Now which principles will you use?
-
I think Nielsen’s ten heuristics are a fantastic start,
-
and you can augment those with anything else that’s relevant for your domain.
-
So, if you have particular design goals that you would like your design to achieve, include those in the list.
-
Or, if you have particular goals that you’ve set up
-
from competitive analysis of designs that are out there already,
-
that’s great too.
-
Or if there are things that you’ve seen your or other designs excel at,
-
those are important goals too and can be included in your list of heuristics.
-
And then obviously, the important part is that you’re going to take what you learn from these evaluators
-
and use those violations of the heuristics as a way of fixing problems and redesigning.
-
Let’s talk a little bit more about why you might want to have multiple evaluators rather than just one.
-
The graph on this slide is adapted from Jacob Neilsen’s work on heuristic evaluation
-
and what you see is each black square is a bug that a particular evaluator found.
-
An individual evaluator represents a row of this matrix
-
and there’s about twenty evaluators in this set.
-
The columns represent the problems.
-
And what you can see is that there’s some problems that were found by relatively few evaluators
-
and other stuff which almost everybody found.
-
So we’re going to call the stuff on the right the easy problems and the stuff on the left hard problems.
-
And so, in aggregate, what we can say is that no evaluator found every problem,
-
and some evaluators found more than others, and so there are better and worse people to do this.
-
So why not have lots of evaluators?
-
Well, as you add more evaluators, they do find more problems;
-
but it kind of tapers off over time — you lose that benefit eventually.
-
And so from a cost-benefit perspective it’s just stops making sense after a certain point.
-
So where’s the peak of this curve?
-
It’s of course going to depend on the user interface that you’re working with,
-
how much you’re paying people, how much time is involved — all sorts of factors.
-
Jakob Nielsen’s rule of thumb for these kinds of user interfaces and heuristic evaluation
-
is that three to five people tends to work pretty well; and that’s been my experience too.
-
And I think that definitely one of the reasons that people use heuristic evaluation
-
is because it can be an extremely cost-effective way of finding problems.
-
In one study that Jacob Nielsen ran,
-
he estimated that the cost of the problems found with heuristic evaluation were $500,000
-
and the cost of performing it was just over $10,000,
-
and so he estimates a 48-fold benefit-cost ratio for this particular user interface.
-
Obviously, these numbers are back of the envelope, and your mileage will vary.
-
You can think about how to estimate the benefit that you get from something like this
-
if you have an in-house software tool using something like productivity increases —
-
that, if you are making an expense reporting system
-
or other in-house system that will make people’s time more efficiently used —
-
that’s a big usability win.
-
And if you’ve got software that you’re making available on the open market,
-
you can think about the benefit from sales or other measures like that.
-
One thing that we can get from that graph is that evaluators are more likely to find severe problems
-
and that’s good news;
-
and so with a relatively small number of people,
-
you’re pretty likely to stumble across the most important stuff.
-
However, as we saw with just one person in this particular case,
-
even the best evaluator found only about a third of the problems of the system.
-
And so that’s why ganging up a number of evaluators, say five,
-
is going to get you most of the benefit that you’ll be going to be able to achieve.
-
If we compare heuristic evaluation and user testing, one of the things that we see
-
is that heuristic evaluation can often be a lot faster — It takes just an hour or two for an evaluator —
-
and the mechanics of getting a user test up and running can take longer,
-
not even accounting for the fact that you may have to build software.
-
Also, the heuristic evaluation results come pre-interpreted
-
because your evaluators are directly providing you with problems and things to fix,
-
and so it saves you the time of having to infer from the usability tests what might be the problem or solution.
-
Now conversely, experts walking through your system
-
can generate false positives that wouldn’t actually happen in a real environment.
-
And this indeed does happen, and so user testing is, sort of, by definition going to be more accurate.
-
At the end of the day I think it’s valuable to alternate methods:
-
All of the different techniques that you’ll learn in this class for getting feedback can each be valuable,
-
and that [by] cycling through them you can often get the benefits of each.
-
And that can be because with user evaluation and user testing, you’ll find different problems,
-
and by running HE or something like that early in the design process,
-
you’ll avoid wasting real users that you may bring in later on.
-
So now that we’ve seen the benefits, what are the steps?
-
The first thing to do is to get all of your evaluators up to speed,
-
on what the story is behind your software — any necessary domain knowledge they might need —
-
and tell them about the scenario that you’re going to have them step through.
-
Then obviously, you have the evaluation phase where people are working through the interface.
-
Afterwards, each person is going to assign a severity rating,
-
and you do this individually first,
-
and then you’re going to aggregate those into a group severity rating
-
and produce an aggregate report out of that.
-
And finally, once you’ve got this aggregated report, you can share that with the design team,
-
and the design team can discuss what to do with that.
-
Doing this kind of expert review can be really taxing,
-
and so for each of the scenarios that you lay out in your design,
-
it can be valuable to have the evaluator go through that scenario twice.
-
The first time, they’ll just get a sense of it; and the second time, they can focus on more specific elements.
-
If you’ve got some walk-up-and-use system, like a ticket machine somewhere,
-
then you may want to not give people any background information at all,
-
because if you’ve got people that are just getting off the bus or the train,
-
and they walk up to your machine without any prior information,
-
that’s the experience you’ll want your evaluators to have.
-
On the other hand, if you’re going to have a genomic system or other expert user interface,
-
you’ll want to to make sure that whatever training you would give to real users,
-
you’re going to give to your evaluators as well.
-
In other words, whatever the background is, it should be realistic.
-
When your evaluators are walking through your interface,
-
it’s going to be important to produce a list of very specific problems
-
and explain those problems with regard to one of the design heuristics.
-
You don’t want people to just to be, like, “I don’t like it.”
-
And in order to maxilinearly preach you these results for the design team;
-
you’ll want to list each one of these separately so that they can be dealt with efficiently.
-
Separate listings can also help you avoid listing the same repeated problem over and over again.
-
If there’s a repeated element on every single screen, you don’t want to list it at every single screen;
-
you want to list it once so that it can be fixed once.
-
And these problems can be very detailed, like “the name of something is confusing,”
-
or it can be something that has to do more with the flow of the user interface,
-
or the architecture of the user experience and that’s not specifically tied to an interface element.
-
Your evaluators may also find that something is missing that ought to be there,
-
and this can be sometime ambiguous with early prototypes, like paper prototypes.
-
And so you’ll want to clarify whether the user interface is something that you believe to be complete,
-
or whether there are intentional elements missing ahead of time.
-
And, of course, sometimes there are features that are going to be obviously there
-
that are implied by the user interface.
-
And so, mellow out, and relax on those.
-
After your evaluators have gone through the interface,
-
they can each independently assign a severity rating to all of the problems that they’ve found.
-
And that’s going to enable you to allocate resources to fix those problems.
-
It can also help give you feedback about how well you’re doing
-
in terms of the usability of your system in general,
-
and give you a kind of a benchmark of your efforts in this vein.
-
The severity measure that your evaluators are going to come up with is going to combine several things:
-
It’s going to combine the frequency, the impact,
-
and the pervasiveness of the problem that they’re seeing on the screen.
-
So, something that is only in one place may be a less big deal
-
than something that shows up throughout the entire user interface.
-
Similarly, there are going to be some things like misaligned text,
-
which may be inelegant, but aren’t a deal killer in terms of your software.
-
And here is the severity rating system that Nielsen created; you can obviously use anything that you want:
-
It ranges from zero to four,
-
where zero is “at the end of the day your evaluator decides it actually is not usability problem,”
-
all the way up to it being something really catastrophic that has to get fixed right away.
-
And here is an example of a particular problem
-
that our TA Robby found when he was taking CS147 as a student.
-
He walked through somebody’s mobile interface that had a “weight” entry element to it;
-
and he realized that once you entered your weight, there is no way to edit it after the fact.
-
So, that’s kind of clunky, you wish you could fix it — maybe not a disaster.
-
And so what you see here is he’s listed the issue, he’s given it a severity rating,
-
he’s got the heuristic that it violates, and then he describes exactly what the problem is.
-
And finally, after all your evaluators have gone through the interface,
-
listed their problems, and combined them in terms of the severity and importance,
-
you’ll want to debrief with the design team.
-
This is a nice chance to be able to discuss general issues in the user interface and qualitative feedback,
-
and it gives you a chance to go through each of these items
-
and suggest improvements on how you can address these problems.
-
In this debrief session, it can be valuable for the development team
-
to estimate the amount of effort that it would take to fix one of these problems.
-
So, for example, if you’ve got something that is one on your severity scale and not too big a deal —
-
it might have something to do with wording and its dirt simple to fix —
-
that tells you “go ahead and fix it.”
-
Conversely, you may having something which is a catastrophe
-
which takes a lot more effort, but its importance will lead you to fix it.
-
And there’s other things where the importance relative to the cost involved
-
just don’t make sense to deal with right now.
-
And this debrief session can be a great way to brainstorm future design ideas,
-
especially while you’ve got all the stakeholders in the room,
-
and the ideas about what the issues are with the user interface are fresh in their minds.
-
In the next two videos we’ll go through Neilsons’ ten heuristics and talk more about what they mean.