WEBVTT

00:00:00.057 --> 00:00:01.092
So we spent a bunch of time

00:00:01.092 --> 00:00:03.032
in the last couple of lectures

00:00:03.032 --> 00:00:05.082
talking about different kinds of testing

00:00:05.082 --> 00:00:08.021
about unit testing versus integration testing

00:00:08.021 --> 00:00:10.010
We talked about how do you use RSpec

00:00:10.010 --> 00:00:12.049
to really isolate the parts of your code you want to test

00:00:12.049 --> 00:00:14.090
you’ve also, you know, because of homework 3,

00:00:14.090 --> 00:00:18.017
and other stuff, we have been doing BDD,

00:00:18.017 --> 00:00:20.062
where we’ve been using Cucumber to turn user stories

00:00:20.062 --> 00:00:22.095
into, essentially, integration and acceptance tests

00:00:22.095 --> 00:00:25.061
So you’ve seen testing in a couple of different levels

00:00:25.061 --> 00:00:27.063
and the goal here is sort of to do a few remarks

00:00:27.063 --> 00:00:29.092
to, you know, let’s back up a little bit

00:00:29.092 --> 00:00:33.001
and see the big picture, and tie those things together

00:00:33.001 --> 00:00:34.095
So this sort of spans material

00:00:34.095 --> 00:00:37.000
that covers three or four sections in the book

00:00:37.000 --> 00:00:39.061
and I want to just hit the high points in lecture

00:00:39.061 --> 00:00:41.046
So a question that comes up

00:00:41.046 --> 00:00:43.025
I’m sure it’s come up for all of you

00:00:43.025 --> 00:00:44.052
as you have been doing homework

00:00:44.052 --> 00:00:45.069
is: “How much testing is enough?”

00:00:45.069 --> 00:00:48.049
And, sadly, for a long time

00:00:48.049 --> 00:00:51.009
kind of if you asked this question in industry

00:00:51.009 --> 00:00:52.017
the answer was basically

00:00:52.017 --> 00:00:53.017
“Well, we have a shipping deadline,

00:00:53.017 --> 00:00:54.099
so however much testing we can do

00:00:54.099 --> 00:00:56.066
before that deadline, that’s how much.”

00:00:56.066 --> 00:00:58.015
That’s what you have time for.

00:00:58.015 --> 00:01:00.002
So, you know, that’s a little flip

00:01:00.002 --> 00:01:01.011
obviously not very good

00:01:01.011 --> 00:01:02.054
So you can do a bit better, right?

00:01:02.054 --> 00:01:03.070
There’re some static measures

00:01:03.070 --> 00:01:06.003
like how many lines of code does your app have

00:01:06.003 --> 00:01:08.021
and how many lines of tests do you have?

00:01:08.021 --> 00:01:10.029
And it’s not unusual in industry

00:01:10.029 --> 00:01:12.068
in a well-tested piece of software

00:01:12.068 --> 00:01:14.057
for the number of lines of tests

00:01:14.057 --> 00:01:17.073
to go far beyond the number of lines of code

00:01:17.073 --> 00:01:19.075
So, integer multiples are not unusual

00:01:19.075 --> 00:01:21.084
And I think even for sort of, you know,

00:01:21.084 --> 00:01:23.022
research code or classwork

00:01:23.022 --> 00:01:26.085
a ratio of, you know, maybe 1.5 is not unreasonable

00:01:26.085 --> 00:01:30.005
so one and a half times the amount of test code

00:01:30.005 --> 00:01:32.024
as you have application code

00:01:32.024 --> 00:01:34.022
And in a lot of production systems

00:01:34.022 --> 00:01:35.027
where they really care about testing

00:01:35.027 --> 00:01:36.091
it is much higher than that

00:01:36.091 --> 00:01:38.015
So maybe a better question to ask:

00:01:38.015 --> 00:01:39.047
Rather than saying “How much testing is enough?”

00:01:39.047 --> 00:01:42.049
is to ask “How good is the testing I am doing now?

00:01:42.049 --> 00:01:44.035
How thorough is it?”

00:01:44.035 --> 00:01:45.056
Later in this semester

00:01:45.056 --> 00:01:46.056
Professor Sen will talk about

00:01:46.056 --> 00:01:48.018
a little bit about formal methods

00:01:48.018 --> 00:01:50.085
and sort of what’s at the frontiers of testing and debugging

00:01:50.085 --> 00:01:52.068
But a couple of things that we can talk about

00:01:52.068 --> 00:01:54.007
based on what you already know

00:01:54.007 --> 00:01:57.074
is some basic concepts about test coverage

00:01:57.074 --> 00:01:59.054
And although I would say

00:01:59.054 --> 00:02:01.001
you know, we’ve been saying all along

00:02:01.001 --> 00:02:03.003
formal methods, they don’t really work on big systems

00:02:03.003 --> 00:02:05.033
I think that statement, in my personal opinion

00:02:05.033 --> 00:02:07.001
is actually a lot less true than it used to be

00:02:07.001 --> 00:02:09.019
I think there are a number of specific places

00:02:09.019 --> 00:02:10.052
especially in testing and debugging

00:02:10.052 --> 00:02:12.084
where formal methods are actually making fast progress

00:02:12.084 --> 00:02:15.075
and Koushik Sen is one of the leaders in that

00:02:15.075 --> 00:02:17.094
So you’ll have the opportunity to hear more about that later

00:02:17.094 --> 00:02:21.043
but for the moment I think, kind of bread and butter

00:02:21.043 --> 00:02:22.073
is let’s talk about coverage measurement

00:02:22.073 --> 00:02:24.047
because this is where the rubber meets the road

00:02:24.047 --> 00:02:26.020
in terms of how you’d be evaluated

00:02:26.020 --> 00:02:28.063
if you are doing this for real

00:02:28.063 --> 00:02:29.052
So what’s some basics?

00:02:29.052 --> 00:02:30.078
Here’s a really simple class you can use

00:02:30.078 --> 00:02:32.090
to talk about different ways to measure

00:02:32.090 --> 00:02:34.080
how our test covers this code

00:02:34.080 --> 00:02:36.063
And there’re a few different levels

00:02:36.063 --> 00:02:37.085
with different terminologies

00:02:37.085 --> 00:02:40.073
It’s not really universal across all software houses

00:02:40.073 --> 00:02:42.064
But one common set of terminology

00:02:42.064 --> 00:02:43.064
that the book exposes

00:02:43.064 --> 00:02:44.068
is we could talk about S0

00:02:44.068 --> 00:02:47.045
where we’d just mean you’ve called every method once

00:02:47.045 --> 00:02:50.045
So you know, if you call foo, and you call bar, you’re done

00:02:50.045 --> 00:02:52.015
That’s S0 coverage: not terribly thorough

00:02:52.015 --> 00:02:54.068
A little more stringent, S1, is

00:02:54.068 --> 00:02:56.013
you could say, we’re calling every method

00:02:56.013 --> 00:02:57.028
from every place that it could be called

00:02:57.028 --> 00:02:58.082
So what does that mean?

00:02:58.082 --> 00:03:00.007
It means, for example

00:03:00.007 --> 00:03:01.012
it’s not enough to call bar

00:03:01.012 --> 00:03:02.095
You have to make sure that you have to call it

00:03:02.095 --> 00:03:05.057
at least once from in here

00:03:05.057 --> 00:03:07.016
as well as calling it once

00:03:07.016 --> 00:03:10.037
from any exterior function that might call it

00:03:10.037 --> 00:03:12.081
C0 which is what SimpleCov measures

00:03:12.081 --> 00:03:15.099
(those of you who’ve gotten SimpleCov up and running)

00:03:15.099 --> 00:03:18.052
basically says you’ve executed every statement

00:03:18.052 --> 00:03:20.004
you’ve touched every statement in your code once

00:03:20.004 --> 00:03:22.048
But the caveat there is that

00:03:22.048 --> 00:03:25.058
conditionals really just count as a single statement

00:03:25.058 --> 00:03:28.091
So, if you, no matter which branch of this “if” you took

00:03:28.091 --> 00:03:31.074
as long as you touched one of the other branch

00:03:31.074 --> 00:03:33.035
you’ve executed the “if’ statement

00:03:33.035 --> 00:03:35.066
So even C0 is still, you know, sort of superficial coverage

00:03:35.066 --> 00:03:37.026
But, as we will see

00:03:37.026 --> 00:03:39.023
the way that you will want to read this information is:

00:03:39.023 --> 00:03:41.079
if you are getting <i>bad</i> coverage at the C0 level

00:03:41.079 --> 00:03:44.007
then you have really really bad coverage

00:03:44.007 --> 00:03:46.008
So if you are not kind of making

00:03:46.008 --> 00:03:47.037
this simple level of superficial coverage

00:03:47.037 --> 00:03:50.002
then your testing is probably deficient

00:03:50.002 --> 00:03:51.091
C1 is the next step up from that

00:03:51.091 --> 00:03:53.071
We could say:

00:03:53.071 --> 00:03:55.019
Well, we have to take every branch in both directions

00:03:55.019 --> 00:03:56.061
So, when we are doing this “if” statement

00:03:56.061 --> 00:03:58.066
we have to make sure that

00:03:58.066 --> 00:03:59.092
we do the “if x” part once

00:03:59.092 --> 00:04:05.013
and the “if not x” part at least once to meet C1

00:04:05.013 --> 00:04:08.036
You can augment that with decision coverage

00:04:08.036 --> 00:04:09.063
saying: Well, if we’re gonna…

00:04:09.063 --> 00:04:12.036
If we have “if” statments where the condition

00:04:12.036 --> 00:04:13.088
is made up of multiple terms

00:04:13.088 --> 00:04:15.071
we have to make sure that every subexpression

00:04:15.071 --> 00:04:17.097
has been evaluated both directions

00:04:17.097 --> 00:04:19.067
In other words, that means that

00:04:19.067 --> 00:04:22.041
if we’re going to fail this “if” statement

00:04:22.041 --> 00:04:24.034
we have to make sure to fail it at least once

00:04:24.034 --> 00:04:26.044
because y was false in at least once because z was false

00:04:26.044 --> 00:04:28.088
In other words, any subexpression that could

00:04:28.088 --> 00:04:31.021
independently change the outcome of the condition

00:04:31.021 --> 00:04:34.048
has to be exercised in both directions

00:04:34.048 --> 00:04:36.003
And then,

00:04:36.003 --> 00:04:38.052
kind of, the one that, you know, a lot of people aspire to

00:04:38.052 --> 00:04:41.026
but there is disagreement on how much more valuable it is

00:04:41.026 --> 00:04:42.083
is you take every path through the code

00:04:42.083 --> 00:04:45.053
Obviously, this is kind of difficult because

00:04:45.053 --> 00:04:48.033
it tends to be exponential in the number of conditions

00:04:48.033 --> 00:04:53.008
And in general it’s difficult

00:04:53.008 --> 00:04:55.031
to evaluate if you’ve taken every path through the code

00:04:55.031 --> 00:04:57.001
There are formal techniques that you can use

00:04:57.001 --> 00:04:58.083
to tell you where the holes are

00:04:58.083 --> 00:05:01.031
but the bottom line is that

00:05:01.031 --> 00:05:03.004
in most commercial software houses

00:05:03.004 --> 00:05:04.089
there is, I would say, not complete consensus

00:05:04.089 --> 00:05:06.070
on how much more valuable C2 is

00:05:06.070 --> 00:05:08.068
compared to C0 or C1

00:05:08.068 --> 00:05:10.013
So, I think, for the purpose of our class

00:05:10.013 --> 00:05:11.067
you get exposed to the idea

00:05:11.067 --> 00:05:13.020
of how you use coverage information

00:05:13.020 --> 00:05:16.040
SimpleCov takes advantage of some built-in Ruby features

00:05:16.040 --> 00:05:18.009
to give you C0 coverage

00:05:18.009 --> 00:05:19.062
[It] does really nice reports

00:05:19.062 --> 00:05:21.025
We can sort of see it

00:05:21.025 --> 00:05:22.096
at the level of individual lines in your file

00:05:22.096 --> 00:05:24.091
You can see what your coverage is

00:05:24.091 --> 00:05:27.015
and I think that’s kind of a, you know

00:05:27.015 --> 00:05:31.018
a good start for where we are

00:05:31.018 --> 00:05:33.076
So, having see a sort of different flavours of tests

00:05:33.076 --> 00:05:37.020
Stepping back and looking back at the big picture

00:05:37.020 --> 00:05:38.098
what are the different kind of tests

00:05:38.098 --> 00:05:40.078
that we’ve seen concretely?

00:05:40.078 --> 00:05:42.032
and what are the tradeoffs

00:05:42.032 --> 00:05:43.089
between using those different kinds of tests?

00:05:43.089 --> 00:05:47.016
So we’ve seen at the level of individual classes or methods

00:05:47.016 --> 00:05:50.009
we use RSpec, with extensive use of mocking and stubbing

00:05:50.009 --> 00:05:53.004
So, for example when we do testing methods in the model

00:05:53.004 --> 00:05:55.057
that will be an example of unit testing

00:05:55.057 --> 00:05:59.025
We also did something that is pretty similar to

00:05:59.025 --> 00:06:00.097
functional or module testing

00:06:00.097 --> 00:06:02.071
where there is more than one module participating

00:06:02.071 --> 00:06:04.065
So, for example when we did controller specs

00:06:04.065 --> 00:06:07.085
we saw that—we simulate a POST action

00:06:07.085 --> 00:06:09.029
but remember that the POST action

00:06:09.029 --> 00:06:10.086
has to go through the routing subsystem

00:06:10.086 --> 00:06:12.042
before it gets to the controller

00:06:12.042 --> 00:06:14.048
Once the controller is done it will try to render a view

00:06:14.048 --> 00:06:16.007
So in fact there’s other pieces

00:06:16.007 --> 00:06:17.067
that collaborate with the controller

00:06:17.067 --> 00:06:19.099
that have to be working in order for controller specs to pass

00:06:19.099 --> 00:06:21.051
So that’s somewhere inbetween:

00:06:21.051 --> 00:06:23.035
where we’re doing more than a single method

00:06:23.035 --> 00:06:25.000
touching more than a single class

00:06:25.000 --> 00:06:27.000
but we’re still concentrating [our] attention

00:06:27.000 --> 00:06:28.088
on a fairly narrow slice of the system at a time

00:06:28.088 --> 00:06:31.044
and we’re still using mocking and stubbing extensively

00:06:31.044 --> 00:06:35.030
to sort of isolate that behaviour that we want to test

00:06:35.030 --> 00:06:36.091
And then at the level of Cucumber scenarios

00:06:36.091 --> 00:06:38.047
these are more like integration or system tests

00:06:38.047 --> 00:06:41.069
They exercise complete paths throughout the application

00:06:41.069 --> 00:06:43.044
They probably touch a lot of different modules

00:06:43.044 --> 00:06:46.003
They make minimal use of mocks and stubs

00:06:46.003 --> 00:06:48.032
because part of the goal of an integration test

00:06:48.032 --> 00:06:50.099
is exactly to test the interaction between pieces

00:06:50.099 --> 00:06:53.021
So you don’t want to stub or control those interactions

00:06:53.021 --> 00:06:54.080
You actually want to let the system do

00:06:54.080 --> 00:06:56.030
what it would really do

00:06:56.030 --> 00:06:58.025
if this was a scenario happening in production

00:06:58.025 --> 00:07:00.069
So how would we compare these different kinds of tests?

00:07:00.069 --> 00:07:02.038
There’s a few different axes we can look at

00:07:02.038 --> 00:07:05.007
One of them is how long they take to run

00:07:05.007 --> 00:07:06.090
Now, both RSpec and Cucumber

00:07:06.090 --> 00:07:09.013
have, kind of, high startup times and stuff like that

00:07:09.013 --> 00:07:10.008
But, as you’ll see

00:07:10.008 --> 00:07:11.090
as you start adding more and more RSpec tests

00:07:11.090 --> 00:07:14.038
and using autotest to run them in the background

00:07:14.038 --> 00:07:17.088
by and large, once RSpec kind of gets off the launching pad

00:07:17.088 --> 00:07:19.092
it runs specs really fast

00:07:19.092 --> 00:07:21.095
whereas running Cucumber features just takes a long time

00:07:21.095 --> 00:07:24.059
as it essentially fires up your entire application

00:07:24.059 --> 00:07:26.010
And later in this semester

00:07:26.010 --> 00:07:28.086
we’ll see a way to make Cucumber even slower—

00:07:28.086 --> 00:07:30.070
which is to have it fire up an entire browser

00:07:30.070 --> 00:07:33.045
basically act like a puppet, remote-controlling Firefox

00:07:33.045 --> 00:07:35.083
so you can test Javascript code

00:07:35.083 --> 00:07:37.000
We’ll do that when we actually—

00:07:37.000 --> 00:07:40.032
I think we’ll be able to work with our friends at SourceLabs

00:07:40.032 --> 00:07:42.080
so you can do that in the cloud—That will be exciting

00:07:42.080 --> 00:07:45.083
So, “run fast” versus “run slow”

00:07:45.083 --> 00:07:46.068
Resolution:

00:07:46.068 --> 00:07:48.025
If an error happens in your unit tests

00:07:48.025 --> 00:07:49.075
it’s usually pretty easy

00:07:49.075 --> 00:07:52.029
to figure out and track down what the source of that error is

00:07:52.029 --> 00:07:53.071
because the tests are so isolated

00:07:53.071 --> 00:07:56.025
You’ve stubbed out everything that doesn’t matter

00:07:56.025 --> 00:07:58.025
and you’re focusing on only the behaviour of interest

00:07:58.025 --> 00:07:59.076
So, if you’ve done a good job of doing that

00:07:59.076 --> 00:08:01.097
when something goes wrong in one of your tests

00:08:01.097 --> 00:08:03.045
there’s not a lot of places

00:08:03.045 --> 00:08:04.088
that something could have gone wrong

00:08:04.088 --> 00:08:07.041
In contrast, if you’re running a Cucumber scenario

00:08:07.041 --> 00:08:08.089
that’s got, you know, 10 steps

00:08:08.089 --> 00:08:10.031
and every step is touching

00:08:10.031 --> 00:08:11.061
a whole bunch of pieces of the app

00:08:11.061 --> 00:08:12.091
it could take a long time

00:08:12.091 --> 00:08:14.076
to actually get to the bottom of a bug

00:08:14.076 --> 00:08:16.014
So it is kind of a tradeoff

00:08:16.014 --> 00:08:17.054
between how well can you localize errors

00:08:17.054 --> 00:08:20.065
Coverage:

00:08:20.065 --> 00:08:23.002
It’s possible if you write a good suite

00:08:23.002 --> 00:08:24.072
of unit and functional tests

00:08:24.072 --> 00:08:26.020
you can get really high coverage

00:08:26.020 --> 00:08:27.085
You can run your SimpleCov report

00:08:27.085 --> 00:08:30.080
and you can actually identify specific lines in your files

00:08:30.080 --> 00:08:32.036
that have not been exercised by any test

00:08:32.036 --> 00:08:34.016
and then you can go right at tests that cover them

00:08:34.016 --> 00:08:36.014
So, figuring out how to improve your coverage

00:08:36.014 --> 00:08:37.057
for example at the C0 level

00:08:37.057 --> 00:08:40.021
is something much more easily done with unit tests

00:08:40.021 --> 00:08:42.018
whereas, with a Cucumber test—

00:08:42.018 --> 00:08:43.078
with a Cucumber scenario—

00:08:43.078 --> 00:08:45.076
you <i>are</i> touching a lot of parts of the code

00:08:45.076 --> 00:08:47.080
but you are doing it very sparsely

00:08:47.080 --> 00:08:49.038
So, if your goal is to get your coverage up

00:08:49.038 --> 00:08:51.031
use the tools at that are at the unit levels

00:08:51.031 --> 00:08:53.007
so that you can focusing on understanding

00:08:53.007 --> 00:08:54.074
what parts or my code are undertested

00:08:54.074 --> 00:08:56.055
and then you can write very targeted tests

00:08:56.055 --> 00:08:58.086
just to focus on them

00:08:58.086 --> 00:09:01.043
And, sort of, you know, putting those pieces together

00:09:01.043 --> 00:09:03.039
the unit tests

00:09:03.039 --> 00:09:05.059
because of their isolation and their fine resolution

00:09:05.059 --> 00:09:07.039
tend to use a lot of mocks

00:09:07.039 --> 00:09:09.012
to isolate the behaviours you don’t care about

00:09:09.012 --> 00:09:11.020
But that means that, by definition

00:09:11.020 --> 00:09:12.070
you’re not testing the interfaces

00:09:12.070 --> 00:09:14.099
and it’s sort of a “received wisdom” in software

00:09:14.099 --> 00:09:16.069
that a lot of the interesting bugs

00:09:16.069 --> 00:09:18.076
occur at the interfaces between pieces

00:09:18.076 --> 00:09:20.078
and not sort of within a class or within a method—

00:09:20.078 --> 00:09:22.040
those are sort of the easy bugs to track down

00:09:22.040 --> 00:09:24.026
And at the other extreme

00:09:24.026 --> 00:09:26.081
the more you get towards the integration testing extreme

00:09:26.081 --> 00:09:29.072
you’re supposed to rely less and less on mocks

00:09:29.072 --> 00:09:30.090
for that exact reason

00:09:30.090 --> 00:09:32.066
Now we saw, if you’re testing something like

00:09:32.066 --> 00:09:34.015
say, in a service-oriented architecture

00:09:34.015 --> 00:09:35.089
where you have to interact with the remote site

00:09:35.089 --> 00:09:37.028
you still end up

00:09:37.028 --> 00:09:38.094
having to do a fair amount of mocking and stubbing

00:09:38.094 --> 00:09:40.028
so that you don’t rely on the Internet

00:09:40.028 --> 00:09:41.067
in order for your tests to pass

00:09:41.067 --> 00:09:43.006
but, generally speaking

00:09:43.006 --> 00:09:47.014
you’re trying to remove as many of the mocks that you can

00:09:47.014 --> 00:09:48.095
and let the system run the way it would run in real life

00:09:48.095 --> 00:09:52.070
So, the good news is you <i>are</i> testing the interfaces

00:09:52.070 --> 00:09:54.074
<i>but</i> when something goes wrong in one of the interfaces

00:09:54.074 --> 00:09:57.053
because your resolution is not as good

00:09:57.053 --> 00:10:00.031
it may take longer to figure out what it is

00:10:00.031 --> 00:10:05.019
So, what’s sort of the high-order bit from this tradeoff

00:10:05.019 --> 00:10:07.024
is you don’t really want to rely

00:10:07.024 --> 00:10:08.076
too heavily on any one kind of test

00:10:08.076 --> 00:10:10.078
They serve different purposes and, depending on

00:10:10.078 --> 00:10:13.043
are you trying to exercise your interfaces more

00:10:13.043 --> 00:10:15.089
or are you trying to improve your fine-grain coverage

00:10:15.089 --> 00:10:18.003
that affects how you develop your test suite

00:10:18.003 --> 00:10:20.065
and you’ll evolve it along with your software

00:10:20.065 --> 00:10:24.014
So, we’ve used a certain set of terminology in testing

00:10:24.014 --> 00:10:26.028
It’s the terminology that, by and large

00:10:26.028 --> 00:10:29.001
is most commonly used in the Rails community

00:10:29.001 --> 00:10:30.060
but there’s some variation

00:10:30.060 --> 00:10:33.069
[and] some other terms that you might hear

00:10:33.069 --> 00:10:35.018
if you go get a job somewhere

00:10:35.018 --> 00:10:36.093
and you hear about mutation testing

00:10:36.093 --> 00:10:38.072
which we haven’t done

00:10:38.072 --> 00:10:40.024
This is an interesting idea that was, I think, invented by

00:10:40.024 --> 00:10:43.037
Ammann and Offutt, who have, sort of

00:10:43.037 --> 00:10:44.093
the definitive book on software testing

00:10:44.093 --> 00:10:46.048
The idea is:

00:10:46.048 --> 00:10:48.000
Suppose I introduced a deliberate bug into my code

00:10:48.000 --> 00:10:49.051
does that force some test to fail?

00:10:49.051 --> 00:10:53.003
Because, if I changed, you know, “if x” to “if not x”

00:10:53.003 --> 00:10:56.010
and no tests fail, then either I’m missing some coverage

00:10:56.010 --> 00:10:59.019
or my app is very strange and somehow nondeterministic

00:10:59.019 --> 00:11:03.099
Fuzz testing, which Koushik Sen may talk more about

00:11:03.099 --> 00:11:07.085
basically, this is the “10,000 monkeys at typewriters

00:11:07.085 --> 00:11:09.024
throwing random input at your code”

00:11:09.024 --> 00:11:10.037
What’s interesting about it is that

00:11:10.037 --> 00:11:11.065
those tests we’ve been doing

00:11:11.065 --> 00:11:13.086
essentially are crafted to test the app

00:11:13.086 --> 00:11:15.058
the way it was designed

00:11:15.058 --> 00:11:16.088
and these, you know, fuzz testing

00:11:16.088 --> 00:11:19.064
is about testing the app in ways it <i>wasn’t</i> meant to be used

00:11:19.064 --> 00:11:22.098
So, what happens if you throw enormous form submissions

00:11:22.098 --> 00:11:25.036
What happens if you put control characters in your forms?

00:11:25.036 --> 00:11:27.062
What happens if you submit the same thing over and over?

00:11:27.062 --> 00:11:29.093
And, Koushik has a statistic that

00:11:29.093 --> 00:11:32.033
Microsoft finds up to 20% of their bugs

00:11:32.033 --> 00:11:34.064
using some variation of fuzz testing

00:11:34.064 --> 00:11:36.029
and that about 25%

00:11:36.029 --> 00:11:39.021
of the common Unix command-line programs

00:11:39.021 --> 00:11:40.092
can be made to crash

00:11:40.092 --> 00:11:44.018
[when] put through aggressive fuzz testing

00:11:44.018 --> 00:11:46.089
Defining-use coverage is something that we haven’t done

00:11:46.089 --> 00:11:48.089
but it’s another interesting concept

00:11:48.089 --> 00:11:50.089
The idea is that at any point in my program

00:11:50.089 --> 00:11:52.062
there’s a place where I define—

00:11:52.062 --> 00:11:54.046
or I assign a value to some variable—

00:11:54.046 --> 00:11:56.000
and then there’s a place downstream

00:11:56.000 --> 00:11:57.075
where presumably I’m going to consume that value—

00:11:57.075 --> 00:11:59.058
someone’s going to use that value

00:11:59.058 --> 00:12:01.013
Have I covered every pair?

00:12:01.013 --> 00:12:02.059
In other words, do I have tests where every pair

00:12:02.059 --> 00:12:04.054
of defining a variable and using it somewhere

00:12:04.054 --> 00:12:07.014
is executed at some part of my test suites

00:12:07.014 --> 00:12:10.071
It’s sometimes called DU-coverage

00:12:10.071 --> 00:12:14.011
And other terms that I think are not as widely used anymore

00:12:14.011 --> 00:12:17.071
blackbox versus whitebox, or blackbox versus glassbox

00:12:17.071 --> 00:12:20.025
Roughly, a blackbox test is one that is written from

00:12:20.025 --> 00:12:22.041
the point of view of the external specification of the thing

00:12:22.041 --> 00:12:24.022
[For example:] “This is a hash table

00:12:24.022 --> 00:12:26.015
When I put in a key I should get back a value

00:12:26.015 --> 00:12:28.011
If I delete the key the value shouldn’t be there”

00:12:28.011 --> 00:12:29.099
That’s a blackbox test because it doesn’t say

00:12:29.099 --> 00:12:32.028
anything about how the hash table is implemented

00:12:32.028 --> 00:12:34.072
and it doesn’t try to stress the implementation

00:12:34.072 --> 00:12:36.056
A corresponding whitebox test might be:

00:12:36.056 --> 00:12:38.008
“I know something about the hash function

00:12:38.008 --> 00:12:39.098
and I’m going to deliberately create

00:12:39.098 --> 00:12:41.088
hash keys in my test cases

00:12:41.088 --> 00:12:43.078
that cause a lot of hash collisions

00:12:43.078 --> 00:12:45.095
to make sure that I’m testing that part of the functionality”

00:12:45.095 --> 00:12:49.007
Now, a C0 test coverage tool, like SimpleCov

00:12:49.007 --> 00:12:52.001
would reveal that if all you had is blackbox tests

00:12:52.001 --> 00:12:53.028
you might find that

00:12:53.028 --> 00:12:55.056
the collision coverage code wasn’t being hit very often

00:12:55.056 --> 00:12:56.075
And that might tip you off and say:

00:12:56.075 --> 00:12:58.028
“Ok, if I really want to strengthen that—

00:12:58.028 --> 00:13:00.008
for one, if I want to boost coverage for those tests

00:13:00.008 --> 00:13:02.006
now I have to write a whitebox or a glassbox test

00:13:02.006 --> 00:13:04.057
I have to look inside, see what the implementation does

00:13:04.057 --> 00:13:05.061
and find specific ways

00:13:05.061 --> 00:13:10.060
to try to break the implementation in evil ways”

00:13:10.060 --> 00:13:13.075
So, I think, testing is a kind of a way of life, right?

00:13:13.075 --> 00:13:16.069
We’ve gotten away from the phase of

00:13:16.069 --> 00:13:18.033
“We’d build the whole thing and then we’d test it”

00:13:18.033 --> 00:13:19.092
and we’ve gotten into the phase of

00:13:19.092 --> 00:13:20.077
“We’re testing as we go”

00:13:20.077 --> 00:13:22.048
Testing is really more like a development tool

00:13:22.048 --> 00:13:24.022
and like so many development tools

00:13:24.022 --> 00:13:25.062
the effectiveness of it depends

00:13:25.062 --> 00:13:27.013
on whether you’re using it in a tasteful manner

00:13:27.013 --> 00:13:31.002
So, you could say: “Well, let’s see—I kicked the tires

00:13:31.002 --> 00:13:33.048
You know, I fired up the browser, I tried a couple of things

00:13:33.048 --> 00:13:35.097
(claps hand) Looks like it works! Deploy it!”

00:13:35.097 --> 00:13:38.045
That’s obviously a little more cavalier than you’d want to be

00:13:38.045 --> 00:13:41.024
And, by the way, one of the things that we discovered

00:13:41.024 --> 00:13:43.077
with this online course just starting up

00:13:43.077 --> 00:13:45.090
when 60,000 people are enrolled in the course

00:13:45.090 --> 00:13:48.099
and 0.1% of those people have a problem

00:13:48.099 --> 00:13:50.083
you’d get 60 emails

00:13:50.083 --> 00:13:53.078
The corollary is: when your site is used by a lot of people

00:13:53.078 --> 00:13:55.089
some stupid bug that you didn’t find

00:13:55.089 --> 00:13:57.018
but that could have found by testing

00:13:57.018 --> 00:13:59.080
could very quickly generate *a lot* of pain

00:13:59.080 --> 00:14:02.023
On the other hand, you don’t want to be dogmatic and say

00:14:02.023 --> 00:14:04.056
“Uh, until we have 100% coverage and every test is green

00:14:04.056 --> 00:14:06.005
we absolutely will not ship”

00:14:06.005 --> 00:14:07.012
That’s not healthy either

00:14:07.012 --> 00:14:08.048
And the test quality

00:14:08.048 --> 00:14:10.057
doesn’t necessarily correlate with the statement

00:14:10.057 --> 00:14:11.064
unless you can say something

00:14:11.064 --> 00:14:12.068
about the quality of your tests

00:14:12.068 --> 00:14:14.029
just because you’ve executed every line

00:14:14.029 --> 00:14:17.010
doesn’t mean that you’ve tested the interesting cases

00:14:17.010 --> 00:14:18.068
So, somewhere in between, you could say

00:14:18.068 --> 00:14:20.014
“Well, we’ll use coverage tools to identify

00:14:20.014 --> 00:14:23.004
undertested or poorly-tested parts of the code

00:14:23.004 --> 00:14:24.073
and we’ll use them as a guideline

00:14:24.073 --> 00:14:27.011
to sort of help improve our overall confidence level”

00:14:27.011 --> 00:14:29.024
But remember, Agile is about embracing change

00:14:29.024 --> 00:14:30.032
and dealing with it

00:14:30.032 --> 00:14:32.002
Part of change is things would change that will cause

00:14:32.002 --> 00:14:33.038
bugs that you didn’t foresee

00:14:33.038 --> 00:14:34.031
and the right reaction is:

00:14:34.031 --> 00:14:36.026
Be comfortable enough for the testing tools

00:14:36.026 --> 00:14:37.064
[so] that you can quickly find those bugs

00:14:37.064 --> 00:14:39.025
Write a test that reproduces that bug

00:14:39.025 --> 00:14:40.062
And then make the test green

00:14:40.062 --> 00:14:41.061
Then you’ll really fix it

00:14:41.061 --> 00:14:43.004
That means, the way that you really fix a bug is

00:14:43.004 --> 00:14:45.049
if you created a test that correctly failed

00:14:45.049 --> 00:14:46.088
to reproduce that bug

00:14:46.088 --> 00:14:48.055
and then you went back and fixed the code

00:14:48.055 --> 00:14:49.057
to make those tests pass

00:14:49.057 --> 00:14:51.073
Similarly, you don’t want to say

00:14:51.073 --> 00:14:53.036
“Well, unit tests give you better coverage

00:14:53.036 --> 00:14:54.073
They’re more thorough and detailed

00:14:54.073 --> 00:14:56.044
So let’s focus all our energy on that”

00:14:56.044 --> 00:14:57.062
as opposed to

00:14:57.062 --> 00:14:58.093
“Oh, focus on integration tests

00:14:58.093 --> 00:15:00.006
because they’re more realistic, right?

00:15:00.006 --> 00:15:01.056
They reflect what the customer said they want

00:15:01.056 --> 00:15:03.034
So, if the integration tests are passing

00:15:03.034 --> 00:15:05.067
by definition we’re meeting a customer need”

00:15:05.067 --> 00:15:07.034
Again, both extremes are kind of unhealthy

00:15:07.034 --> 00:15:09.079
because each one of these can find problems

00:15:09.079 --> 00:15:11.031
that would be missed by the other

00:15:11.031 --> 00:15:12.060
So, having a good combination of them

00:15:12.060 --> 00:15:15.042
is kind of all it is all about

00:15:15.042 --> 00:15:18.072
The last thing I want to leave you with is, I think

00:15:18.072 --> 00:15:20.036
in terms of testing, is “TDD versus

00:15:20.036 --> 00:15:22.005
what I call conventional debugging—

00:15:22.005 --> 00:15:24.004
i.e., the way that we all kind of do it

00:15:24.004 --> 00:15:25.051
even though we say we don’t”

00:15:25.051 --> 00:15:26.064
and we’re all trying to get better, right?

00:15:26.064 --> 00:15:27.085
We’re all kind of in the gutter

00:15:27.085 --> 00:15:29.036
Some of us are looking up at the stars

00:15:29.036 --> 00:15:31.011
trying to improve our practices

00:15:31.011 --> 00:15:33.099
But, having now lived with this for 3 or 4 years myself

00:15:33.099 --> 00:15:35.091
and—I’ll be honest—3 years ago I didn’t do TDD

00:15:35.091 --> 00:15:37.079
I do it now, because I find that it’s better

00:15:37.079 --> 00:15:40.081
and here’s my distillation of why I think it works for me

00:15:40.081 --> 00:15:43.032
Sorry, the colours are a little weird

00:15:43.032 --> 00:15:45.000
but on the left column of the table

00:15:45.000 --> 00:15:46.034
[it] says “Conventional debugging”

00:15:46.034 --> 00:15:47.044
and the right side says “TDD”

00:15:47.044 --> 00:15:49.069
So what’s the way I used to write code?

00:15:49.069 --> 00:15:51.056
Maybe some of you still do this

00:15:51.056 --> 00:15:53.013
I write a whole bunch of lines

00:15:53.013 --> 00:15:54.043
maybe a few tens of lines of code

00:15:54.043 --> 00:15:55.059
I’m <i>sure</i> they’re right—

00:15:55.059 --> 00:15:56.061
I mean, I <i>am</i> a good programmer, right?

00:15:56.061 --> 00:15:57.099
This is not that hard

00:15:57.099 --> 00:15:59.002
I run it – It doesn’t work

00:15:59.002 --> 00:16:01.098
Ok, fire up the debugger – Start putting in printf’s

00:16:01.098 --> 00:16:04.088
If I’d been using TDD what would I do instead?

00:16:04.088 --> 00:16:08.022
Well I’d write a <i>few</i> lines of code, having written a test first

00:16:08.022 --> 00:16:10.071
So as soon as the test goes from red to green

00:16:10.071 --> 00:16:12.064
I know I wrote code that works—

00:16:12.064 --> 00:16:15.013
or at least the parts of the behaviour that I had in mind

00:16:15.013 --> 00:16:16.096
Those parts of the behaviour work, because I had a test

00:16:16.096 --> 00:16:19.056
Ok, back to conventional debugging:

00:16:19.056 --> 00:16:21.073
I’m running my program, trying to find the bugs

00:16:21.073 --> 00:16:23.028
I start putting in printf’s everywhere

00:16:23.028 --> 00:16:24.062
to print out the values of things

00:16:24.062 --> 00:16:25.064
which by the way is a lot fun

00:16:25.064 --> 00:16:26.073
when you’re trying to read them

00:16:26.073 --> 00:16:28.014
out of the 500 lines of log output

00:16:28.014 --> 00:16:29.035
that you’d get in a Rails app

00:16:29.035 --> 00:16:30.087
trying to find <i>your</i> printf’s

00:16:30.087 --> 00:16:32.035
you know, “I know what I’ll do—

00:16:32.035 --> 00:16:34.008
I’ll put in 75 asterisks before and after

00:16:34.008 --> 00:16:36.043
That will make it readable” (laughter)

00:16:36.043 --> 00:16:38.071
Who don’t—Ok, raise your hands if you don’t do this!

00:16:38.071 --> 00:16:40.090
Thank you for your honesty. (laughter) Ok.

00:16:40.090 --> 00:16:43.014
Or— Or I could do the other thing, I could say:

00:16:43.014 --> 00:16:45.030
Instead of printing the value of a variable

00:16:45.030 --> 00:16:47.039
why don’t I write a test that inspects it

00:16:47.039 --> 00:16:48.079
in such an expectation which should

00:16:48.079 --> 00:16:50.090
and I’ll know immediately in bright red letters

00:16:50.090 --> 00:16:53.033
if that expectation wasn’t met

00:16:53.033 --> 00:16:56.005
Ok, I’m back on the conventional debugging side:

00:16:56.005 --> 00:16:58.090
I break out the big guns: I pull out the Ruby debugger

00:16:58.092 --> 00:17:02.044
I set a debug breakpoint, and I now start <i>tweaking</i> and say

00:17:02.044 --> 00:17:04.085
“Oh, let’s see, I have to get past that ‘if’ statement

00:17:04.085 --> 00:17:06.002
so I have to set that thing

00:17:06.002 --> 00:17:07.063
Oh, I have to call that method and so I need to…”

00:17:07.063 --> 00:17:08.065
No!

00:17:08.065 --> 00:17:10.087
I <i>could</i> instead—if I’m going to do that anyway—

00:17:10.087 --> 00:17:13.000
let’s just do it in a file, set up some mocks and stubs

00:17:13.000 --> 00:17:16.045
to control the code path, make it go the way I want

00:17:16.045 --> 00:17:19.013
And now, “Ok, for sure I’ve fixed it!

00:17:19.013 --> 00:17:22.012
I’ll get out of the debugger, run it all again!”

00:17:22.012 --> 00:17:24.022
And, of course, 9 times out of 10, you didn’t fix it

00:17:24.022 --> 00:17:26.072
or you kind of partly fixed it but you didn’t completely fix it

00:17:26.072 --> 00:17:30.040
and now I have to do all these manual things all over again

00:17:30.040 --> 00:17:32.086
<i>or</i> I already have a bunch of tests

00:17:32.086 --> 00:17:34.031
and I can just rerun them automatically

00:17:34.031 --> 00:17:35.056
and I could, if some of them fail

00:17:35.056 --> 00:17:36.087
“Oh, I didn’t fix the whole thing

00:17:36.087 --> 00:17:38.040
No problem, I’ll just go back!”

00:17:38.040 --> 00:17:39.096
So, the bottom line is that

00:17:39.096 --> 00:17:41.095
you know, you <i>could</i> do it on the left side

00:17:41.095 --> 00:17:45.004
but you’re using the same techniques in both cases

00:17:45.004 --> 00:17:48.062
The only difference is, in one case you’re doing it manually

00:17:48.062 --> 00:17:50.004
which is boring and error-prone

00:17:50.004 --> 00:17:51.078
In the other case you’re doing a little more work

00:17:51.078 --> 00:17:53.095
but you can make it automatic and repeatable

00:17:53.095 --> 00:17:55.071
and have, you know, some high confidence

00:17:55.071 --> 00:17:57.003
that as you change things in your code

00:17:57.003 --> 00:17:58.092
you are not breaking stuff that used to work

00:17:58.092 --> 00:18:00.091
and basically it’s more productive

00:18:00.091 --> 00:18:02.047
So you’re doing all the same things

00:18:02.047 --> 00:18:04.037
but with a, kind of, “delta” extra work

00:18:04.037 --> 00:18:07.086
you are using your effort at a much higher leverage

00:18:07.086 --> 00:18:10.036
So that’s kind of my view of why TDD is a good thing

00:18:10.036 --> 00:18:11.088
It’s really, it doesn’t require new skills

00:18:11.088 --> 00:18:15.011
It just requires [you] to refactor your existing skills

00:18:15.011 --> 00:18:18.014
I also tried when I—again, honest confessions, right?—

00:18:18.014 --> 00:18:19.034
when I started doing this it was like

00:18:19.034 --> 00:18:21.049
“Ok, I gonna be teaching a course on Rails

00:18:21.049 --> 00:18:22.065
I should really focus on testing

00:18:22.065 --> 00:18:24.032
So I went back to some code I had written

00:18:24.032 --> 00:18:26.087
that was <i>working</i>—you know, that was decent code—

00:18:26.087 --> 00:18:29.006
and I started trying to write tests for it

00:18:29.006 --> 00:18:31.019
and it was *so painful*

00:18:31.019 --> 00:18:33.033
because the code wasn’t written in way that was testable

00:18:33.033 --> 00:18:34.097
There were all kinds of interactions

00:18:34.097 --> 00:18:36.038
There were, like, nested conditionals

00:18:36.038 --> 00:18:38.083
And if you wanted to isolate a particular statement

00:18:38.083 --> 00:18:41.070
and have it test—to trigger test—just that statement

00:18:41.070 --> 00:18:44.000
the amount of stuff you’d have to set up in your test

00:18:44.000 --> 00:18:45.009
to have it happen—

00:18:45.009 --> 00:18:46.040
remember when talked about mock train wrecks—

00:18:46.040 --> 00:18:48.014
you have to set up all this infrastructure

00:18:48.014 --> 00:18:49.063
just to get <i>one</i> line of code

00:18:49.063 --> 00:18:51.015
and you do that and you go

00:18:51.015 --> 00:18:52.074
“Gawd, testing is really not worth it!

00:18:52.074 --> 00:18:54.034
I wrote 20 lines of setup

00:18:54.034 --> 00:18:56.059
so that I could test two lines in my function!”

00:18:56.059 --> 00:18:58.085
What that’s really telling you—as I now realize—

00:18:58.085 --> 00:19:00.042
is your function is <i>bad</i>

00:19:00.042 --> 00:19:01.049
It’s a badly written function

00:19:01.049 --> 00:19:02.052
It’s not a testable function

00:19:02.052 --> 00:19:03.088
It’s got too many moving parts

00:19:03.088 --> 00:19:06.026
whose dependencies <i>can</i> be broken

00:19:06.026 --> 00:19:07.070
There’s no seams in my function

00:19:07.070 --> 00:19:11.008
that allow me to individually test the different behaviours

00:19:11.008 --> 00:19:12.083
And once you start doing Test First Development

00:19:12.083 --> 00:19:15.043
because you have to write your tests in small chunks

00:19:15.043 --> 00:19:17.053
it kind of make this problem go away

00:19:17.053 --> 99:59:59.999
So that’s been my epiphany