0:00:00.057,0:00:01.092
So we spent a bunch of time
0:00:01.092,0:00:03.032
in the last couple of lectures
0:00:03.032,0:00:05.082
talking about different kinds of testing
0:00:05.082,0:00:08.021
about unit testing versus integration testing
0:00:08.021,0:00:10.010
We talked about how do you use RSpec
0:00:10.010,0:00:12.049
to really isolate the parts of your code you want to test
0:00:12.049,0:00:14.090
you’ve also, you know, because of homework 3,
0:00:14.090,0:00:18.017
and other stuff, we have been doing BDD,
0:00:18.017,0:00:20.062
where we’ve been using Cucumber to turn user stories
0:00:20.062,0:00:22.095
into, essentially, integration and acceptance tests
0:00:22.095,0:00:25.061
So you’ve seen testing in a couple of different levels
0:00:25.061,0:00:27.063
and the goal here is sort of to do a few remarks
0:00:27.063,0:00:29.092
to, you know, let’s back up a little bit
0:00:29.092,0:00:33.001
and see the big picture, and tie those things together
0:00:33.001,0:00:34.095
So this sort of spans material
0:00:34.095,0:00:37.000
that covers three or four sections in the book
0:00:37.000,0:00:39.061
and I want to just hit the high points in lecture
0:00:39.061,0:00:41.046
So a question that comes up
0:00:41.046,0:00:43.025
I’m sure it’s come up for all of you
0:00:43.025,0:00:44.052
as you have been doing homework
0:00:44.052,0:00:45.069
is: “How much testing is enough?”
0:00:45.069,0:00:48.049
And, sadly, for a long time
0:00:48.049,0:00:51.009
kind of if you asked this question in industry
0:00:51.009,0:00:52.017
the answer was basically
0:00:52.017,0:00:53.017
“Well, we have a shipping deadline,
0:00:53.017,0:00:54.099
so however much testing we can do
0:00:54.099,0:00:56.066
before that deadline, that’s how much.”
0:00:56.066,0:00:58.015
That’s what you have time for.
0:00:58.015,0:01:00.002
So, you know, that’s a little flip
0:01:00.002,0:01:01.011
obviously not very good
0:01:01.011,0:01:02.054
So you can do a bit better, right?
0:01:02.054,0:01:03.070
There’re some static measures
0:01:03.070,0:01:06.003
like how many lines of code does your app have
0:01:06.003,0:01:08.021
and how many lines of tests do you have?
0:01:08.021,0:01:10.029
And it’s not unusual in industry
0:01:10.029,0:01:12.068
in a well-tested piece of software
0:01:12.068,0:01:14.057
for the number of lines of tests
0:01:14.057,0:01:17.073
to go far beyond the number of lines of code
0:01:17.073,0:01:19.075
So, integer multiples are not unusual
0:01:19.075,0:01:21.084
And I think even for sort of, you know,
0:01:21.084,0:01:23.022
research code or classwork
0:01:23.022,0:01:26.085
a ratio of, you know, maybe 1.5 is not unreasonable
0:01:26.085,0:01:30.005
so one and a half times the amount of test code
0:01:30.005,0:01:32.024
as you have application code
0:01:32.024,0:01:34.022
And in a lot of production systems
0:01:34.022,0:01:35.027
where they really care about testing
0:01:35.027,0:01:36.091
it is much higher than that
0:01:36.091,0:01:38.015
So maybe a better question to ask:
0:01:38.015,0:01:39.047
Rather than saying “How much testing is enough?”
0:01:39.047,0:01:42.049
is to ask “How good is the testing I am doing now?
0:01:42.049,0:01:44.035
How thorough is it?”
0:01:44.035,0:01:45.056
Later in this semester
0:01:45.056,0:01:46.056
Professor Sen will talk about
0:01:46.056,0:01:48.018
a little bit about formal methods
0:01:48.018,0:01:50.085
and sort of what’s at the frontiers of testing and debugging
0:01:50.085,0:01:52.068
But a couple of things that we can talk about
0:01:52.068,0:01:54.007
based on what you already know
0:01:54.007,0:01:57.074
is some basic concepts about test coverage
0:01:57.074,0:01:59.054
And although I would say
0:01:59.054,0:02:01.001
you know, we’ve been saying all along
0:02:01.001,0:02:03.003
formal methods, they don’t really work on big systems
0:02:03.003,0:02:05.033
I think that statement, in my personal opinion
0:02:05.033,0:02:07.001
is actually a lot less true than it used to be
0:02:07.001,0:02:09.019
I think there are a number of specific places
0:02:09.019,0:02:10.052
especially in testing and debugging
0:02:10.052,0:02:12.084
where formal methods are actually making fast progress
0:02:12.084,0:02:15.075
and Koushik Sen is one of the leaders in that
0:02:15.075,0:02:17.094
So you’ll have the opportunity to hear more about that later
0:02:17.094,0:02:21.043
but for the moment I think, kind of bread and butter
0:02:21.043,0:02:22.073
is let’s talk about coverage measurement
0:02:22.073,0:02:24.047
because this is where the rubber meets the road
0:02:24.047,0:02:26.020
in terms of how you’d be evaluated
0:02:26.020,0:02:28.063
if you are doing this for real
0:02:28.063,0:02:29.052
So what’s some basics?
0:02:29.052,0:02:30.078
Here’s a really simple class you can use
0:02:30.078,0:02:32.090
to talk about different ways to measure
0:02:32.090,0:02:34.080
how our test covers this code
0:02:34.080,0:02:36.063
And there’re a few different levels
0:02:36.063,0:02:37.085
with different terminologies
0:02:37.085,0:02:40.073
It’s not really universal across all software houses
0:02:40.073,0:02:42.064
But one common set of terminology
0:02:42.064,0:02:43.064
that the book exposes
0:02:43.064,0:02:44.068
is we could talk about S0
0:02:44.068,0:02:47.045
where we’d just mean you’ve called every method once
0:02:47.045,0:02:50.045
So you know, if you call foo, and you call bar, you’re done
0:02:50.045,0:02:52.015
That’s S0 coverage: not terribly thorough
0:02:52.015,0:02:54.068
A little more stringent, S1, is
0:02:54.068,0:02:56.013
you could say, we’re calling every method
0:02:56.013,0:02:57.028
from every place that it could be called
0:02:57.028,0:02:58.082
So what does that mean?
0:02:58.082,0:03:00.007
It means, for example
0:03:00.007,0:03:01.012
it’s not enough to call bar
0:03:01.012,0:03:02.095
You have to make sure that you have to call it
0:03:02.095,0:03:05.057
at least once from in here
0:03:05.057,0:03:07.016
as well as calling it once
0:03:07.016,0:03:10.037
from any exterior function that might call it
0:03:10.037,0:03:12.081
C0 which is what SimpleCov measures
0:03:12.081,0:03:15.099
(those of you who’ve gotten SimpleCov up and running)
0:03:15.099,0:03:18.052
basically says you’ve executed every statement
0:03:18.052,0:03:20.004
you’ve touched every statement in your code once
0:03:20.004,0:03:22.048
But the caveat there is that
0:03:22.048,0:03:25.058
conditionals really just count as a single statement
0:03:25.058,0:03:28.091
So, if you, no matter which branch of this “if” you took
0:03:28.091,0:03:31.074
as long as you touched one of the other branch
0:03:31.074,0:03:33.035
you’ve executed the “if’ statement
0:03:33.035,0:03:35.066
So even C0 is still, you know, sort of superficial coverage
0:03:35.066,0:03:37.026
But, as we will see
0:03:37.026,0:03:39.023
the way that you will want to read this information is:
0:03:39.023,0:03:41.079
if you are getting bad coverage at the C0 level
0:03:41.079,0:03:44.007
then you have really really bad coverage
0:03:44.007,0:03:46.008
So if you are not kind of making
0:03:46.008,0:03:47.037
this simple level of superficial coverage
0:03:47.037,0:03:50.002
then your testing is probably deficient
0:03:50.002,0:03:51.091
C1 is the next step up from that
0:03:51.091,0:03:53.071
We could say:
0:03:53.071,0:03:55.019
Well, we have to take every branch in both directions
0:03:55.019,0:03:56.061
So, when we are doing this “if” statement
0:03:56.061,0:03:58.066
we have to make sure that
0:03:58.066,0:03:59.092
we do the “if x” part once
0:03:59.092,0:04:05.013
and the “if not x” part at least once to meet C1
0:04:05.013,0:04:08.036
You can augment that with decision coverage
0:04:08.036,0:04:09.063
saying: Well, if we’re gonna…
0:04:09.063,0:04:12.036
If we have “if” statments where the condition
0:04:12.036,0:04:13.088
is made up of multiple terms
0:04:13.088,0:04:15.071
we have to make sure that every subexpression
0:04:15.071,0:04:17.097
has been evaluated both directions
0:04:17.097,0:04:19.067
In other words, that means that
0:04:19.067,0:04:22.041
if we’re going to fail this “if” statement
0:04:22.041,0:04:24.034
we have to make sure to fail it at least once
0:04:24.034,0:04:26.044
because y was false in at least once because z was false
0:04:26.044,0:04:28.088
In other words, any subexpression that could
0:04:28.088,0:04:31.021
independently change the outcome of the condition
0:04:31.021,0:04:34.048
has to be exercised in both directions
0:04:34.048,0:04:36.003
And then,
0:04:36.003,0:04:38.052
kind of, the one that, you know, a lot of people aspire to
0:04:38.052,0:04:41.026
but there is disagreement on how much more valuable it is
0:04:41.026,0:04:42.083
is you take every path through the code
0:04:42.083,0:04:45.053
Obviously, this is kind of difficult because
0:04:45.053,0:04:48.033
it tends to be exponential in the number of conditions
0:04:48.033,0:04:53.008
And in general it’s difficult
0:04:53.008,0:04:55.031
to evaluate if you’ve taken every path through the code
0:04:55.031,0:04:57.001
There are formal techniques that you can use
0:04:57.001,0:04:58.083
to tell you where the holes are
0:04:58.083,0:05:01.031
but the bottom line is that
0:05:01.031,0:05:03.004
in most commercial software houses
0:05:03.004,0:05:04.089
there is, I would say, not complete consensus
0:05:04.089,0:05:06.070
on how much more valuable C2 is
0:05:06.070,0:05:08.068
compared to C0 or C1
0:05:08.068,0:05:10.013
So, I think, for the purpose of our class
0:05:10.013,0:05:11.067
you get exposed to the idea
0:05:11.067,0:05:13.020
of how you use coverage information
0:05:13.020,0:05:16.040
SimpleCov takes advantage of some built-in Ruby features
0:05:16.040,0:05:18.009
to give you C0 coverage
0:05:18.009,0:05:19.062
[It] does really nice reports
0:05:19.062,0:05:21.025
We can sort of see it
0:05:21.025,0:05:22.096
at the level of individual lines in your file
0:05:22.096,0:05:24.091
You can see what your coverage is
0:05:24.091,0:05:27.015
and I think that’s kind of a, you know
0:05:27.015,0:05:31.018
a good start for where we are
0:05:31.018,0:05:33.076
So, having see a sort of different flavours of tests
0:05:33.076,0:05:37.020
Stepping back and looking back at the big picture
0:05:37.020,0:05:38.098
what are the different kind of tests
0:05:38.098,0:05:40.078
that we’ve seen concretely?
0:05:40.078,0:05:42.032
and what are the tradeoffs
0:05:42.032,0:05:43.089
between using those different kinds of tests?
0:05:43.089,0:05:47.016
So we’ve seen at the level of individual classes or methods
0:05:47.016,0:05:50.009
we use RSpec, with extensive use of mocking and stubbing
0:05:50.009,0:05:53.004
So, for example when we do testing methods in the model
0:05:53.004,0:05:55.057
that will be an example of unit testing
0:05:55.057,0:05:59.025
We also did something that is pretty similar to
0:05:59.025,0:06:00.097
functional or module testing
0:06:00.097,0:06:02.071
where there is more than one module participating
0:06:02.071,0:06:04.065
So, for example when we did controller specs
0:06:04.065,0:06:07.085
we saw that—we simulate a POST action
0:06:07.085,0:06:09.029
but remember that the POST action
0:06:09.029,0:06:10.086
has to go through the routing subsystem
0:06:10.086,0:06:12.042
before it gets to the controller
0:06:12.042,0:06:14.048
Once the controller is done it will try to render a view
0:06:14.048,0:06:16.007
So in fact there’s other pieces
0:06:16.007,0:06:17.067
that collaborate with the controller
0:06:17.067,0:06:19.099
that have to be working in order for controller specs to pass
0:06:19.099,0:06:21.051
So that’s somewhere inbetween:
0:06:21.051,0:06:23.035
where we’re doing more than a single method
0:06:23.035,0:06:25.000
touching more than a single class
0:06:25.000,0:06:27.000
but we’re still concentrating [our] attention
0:06:27.000,0:06:28.088
on a fairly narrow slice of the system at a time
0:06:28.088,0:06:31.044
and we’re still using mocking and stubbing extensively
0:06:31.044,0:06:35.030
to sort of isolate that behaviour that we want to test
0:06:35.030,0:06:36.091
And then at the level of Cucumber scenarios
0:06:36.091,0:06:38.047
these are more like integration or system tests
0:06:38.047,0:06:41.069
They exercise complete paths throughout the application
0:06:41.069,0:06:43.044
They probably touch a lot of different modules
0:06:43.044,0:06:46.003
They make minimal use of mocks and stubs
0:06:46.003,0:06:48.032
because part of the goal of an integration test
0:06:48.032,0:06:50.099
is exactly to test the interaction between pieces
0:06:50.099,0:06:53.021
So you don’t want to stub or control those interactions
0:06:53.021,0:06:54.080
You actually want to let the system do
0:06:54.080,0:06:56.030
what it would really do
0:06:56.030,0:06:58.025
if this was a scenario happening in production
0:06:58.025,0:07:00.069
So how would we compare these different kinds of tests?
0:07:00.069,0:07:02.038
There’s a few different axes we can look at
0:07:02.038,0:07:05.007
One of them is how long they take to run
0:07:05.007,0:07:06.090
Now, both RSpec and Cucumber
0:07:06.090,0:07:09.013
have, kind of, high startup times and stuff like that
0:07:09.013,0:07:10.008
But, as you’ll see
0:07:10.008,0:07:11.090
as you start adding more and more RSpec tests
0:07:11.090,0:07:14.038
and using autotest to run them in the background
0:07:14.038,0:07:17.088
by and large, once RSpec kind of gets off the launching pad
0:07:17.088,0:07:19.092
it runs specs really fast
0:07:19.092,0:07:21.095
whereas running Cucumber features just takes a long time
0:07:21.095,0:07:24.059
as it essentially fires up your entire application
0:07:24.059,0:07:26.010
And later in this semester
0:07:26.010,0:07:28.086
we’ll see a way to make Cucumber even slower—
0:07:28.086,0:07:30.070
which is to have it fire up an entire browser
0:07:30.070,0:07:33.045
basically act like a puppet, remote-controlling Firefox
0:07:33.045,0:07:35.083
so you can test Javascript code
0:07:35.083,0:07:37.000
We’ll do that when we actually—
0:07:37.000,0:07:40.032
I think we’ll be able to work with our friends at SourceLabs
0:07:40.032,0:07:42.080
so you can do that in the cloud—That will be exciting
0:07:42.080,0:07:45.083
So, “run fast” versus “run slow”
0:07:45.083,0:07:46.068
Resolution:
0:07:46.068,0:07:48.025
If an error happens in your unit tests
0:07:48.025,0:07:49.075
it’s usually pretty easy
0:07:49.075,0:07:52.029
to figure out and track down what the source of that error is
0:07:52.029,0:07:53.071
because the tests are so isolated
0:07:53.071,0:07:56.025
You’ve stubbed out everything that doesn’t matter
0:07:56.025,0:07:58.025
and you’re focusing on only the behaviour of interest
0:07:58.025,0:07:59.076
So, if you’ve done a good job of doing that
0:07:59.076,0:08:01.097
when something goes wrong in one of your tests
0:08:01.097,0:08:03.045
there’s not a lot of places
0:08:03.045,0:08:04.088
that something could have gone wrong
0:08:04.088,0:08:07.041
In contrast, if you’re running a Cucumber scenario
0:08:07.041,0:08:08.089
that’s got, you know, 10 steps
0:08:08.089,0:08:10.031
and every step is touching
0:08:10.031,0:08:11.061
a whole bunch of pieces of the app
0:08:11.061,0:08:12.091
it could take a long time
0:08:12.091,0:08:14.076
to actually get to the bottom of a bug
0:08:14.076,0:08:16.014
So it is kind of a tradeoff
0:08:16.014,0:08:17.054
between how well can you localize errors
0:08:17.054,0:08:20.065
Coverage:
0:08:20.065,0:08:23.002
It’s possible if you write a good suite
0:08:23.002,0:08:24.072
of unit and functional tests
0:08:24.072,0:08:26.020
you can get really high coverage
0:08:26.020,0:08:27.085
You can run your SimpleCov report
0:08:27.085,0:08:30.080
and you can actually identify specific lines in your files
0:08:30.080,0:08:32.036
that have not been exercised by any test
0:08:32.036,0:08:34.016
and then you can go right at tests that cover them
0:08:34.016,0:08:36.014
So, figuring out how to improve your coverage
0:08:36.014,0:08:37.057
for example at the C0 level
0:08:37.057,0:08:40.021
is something much more easily done with unit tests
0:08:40.021,0:08:42.018
whereas, with a Cucumber test—
0:08:42.018,0:08:43.078
with a Cucumber scenario—
0:08:43.078,0:08:45.076
you are touching a lot of parts of the code
0:08:45.076,0:08:47.080
but you are doing it very sparsely
0:08:47.080,0:08:49.038
So, if your goal is to get your coverage up
0:08:49.038,0:08:51.031
use the tools at that are at the unit levels
0:08:51.031,0:08:53.007
so that you can focusing on understanding
0:08:53.007,0:08:54.074
what parts or my code are undertested
0:08:54.074,0:08:56.055
and then you can write very targeted tests
0:08:56.055,0:08:58.086
just to focus on them
0:08:58.086,0:09:01.043
And, sort of, you know, putting those pieces together
0:09:01.043,0:09:03.039
the unit tests
0:09:03.039,0:09:05.059
because of their isolation and their fine resolution
0:09:05.059,0:09:07.039
tend to use a lot of mocks
0:09:07.039,0:09:09.012
to isolate the behaviours you don’t care about
0:09:09.012,0:09:11.020
But that means that, by definition
0:09:11.020,0:09:12.070
you’re not testing the interfaces
0:09:12.070,0:09:14.099
and it’s sort of a “received wisdom” in software
0:09:14.099,0:09:16.069
that a lot of the interesting bugs
0:09:16.069,0:09:18.076
occur at the interfaces between pieces
0:09:18.076,0:09:20.078
and not sort of within a class or within a method—
0:09:20.078,0:09:22.040
those are sort of the easy bugs to track down
0:09:22.040,0:09:24.026
And at the other extreme
0:09:24.026,0:09:26.081
the more you get towards the integration testing extreme
0:09:26.081,0:09:29.072
you’re supposed to rely less and less on mocks
0:09:29.072,0:09:30.090
for that exact reason
0:09:30.090,0:09:32.066
Now we saw, if you’re testing something like
0:09:32.066,0:09:34.015
say, in a service-oriented architecture
0:09:34.015,0:09:35.089
where you have to interact with the remote site
0:09:35.089,0:09:37.028
you still end up
0:09:37.028,0:09:38.094
having to do a fair amount of mocking and stubbing
0:09:38.094,0:09:40.028
so that you don’t rely on the Internet
0:09:40.028,0:09:41.067
in order for your tests to pass
0:09:41.067,0:09:43.006
but, generally speaking
0:09:43.006,0:09:47.014
you’re trying to remove as many of the mocks that you can
0:09:47.014,0:09:48.095
and let the system run the way it would run in real life
0:09:48.095,0:09:52.070
So, the good news is you are testing the interfaces
0:09:52.070,0:09:54.074
but when something goes wrong in one of the interfaces
0:09:54.074,0:09:57.053
because your resolution is not as good
0:09:57.053,0:10:00.031
it may take longer to figure out what it is
0:10:00.031,0:10:05.019
So, what’s sort of the high-order bit from this tradeoff
0:10:05.019,0:10:07.024
is you don’t really want to rely
0:10:07.024,0:10:08.076
too heavily on any one kind of test
0:10:08.076,0:10:10.078
They serve different purposes and, depending on
0:10:10.078,0:10:13.043
are you trying to exercise your interfaces more
0:10:13.043,0:10:15.089
or are you trying to improve your fine-grain coverage
0:10:15.089,0:10:18.003
that affects how you develop your test suite
0:10:18.003,0:10:20.065
and you’ll evolve it along with your software
0:10:20.065,0:10:24.014
So, we’ve used a certain set of terminology in testing
0:10:24.014,0:10:26.028
It’s the terminology that, by and large
0:10:26.028,0:10:29.001
is most commonly used in the Rails community
0:10:29.001,0:10:30.060
but there’s some variation
0:10:30.060,0:10:33.069
[and] some other terms that you might hear
0:10:33.069,0:10:35.018
if you go get a job somewhere
0:10:35.018,0:10:36.093
and you hear about mutation testing
0:10:36.093,0:10:38.072
which we haven’t done
0:10:38.072,0:10:40.024
This is an interesting idea that was, I think, invented by
0:10:40.024,0:10:43.037
Ammann and Offutt, who have, sort of
0:10:43.037,0:10:44.093
the definitive book on software testing
0:10:44.093,0:10:46.048
The idea is:
0:10:46.048,0:10:48.000
Suppose I introduced a deliberate bug into my code
0:10:48.000,0:10:49.051
does that force some test to fail?
0:10:49.051,0:10:53.003
Because, if I changed, you know, “if x” to “if not x”
0:10:53.003,0:10:56.010
and no tests fail, then either I’m missing some coverage
0:10:56.010,0:10:59.019
or my app is very strange and somehow nondeterministic
0:10:59.019,0:11:03.099
Fuzz testing, which Koushik Sen may talk more about
0:11:03.099,0:11:07.085
basically, this is the “10,000 monkeys at typewriters
0:11:07.085,0:11:09.024
throwing random input at your code”
0:11:09.024,0:11:10.037
What’s interesting about it is that
0:11:10.037,0:11:11.065
those tests we’ve been doing
0:11:11.065,0:11:13.086
essentially are crafted to test the app
0:11:13.086,0:11:15.058
the way it was designed
0:11:15.058,0:11:16.088
and these, you know, fuzz testing
0:11:16.088,0:11:19.064
is about testing the app in ways it wasn’t meant to be used
0:11:19.064,0:11:22.098
So, what happens if you throw enormous form submissions
0:11:22.098,0:11:25.036
What happens if you put control characters in your forms?
0:11:25.036,0:11:27.062
What happens if you submit the same thing over and over?
0:11:27.062,0:11:29.093
And, Koushik has a statistic that
0:11:29.093,0:11:32.033
Microsoft finds up to 20% of their bugs
0:11:32.033,0:11:34.064
using some variation of fuzz testing
0:11:34.064,0:11:36.029
and that about 25%
0:11:36.029,0:11:39.021
of the common Unix command-line programs
0:11:39.021,0:11:40.092
can be made to crash
0:11:40.092,0:11:44.018
[when] put through aggressive fuzz testing
0:11:44.018,0:11:46.089
Defining-use coverage is something that we haven’t done
0:11:46.089,0:11:48.089
but it’s another interesting concept
0:11:48.089,0:11:50.089
The idea is that at any point in my program
0:11:50.089,0:11:52.062
there’s a place where I define—
0:11:52.062,0:11:54.046
or I assign a value to some variable—
0:11:54.046,0:11:56.000
and then there’s a place downstream
0:11:56.000,0:11:57.075
where presumably I’m going to consume that value—
0:11:57.075,0:11:59.058
someone’s going to use that value
0:11:59.058,0:12:01.013
Have I covered every pair?
0:12:01.013,0:12:02.059
In other words, do I have tests where every pair
0:12:02.059,0:12:04.054
of defining a variable and using it somewhere
0:12:04.054,0:12:07.014
is executed at some part of my test suites
0:12:07.014,0:12:10.071
It’s sometimes called DU-coverage
0:12:10.071,0:12:14.011
And other terms that I think are not as widely used anymore
0:12:14.011,0:12:17.071
blackbox versus whitebox, or blackbox versus glassbox
0:12:17.071,0:12:20.025
Roughly, a blackbox test is one that is written from
0:12:20.025,0:12:22.041
the point of view of the external specification of the thing
0:12:22.041,0:12:24.022
[For example:] “This is a hash table
0:12:24.022,0:12:26.015
When I put in a key I should get back a value
0:12:26.015,0:12:28.011
If I delete the key the value shouldn’t be there”
0:12:28.011,0:12:29.099
That’s a blackbox test because it doesn’t say
0:12:29.099,0:12:32.028
anything about how the hash table is implemented
0:12:32.028,0:12:34.072
and it doesn’t try to stress the implementation
0:12:34.072,0:12:36.056
A corresponding whitebox test might be:
0:12:36.056,0:12:38.008
“I know something about the hash function
0:12:38.008,0:12:39.098
and I’m going to deliberately create
0:12:39.098,0:12:41.088
hash keys in my test cases
0:12:41.088,0:12:43.078
that cause a lot of hash collisions
0:12:43.078,0:12:45.095
to make sure that I’m testing that part of the functionality”
0:12:45.095,0:12:49.007
Now, a C0 test coverage tool, like SimpleCov
0:12:49.007,0:12:52.001
would reveal that if all you had is blackbox tests
0:12:52.001,0:12:53.028
you might find that
0:12:53.028,0:12:55.056
the collision coverage code wasn’t being hit very often
0:12:55.056,0:12:56.075
And that might tip you off and say:
0:12:56.075,0:12:58.028
“Ok, if I really want to strengthen that—
0:12:58.028,0:13:00.008
for one, if I want to boost coverage for those tests
0:13:00.008,0:13:02.006
now I have to write a whitebox or a glassbox test
0:13:02.006,0:13:04.057
I have to look inside, see what the implementation does
0:13:04.057,0:13:05.061
and find specific ways
0:13:05.061,0:13:10.060
to try to break the implementation in evil ways”
0:13:10.060,0:13:13.075
So, I think, testing is a kind of a way of life, right?
0:13:13.075,0:13:16.069
We’ve gotten away from the phase of
0:13:16.069,0:13:18.033
“We’d build the whole thing and then we’d test it”
0:13:18.033,0:13:19.092
and we’ve gotten into the phase of
0:13:19.092,0:13:20.077
“We’re testing as we go”
0:13:20.077,0:13:22.048
Testing is really more like a development tool
0:13:22.048,0:13:24.022
and like so many development tools
0:13:24.022,0:13:25.062
the effectiveness of it depends
0:13:25.062,0:13:27.013
on whether you’re using it in a tasteful manner
0:13:27.013,0:13:31.002
So, you could say: “Well, let’s see—I kicked the tires
0:13:31.002,0:13:33.048
You know, I fired up the browser, I tried a couple of things
0:13:33.048,0:13:35.097
(claps hand) Looks like it works! Deploy it!”
0:13:35.097,0:13:38.045
That’s obviously a little more cavalier than you’d want to be
0:13:38.045,0:13:41.024
And, by the way, one of the things that we discovered
0:13:41.024,0:13:43.077
with this online course just starting up
0:13:43.077,0:13:45.090
when 60,000 people are enrolled in the course
0:13:45.090,0:13:48.099
and 0.1% of those people have a problem
0:13:48.099,0:13:50.083
you’d get 60 emails
0:13:50.083,0:13:53.078
The corollary is: when your site is used by a lot of people
0:13:53.078,0:13:55.089
some stupid bug that you didn’t find
0:13:55.089,0:13:57.018
but that could have found by testing
0:13:57.018,0:13:59.080
could very quickly generate *a lot* of pain
0:13:59.080,0:14:02.023
On the other hand, you don’t want to be dogmatic and say
0:14:02.023,0:14:04.056
“Uh, until we have 100% coverage and every test is green
0:14:04.056,0:14:06.005
we absolutely will not ship”
0:14:06.005,0:14:07.012
That’s not healthy either
0:14:07.012,0:14:08.048
And the test quality
0:14:08.048,0:14:10.057
doesn’t necessarily correlate with the statement
0:14:10.057,0:14:11.064
unless you can say something
0:14:11.064,0:14:12.068
about the quality of your tests
0:14:12.068,0:14:14.029
just because you’ve executed every line
0:14:14.029,0:14:17.010
doesn’t mean that you’ve tested the interesting cases
0:14:17.010,0:14:18.068
So, somewhere in between, you could say
0:14:18.068,0:14:20.014
“Well, we’ll use coverage tools to identify
0:14:20.014,0:14:23.004
undertested or poorly-tested parts of the code
0:14:23.004,0:14:24.073
and we’ll use them as a guideline
0:14:24.073,0:14:27.011
to sort of help improve our overall confidence level”
0:14:27.011,0:14:29.024
But remember, Agile is about embracing change
0:14:29.024,0:14:30.032
and dealing with it
0:14:30.032,0:14:32.002
Part of change is things would change that will cause
0:14:32.002,0:14:33.038
bugs that you didn’t foresee
0:14:33.038,0:14:34.031
and the right reaction is:
0:14:34.031,0:14:36.026
Be comfortable enough for the testing tools
0:14:36.026,0:14:37.064
[so] that you can quickly find those bugs
0:14:37.064,0:14:39.025
Write a test that reproduces that bug
0:14:39.025,0:14:40.062
And then make the test green
0:14:40.062,0:14:41.061
Then you’ll really fix it
0:14:41.061,0:14:43.004
That means, the way that you really fix a bug is
0:14:43.004,0:14:45.049
if you created a test that correctly failed
0:14:45.049,0:14:46.088
to reproduce that bug
0:14:46.088,0:14:48.055
and then you went back and fixed the code
0:14:48.055,0:14:49.057
to make those tests pass
0:14:49.057,0:14:51.073
Similarly, you don’t want to say
0:14:51.073,0:14:53.036
“Well, unit tests give you better coverage
0:14:53.036,0:14:54.073
They’re more thorough and detailed
0:14:54.073,0:14:56.044
So let’s focus all our energy on that”
0:14:56.044,0:14:57.062
as opposed to
0:14:57.062,0:14:58.093
“Oh, focus on integration tests
0:14:58.093,0:15:00.006
because they’re more realistic, right?
0:15:00.006,0:15:01.056
They reflect what the customer said they want
0:15:01.056,0:15:03.034
So, if the integration tests are passing
0:15:03.034,0:15:05.067
by definition we’re meeting a customer need”
0:15:05.067,0:15:07.034
Again, both extremes are kind of unhealthy
0:15:07.034,0:15:09.079
because each one of these can find problems
0:15:09.079,0:15:11.031
that would be missed by the other
0:15:11.031,0:15:12.060
So, having a good combination of them
0:15:12.060,0:15:15.042
is kind of all it is all about
0:15:15.042,0:15:18.072
The last thing I want to leave you with is, I think
0:15:18.072,0:15:20.036
in terms of testing, is “TDD versus
0:15:20.036,0:15:22.005
what I call conventional debugging—
0:15:22.005,0:15:24.004
i.e., the way that we all kind of do it
0:15:24.004,0:15:25.051
even though we say we don’t”
0:15:25.051,0:15:26.064
and we’re all trying to get better, right?
0:15:26.064,0:15:27.085
We’re all kind of in the gutter
0:15:27.085,0:15:29.036
Some of us are looking up at the stars
0:15:29.036,0:15:31.011
trying to improve our practices
0:15:31.011,0:15:33.099
But, having now lived with this for 3 or 4 years myself
0:15:33.099,0:15:35.091
and—I’ll be honest—3 years ago I didn’t do TDD
0:15:35.091,0:15:37.079
I do it now, because I find that it’s better
0:15:37.079,0:15:40.081
and here’s my distillation of why I think it works for me
0:15:40.081,0:15:43.032
Sorry, the colours are a little weird
0:15:43.032,0:15:45.000
but on the left column of the table
0:15:45.000,0:15:46.034
[it] says “Conventional debugging”
0:15:46.034,0:15:47.044
and the right side says “TDD”
0:15:47.044,0:15:49.069
So what’s the way I used to write code?
0:15:49.069,0:15:51.056
Maybe some of you still do this
0:15:51.056,0:15:53.013
I write a whole bunch of lines
0:15:53.013,0:15:54.043
maybe a few tens of lines of code
0:15:54.043,0:15:55.059
I’m sure they’re right—
0:15:55.059,0:15:56.061
I mean, I am a good programmer, right?
0:15:56.061,0:15:57.099
This is not that hard
0:15:57.099,0:15:59.002
I run it – It doesn’t work
0:15:59.002,0:16:01.098
Ok, fire up the debugger – Start putting in printf’s
0:16:01.098,0:16:04.088
If I’d been using TDD what would I do instead?
0:16:04.088,0:16:08.022
Well I’d write a few lines of code, having written a test first
0:16:08.022,0:16:10.071
So as soon as the test goes from red to green
0:16:10.071,0:16:12.064
I know I wrote code that works—
0:16:12.064,0:16:15.013
or at least the parts of the behaviour that I had in mind
0:16:15.013,0:16:16.096
Those parts of the behaviour work, because I had a test
0:16:16.096,0:16:19.056
Ok, back to conventional debugging:
0:16:19.056,0:16:21.073
I’m running my program, trying to find the bugs
0:16:21.073,0:16:23.028
I start putting in printf’s everywhere
0:16:23.028,0:16:24.062
to print out the values of things
0:16:24.062,0:16:25.064
which by the way is a lot fun
0:16:25.064,0:16:26.073
when you’re trying to read them
0:16:26.073,0:16:28.014
out of the 500 lines of log output
0:16:28.014,0:16:29.035
that you’d get in a Rails app
0:16:29.035,0:16:30.087
trying to find your printf’s
0:16:30.087,0:16:32.035
you know, “I know what I’ll do—
0:16:32.035,0:16:34.008
I’ll put in 75 asterisks before and after
0:16:34.008,0:16:36.043
That will make it readable” (laughter)
0:16:36.043,0:16:38.071
Who don’t—Ok, raise your hands if you don’t do this!
0:16:38.071,0:16:40.090
Thank you for your honesty. (laughter) Ok.
0:16:40.090,0:16:43.014
Or— Or I could do the other thing, I could say:
0:16:43.014,0:16:45.030
Instead of printing the value of a variable
0:16:45.030,0:16:47.039
why don’t I write a test that inspects it
0:16:47.039,0:16:48.079
in such an expectation which should
0:16:48.079,0:16:50.090
and I’ll know immediately in bright red letters
0:16:50.090,0:16:53.033
if that expectation wasn’t met
0:16:53.033,0:16:56.005
Ok, I’m back on the conventional debugging side:
0:16:56.005,0:16:58.090
I break out the big guns: I pull out the Ruby debugger
0:16:58.092,0:17:02.044
I set a debug breakpoint, and I now start tweaking and say
0:17:02.044,0:17:04.085
“Oh, let’s see, I have to get past that ‘if’ statement
0:17:04.085,0:17:06.002
so I have to set that thing
0:17:06.002,0:17:07.063
Oh, I have to call that method and so I need to…”
0:17:07.063,0:17:08.065
No!
0:17:08.065,0:17:10.087
I could instead—if I’m going to do that anyway—
0:17:10.087,0:17:13.000
let’s just do it in a file, set up some mocks and stubs
0:17:13.000,0:17:16.045
to control the code path, make it go the way I want
0:17:16.045,0:17:19.013
And now, “Ok, for sure I’ve fixed it!
0:17:19.013,0:17:22.012
I’ll get out of the debugger, run it all again!”
0:17:22.012,0:17:24.022
And, of course, 9 times out of 10, you didn’t fix it
0:17:24.022,0:17:26.072
or you kind of partly fixed it but you didn’t completely fix it
0:17:26.072,0:17:30.040
and now I have to do all these manual things all over again
0:17:30.040,0:17:32.086
or I already have a bunch of tests
0:17:32.086,0:17:34.031
and I can just rerun them automatically
0:17:34.031,0:17:35.056
and I could, if some of them fail
0:17:35.056,0:17:36.087
“Oh, I didn’t fix the whole thing
0:17:36.087,0:17:38.040
No problem, I’ll just go back!”
0:17:38.040,0:17:39.096
So, the bottom line is that
0:17:39.096,0:17:41.095
you know, you could do it on the left side
0:17:41.095,0:17:45.004
but you’re using the same techniques in both cases
0:17:45.004,0:17:48.062
The only difference is, in one case you’re doing it manually
0:17:48.062,0:17:50.004
which is boring and error-prone
0:17:50.004,0:17:51.078
In the other case you’re doing a little more work
0:17:51.078,0:17:53.095
but you can make it automatic and repeatable
0:17:53.095,0:17:55.071
and have, you know, some high confidence
0:17:55.071,0:17:57.003
that as you change things in your code
0:17:57.003,0:17:58.092
you are not breaking stuff that used to work
0:17:58.092,0:18:00.091
and basically it’s more productive
0:18:00.091,0:18:02.047
So you’re doing all the same things
0:18:02.047,0:18:04.037
but with a, kind of, “delta” extra work
0:18:04.037,0:18:07.086
you are using your effort at a much higher leverage
0:18:07.086,0:18:10.036
So that’s kind of my view of why TDD is a good thing
0:18:10.036,0:18:11.088
It’s really, it doesn’t require new skills
0:18:11.088,0:18:15.011
It just requires [you] to refactor your existing skills
0:18:15.011,0:18:18.014
I also tried when I—again, honest confessions, right?—
0:18:18.014,0:18:19.034
when I started doing this it was like
0:18:19.034,0:18:21.049
“Ok, I gonna be teaching a course on Rails
0:18:21.049,0:18:22.065
I should really focus on testing
0:18:22.065,0:18:24.032
So I went back to some code I had written
0:18:24.032,0:18:26.087
that was working—you know, that was decent code—
0:18:26.087,0:18:29.006
and I started trying to write tests for it
0:18:29.006,0:18:31.019
and it was *so painful*
0:18:31.019,0:18:33.033
because the code wasn’t written in way that was testable
0:18:33.033,0:18:34.097
There were all kinds of interactions
0:18:34.097,0:18:36.038
There were, like, nested conditionals
0:18:36.038,0:18:38.083
And if you wanted to isolate a particular statement
0:18:38.083,0:18:41.070
and have it test—to trigger test—just that statement
0:18:41.070,0:18:44.000
the amount of stuff you’d have to set up in your test
0:18:44.000,0:18:45.009
to have it happen—
0:18:45.009,0:18:46.040
remember when talked about mock train wrecks—
0:18:46.040,0:18:48.014
you have to set up all this infrastructure
0:18:48.014,0:18:49.063
just to get one line of code
0:18:49.063,0:18:51.015
and you do that and you go
0:18:51.015,0:18:52.074
“Gawd, testing is really not worth it!
0:18:52.074,0:18:54.034
I wrote 20 lines of setup
0:18:54.034,0:18:56.059
so that I could test two lines in my function!”
0:18:56.059,0:18:58.085
What that’s really telling you—as I now realize—
0:18:58.085,0:19:00.042
is your function is bad
0:19:00.042,0:19:01.049
It’s a badly written function
0:19:01.049,0:19:02.052
It’s not a testable function
0:19:02.052,0:19:03.088
It’s got too many moving parts
0:19:03.088,0:19:06.026
whose dependencies can be broken
0:19:06.026,0:19:07.070
There’s no seams in my function
0:19:07.070,0:19:11.008
that allow me to individually test the different behaviours
0:19:11.008,0:19:12.083
And once you start doing Test First Development
0:19:12.083,0:19:15.043
because you have to write your tests in small chunks
0:19:15.043,0:19:17.053
it kind of make this problem go away
0:19:17.053,9:59:59.000
So that’s been my epiphany