WEBVTT
00:00:00.057 --> 00:00:01.092
So we spent a bunch of time
00:00:01.092 --> 00:00:03.032
in the last couple of lectures
00:00:03.032 --> 00:00:05.082
talking about different kinds of testing
00:00:05.082 --> 00:00:08.021
about unit testing versus integration testing
00:00:08.021 --> 00:00:10.010
We talked about how do you use RSpec
00:00:10.010 --> 00:00:12.049
to really isolate the parts of your code you want to test
00:00:12.049 --> 00:00:14.090
you’ve also, you know, because of homework 3,
00:00:14.090 --> 00:00:18.017
and other stuff, we have been doing BDD,
00:00:18.017 --> 00:00:20.062
where we’ve been using Cucumber to turn user stories
00:00:20.062 --> 00:00:22.095
into, essentially, integration and acceptance tests
00:00:22.095 --> 00:00:25.061
So you’ve seen testing in a couple of different levels
00:00:25.061 --> 00:00:27.063
and the goal here is sort of to do a few remarks
00:00:27.063 --> 00:00:29.092
to, you know, let’s back up a little bit
00:00:29.092 --> 00:00:33.001
and see the big picture, and tie those things together
00:00:33.001 --> 00:00:34.095
So this sort of spans material
00:00:34.095 --> 00:00:37.000
that covers three or four sections in the book
00:00:37.000 --> 00:00:39.061
and I want to just hit the high points in lecture
00:00:39.061 --> 00:00:41.046
So a question that comes up
00:00:41.046 --> 00:00:43.025
I’m sure it’s come up for all of you
00:00:43.025 --> 00:00:44.052
as you have been doing homework
00:00:44.052 --> 00:00:45.069
is: “How much testing is enough?”
00:00:45.069 --> 00:00:48.049
And, sadly, for a long time
00:00:48.049 --> 00:00:51.009
kind of if you asked this question in industry
00:00:51.009 --> 00:00:52.017
the answer was basically
00:00:52.017 --> 00:00:53.017
“Well, we have a shipping deadline,
00:00:53.017 --> 00:00:54.099
so however much testing we can do
00:00:54.099 --> 00:00:56.066
before that deadline, that’s how much.”
00:00:56.066 --> 00:00:58.015
That’s what you have time for.
00:00:58.015 --> 00:01:00.002
So, you know, that’s a little flip
00:01:00.002 --> 00:01:01.011
obviously not very good
00:01:01.011 --> 00:01:02.054
So you can do a bit better, right?
00:01:02.054 --> 00:01:03.070
There’re some static measures
00:01:03.070 --> 00:01:06.003
like how many lines of code does your app have
00:01:06.003 --> 00:01:08.021
and how many lines of tests do you have?
00:01:08.021 --> 00:01:10.029
And it’s not unusual in industry
00:01:10.029 --> 00:01:12.068
in a well-tested piece of software
00:01:12.068 --> 00:01:14.057
for the number of lines of tests
00:01:14.057 --> 00:01:17.073
to go far beyond the number of lines of code
00:01:17.073 --> 00:01:19.075
So, integer multiples are not unusual
00:01:19.075 --> 00:01:21.084
And I think even for sort of, you know,
00:01:21.084 --> 00:01:23.022
research code or classwork
00:01:23.022 --> 00:01:26.085
a ratio of, you know, maybe 1.5 is not unreasonable
00:01:26.085 --> 00:01:30.005
so one and a half times the amount of test code
00:01:30.005 --> 00:01:32.024
as you have application code
00:01:32.024 --> 00:01:34.022
And in a lot of production systems
00:01:34.022 --> 00:01:35.027
where they really care about testing
00:01:35.027 --> 00:01:36.091
it is much higher than that
00:01:36.091 --> 00:01:38.015
So maybe a better question to ask:
00:01:38.015 --> 00:01:39.047
Rather than saying “How much testing is enough?”
00:01:39.047 --> 00:01:42.049
is to ask “How good is the testing I am doing now?
00:01:42.049 --> 00:01:44.035
How thorough is it?”
00:01:44.035 --> 00:01:45.056
Later in this semester
00:01:45.056 --> 00:01:46.056
Professor Sen will talk about
00:01:46.056 --> 00:01:48.018
a little bit about formal methods
00:01:48.018 --> 00:01:50.085
and sort of what’s at the frontiers of testing and debugging
00:01:50.085 --> 00:01:52.068
But a couple of things that we can talk about
00:01:52.068 --> 00:01:54.007
based on what you already know
00:01:54.007 --> 00:01:57.074
is some basic concepts about test coverage
00:01:57.074 --> 00:01:59.054
And although I would say
00:01:59.054 --> 00:02:01.001
you know, we’ve been saying all along
00:02:01.001 --> 00:02:03.003
formal methods, they don’t really work on big systems
00:02:03.003 --> 00:02:05.033
I think that statement, in my personal opinion
00:02:05.033 --> 00:02:07.001
is actually a lot less true than it used to be
00:02:07.001 --> 00:02:09.019
I think there are a number of specific places
00:02:09.019 --> 00:02:10.052
especially in testing and debugging
00:02:10.052 --> 00:02:12.084
where formal methods are actually making fast progress
00:02:12.084 --> 00:02:15.075
and Koushik Sen is one of the leaders in that
00:02:15.075 --> 00:02:17.094
So you’ll have the opportunity to hear more about that later
00:02:17.094 --> 00:02:21.043
but for the moment I think, kind of bread and butter
00:02:21.043 --> 00:02:22.073
is let’s talk about coverage measurement
00:02:22.073 --> 00:02:24.047
because this is where the rubber meets the road
00:02:24.047 --> 00:02:26.020
in terms of how you’d be evaluated
00:02:26.020 --> 00:02:28.063
if you are doing this for real
00:02:28.063 --> 00:02:29.052
So what’s some basics?
00:02:29.052 --> 00:02:30.078
Here’s a really simple class you can use
00:02:30.078 --> 00:02:32.090
to talk about different ways to measure
00:02:32.090 --> 00:02:34.080
how our test covers this code
00:02:34.080 --> 00:02:36.063
And there’re a few different levels
00:02:36.063 --> 00:02:37.085
with different terminologies
00:02:37.085 --> 00:02:40.073
It’s not really universal across all software houses
00:02:40.073 --> 00:02:42.064
But one common set of terminology
00:02:42.064 --> 00:02:43.064
that the book exposes
00:02:43.064 --> 00:02:44.068
is we could talk about S0
00:02:44.068 --> 00:02:47.045
where we’d just mean you’ve called every method once
00:02:47.045 --> 00:02:50.045
So you know, if you call foo, and you call bar, you’re done
00:02:50.045 --> 00:02:52.015
That’s S0 coverage: not terribly thorough
00:02:52.015 --> 00:02:54.068
A little more stringent, S1, is
00:02:54.068 --> 00:02:56.013
you could say, we’re calling every method
00:02:56.013 --> 00:02:57.028
from every place that it could be called
00:02:57.028 --> 00:02:58.082
So what does that mean?
00:02:58.082 --> 00:03:00.007
It means, for example
00:03:00.007 --> 00:03:01.012
it’s not enough to call bar
00:03:01.012 --> 00:03:02.095
You have to make sure that you have to call it
00:03:02.095 --> 00:03:05.057
at least once from in here
00:03:05.057 --> 00:03:07.016
as well as calling it once
00:03:07.016 --> 00:03:10.037
from any exterior function that might call it
00:03:10.037 --> 00:03:12.081
C0 which is what SimpleCov measures
00:03:12.081 --> 00:03:15.099
(those of you who’ve gotten SimpleCov up and running)
00:03:15.099 --> 00:03:18.052
basically says you’ve executed every statement
00:03:18.052 --> 00:03:20.004
you’ve touched every statement in your code once
00:03:20.004 --> 00:03:22.048
But the caveat there is that
00:03:22.048 --> 00:03:25.058
conditionals really just count as a single statement
00:03:25.058 --> 00:03:28.091
So, if you, no matter which branch of this “if” you took
00:03:28.091 --> 00:03:31.074
as long as you touched one of the other branch
00:03:31.074 --> 00:03:33.035
you’ve executed the “if’ statement
00:03:33.035 --> 00:03:35.066
So even C0 is still, you know, sort of superficial coverage
00:03:35.066 --> 00:03:37.026
But, as we will see
00:03:37.026 --> 00:03:39.023
the way that you will want to read this information is:
00:03:39.023 --> 00:03:41.079
if you are getting bad coverage at the C0 level
00:03:41.079 --> 00:03:44.007
then you have really really bad coverage
00:03:44.007 --> 00:03:46.008
So if you are not kind of making
00:03:46.008 --> 00:03:47.037
this simple level of superficial coverage
00:03:47.037 --> 00:03:50.002
then your testing is probably deficient
00:03:50.002 --> 00:03:51.091
C1 is the next step up from that
00:03:51.091 --> 00:03:53.071
We could say:
00:03:53.071 --> 00:03:55.019
Well, we have to take every branch in both directions
00:03:55.019 --> 00:03:56.061
So, when we are doing this “if” statement
00:03:56.061 --> 00:03:58.066
we have to make sure that
00:03:58.066 --> 00:03:59.092
we do the “if x” part once
00:03:59.092 --> 00:04:05.013
and the “if not x” part at least once to meet C1
00:04:05.013 --> 00:04:08.036
You can augment that with decision coverage
00:04:08.036 --> 00:04:09.063
saying: Well, if we’re gonna…
00:04:09.063 --> 00:04:12.036
If we have “if” statments where the condition
00:04:12.036 --> 00:04:13.088
is made up of multiple terms
00:04:13.088 --> 00:04:15.071
we have to make sure that every subexpression
00:04:15.071 --> 00:04:17.097
has been evaluated both directions
00:04:17.097 --> 00:04:19.067
In other words, that means that
00:04:19.067 --> 00:04:22.041
if we’re going to fail this “if” statement
00:04:22.041 --> 00:04:24.034
we have to make sure to fail it at least once
00:04:24.034 --> 00:04:26.044
because y was false in at least once because z was false
00:04:26.044 --> 00:04:28.088
In other words, any subexpression that could
00:04:28.088 --> 00:04:31.021
independently change the outcome of the condition
00:04:31.021 --> 00:04:34.048
has to be exercised in both directions
00:04:34.048 --> 00:04:36.003
And then,
00:04:36.003 --> 00:04:38.052
kind of, the one that, you know, a lot of people aspire to
00:04:38.052 --> 00:04:41.026
but there is disagreement on how much more valuable it is
00:04:41.026 --> 00:04:42.083
is you take every path through the code
00:04:42.083 --> 00:04:45.053
Obviously, this is kind of difficult because
00:04:45.053 --> 00:04:48.033
it tends to be exponential in the number of conditions
00:04:48.033 --> 00:04:53.008
And in general it’s difficult
00:04:53.008 --> 00:04:55.031
to evaluate if you’ve taken every path through the code
00:04:55.031 --> 00:04:57.001
There are formal techniques that you can use
00:04:57.001 --> 00:04:58.083
to tell you where the holes are
00:04:58.083 --> 00:05:01.031
but the bottom line is that
00:05:01.031 --> 00:05:03.004
in most commercial software houses
00:05:03.004 --> 00:05:04.089
there is, I would say, not complete consensus
00:05:04.089 --> 00:05:06.070
on how much more valuable C2 is
00:05:06.070 --> 00:05:08.068
compared to C0 or C1
00:05:08.068 --> 00:05:10.013
So, I think, for the purpose of our class
00:05:10.013 --> 00:05:11.067
you get exposed to the idea
00:05:11.067 --> 00:05:13.020
of how you use coverage information
00:05:13.020 --> 00:05:16.040
SimpleCov takes advantage of some built-in Ruby features
00:05:16.040 --> 00:05:18.009
to give you C0 coverage
00:05:18.009 --> 00:05:19.062
[It] does really nice reports
00:05:19.062 --> 00:05:21.025
We can sort of see it
00:05:21.025 --> 00:05:22.096
at the level of individual lines in your file
00:05:22.096 --> 00:05:24.091
You can see what your coverage is
00:05:24.091 --> 00:05:27.015
and I think that’s kind of a, you know
00:05:27.015 --> 00:05:31.018
a good start for where we are
00:05:31.018 --> 00:05:33.076
So, having see a sort of different flavours of tests
00:05:33.076 --> 00:05:37.020
Stepping back and looking back at the big picture
00:05:37.020 --> 00:05:38.098
what are the different kind of tests
00:05:38.098 --> 00:05:40.078
that we’ve seen concretely?
00:05:40.078 --> 00:05:42.032
and what are the tradeoffs
00:05:42.032 --> 00:05:43.089
between using those different kinds of tests?
00:05:43.089 --> 00:05:47.016
So we’ve seen at the level of individual classes or methods
00:05:47.016 --> 00:05:50.009
we use RSpec, with extensive use of mocking and stubbing
00:05:50.009 --> 00:05:53.004
So, for example when we do testing methods in the model
00:05:53.004 --> 00:05:55.057
that will be an example of unit testing
00:05:55.057 --> 00:05:59.025
We also did something that is pretty similar to
00:05:59.025 --> 00:06:00.097
functional or module testing
00:06:00.097 --> 00:06:02.071
where there is more than one module participating
00:06:02.071 --> 00:06:04.065
So, for example when we did controller specs
00:06:04.065 --> 00:06:07.085
we saw that—we simulate a POST action
00:06:07.085 --> 00:06:09.029
but remember that the POST action
00:06:09.029 --> 00:06:10.086
has to go through the routing subsystem
00:06:10.086 --> 00:06:12.042
before it gets to the controller
00:06:12.042 --> 00:06:14.048
Once the controller is done it will try to render a view
00:06:14.048 --> 00:06:16.007
So in fact there’s other pieces
00:06:16.007 --> 00:06:17.067
that collaborate with the controller
00:06:17.067 --> 00:06:19.099
that have to be working in order for controller specs to pass
00:06:19.099 --> 00:06:21.051
So that’s somewhere inbetween:
00:06:21.051 --> 00:06:23.035
where we’re doing more than a single method
00:06:23.035 --> 00:06:25.000
touching more than a single class
00:06:25.000 --> 00:06:27.000
but we’re still concentrating [our] attention
00:06:27.000 --> 00:06:28.088
on a fairly narrow slice of the system at a time
00:06:28.088 --> 00:06:31.044
and we’re still using mocking and stubbing extensively
00:06:31.044 --> 00:06:35.030
to sort of isolate that behaviour that we want to test
00:06:35.030 --> 00:06:36.091
And then at the level of Cucumber scenarios
00:06:36.091 --> 00:06:38.047
these are more like integration or system tests
00:06:38.047 --> 00:06:41.069
They exercise complete paths throughout the application
00:06:41.069 --> 00:06:43.044
They probably touch a lot of different modules
00:06:43.044 --> 00:06:46.003
They make minimal use of mocks and stubs
00:06:46.003 --> 00:06:48.032
because part of the goal of an integration test
00:06:48.032 --> 00:06:50.099
is exactly to test the interaction between pieces
00:06:50.099 --> 00:06:53.021
So you don’t want to stub or control those interactions
00:06:53.021 --> 00:06:54.080
You actually want to let the system do
00:06:54.080 --> 00:06:56.030
what it would really do
00:06:56.030 --> 00:06:58.025
if this was a scenario happening in production
00:06:58.025 --> 00:07:00.069
So how would we compare these different kinds of tests?
00:07:00.069 --> 00:07:02.038
There’s a few different axes we can look at
00:07:02.038 --> 00:07:05.007
One of them is how long they take to run
00:07:05.007 --> 00:07:06.090
Now, both RSpec and Cucumber
00:07:06.090 --> 00:07:09.013
have, kind of, high startup times and stuff like that
00:07:09.013 --> 00:07:10.008
But, as you’ll see
00:07:10.008 --> 00:07:11.090
as you start adding more and more RSpec tests
00:07:11.090 --> 00:07:14.038
and using autotest to run them in the background
00:07:14.038 --> 00:07:17.088
by and large, once RSpec kind of gets off the launching pad
00:07:17.088 --> 00:07:19.092
it runs specs really fast
00:07:19.092 --> 00:07:21.095
whereas running Cucumber features just takes a long time
00:07:21.095 --> 00:07:24.059
as it essentially fires up your entire application
00:07:24.059 --> 00:07:26.010
And later in this semester
00:07:26.010 --> 00:07:28.086
we’ll see a way to make Cucumber even slower—
00:07:28.086 --> 00:07:30.070
which is to have it fire up an entire browser
00:07:30.070 --> 00:07:33.045
basically act like a puppet, remote-controlling Firefox
00:07:33.045 --> 00:07:35.083
so you can test Javascript code
00:07:35.083 --> 00:07:37.000
We’ll do that when we actually—
00:07:37.000 --> 00:07:40.032
I think we’ll be able to work with our friends at SourceLabs
00:07:40.032 --> 00:07:42.080
so you can do that in the cloud—That will be exciting
00:07:42.080 --> 00:07:45.083
So, “run fast” versus “run slow”
00:07:45.083 --> 00:07:46.068
Resolution:
00:07:46.068 --> 00:07:48.025
If an error happens in your unit tests
00:07:48.025 --> 00:07:49.075
it’s usually pretty easy
00:07:49.075 --> 00:07:52.029
to figure out and track down what the source of that error is
00:07:52.029 --> 00:07:53.071
because the tests are so isolated
00:07:53.071 --> 00:07:56.025
You’ve stubbed out everything that doesn’t matter
00:07:56.025 --> 00:07:58.025
and you’re focusing on only the behaviour of interest
00:07:58.025 --> 00:07:59.076
So, if you’ve done a good job of doing that
00:07:59.076 --> 00:08:01.097
when something goes wrong in one of your tests
00:08:01.097 --> 00:08:03.045
there’s not a lot of places
00:08:03.045 --> 00:08:04.088
that something could have gone wrong
00:08:04.088 --> 00:08:07.041
In contrast, if you’re running a Cucumber scenario
00:08:07.041 --> 00:08:08.089
that’s got, you know, 10 steps
00:08:08.089 --> 00:08:10.031
and every step is touching
00:08:10.031 --> 00:08:11.061
a whole bunch of pieces of the app
00:08:11.061 --> 00:08:12.091
it could take a long time
00:08:12.091 --> 00:08:14.076
to actually get to the bottom of a bug
00:08:14.076 --> 00:08:16.014
So it is kind of a tradeoff
00:08:16.014 --> 00:08:17.054
between how well can you localize errors
00:08:17.054 --> 00:08:20.065
Coverage:
00:08:20.065 --> 00:08:23.002
It’s possible if you write a good suite
00:08:23.002 --> 00:08:24.072
of unit and functional tests
00:08:24.072 --> 00:08:26.020
you can get really high coverage
00:08:26.020 --> 00:08:27.085
You can run your SimpleCov report
00:08:27.085 --> 00:08:30.080
and you can actually identify specific lines in your files
00:08:30.080 --> 00:08:32.036
that have not been exercised by any test
00:08:32.036 --> 00:08:34.016
and then you can go right at tests that cover them
00:08:34.016 --> 00:08:36.014
So, figuring out how to improve your coverage
00:08:36.014 --> 00:08:37.057
for example at the C0 level
00:08:37.057 --> 00:08:40.021
is something much more easily done with unit tests
00:08:40.021 --> 00:08:42.018
whereas, with a Cucumber test—
00:08:42.018 --> 00:08:43.078
with a Cucumber scenario—
00:08:43.078 --> 00:08:45.076
you are touching a lot of parts of the code
00:08:45.076 --> 00:08:47.080
but you are doing it very sparsely
00:08:47.080 --> 00:08:49.038
So, if your goal is to get your coverage up
00:08:49.038 --> 00:08:51.031
use the tools at that are at the unit levels
00:08:51.031 --> 00:08:53.007
so that you can focusing on understanding
00:08:53.007 --> 00:08:54.074
what parts or my code are undertested
00:08:54.074 --> 00:08:56.055
and then you can write very targeted tests
00:08:56.055 --> 00:08:58.086
just to focus on them
00:08:58.086 --> 00:09:01.043
And, sort of, you know, putting those pieces together
00:09:01.043 --> 00:09:03.039
the unit tests
00:09:03.039 --> 00:09:05.059
because of their isolation and their fine resolution
00:09:05.059 --> 00:09:07.039
tend to use a lot of mocks
00:09:07.039 --> 00:09:09.012
to isolate the behaviours you don’t care about
00:09:09.012 --> 00:09:11.020
But that means that, by definition
00:09:11.020 --> 00:09:12.070
you’re not testing the interfaces
00:09:12.070 --> 00:09:14.099
and it’s sort of a “received wisdom” in software
00:09:14.099 --> 00:09:16.069
that a lot of the interesting bugs
00:09:16.069 --> 00:09:18.076
occur at the interfaces between pieces
00:09:18.076 --> 00:09:20.078
and not sort of within a class or within a method—
00:09:20.078 --> 00:09:22.040
those are sort of the easy bugs to track down
00:09:22.040 --> 00:09:24.026
And at the other extreme
00:09:24.026 --> 00:09:26.081
the more you get towards the integration testing extreme
00:09:26.081 --> 00:09:29.072
you’re supposed to rely less and less on mocks
00:09:29.072 --> 00:09:30.090
for that exact reason
00:09:30.090 --> 00:09:32.066
Now we saw, if you’re testing something like
00:09:32.066 --> 00:09:34.015
say, in a service-oriented architecture
00:09:34.015 --> 00:09:35.089
where you have to interact with the remote site
00:09:35.089 --> 00:09:37.028
you still end up
00:09:37.028 --> 00:09:38.094
having to do a fair amount of mocking and stubbing
00:09:38.094 --> 00:09:40.028
so that you don’t rely on the Internet
00:09:40.028 --> 00:09:41.067
in order for your tests to pass
00:09:41.067 --> 00:09:43.006
but, generally speaking
00:09:43.006 --> 00:09:47.014
you’re trying to remove as many of the mocks that you can
00:09:47.014 --> 00:09:48.095
and let the system run the way it would run in real life
00:09:48.095 --> 00:09:52.070
So, the good news is you are testing the interfaces
00:09:52.070 --> 00:09:54.074
but when something goes wrong in one of the interfaces
00:09:54.074 --> 00:09:57.053
because your resolution is not as good
00:09:57.053 --> 00:10:00.031
it may take longer to figure out what it is
00:10:00.031 --> 00:10:05.019
So, what’s sort of the high-order bit from this tradeoff
00:10:05.019 --> 00:10:07.024
is you don’t really want to rely
00:10:07.024 --> 00:10:08.076
too heavily on any one kind of test
00:10:08.076 --> 00:10:10.078
They serve different purposes and, depending on
00:10:10.078 --> 00:10:13.043
are you trying to exercise your interfaces more
00:10:13.043 --> 00:10:15.089
or are you trying to improve your fine-grain coverage
00:10:15.089 --> 00:10:18.003
that affects how you develop your test suite
00:10:18.003 --> 00:10:20.065
and you’ll evolve it along with your software
00:10:20.065 --> 00:10:24.014
So, we’ve used a certain set of terminology in testing
00:10:24.014 --> 00:10:26.028
It’s the terminology that, by and large
00:10:26.028 --> 00:10:29.001
is most commonly used in the Rails community
00:10:29.001 --> 00:10:30.060
but there’s some variation
00:10:30.060 --> 00:10:33.069
[and] some other terms that you might hear
00:10:33.069 --> 00:10:35.018
if you go get a job somewhere
00:10:35.018 --> 00:10:36.093
and you hear about mutation testing
00:10:36.093 --> 00:10:38.072
which we haven’t done
00:10:38.072 --> 00:10:40.024
This is an interesting idea that was, I think, invented by
00:10:40.024 --> 00:10:43.037
Ammann and Offutt, who have, sort of
00:10:43.037 --> 00:10:44.093
the definitive book on software testing
00:10:44.093 --> 00:10:46.048
The idea is:
00:10:46.048 --> 00:10:48.000
Suppose I introduced a deliberate bug into my code
00:10:48.000 --> 00:10:49.051
does that force some test to fail?
00:10:49.051 --> 00:10:53.003
Because, if I changed, you know, “if x” to “if not x”
00:10:53.003 --> 00:10:56.010
and no tests fail, then either I’m missing some coverage
00:10:56.010 --> 00:10:59.019
or my app is very strange and somehow nondeterministic
00:10:59.019 --> 00:11:03.099
Fuzz testing, which Koushik Sen may talk more about
00:11:03.099 --> 00:11:07.085
basically, this is the “10,000 monkeys at typewriters
00:11:07.085 --> 00:11:09.024
throwing random input at your code”
00:11:09.024 --> 00:11:10.037
What’s interesting about it is that
00:11:10.037 --> 00:11:11.065
those tests we’ve been doing
00:11:11.065 --> 00:11:13.086
essentially are crafted to test the app
00:11:13.086 --> 00:11:15.058
the way it was designed
00:11:15.058 --> 00:11:16.088
and these, you know, fuzz testing
00:11:16.088 --> 00:11:19.064
is about testing the app in ways it wasn’t meant to be used
00:11:19.064 --> 00:11:22.098
So, what happens if you throw enormous form submissions
00:11:22.098 --> 00:11:25.036
What happens if you put control characters in your forms?
00:11:25.036 --> 00:11:27.062
What happens if you submit the same thing over and over?
00:11:27.062 --> 00:11:29.093
And, Koushik has a statistic that
00:11:29.093 --> 00:11:32.033
Microsoft finds up to 20% of their bugs
00:11:32.033 --> 00:11:34.064
using some variation of fuzz testing
00:11:34.064 --> 00:11:36.029
and that about 25%
00:11:36.029 --> 00:11:39.021
of the common Unix command-line programs
00:11:39.021 --> 00:11:40.092
can be made to crash
00:11:40.092 --> 00:11:44.018
[when] put through aggressive fuzz testing
00:11:44.018 --> 00:11:46.089
Defining-use coverage is something that we haven’t done
00:11:46.089 --> 00:11:48.089
but it’s another interesting concept
00:11:48.089 --> 00:11:50.089
The idea is that at any point in my program
00:11:50.089 --> 00:11:52.062
there’s a place where I define—
00:11:52.062 --> 00:11:54.046
or I assign a value to some variable—
00:11:54.046 --> 00:11:56.000
and then there’s a place downstream
00:11:56.000 --> 00:11:57.075
where presumably I’m going to consume that value—
00:11:57.075 --> 00:11:59.058
someone’s going to use that value
00:11:59.058 --> 00:12:01.013
Have I covered every pair?
00:12:01.013 --> 00:12:02.059
In other words, do I have tests where every pair
00:12:02.059 --> 00:12:04.054
of defining a variable and using it somewhere
00:12:04.054 --> 00:12:07.014
is executed at some part of my test suites
00:12:07.014 --> 00:12:10.071
It’s sometimes called DU-coverage
00:12:10.071 --> 00:12:14.011
And other terms that I think are not as widely used anymore
00:12:14.011 --> 00:12:17.071
blackbox versus whitebox, or blackbox versus glassbox
00:12:17.071 --> 00:12:20.025
Roughly, a blackbox test is one that is written from
00:12:20.025 --> 00:12:22.041
the point of view of the external specification of the thing
00:12:22.041 --> 00:12:24.022
[For example:] “This is a hash table
00:12:24.022 --> 00:12:26.015
When I put in a key I should get back a value
00:12:26.015 --> 00:12:28.011
If I delete the key the value shouldn’t be there”
00:12:28.011 --> 00:12:29.099
That’s a blackbox test because it doesn’t say
00:12:29.099 --> 00:12:32.028
anything about how the hash table is implemented
00:12:32.028 --> 00:12:34.072
and it doesn’t try to stress the implementation
00:12:34.072 --> 00:12:36.056
A corresponding whitebox test might be:
00:12:36.056 --> 00:12:38.008
“I know something about the hash function
00:12:38.008 --> 00:12:39.098
and I’m going to deliberately create
00:12:39.098 --> 00:12:41.088
hash keys in my test cases
00:12:41.088 --> 00:12:43.078
that cause a lot of hash collisions
00:12:43.078 --> 00:12:45.095
to make sure that I’m testing that part of the functionality”
00:12:45.095 --> 00:12:49.007
Now, a C0 test coverage tool, like SimpleCov
00:12:49.007 --> 00:12:52.001
would reveal that if all you had is blackbox tests
00:12:52.001 --> 00:12:53.028
you might find that
00:12:53.028 --> 00:12:55.056
the collision coverage code wasn’t being hit very often
00:12:55.056 --> 00:12:56.075
And that might tip you off and say:
00:12:56.075 --> 00:12:58.028
“Ok, if I really want to strengthen that—
00:12:58.028 --> 00:13:00.008
for one, if I want to boost coverage for those tests
00:13:00.008 --> 00:13:02.006
now I have to write a whitebox or a glassbox test
00:13:02.006 --> 00:13:04.057
I have to look inside, see what the implementation does
00:13:04.057 --> 00:13:05.061
and find specific ways
00:13:05.061 --> 00:13:10.060
to try to break the implementation in evil ways”
00:13:10.060 --> 00:13:13.075
So, I think, testing is a kind of a way of life, right?
00:13:13.075 --> 00:13:16.069
We’ve gotten away from the phase of
00:13:16.069 --> 00:13:18.033
“We’d build the whole thing and then we’d test it”
00:13:18.033 --> 00:13:19.092
and we’ve gotten into the phase of
00:13:19.092 --> 00:13:20.077
“We’re testing as we go”
00:13:20.077 --> 00:13:22.048
Testing is really more like a development tool
00:13:22.048 --> 00:13:24.022
and like so many development tools
00:13:24.022 --> 00:13:25.062
the effectiveness of it depends
00:13:25.062 --> 00:13:27.013
on whether you’re using it in a tasteful manner
00:13:27.013 --> 00:13:31.002
So, you could say: “Well, let’s see—I kicked the tires
00:13:31.002 --> 00:13:33.048
You know, I fired up the browser, I tried a couple of things
00:13:33.048 --> 00:13:35.097
(claps hand) Looks like it works! Deploy it!”
00:13:35.097 --> 00:13:38.045
That’s obviously a little more cavalier than you’d want to be
00:13:38.045 --> 00:13:41.024
And, by the way, one of the things that we discovered
00:13:41.024 --> 00:13:43.077
with this online course just starting up
00:13:43.077 --> 00:13:45.090
when 60,000 people are enrolled in the course
00:13:45.090 --> 00:13:48.099
and 0.1% of those people have a problem
00:13:48.099 --> 00:13:50.083
you’d get 60 emails
00:13:50.083 --> 00:13:53.078
The corollary is: when your site is used by a lot of people
00:13:53.078 --> 00:13:55.089
some stupid bug that you didn’t find
00:13:55.089 --> 00:13:57.018
but that could have found by testing
00:13:57.018 --> 00:13:59.080
could very quickly generate *a lot* of pain
00:13:59.080 --> 00:14:02.023
On the other hand, you don’t want to be dogmatic and say
00:14:02.023 --> 00:14:04.056
“Uh, until we have 100% coverage and every test is green
00:14:04.056 --> 00:14:06.005
we absolutely will not ship”
00:14:06.005 --> 00:14:07.012
That’s not healthy either
00:14:07.012 --> 00:14:08.048
And the test quality
00:14:08.048 --> 00:14:10.057
doesn’t necessarily correlate with the statement
00:14:10.057 --> 00:14:11.064
unless you can say something
00:14:11.064 --> 00:14:12.068
about the quality of your tests
00:14:12.068 --> 00:14:14.029
just because you’ve executed every line
00:14:14.029 --> 00:14:17.010
doesn’t mean that you’ve tested the interesting cases
00:14:17.010 --> 00:14:18.068
So, somewhere in between, you could say
00:14:18.068 --> 00:14:20.014
“Well, we’ll use coverage tools to identify
00:14:20.014 --> 00:14:23.004
undertested or poorly-tested parts of the code
00:14:23.004 --> 00:14:24.073
and we’ll use them as a guideline
00:14:24.073 --> 00:14:27.011
to sort of help improve our overall confidence level”
00:14:27.011 --> 00:14:29.024
But remember, Agile is about embracing change
00:14:29.024 --> 00:14:30.032
and dealing with it
00:14:30.032 --> 00:14:32.002
Part of change is things would change that will cause
00:14:32.002 --> 00:14:33.038
bugs that you didn’t foresee
00:14:33.038 --> 00:14:34.031
and the right reaction is:
00:14:34.031 --> 00:14:36.026
Be comfortable enough for the testing tools
00:14:36.026 --> 00:14:37.064
[so] that you can quickly find those bugs
00:14:37.064 --> 00:14:39.025
Write a test that reproduces that bug
00:14:39.025 --> 00:14:40.062
And then make the test green
00:14:40.062 --> 00:14:41.061
Then you’ll really fix it
00:14:41.061 --> 00:14:43.004
That means, the way that you really fix a bug is
00:14:43.004 --> 00:14:45.049
if you created a test that correctly failed
00:14:45.049 --> 00:14:46.088
to reproduce that bug
00:14:46.088 --> 00:14:48.055
and then you went back and fixed the code
00:14:48.055 --> 00:14:49.057
to make those tests pass
00:14:49.057 --> 00:14:51.073
Similarly, you don’t want to say
00:14:51.073 --> 00:14:53.036
“Well, unit tests give you better coverage
00:14:53.036 --> 00:14:54.073
They’re more thorough and detailed
00:14:54.073 --> 00:14:56.044
So let’s focus all our energy on that”
00:14:56.044 --> 00:14:57.062
as opposed to
00:14:57.062 --> 00:14:58.093
“Oh, focus on integration tests
00:14:58.093 --> 00:15:00.006
because they’re more realistic, right?
00:15:00.006 --> 00:15:01.056
They reflect what the customer said they want
00:15:01.056 --> 00:15:03.034
So, if the integration tests are passing
00:15:03.034 --> 00:15:05.067
by definition we’re meeting a customer need”
00:15:05.067 --> 00:15:07.034
Again, both extremes are kind of unhealthy
00:15:07.034 --> 00:15:09.079
because each one of these can find problems
00:15:09.079 --> 00:15:11.031
that would be missed by the other
00:15:11.031 --> 00:15:12.060
So, having a good combination of them
00:15:12.060 --> 00:15:15.042
is kind of all it is all about
00:15:15.042 --> 00:15:18.072
The last thing I want to leave you with is, I think
00:15:18.072 --> 00:15:20.036
in terms of testing, is “TDD versus
00:15:20.036 --> 00:15:22.005
what I call conventional debugging—
00:15:22.005 --> 00:15:24.004
i.e., the way that we all kind of do it
00:15:24.004 --> 00:15:25.051
even though we say we don’t”
00:15:25.051 --> 00:15:26.064
and we’re all trying to get better, right?
00:15:26.064 --> 00:15:27.085
We’re all kind of in the gutter
00:15:27.085 --> 00:15:29.036
Some of us are looking up at the stars
00:15:29.036 --> 00:15:31.011
trying to improve our practices
00:15:31.011 --> 00:15:33.099
But, having now lived with this for 3 or 4 years myself
00:15:33.099 --> 00:15:35.091
and—I’ll be honest—3 years ago I didn’t do TDD
00:15:35.091 --> 00:15:37.079
I do it now, because I find that it’s better
00:15:37.079 --> 00:15:40.081
and here’s my distillation of why I think it works for me
00:15:40.081 --> 00:15:43.032
Sorry, the colours are a little weird
00:15:43.032 --> 00:15:45.000
but on the left column of the table
00:15:45.000 --> 00:15:46.034
[it] says “Conventional debugging”
00:15:46.034 --> 00:15:47.044
and the right side says “TDD”
00:15:47.044 --> 00:15:49.069
So what’s the way I used to write code?
00:15:49.069 --> 00:15:51.056
Maybe some of you still do this
00:15:51.056 --> 00:15:53.013
I write a whole bunch of lines
00:15:53.013 --> 00:15:54.043
maybe a few tens of lines of code
00:15:54.043 --> 00:15:55.059
I’m sure they’re right—
00:15:55.059 --> 00:15:56.061
I mean, I am a good programmer, right?
00:15:56.061 --> 00:15:57.099
This is not that hard
00:15:57.099 --> 00:15:59.002
I run it – It doesn’t work
00:15:59.002 --> 00:16:01.098
Ok, fire up the debugger – Start putting in printf’s
00:16:01.098 --> 00:16:04.088
If I’d been using TDD what would I do instead?
00:16:04.088 --> 00:16:08.022
Well I’d write a few lines of code, having written a test first
00:16:08.022 --> 00:16:10.071
So as soon as the test goes from red to green
00:16:10.071 --> 00:16:12.064
I know I wrote code that works—
00:16:12.064 --> 00:16:15.013
or at least the parts of the behaviour that I had in mind
00:16:15.013 --> 00:16:16.096
Those parts of the behaviour work, because I had a test
00:16:16.096 --> 00:16:19.056
Ok, back to conventional debugging:
00:16:19.056 --> 00:16:21.073
I’m running my program, trying to find the bugs
00:16:21.073 --> 00:16:23.028
I start putting in printf’s everywhere
00:16:23.028 --> 00:16:24.062
to print out the values of things
00:16:24.062 --> 00:16:25.064
which by the way is a lot fun
00:16:25.064 --> 00:16:26.073
when you’re trying to read them
00:16:26.073 --> 00:16:28.014
out of the 500 lines of log output
00:16:28.014 --> 00:16:29.035
that you’d get in a Rails app
00:16:29.035 --> 00:16:30.087
trying to find your printf’s
00:16:30.087 --> 00:16:32.035
you know, “I know what I’ll do—
00:16:32.035 --> 00:16:34.008
I’ll put in 75 asterisks before and after
00:16:34.008 --> 00:16:36.043
That will make it readable” (laughter)
00:16:36.043 --> 00:16:38.071
Who don’t—Ok, raise your hands if you don’t do this!
00:16:38.071 --> 00:16:40.090
Thank you for your honesty. (laughter) Ok.
00:16:40.090 --> 00:16:43.014
Or— Or I could do the other thing, I could say:
00:16:43.014 --> 00:16:45.030
Instead of printing the value of a variable
00:16:45.030 --> 00:16:47.039
why don’t I write a test that inspects it
00:16:47.039 --> 00:16:48.079
in such an expectation which should
00:16:48.079 --> 00:16:50.090
and I’ll know immediately in bright red letters
00:16:50.090 --> 00:16:53.033
if that expectation wasn’t met
00:16:53.033 --> 00:16:56.005
Ok, I’m back on the conventional debugging side:
00:16:56.005 --> 00:16:58.090
I break out the big guns: I pull out the Ruby debugger
00:16:58.092 --> 00:17:02.044
I set a debug breakpoint, and I now start tweaking and say
00:17:02.044 --> 00:17:04.085
“Oh, let’s see, I have to get past that ‘if’ statement
00:17:04.085 --> 00:17:06.002
so I have to set that thing
00:17:06.002 --> 00:17:07.063
Oh, I have to call that method and so I need to…”
00:17:07.063 --> 00:17:08.065
No!
00:17:08.065 --> 00:17:10.087
I could instead—if I’m going to do that anyway—
00:17:10.087 --> 00:17:13.000
let’s just do it in a file, set up some mocks and stubs
00:17:13.000 --> 00:17:16.045
to control the code path, make it go the way I want
00:17:16.045 --> 00:17:19.013
And now, “Ok, for sure I’ve fixed it!
00:17:19.013 --> 00:17:22.012
I’ll get out of the debugger, run it all again!”
00:17:22.012 --> 00:17:24.022
And, of course, 9 times out of 10, you didn’t fix it
00:17:24.022 --> 00:17:26.072
or you kind of partly fixed it but you didn’t completely fix it
00:17:26.072 --> 00:17:30.040
and now I have to do all these manual things all over again
00:17:30.040 --> 00:17:32.086
or I already have a bunch of tests
00:17:32.086 --> 00:17:34.031
and I can just rerun them automatically
00:17:34.031 --> 00:17:35.056
and I could, if some of them fail
00:17:35.056 --> 00:17:36.087
“Oh, I didn’t fix the whole thing
00:17:36.087 --> 00:17:38.040
No problem, I’ll just go back!”
00:17:38.040 --> 00:17:39.096
So, the bottom line is that
00:17:39.096 --> 00:17:41.095
you know, you could do it on the left side
00:17:41.095 --> 00:17:45.004
but you’re using the same techniques in both cases
00:17:45.004 --> 00:17:48.062
The only difference is, in one case you’re doing it manually
00:17:48.062 --> 00:17:50.004
which is boring and error-prone
00:17:50.004 --> 00:17:51.078
In the other case you’re doing a little more work
00:17:51.078 --> 00:17:53.095
but you can make it automatic and repeatable
00:17:53.095 --> 00:17:55.071
and have, you know, some high confidence
00:17:55.071 --> 00:17:57.003
that as you change things in your code
00:17:57.003 --> 00:17:58.092
you are not breaking stuff that used to work
00:17:58.092 --> 00:18:00.091
and basically it’s more productive
00:18:00.091 --> 00:18:02.047
So you’re doing all the same things
00:18:02.047 --> 00:18:04.037
but with a, kind of, “delta” extra work
00:18:04.037 --> 00:18:07.086
you are using your effort at a much higher leverage
00:18:07.086 --> 00:18:10.036
So that’s kind of my view of why TDD is a good thing
00:18:10.036 --> 00:18:11.088
It’s really, it doesn’t require new skills
00:18:11.088 --> 00:18:15.011
It just requires [you] to refactor your existing skills
00:18:15.011 --> 00:18:18.014
I also tried when I—again, honest confessions, right?—
00:18:18.014 --> 00:18:19.034
when I started doing this it was like
00:18:19.034 --> 00:18:21.049
“Ok, I gonna be teaching a course on Rails
00:18:21.049 --> 00:18:22.065
I should really focus on testing
00:18:22.065 --> 00:18:24.032
So I went back to some code I had written
00:18:24.032 --> 00:18:26.087
that was working—you know, that was decent code—
00:18:26.087 --> 00:18:29.006
and I started trying to write tests for it
00:18:29.006 --> 00:18:31.019
and it was *so painful*
00:18:31.019 --> 00:18:33.033
because the code wasn’t written in way that was testable
00:18:33.033 --> 00:18:34.097
There were all kinds of interactions
00:18:34.097 --> 00:18:36.038
There were, like, nested conditionals
00:18:36.038 --> 00:18:38.083
And if you wanted to isolate a particular statement
00:18:38.083 --> 00:18:41.070
and have it test—to trigger test—just that statement
00:18:41.070 --> 00:18:44.000
the amount of stuff you’d have to set up in your test
00:18:44.000 --> 00:18:45.009
to have it happen—
00:18:45.009 --> 00:18:46.040
remember when talked about mock train wrecks—
00:18:46.040 --> 00:18:48.014
you have to set up all this infrastructure
00:18:48.014 --> 00:18:49.063
just to get one line of code
00:18:49.063 --> 00:18:51.015
and you do that and you go
00:18:51.015 --> 00:18:52.074
“Gawd, testing is really not worth it!
00:18:52.074 --> 00:18:54.034
I wrote 20 lines of setup
00:18:54.034 --> 00:18:56.059
so that I could test two lines in my function!”
00:18:56.059 --> 00:18:58.085
What that’s really telling you—as I now realize—
00:18:58.085 --> 00:19:00.042
is your function is bad
00:19:00.042 --> 00:19:01.049
It’s a badly written function
00:19:01.049 --> 00:19:02.052
It’s not a testable function
00:19:02.052 --> 00:19:03.088
It’s got too many moving parts
00:19:03.088 --> 00:19:06.026
whose dependencies can be broken
00:19:06.026 --> 00:19:07.070
There’s no seams in my function
00:19:07.070 --> 00:19:11.008
that allow me to individually test the different behaviours
00:19:11.008 --> 00:19:12.083
And once you start doing Test First Development
00:19:12.083 --> 00:19:15.043
because you have to write your tests in small chunks
00:19:15.043 --> 00:19:17.053
it kind of make this problem go away
00:19:17.053 --> 99:59:59.999
So that’s been my epiphany