1
00:00:00,057 --> 00:00:01,092
So we spent a bunch of time

2
00:00:01,092 --> 00:00:03,032
in the last couple of lectures

3
00:00:03,032 --> 00:00:05,082
talking about different kinds of testing

4
00:00:05,082 --> 00:00:08,021
about unit testing versus integration testing

5
00:00:08,021 --> 00:00:10,010
We talked about how do you use RSpec

6
00:00:10,010 --> 00:00:12,049
to really isolate the parts of your code you want to test

7
00:00:12,049 --> 00:00:14,090
you’ve also, you know, because of homework 3,

8
00:00:14,090 --> 00:00:18,017
and other stuff, we have been doing BDD,

9
00:00:18,017 --> 00:00:20,062
where we’ve been using Cucumber to turn user stories

10
00:00:20,062 --> 00:00:22,095
into, essentially, integration and acceptance tests

11
00:00:22,095 --> 00:00:25,061
So you’ve seen testing in a couple of different levels

12
00:00:25,061 --> 00:00:27,063
and the goal here is sort of to do a few remarks

13
00:00:27,063 --> 00:00:29,092
to, you know, let’s back up a little bit

14
00:00:29,092 --> 00:00:33,001
and see the big picture, and tie those things together

15
00:00:33,001 --> 00:00:34,095
So this sort of spans material

16
00:00:34,095 --> 00:00:37,000
that covers three or four sections in the book

17
00:00:37,000 --> 00:00:39,061
and I want to just hit the high points in lecture

18
00:00:39,061 --> 00:00:41,046
So a question that comes up

19
00:00:41,046 --> 00:00:43,025
I’m sure it’s come up for all of you

20
00:00:43,025 --> 00:00:44,052
as you have been doing homework

21
00:00:44,052 --> 00:00:45,069
is: “How much testing is enough?”

22
00:00:45,069 --> 00:00:48,049
And, sadly, for a long time

23
00:00:48,049 --> 00:00:51,009
kind of if you asked this question in industry

24
00:00:51,009 --> 00:00:52,017
the answer was basically

25
00:00:52,017 --> 00:00:53,017
“Well, we have a shipping deadline,

26
00:00:53,017 --> 00:00:54,099
so however much testing we can do

27
00:00:54,099 --> 00:00:56,066
before that deadline, that’s how much.”

28
00:00:56,066 --> 00:00:58,015
That’s what you have time for.

29
00:00:58,015 --> 00:01:00,002
So, you know, that’s a little flip

30
00:01:00,002 --> 00:01:01,011
obviously not very good

31
00:01:01,011 --> 00:01:02,054
So you can do a bit better, right?

32
00:01:02,054 --> 00:01:03,070
There’re some static measures

33
00:01:03,070 --> 00:01:06,003
like how many lines of code does your app have

34
00:01:06,003 --> 00:01:08,021
and how many lines of tests do you have?

35
00:01:08,021 --> 00:01:10,029
And it’s not unusual in industry

36
00:01:10,029 --> 00:01:12,068
in a well-tested piece of software

37
00:01:12,068 --> 00:01:14,057
for the number of lines of tests

38
00:01:14,057 --> 00:01:17,073
to go far beyond the number of lines of code

39
00:01:17,073 --> 00:01:19,075
So, integer multiples are not unusual

40
00:01:19,075 --> 00:01:21,084
And I think even for sort of, you know,

41
00:01:21,084 --> 00:01:23,022
research code or classwork

42
00:01:23,022 --> 00:01:26,085
a ratio of, you know, maybe 1.5 is not unreasonable

43
00:01:26,085 --> 00:01:30,005
so one and a half times the amount of test code

44
00:01:30,005 --> 00:01:32,024
as you have application code

45
00:01:32,024 --> 00:01:34,022
And in a lot of production systems

46
00:01:34,022 --> 00:01:35,027
where they really care about testing

47
00:01:35,027 --> 00:01:36,091
it is much higher than that

48
00:01:36,091 --> 00:01:38,015
So maybe a better question to ask:

49
00:01:38,015 --> 00:01:39,047
Rather than saying “How much testing is enough?”

50
00:01:39,047 --> 00:01:42,049
is to ask “How good is the testing I am doing now?

51
00:01:42,049 --> 00:01:44,035
How thorough is it?”

52
00:01:44,035 --> 00:01:45,056
Later in this semester

53
00:01:45,056 --> 00:01:46,056
Professor Sen will talk about

54
00:01:46,056 --> 00:01:48,018
a little bit about formal methods

55
00:01:48,018 --> 00:01:50,085
and sort of what’s at the frontiers of testing and debugging

56
00:01:50,085 --> 00:01:52,068
But a couple of things that we can talk about

57
00:01:52,068 --> 00:01:54,007
based on what you already know

58
00:01:54,007 --> 00:01:57,074
is some basic concepts about test coverage

59
00:01:57,074 --> 00:01:59,054
And although I would say

60
00:01:59,054 --> 00:02:01,001
you know, we’ve been saying all along

61
00:02:01,001 --> 00:02:03,003
formal methods, they don’t really work on big systems

62
00:02:03,003 --> 00:02:05,033
I think that statement, in my personal opinion

63
00:02:05,033 --> 00:02:07,001
is actually a lot less true than it used to be

64
00:02:07,001 --> 00:02:09,019
I think there are a number of specific places

65
00:02:09,019 --> 00:02:10,052
especially in testing and debugging

66
00:02:10,052 --> 00:02:12,084
where formal methods are actually making fast progress

67
00:02:12,084 --> 00:02:15,075
and Koushik Sen is one of the leaders in that

68
00:02:15,075 --> 00:02:17,094
So you’ll have the opportunity to hear more about that later

69
00:02:17,094 --> 00:02:21,043
but for the moment I think, kind of bread and butter

70
00:02:21,043 --> 00:02:22,073
is let’s talk about coverage measurement

71
00:02:22,073 --> 00:02:24,047
because this is where the rubber meets the road

72
00:02:24,047 --> 00:02:26,020
in terms of how you’d be evaluated

73
00:02:26,020 --> 00:02:28,063
if you are doing this for real

74
00:02:28,063 --> 00:02:29,052
So what’s some basics?

75
00:02:29,052 --> 00:02:30,078
Here’s a really simple class you can use

76
00:02:30,078 --> 00:02:32,090
to talk about different ways to measure

77
00:02:32,090 --> 00:02:34,080
how our test covers this code

78
00:02:34,080 --> 00:02:36,063
And there’re a few different levels

79
00:02:36,063 --> 00:02:37,085
with different terminologies

80
00:02:37,085 --> 00:02:40,073
It’s not really universal across all software houses

81
00:02:40,073 --> 00:02:42,064
But one common set of terminology

82
00:02:42,064 --> 00:02:43,064
that the book exposes

83
00:02:43,064 --> 00:02:44,068
is we could talk about S0

84
00:02:44,068 --> 00:02:47,045
where we’d just mean you’ve called every method once

85
00:02:47,045 --> 00:02:50,045
So you know, if you call foo, and you call bar, you’re done

86
00:02:50,045 --> 00:02:52,015
That’s S0 coverage: not terribly thorough

87
00:02:52,015 --> 00:02:54,068
A little more stringent, S1, is

88
00:02:54,068 --> 00:02:56,013
you could say, we’re calling every method

89
00:02:56,013 --> 00:02:57,028
from every place that it could be called

90
00:02:57,028 --> 00:02:58,082
So what does that mean?

91
00:02:58,082 --> 00:03:00,007
It means, for example

92
00:03:00,007 --> 00:03:01,012
it’s not enough to call bar

93
00:03:01,012 --> 00:03:02,095
You have to make sure that you have to call it

94
00:03:02,095 --> 00:03:05,057
at least once from in here

95
00:03:05,057 --> 00:03:07,016
as well as calling it once

96
00:03:07,016 --> 00:03:10,037
from any exterior function that might call it

97
00:03:10,037 --> 00:03:12,081
C0 which is what SimpleCov measures

98
00:03:12,081 --> 00:03:15,099
(those of you who’ve gotten SimpleCov up and running)

99
00:03:15,099 --> 00:03:18,052
basically says you’ve executed every statement

100
00:03:18,052 --> 00:03:20,004
you’ve touched every statement in your code once

101
00:03:20,004 --> 00:03:22,048
But the caveat there is that

102
00:03:22,048 --> 00:03:25,058
conditionals really just count as a single statement

103
00:03:25,058 --> 00:03:28,091
So, if you, no matter which branch of this “if” you took

104
00:03:28,091 --> 00:03:31,074
as long as you touched one of the other branch

105
00:03:31,074 --> 00:03:33,035
you’ve executed the “if’ statement

106
00:03:33,035 --> 00:03:35,066
So even C0 is still, you know, sort of superficial coverage

107
00:03:35,066 --> 00:03:37,026
But, as we will see

108
00:03:37,026 --> 00:03:39,023
the way that you will want to read this information is:

109
00:03:39,023 --> 00:03:41,079
if you are getting <i>bad</i> coverage at the C0 level

110
00:03:41,079 --> 00:03:44,007
then you have really really bad coverage

111
00:03:44,007 --> 00:03:46,008
So if you are not kind of making

112
00:03:46,008 --> 00:03:47,037
this simple level of superficial coverage

113
00:03:47,037 --> 00:03:50,002
then your testing is probably deficient

114
00:03:50,002 --> 00:03:51,091
C1 is the next step up from that

115
00:03:51,091 --> 00:03:53,071
We could say:

116
00:03:53,071 --> 00:03:55,019
Well, we have to take every branch in both directions

117
00:03:55,019 --> 00:03:56,061
So, when we are doing this “if” statement

118
00:03:56,061 --> 00:03:58,066
we have to make sure that

119
00:03:58,066 --> 00:03:59,092
we do the “if x” part once

120
00:03:59,092 --> 00:04:05,013
and the “if not x” part at least once to meet C1

121
00:04:05,013 --> 00:04:08,036
You can augment that with decision coverage

122
00:04:08,036 --> 00:04:09,063
saying: Well, if we’re gonna…

123
00:04:09,063 --> 00:04:12,036
If we have “if” statments where the condition

124
00:04:12,036 --> 00:04:13,088
is made up of multiple terms

125
00:04:13,088 --> 00:04:15,071
we have to make sure that every subexpression

126
00:04:15,071 --> 00:04:17,097
has been evaluated both directions

127
00:04:17,097 --> 00:04:19,067
In other words, that means that

128
00:04:19,067 --> 00:04:22,041
if we’re going to fail this “if” statement

129
00:04:22,041 --> 00:04:24,034
we have to make sure to fail it at least once

130
00:04:24,034 --> 00:04:26,044
because y was false in at least once because z was false

131
00:04:26,044 --> 00:04:28,088
In other words, any subexpression that could

132
00:04:28,088 --> 00:04:31,021
independently change the outcome of the condition

133
00:04:31,021 --> 00:04:34,048
has to be exercised in both directions

134
00:04:34,048 --> 00:04:36,003
And then,

135
00:04:36,003 --> 00:04:38,052
kind of, the one that, you know, a lot of people aspire to

136
00:04:38,052 --> 00:04:41,026
but there is disagreement on how much more valuable it is

137
00:04:41,026 --> 00:04:42,083
is you take every path through the code

138
00:04:42,083 --> 00:04:45,053
Obviously, this is kind of difficult because

139
00:04:45,053 --> 00:04:48,033
it tends to be exponential in the number of conditions

140
00:04:48,033 --> 00:04:53,008
And in general it’s difficult

141
00:04:53,008 --> 00:04:55,031
to evaluate if you’ve taken every path through the code

142
00:04:55,031 --> 00:04:57,001
There are formal techniques that you can use

143
00:04:57,001 --> 00:04:58,083
to tell you where the holes are

144
00:04:58,083 --> 00:05:01,031
but the bottom line is that

145
00:05:01,031 --> 00:05:03,004
in most commercial software houses

146
00:05:03,004 --> 00:05:04,089
there is, I would say, not complete consensus

147
00:05:04,089 --> 00:05:06,070
on how much more valuable C2 is

148
00:05:06,070 --> 00:05:08,068
compared to C0 or C1

149
00:05:08,068 --> 00:05:10,013
So, I think, for the purpose of our class

150
00:05:10,013 --> 00:05:11,067
you get exposed to the idea

151
00:05:11,067 --> 00:05:13,020
of how you use coverage information

152
00:05:13,020 --> 00:05:16,040
SimpleCov takes advantage of some built-in Ruby features

153
00:05:16,040 --> 00:05:18,009
to give you C0 coverage

154
00:05:18,009 --> 00:05:19,062
[It] does really nice reports

155
00:05:19,062 --> 00:05:21,025
We can sort of see it

156
00:05:21,025 --> 00:05:22,096
at the level of individual lines in your file

157
00:05:22,096 --> 00:05:24,091
You can see what your coverage is

158
00:05:24,091 --> 00:05:27,015
and I think that’s kind of a, you know

159
00:05:27,015 --> 00:05:31,018
a good start for where we are

160
00:05:31,018 --> 00:05:33,076
So, having see a sort of different flavours of tests

161
00:05:33,076 --> 00:05:37,020
Stepping back and looking back at the big picture

162
00:05:37,020 --> 00:05:38,098
what are the different kind of tests

163
00:05:38,098 --> 00:05:40,078
that we’ve seen concretely?

164
00:05:40,078 --> 00:05:42,032
and what are the tradeoffs

165
00:05:42,032 --> 00:05:43,089
between using those different kinds of tests?

166
00:05:43,089 --> 00:05:47,016
So we’ve seen at the level of individual classes or methods

167
00:05:47,016 --> 00:05:50,009
we use RSpec, with extensive use of mocking and stubbing

168
00:05:50,009 --> 00:05:53,004
So, for example when we do testing methods in the model

169
00:05:53,004 --> 00:05:55,057
that will be an example of unit testing

170
00:05:55,057 --> 00:05:59,025
We also did something that is pretty similar to

171
00:05:59,025 --> 00:06:00,097
functional or module testing

172
00:06:00,097 --> 00:06:02,071
where there is more than one module participating

173
00:06:02,071 --> 00:06:04,065
So, for example when we did controller specs

174
00:06:04,065 --> 00:06:07,085
we saw that—we simulate a POST action

175
00:06:07,085 --> 00:06:09,029
but remember that the POST action

176
00:06:09,029 --> 00:06:10,086
has to go through the routing subsystem

177
00:06:10,086 --> 00:06:12,042
before it gets to the controller

178
00:06:12,042 --> 00:06:14,048
Once the controller is done it will try to render a view

179
00:06:14,048 --> 00:06:16,007
So in fact there’s other pieces

180
00:06:16,007 --> 00:06:17,067
that collaborate with the controller

181
00:06:17,067 --> 00:06:19,099
that have to be working in order for controller specs to pass

182
00:06:19,099 --> 00:06:21,051
So that’s somewhere inbetween:

183
00:06:21,051 --> 00:06:23,035
where we’re doing more than a single method

184
00:06:23,035 --> 00:06:25,000
touching more than a single class

185
00:06:25,000 --> 00:06:27,000
but we’re still concentrating [our] attention

186
00:06:27,000 --> 00:06:28,088
on a fairly narrow slice of the system at a time

187
00:06:28,088 --> 00:06:31,044
and we’re still using mocking and stubbing extensively

188
00:06:31,044 --> 00:06:35,030
to sort of isolate that behaviour that we want to test

189
00:06:35,030 --> 00:06:36,091
And then at the level of Cucumber scenarios

190
00:06:36,091 --> 00:06:38,047
these are more like integration or system tests

191
00:06:38,047 --> 00:06:41,069
They exercise complete paths throughout the application

192
00:06:41,069 --> 00:06:43,044
They probably touch a lot of different modules

193
00:06:43,044 --> 00:06:46,003
They make minimal use of mocks and stubs

194
00:06:46,003 --> 00:06:48,032
because part of the goal of an integration test

195
00:06:48,032 --> 00:06:50,099
is exactly to test the interaction between pieces

196
00:06:50,099 --> 00:06:53,021
So you don’t want to stub or control those interactions

197
00:06:53,021 --> 00:06:54,080
You actually want to let the system do

198
00:06:54,080 --> 00:06:56,030
what it would really do

199
00:06:56,030 --> 00:06:58,025
if this was a scenario happening in production

200
00:06:58,025 --> 00:07:00,069
So how would we compare these different kinds of tests?

201
00:07:00,069 --> 00:07:02,038
There’s a few different axes we can look at

202
00:07:02,038 --> 00:07:05,007
One of them is how long they take to run

203
00:07:05,007 --> 00:07:06,090
Now, both RSpec and Cucumber

204
00:07:06,090 --> 00:07:09,013
have, kind of, high startup times and stuff like that

205
00:07:09,013 --> 00:07:10,008
But, as you’ll see

206
00:07:10,008 --> 00:07:11,090
as you start adding more and more RSpec tests

207
00:07:11,090 --> 00:07:14,038
and using autotest to run them in the background

208
00:07:14,038 --> 00:07:17,088
by and large, once RSpec kind of gets off the launching pad

209
00:07:17,088 --> 00:07:19,092
it runs specs really fast

210
00:07:19,092 --> 00:07:21,095
whereas running Cucumber features just takes a long time

211
00:07:21,095 --> 00:07:24,059
as it essentially fires up your entire application

212
00:07:24,059 --> 00:07:26,010
And later in this semester

213
00:07:26,010 --> 00:07:28,086
we’ll see a way to make Cucumber even slower—

214
00:07:28,086 --> 00:07:30,070
which is to have it fire up an entire browser

215
00:07:30,070 --> 00:07:33,045
basically act like a puppet, remote-controlling Firefox

216
00:07:33,045 --> 00:07:35,083
so you can test Javascript code

217
00:07:35,083 --> 00:07:37,000
We’ll do that when we actually—

218
00:07:37,000 --> 00:07:40,032
I think we’ll be able to work with our friends at SourceLabs

219
00:07:40,032 --> 00:07:42,080
so you can do that in the cloud—That will be exciting

220
00:07:42,080 --> 00:07:45,083
So, “run fast” versus “run slow”

221
00:07:45,083 --> 00:07:46,068
Resolution:

222
00:07:46,068 --> 00:07:48,025
If an error happens in your unit tests

223
00:07:48,025 --> 00:07:49,075
it’s usually pretty easy

224
00:07:49,075 --> 00:07:52,029
to figure out and track down what the source of that error is

225
00:07:52,029 --> 00:07:53,071
because the tests are so isolated

226
00:07:53,071 --> 00:07:56,025
You’ve stubbed out everything that doesn’t matter

227
00:07:56,025 --> 00:07:58,025
and you’re focusing on only the behaviour of interest

228
00:07:58,025 --> 00:07:59,076
So, if you’ve done a good job of doing that

229
00:07:59,076 --> 00:08:01,097
when something goes wrong in one of your tests

230
00:08:01,097 --> 00:08:03,045
there’s not a lot of places

231
00:08:03,045 --> 00:08:04,088
that something could have gone wrong

232
00:08:04,088 --> 00:08:07,041
In contrast, if you’re running a Cucumber scenario

233
00:08:07,041 --> 00:08:08,089
that’s got, you know, 10 steps

234
00:08:08,089 --> 00:08:10,031
and every step is touching

235
00:08:10,031 --> 00:08:11,061
a whole bunch of pieces of the app

236
00:08:11,061 --> 00:08:12,091
it could take a long time

237
00:08:12,091 --> 00:08:14,076
to actually get to the bottom of a bug

238
00:08:14,076 --> 00:08:16,014
So it is kind of a tradeoff

239
00:08:16,014 --> 00:08:17,054
between how well can you localize errors

240
00:08:17,054 --> 00:08:20,065
Coverage:

241
00:08:20,065 --> 00:08:23,002
It’s possible if you write a good suite

242
00:08:23,002 --> 00:08:24,072
of unit and functional tests

243
00:08:24,072 --> 00:08:26,020
you can get really high coverage

244
00:08:26,020 --> 00:08:27,085
You can run your SimpleCov report

245
00:08:27,085 --> 00:08:30,080
and you can actually identify specific lines in your files

246
00:08:30,080 --> 00:08:32,036
that have not been exercised by any test

247
00:08:32,036 --> 00:08:34,016
and then you can go right at tests that cover them

248
00:08:34,016 --> 00:08:36,014
So, figuring out how to improve your coverage

249
00:08:36,014 --> 00:08:37,057
for example at the C0 level

250
00:08:37,057 --> 00:08:40,021
is something much more easily done with unit tests

251
00:08:40,021 --> 00:08:42,018
whereas, with a Cucumber test—

252
00:08:42,018 --> 00:08:43,078
with a Cucumber scenario—

253
00:08:43,078 --> 00:08:45,076
you <i>are</i> touching a lot of parts of the code

254
00:08:45,076 --> 00:08:47,080
but you are doing it very sparsely

255
00:08:47,080 --> 00:08:49,038
So, if your goal is to get your coverage up

256
00:08:49,038 --> 00:08:51,031
use the tools at that are at the unit levels

257
00:08:51,031 --> 00:08:53,007
so that you can focusing on understanding

258
00:08:53,007 --> 00:08:54,074
what parts or my code are undertested

259
00:08:54,074 --> 00:08:56,055
and then you can write very targeted tests

260
00:08:56,055 --> 00:08:58,086
just to focus on them

261
00:08:58,086 --> 00:09:01,043
And, sort of, you know, putting those pieces together

262
00:09:01,043 --> 00:09:03,039
the unit tests

263
00:09:03,039 --> 00:09:05,059
because of their isolation and their fine resolution

264
00:09:05,059 --> 00:09:07,039
tend to use a lot of mocks

265
00:09:07,039 --> 00:09:09,012
to isolate the behaviours you don’t care about

266
00:09:09,012 --> 00:09:11,020
But that means that, by definition

267
00:09:11,020 --> 00:09:12,070
you’re not testing the interfaces

268
00:09:12,070 --> 00:09:14,099
and it’s sort of a “received wisdom” in software

269
00:09:14,099 --> 00:09:16,069
that a lot of the interesting bugs

270
00:09:16,069 --> 00:09:18,076
occur at the interfaces between pieces

271
00:09:18,076 --> 00:09:20,078
and not sort of within a class or within a method—

272
00:09:20,078 --> 00:09:22,040
those are sort of the easy bugs to track down

273
00:09:22,040 --> 00:09:24,026
And at the other extreme

274
00:09:24,026 --> 00:09:26,081
the more you get towards the integration testing extreme

275
00:09:26,081 --> 00:09:29,072
you’re supposed to rely less and less on mocks

276
00:09:29,072 --> 00:09:30,090
for that exact reason

277
00:09:30,090 --> 00:09:32,066
Now we saw, if you’re testing something like

278
00:09:32,066 --> 00:09:34,015
say, in a service-oriented architecture

279
00:09:34,015 --> 00:09:35,089
where you have to interact with the remote site

280
00:09:35,089 --> 00:09:37,028
you still end up

281
00:09:37,028 --> 00:09:38,094
having to do a fair amount of mocking and stubbing

282
00:09:38,094 --> 00:09:40,028
so that you don’t rely on the Internet

283
00:09:40,028 --> 00:09:41,067
in order for your tests to pass

284
00:09:41,067 --> 00:09:43,006
but, generally speaking

285
00:09:43,006 --> 00:09:47,014
you’re trying to remove as many of the mocks that you can

286
00:09:47,014 --> 00:09:48,095
and let the system run the way it would run in real life

287
00:09:48,095 --> 00:09:52,070
So, the good news is you <i>are</i> testing the interfaces

288
00:09:52,070 --> 00:09:54,074
<i>but</i> when something goes wrong in one of the interfaces

289
00:09:54,074 --> 00:09:57,053
because your resolution is not as good

290
00:09:57,053 --> 00:10:00,031
it may take longer to figure out what it is

291
00:10:00,031 --> 00:10:05,019
So, what’s sort of the high-order bit from this tradeoff

292
00:10:05,019 --> 00:10:07,024
is you don’t really want to rely

293
00:10:07,024 --> 00:10:08,076
too heavily on any one kind of test

294
00:10:08,076 --> 00:10:10,078
They serve different purposes and, depending on

295
00:10:10,078 --> 00:10:13,043
are you trying to exercise your interfaces more

296
00:10:13,043 --> 00:10:15,089
or are you trying to improve your fine-grain coverage

297
00:10:15,089 --> 00:10:18,003
that affects how you develop your test suite

298
00:10:18,003 --> 00:10:20,065
and you’ll evolve it along with your software

299
00:10:20,065 --> 00:10:24,014
So, we’ve used a certain set of terminology in testing

300
00:10:24,014 --> 00:10:26,028
It’s the terminology that, by and large

301
00:10:26,028 --> 00:10:29,001
is most commonly used in the Rails community

302
00:10:29,001 --> 00:10:30,060
but there’s some variation

303
00:10:30,060 --> 00:10:33,069
[and] some other terms that you might hear

304
00:10:33,069 --> 00:10:35,018
if you go get a job somewhere

305
00:10:35,018 --> 00:10:36,093
and you hear about mutation testing

306
00:10:36,093 --> 00:10:38,072
which we haven’t done

307
00:10:38,072 --> 00:10:40,024
This is an interesting idea that was, I think, invented by

308
00:10:40,024 --> 00:10:43,037
Ammann and Offutt, who have, sort of

309
00:10:43,037 --> 00:10:44,093
the definitive book on software testing

310
00:10:44,093 --> 00:10:46,048
The idea is:

311
00:10:46,048 --> 00:10:48,000
Suppose I introduced a deliberate bug into my code

312
00:10:48,000 --> 00:10:49,051
does that force some test to fail?

313
00:10:49,051 --> 00:10:53,003
Because, if I changed, you know, “if x” to “if not x”

314
00:10:53,003 --> 00:10:56,010
and no tests fail, then either I’m missing some coverage

315
00:10:56,010 --> 00:10:59,019
or my app is very strange and somehow nondeterministic

316
00:10:59,019 --> 00:11:03,099
Fuzz testing, which Koushik Sen may talk more about

317
00:11:03,099 --> 00:11:07,085
basically, this is the “10,000 monkeys at typewriters

318
00:11:07,085 --> 00:11:09,024
throwing random input at your code”

319
00:11:09,024 --> 00:11:10,037
What’s interesting about it is that

320
00:11:10,037 --> 00:11:11,065
those tests we’ve been doing

321
00:11:11,065 --> 00:11:13,086
essentially are crafted to test the app

322
00:11:13,086 --> 00:11:15,058
the way it was designed

323
00:11:15,058 --> 00:11:16,088
and these, you know, fuzz testing

324
00:11:16,088 --> 00:11:19,064
is about testing the app in ways it <i>wasn’t</i> meant to be used

325
00:11:19,064 --> 00:11:22,098
So, what happens if you throw enormous form submissions

326
00:11:22,098 --> 00:11:25,036
What happens if you put control characters in your forms?

327
00:11:25,036 --> 00:11:27,062
What happens if you submit the same thing over and over?

328
00:11:27,062 --> 00:11:29,093
And, Koushik has a statistic that

329
00:11:29,093 --> 00:11:32,033
Microsoft finds up to 20% of their bugs

330
00:11:32,033 --> 00:11:34,064
using some variation of fuzz testing

331
00:11:34,064 --> 00:11:36,029
and that about 25%

332
00:11:36,029 --> 00:11:39,021
of the common Unix command-line programs

333
00:11:39,021 --> 00:11:40,092
can be made to crash

334
00:11:40,092 --> 00:11:44,018
[when] put through aggressive fuzz testing

335
00:11:44,018 --> 00:11:46,089
Defining-use coverage is something that we haven’t done

336
00:11:46,089 --> 00:11:48,089
but it’s another interesting concept

337
00:11:48,089 --> 00:11:50,089
The idea is that at any point in my program

338
00:11:50,089 --> 00:11:52,062
there’s a place where I define—

339
00:11:52,062 --> 00:11:54,046
or I assign a value to some variable—

340
00:11:54,046 --> 00:11:56,000
and then there’s a place downstream

341
00:11:56,000 --> 00:11:57,075
where presumably I’m going to consume that value—

342
00:11:57,075 --> 00:11:59,058
someone’s going to use that value

343
00:11:59,058 --> 00:12:01,013
Have I covered every pair?

344
00:12:01,013 --> 00:12:02,059
In other words, do I have tests where every pair

345
00:12:02,059 --> 00:12:04,054
of defining a variable and using it somewhere

346
00:12:04,054 --> 00:12:07,014
is executed at some part of my test suites

347
00:12:07,014 --> 00:12:10,071
It’s sometimes called DU-coverage

348
00:12:10,071 --> 00:12:14,011
And other terms that I think are not as widely used anymore

349
00:12:14,011 --> 00:12:17,071
blackbox versus whitebox, or blackbox versus glassbox

350
00:12:17,071 --> 00:12:20,025
Roughly, a blackbox test is one that is written from

351
00:12:20,025 --> 00:12:22,041
the point of view of the external specification of the thing

352
00:12:22,041 --> 00:12:24,022
[For example:] “This is a hash table

353
00:12:24,022 --> 00:12:26,015
When I put in a key I should get back a value

354
00:12:26,015 --> 00:12:28,011
If I delete the key the value shouldn’t be there”

355
00:12:28,011 --> 00:12:29,099
That’s a blackbox test because it doesn’t say

356
00:12:29,099 --> 00:12:32,028
anything about how the hash table is implemented

357
00:12:32,028 --> 00:12:34,072
and it doesn’t try to stress the implementation

358
00:12:34,072 --> 00:12:36,056
A corresponding whitebox test might be:

359
00:12:36,056 --> 00:12:38,008
“I know something about the hash function

360
00:12:38,008 --> 00:12:39,098
and I’m going to deliberately create

361
00:12:39,098 --> 00:12:41,088
hash keys in my test cases

362
00:12:41,088 --> 00:12:43,078
that cause a lot of hash collisions

363
00:12:43,078 --> 00:12:45,095
to make sure that I’m testing that part of the functionality”

364
00:12:45,095 --> 00:12:49,007
Now, a C0 test coverage tool, like SimpleCov

365
00:12:49,007 --> 00:12:52,001
would reveal that if all you had is blackbox tests

366
00:12:52,001 --> 00:12:53,028
you might find that

367
00:12:53,028 --> 00:12:55,056
the collision coverage code wasn’t being hit very often

368
00:12:55,056 --> 00:12:56,075
And that might tip you off and say:

369
00:12:56,075 --> 00:12:58,028
“Ok, if I really want to strengthen that—

370
00:12:58,028 --> 00:13:00,008
for one, if I want to boost coverage for those tests

371
00:13:00,008 --> 00:13:02,006
now I have to write a whitebox or a glassbox test

372
00:13:02,006 --> 00:13:04,057
I have to look inside, see what the implementation does

373
00:13:04,057 --> 00:13:05,061
and find specific ways

374
00:13:05,061 --> 00:13:10,060
to try to break the implementation in evil ways”

375
00:13:10,060 --> 00:13:13,075
So, I think, testing is a kind of a way of life, right?

376
00:13:13,075 --> 00:13:16,069
We’ve gotten away from the phase of

377
00:13:16,069 --> 00:13:18,033
“We’d build the whole thing and then we’d test it”

378
00:13:18,033 --> 00:13:19,092
and we’ve gotten into the phase of

379
00:13:19,092 --> 00:13:20,077
“We’re testing as we go”

380
00:13:20,077 --> 00:13:22,048
Testing is really more like a development tool

381
00:13:22,048 --> 00:13:24,022
and like so many development tools

382
00:13:24,022 --> 00:13:25,062
the effectiveness of it depends

383
00:13:25,062 --> 00:13:27,013
on whether you’re using it in a tasteful manner

384
00:13:27,013 --> 00:13:31,002
So, you could say: “Well, let’s see—I kicked the tires

385
00:13:31,002 --> 00:13:33,048
You know, I fired up the browser, I tried a couple of things

386
00:13:33,048 --> 00:13:35,097
(claps hand) Looks like it works! Deploy it!”

387
00:13:35,097 --> 00:13:38,045
That’s obviously a little more cavalier than you’d want to be

388
00:13:38,045 --> 00:13:41,024
And, by the way, one of the things that we discovered

389
00:13:41,024 --> 00:13:43,077
with this online course just starting up

390
00:13:43,077 --> 00:13:45,090
when 60,000 people are enrolled in the course

391
00:13:45,090 --> 00:13:48,099
and 0.1% of those people have a problem

392
00:13:48,099 --> 00:13:50,083
you’d get 60 emails

393
00:13:50,083 --> 00:13:53,078
The corollary is: when your site is used by a lot of people

394
00:13:53,078 --> 00:13:55,089
some stupid bug that you didn’t find

395
00:13:55,089 --> 00:13:57,018
but that could have found by testing

396
00:13:57,018 --> 00:13:59,080
could very quickly generate *a lot* of pain

397
00:13:59,080 --> 00:14:02,023
On the other hand, you don’t want to be dogmatic and say

398
00:14:02,023 --> 00:14:04,056
“Uh, until we have 100% coverage and every test is green

399
00:14:04,056 --> 00:14:06,005
we absolutely will not ship”

400
00:14:06,005 --> 00:14:07,012
That’s not healthy either

401
00:14:07,012 --> 00:14:08,048
And the test quality

402
00:14:08,048 --> 00:14:10,057
doesn’t necessarily correlate with the statement

403
00:14:10,057 --> 00:14:11,064
unless you can say something

404
00:14:11,064 --> 00:14:12,068
about the quality of your tests

405
00:14:12,068 --> 00:14:14,029
just because you’ve executed every line

406
00:14:14,029 --> 00:14:17,010
doesn’t mean that you’ve tested the interesting cases

407
00:14:17,010 --> 00:14:18,068
So, somewhere in between, you could say

408
00:14:18,068 --> 00:14:20,014
“Well, we’ll use coverage tools to identify

409
00:14:20,014 --> 00:14:23,004
undertested or poorly-tested parts of the code

410
00:14:23,004 --> 00:14:24,073
and we’ll use them as a guideline

411
00:14:24,073 --> 00:14:27,011
to sort of help improve our overall confidence level”

412
00:14:27,011 --> 00:14:29,024
But remember, Agile is about embracing change

413
00:14:29,024 --> 00:14:30,032
and dealing with it

414
00:14:30,032 --> 00:14:32,002
Part of change is things would change that will cause

415
00:14:32,002 --> 00:14:33,038
bugs that you didn’t foresee

416
00:14:33,038 --> 00:14:34,031
and the right reaction is:

417
00:14:34,031 --> 00:14:36,026
Be comfortable enough for the testing tools

418
00:14:36,026 --> 00:14:37,064
[so] that you can quickly find those bugs

419
00:14:37,064 --> 00:14:39,025
Write a test that reproduces that bug

420
00:14:39,025 --> 00:14:40,062
And then make the test green

421
00:14:40,062 --> 00:14:41,061
Then you’ll really fix it

422
00:14:41,061 --> 00:14:43,004
That means, the way that you really fix a bug is

423
00:14:43,004 --> 00:14:45,049
if you created a test that correctly failed

424
00:14:45,049 --> 00:14:46,088
to reproduce that bug

425
00:14:46,088 --> 00:14:48,055
and then you went back and fixed the code

426
00:14:48,055 --> 00:14:49,057
to make those tests pass

427
00:14:49,057 --> 00:14:51,073
Similarly, you don’t want to say

428
00:14:51,073 --> 00:14:53,036
“Well, unit tests give you better coverage

429
00:14:53,036 --> 00:14:54,073
They’re more thorough and detailed

430
00:14:54,073 --> 00:14:56,044
So let’s focus all our energy on that”

431
00:14:56,044 --> 00:14:57,062
as opposed to

432
00:14:57,062 --> 00:14:58,093
“Oh, focus on integration tests

433
00:14:58,093 --> 00:15:00,006
because they’re more realistic, right?

434
00:15:00,006 --> 00:15:01,056
They reflect what the customer said they want

435
00:15:01,056 --> 00:15:03,034
So, if the integration tests are passing

436
00:15:03,034 --> 00:15:05,067
by definition we’re meeting a customer need”

437
00:15:05,067 --> 00:15:07,034
Again, both extremes are kind of unhealthy

438
00:15:07,034 --> 00:15:09,079
because each one of these can find problems

439
00:15:09,079 --> 00:15:11,031
that would be missed by the other

440
00:15:11,031 --> 00:15:12,060
So, having a good combination of them

441
00:15:12,060 --> 00:15:15,042
is kind of all it is all about

442
00:15:15,042 --> 00:15:18,072
The last thing I want to leave you with is, I think

443
00:15:18,072 --> 00:15:20,036
in terms of testing, is “TDD versus

444
00:15:20,036 --> 00:15:22,005
what I call conventional debugging—

445
00:15:22,005 --> 00:15:24,004
i.e., the way that we all kind of do it

446
00:15:24,004 --> 00:15:25,051
even though we say we don’t”

447
00:15:25,051 --> 00:15:26,064
and we’re all trying to get better, right?

448
00:15:26,064 --> 00:15:27,085
We’re all kind of in the gutter

449
00:15:27,085 --> 00:15:29,036
Some of us are looking up at the stars

450
00:15:29,036 --> 00:15:31,011
trying to improve our practices

451
00:15:31,011 --> 00:15:33,099
But, having now lived with this for 3 or 4 years myself

452
00:15:33,099 --> 00:15:35,091
and—I’ll be honest—3 years ago I didn’t do TDD

453
00:15:35,091 --> 00:15:37,079
I do it now, because I find that it’s better

454
00:15:37,079 --> 00:15:40,081
and here’s my distillation of why I think it works for me

455
00:15:40,081 --> 00:15:43,032
Sorry, the colours are a little weird

456
00:15:43,032 --> 00:15:45,000
but on the left column of the table

457
00:15:45,000 --> 00:15:46,034
[it] says “Conventional debugging”

458
00:15:46,034 --> 00:15:47,044
and the right side says “TDD”

459
00:15:47,044 --> 00:15:49,069
So what’s the way I used to write code?

460
00:15:49,069 --> 00:15:51,056
Maybe some of you still do this

461
00:15:51,056 --> 00:15:53,013
I write a whole bunch of lines

462
00:15:53,013 --> 00:15:54,043
maybe a few tens of lines of code

463
00:15:54,043 --> 00:15:55,059
I’m <i>sure</i> they’re right—

464
00:15:55,059 --> 00:15:56,061
I mean, I <i>am</i> a good programmer, right?

465
00:15:56,061 --> 00:15:57,099
This is not that hard

466
00:15:57,099 --> 00:15:59,002
I run it – It doesn’t work

467
00:15:59,002 --> 00:16:01,098
Ok, fire up the debugger – Start putting in printf’s

468
00:16:01,098 --> 00:16:04,088
If I’d been using TDD what would I do instead?

469
00:16:04,088 --> 00:16:08,022
Well I’d write a <i>few</i> lines of code, having written a test first

470
00:16:08,022 --> 00:16:10,071
So as soon as the test goes from red to green

471
00:16:10,071 --> 00:16:12,064
I know I wrote code that works—

472
00:16:12,064 --> 00:16:15,013
or at least the parts of the behaviour that I had in mind

473
00:16:15,013 --> 00:16:16,096
Those parts of the behaviour work, because I had a test

474
00:16:16,096 --> 00:16:19,056
Ok, back to conventional debugging:

475
00:16:19,056 --> 00:16:21,073
I’m running my program, trying to find the bugs

476
00:16:21,073 --> 00:16:23,028
I start putting in printf’s everywhere

477
00:16:23,028 --> 00:16:24,062
to print out the values of things

478
00:16:24,062 --> 00:16:25,064
which by the way is a lot fun

479
00:16:25,064 --> 00:16:26,073
when you’re trying to read them

480
00:16:26,073 --> 00:16:28,014
out of the 500 lines of log output

481
00:16:28,014 --> 00:16:29,035
that you’d get in a Rails app

482
00:16:29,035 --> 00:16:30,087
trying to find <i>your</i> printf’s

483
00:16:30,087 --> 00:16:32,035
you know, “I know what I’ll do—

484
00:16:32,035 --> 00:16:34,008
I’ll put in 75 asterisks before and after

485
00:16:34,008 --> 00:16:36,043
That will make it readable” (laughter)

486
00:16:36,043 --> 00:16:38,071
Who don’t—Ok, raise your hands if you don’t do this!

487
00:16:38,071 --> 00:16:40,090
Thank you for your honesty. (laughter) Ok.

488
00:16:40,090 --> 00:16:43,014
Or— Or I could do the other thing, I could say:

489
00:16:43,014 --> 00:16:45,030
Instead of printing the value of a variable

490
00:16:45,030 --> 00:16:47,039
why don’t I write a test that inspects it

491
00:16:47,039 --> 00:16:48,079
in such an expectation which should

492
00:16:48,079 --> 00:16:50,090
and I’ll know immediately in bright red letters

493
00:16:50,090 --> 00:16:53,033
if that expectation wasn’t met

494
00:16:53,033 --> 00:16:56,005
Ok, I’m back on the conventional debugging side:

495
00:16:56,005 --> 00:16:58,090
I break out the big guns: I pull out the Ruby debugger

496
00:16:58,092 --> 00:17:02,044
I set a debug breakpoint, and I now start <i>tweaking</i> and say

497
00:17:02,044 --> 00:17:04,085
“Oh, let’s see, I have to get past that ‘if’ statement

498
00:17:04,085 --> 00:17:06,002
so I have to set that thing

499
00:17:06,002 --> 00:17:07,063
Oh, I have to call that method and so I need to…”

500
00:17:07,063 --> 00:17:08,065
No!

501
00:17:08,065 --> 00:17:10,087
I <i>could</i> instead—if I’m going to do that anyway—

502
00:17:10,087 --> 00:17:13,000
let’s just do it in a file, set up some mocks and stubs

503
00:17:13,000 --> 00:17:16,045
to control the code path, make it go the way I want

504
00:17:16,045 --> 00:17:19,013
And now, “Ok, for sure I’ve fixed it!

505
00:17:19,013 --> 00:17:22,012
I’ll get out of the debugger, run it all again!”

506
00:17:22,012 --> 00:17:24,022
And, of course, 9 times out of 10, you didn’t fix it

507
00:17:24,022 --> 00:17:26,072
or you kind of partly fixed it but you didn’t completely fix it

508
00:17:26,072 --> 00:17:30,040
and now I have to do all these manual things all over again

509
00:17:30,040 --> 00:17:32,086
<i>or</i> I already have a bunch of tests

510
00:17:32,086 --> 00:17:34,031
and I can just rerun them automatically

511
00:17:34,031 --> 00:17:35,056
and I could, if some of them fail

512
00:17:35,056 --> 00:17:36,087
“Oh, I didn’t fix the whole thing

513
00:17:36,087 --> 00:17:38,040
No problem, I’ll just go back!”

514
00:17:38,040 --> 00:17:39,096
So, the bottom line is that

515
00:17:39,096 --> 00:17:41,095
you know, you <i>could</i> do it on the left side

516
00:17:41,095 --> 00:17:45,004
but you’re using the same techniques in both cases

517
00:17:45,004 --> 00:17:48,062
The only difference is, in one case you’re doing it manually

518
00:17:48,062 --> 00:17:50,004
which is boring and error-prone

519
00:17:50,004 --> 00:17:51,078
In the other case you’re doing a little more work

520
00:17:51,078 --> 00:17:53,095
but you can make it automatic and repeatable

521
00:17:53,095 --> 00:17:55,071
and have, you know, some high confidence

522
00:17:55,071 --> 00:17:57,003
that as you change things in your code

523
00:17:57,003 --> 00:17:58,092
you are not breaking stuff that used to work

524
00:17:58,092 --> 00:18:00,091
and basically it’s more productive

525
00:18:00,091 --> 00:18:02,047
So you’re doing all the same things

526
00:18:02,047 --> 00:18:04,037
but with a, kind of, “delta” extra work

527
00:18:04,037 --> 00:18:07,086
you are using your effort at a much higher leverage

528
00:18:07,086 --> 00:18:10,036
So that’s kind of my view of why TDD is a good thing

529
00:18:10,036 --> 00:18:11,088
It’s really, it doesn’t require new skills

530
00:18:11,088 --> 00:18:15,011
It just requires [you] to refactor your existing skills

531
00:18:15,011 --> 00:18:18,014
I also tried when I—again, honest confessions, right?—

532
00:18:18,014 --> 00:18:19,034
when I started doing this it was like

533
00:18:19,034 --> 00:18:21,049
“Ok, I gonna be teaching a course on Rails

534
00:18:21,049 --> 00:18:22,065
I should really focus on testing

535
00:18:22,065 --> 00:18:24,032
So I went back to some code I had written

536
00:18:24,032 --> 00:18:26,087
that was <i>working</i>—you know, that was decent code—

537
00:18:26,087 --> 00:18:29,006
and I started trying to write tests for it

538
00:18:29,006 --> 00:18:31,019
and it was *so painful*

539
00:18:31,019 --> 00:18:33,033
because the code wasn’t written in way that was testable

540
00:18:33,033 --> 00:18:34,097
There were all kinds of interactions

541
00:18:34,097 --> 00:18:36,038
There were, like, nested conditionals

542
00:18:36,038 --> 00:18:38,083
And if you wanted to isolate a particular statement

543
00:18:38,083 --> 00:18:41,070
and have it test—to trigger test—just that statement

544
00:18:41,070 --> 00:18:44,000
the amount of stuff you’d have to set up in your test

545
00:18:44,000 --> 00:18:45,009
to have it happen—

546
00:18:45,009 --> 00:18:46,040
remember when talked about mock train wrecks—

547
00:18:46,040 --> 00:18:48,014
you have to set up all this infrastructure

548
00:18:48,014 --> 00:18:49,063
just to get <i>one</i> line of code

549
00:18:49,063 --> 00:18:51,015
and you do that and you go

550
00:18:51,015 --> 00:18:52,074
“Gawd, testing is really not worth it!

551
00:18:52,074 --> 00:18:54,034
I wrote 20 lines of setup

552
00:18:54,034 --> 00:18:56,059
so that I could test two lines in my function!”

553
00:18:56,059 --> 00:18:58,085
What that’s really telling you—as I now realize—

554
00:18:58,085 --> 00:19:00,042
is your function is <i>bad</i>

555
00:19:00,042 --> 00:19:01,049
It’s a badly written function

556
00:19:01,049 --> 00:19:02,052
It’s not a testable function

557
00:19:02,052 --> 00:19:03,088
It’s got too many moving parts

558
00:19:03,088 --> 00:19:06,026
whose dependencies <i>can</i> be broken

559
00:19:06,026 --> 00:19:07,070
There’s no seams in my function

560
00:19:07,070 --> 00:19:11,008
that allow me to individually test the different behaviours

561
00:19:11,008 --> 00:19:12,083
And once you start doing Test First Development

562
00:19:12,083 --> 00:19:15,043
because you have to write your tests in small chunks

563
00:19:15,043 --> 00:19:17,053
it kind of make this problem go away

564
00:19:17,053 --> 99:59:59,999
So that’s been my epiphany