0:00:00.057,0:00:01.092 So we spent a bunch of time 0:00:01.092,0:00:03.032 in the last couple of lectures 0:00:03.032,0:00:05.082 talking about different kinds of testing 0:00:05.082,0:00:08.021 about unit testing versus integration testing 0:00:08.021,0:00:10.010 We talked about how do you use RSpec 0:00:10.010,0:00:12.049 to really isolate the parts of your code you want to test 0:00:12.049,0:00:14.090 you’ve also, you know, because of homework 3, 0:00:14.090,0:00:18.017 and other stuff, we have been doing BDD, 0:00:18.017,0:00:20.062 where we’ve been using Cucumber to turn user stories 0:00:20.062,0:00:22.095 into, essentially, integration and acceptance tests 0:00:22.095,0:00:25.061 So you’ve seen testing in a couple of different levels 0:00:25.061,0:00:27.063 and the goal here is sort of to do a few remarks 0:00:27.063,0:00:29.092 to, you know, let’s back up a little bit 0:00:29.092,0:00:33.001 and see the big picture, and tie those things together 0:00:33.001,0:00:34.095 So this sort of spans material 0:00:34.095,0:00:37.000 that covers three or four sections in the book 0:00:37.000,0:00:39.061 and I want to just hit the high points in lecture 0:00:39.061,0:00:41.046 So a question that comes up 0:00:41.046,0:00:43.025 I’m sure it’s come up for all of you 0:00:43.025,0:00:44.052 as you have been doing homework 0:00:44.052,0:00:45.069 is: “How much testing is enough?” 0:00:45.069,0:00:48.049 And, sadly, for a long time 0:00:48.049,0:00:51.009 kind of if you asked this question in industry 0:00:51.009,0:00:52.017 the answer was basically 0:00:52.017,0:00:53.017 “Well, we have a shipping deadline, 0:00:53.017,0:00:54.099 so however much testing we can do 0:00:54.099,0:00:56.066 before that deadline, that’s how much.” 0:00:56.066,0:00:58.015 That’s what you have time for. 0:00:58.015,0:01:00.002 So, you know, that’s a little flip 0:01:00.002,0:01:01.011 obviously not very good 0:01:01.011,0:01:02.054 So you can do a bit better, right? 0:01:02.054,0:01:03.070 There’re some static measures 0:01:03.070,0:01:06.003 like how many lines of code does your app have 0:01:06.003,0:01:08.021 and how many lines of tests do you have? 0:01:08.021,0:01:10.029 And it’s not unusual in industry 0:01:10.029,0:01:12.068 in a well-tested piece of software 0:01:12.068,0:01:14.057 for the number of lines of tests 0:01:14.057,0:01:17.073 to go far beyond the number of lines of code 0:01:17.073,0:01:19.075 So, integer multiples are not unusual 0:01:19.075,0:01:21.084 And I think even for sort of, you know, 0:01:21.084,0:01:23.022 research code or classwork 0:01:23.022,0:01:26.085 a ratio of, you know, maybe 1.5 is not unreasonable 0:01:26.085,0:01:30.005 so one and a half times the amount of test code 0:01:30.005,0:01:32.024 as you have application code 0:01:32.024,0:01:34.022 And in a lot of production systems 0:01:34.022,0:01:35.027 where they really care about testing 0:01:35.027,0:01:36.091 it is much higher than that 0:01:36.091,0:01:38.015 So maybe a better question to ask: 0:01:38.015,0:01:39.047 Rather than saying “How much testing is enough?” 0:01:39.047,0:01:42.049 is to ask “How good is the testing I am doing now? 0:01:42.049,0:01:44.035 How thorough is it?” 0:01:44.035,0:01:45.056 Later in this semester 0:01:45.056,0:01:46.056 Professor Sen will talk about 0:01:46.056,0:01:48.018 a little bit about formal methods 0:01:48.018,0:01:50.085 and sort of what’s at the frontiers of testing and debugging 0:01:50.085,0:01:52.068 But a couple of things that we can talk about 0:01:52.068,0:01:54.007 based on what you already know 0:01:54.007,0:01:57.074 is some basic concepts about test coverage 0:01:57.074,0:01:59.054 And although I would say 0:01:59.054,0:02:01.001 you know, we’ve been saying all along 0:02:01.001,0:02:03.003 formal methods, they don’t really work on big systems 0:02:03.003,0:02:05.033 I think that statement, in my personal opinion 0:02:05.033,0:02:07.001 is actually a lot less true than it used to be 0:02:07.001,0:02:09.019 I think there are a number of specific places 0:02:09.019,0:02:10.052 especially in testing and debugging 0:02:10.052,0:02:12.084 where formal methods are actually making fast progress 0:02:12.084,0:02:15.075 and Koushik Sen is one of the leaders in that 0:02:15.075,0:02:17.094 So you’ll have the opportunity to hear more about that later 0:02:17.094,0:02:21.043 but for the moment I think, kind of bread and butter 0:02:21.043,0:02:22.073 is let’s talk about coverage measurement 0:02:22.073,0:02:24.047 because this is where the rubber meets the road 0:02:24.047,0:02:26.020 in terms of how you’d be evaluated 0:02:26.020,0:02:28.063 if you are doing this for real 0:02:28.063,0:02:29.052 So what’s some basics? 0:02:29.052,0:02:30.078 Here’s a really simple class you can use 0:02:30.078,0:02:32.090 to talk about different ways to measure 0:02:32.090,0:02:34.080 how our test covers this code 0:02:34.080,0:02:36.063 And there’re a few different levels 0:02:36.063,0:02:37.085 with different terminologies 0:02:37.085,0:02:40.073 It’s not really universal across all software houses 0:02:40.073,0:02:42.064 But one common set of terminology 0:02:42.064,0:02:43.064 that the book exposes 0:02:43.064,0:02:44.068 is we could talk about S0 0:02:44.068,0:02:47.045 where we’d just mean you’ve called every method once 0:02:47.045,0:02:50.045 So you know, if you call foo, and you call bar, you’re done 0:02:50.045,0:02:52.015 That’s S0 coverage: not terribly thorough 0:02:52.015,0:02:54.068 A little more stringent, S1, is 0:02:54.068,0:02:56.013 you could say, we’re calling every method 0:02:56.013,0:02:57.028 from every place that it could be called 0:02:57.028,0:02:58.082 So what does that mean? 0:02:58.082,0:03:00.007 It means, for example 0:03:00.007,0:03:01.012 it’s not enough to call bar 0:03:01.012,0:03:02.095 You have to make sure that you have to call it 0:03:02.095,0:03:05.057 at least once from in here 0:03:05.057,0:03:07.016 as well as calling it once 0:03:07.016,0:03:10.037 from any exterior function that might call it 0:03:10.037,0:03:12.081 C0 which is what SimpleCov measures 0:03:12.081,0:03:15.099 (those of you who’ve gotten SimpleCov up and running) 0:03:15.099,0:03:18.052 basically says you’ve executed every statement 0:03:18.052,0:03:20.004 you’ve touched every statement in your code once 0:03:20.004,0:03:22.048 But the caveat there is that 0:03:22.048,0:03:25.058 conditionals really just count as a single statement 0:03:25.058,0:03:28.091 So, if you, no matter which branch of this “if” you took 0:03:28.091,0:03:31.074 as long as you touched one of the other branch 0:03:31.074,0:03:33.035 you’ve executed the “if’ statement 0:03:33.035,0:03:35.066 So even C0 is still, you know, sort of superficial coverage 0:03:35.066,0:03:37.026 But, as we will see 0:03:37.026,0:03:39.023 the way that you will want to read this information is: 0:03:39.023,0:03:41.079 if you are getting bad coverage at the C0 level 0:03:41.079,0:03:44.007 then you have really really bad coverage 0:03:44.007,0:03:46.008 So if you are not kind of making 0:03:46.008,0:03:47.037 this simple level of superficial coverage 0:03:47.037,0:03:50.002 then your testing is probably deficient 0:03:50.002,0:03:51.091 C1 is the next step up from that 0:03:51.091,0:03:53.071 We could say: 0:03:53.071,0:03:55.019 Well, we have to take every branch in both directions 0:03:55.019,0:03:56.061 So, when we are doing this “if” statement 0:03:56.061,0:03:58.066 we have to make sure that 0:03:58.066,0:03:59.092 we do the “if x” part once 0:03:59.092,0:04:05.013 and the “if not x” part at least once to meet C1 0:04:05.013,0:04:08.036 You can augment that with decision coverage 0:04:08.036,0:04:09.063 saying: Well, if we’re gonna… 0:04:09.063,0:04:12.036 If we have “if” statments where the condition 0:04:12.036,0:04:13.088 is made up of multiple terms 0:04:13.088,0:04:15.071 we have to make sure that every subexpression 0:04:15.071,0:04:17.097 has been evaluated both directions 0:04:17.097,0:04:19.067 In other words, that means that 0:04:19.067,0:04:22.041 if we’re going to fail this “if” statement 0:04:22.041,0:04:24.034 we have to make sure to fail it at least once 0:04:24.034,0:04:26.044 because y was false in at least once because z was false 0:04:26.044,0:04:28.088 In other words, any subexpression that could 0:04:28.088,0:04:31.021 independently change the outcome of the condition 0:04:31.021,0:04:34.048 has to be exercised in both directions 0:04:34.048,0:04:36.003 And then, 0:04:36.003,0:04:38.052 kind of, the one that, you know, a lot of people aspire to 0:04:38.052,0:04:41.026 but there is disagreement on how much more valuable it is 0:04:41.026,0:04:42.083 is you take every path through the code 0:04:42.083,0:04:45.053 Obviously, this is kind of difficult because 0:04:45.053,0:04:48.033 it tends to be exponential in the number of conditions 0:04:48.033,0:04:53.008 And in general it’s difficult 0:04:53.008,0:04:55.031 to evaluate if you’ve taken every path through the code 0:04:55.031,0:04:57.001 There are formal techniques that you can use 0:04:57.001,0:04:58.083 to tell you where the holes are 0:04:58.083,0:05:01.031 but the bottom line is that 0:05:01.031,0:05:03.004 in most commercial software houses 0:05:03.004,0:05:04.089 there is, I would say, not complete consensus 0:05:04.089,0:05:06.070 on how much more valuable C2 is 0:05:06.070,0:05:08.068 compared to C0 or C1 0:05:08.068,0:05:10.013 So, I think, for the purpose of our class 0:05:10.013,0:05:11.067 you get exposed to the idea 0:05:11.067,0:05:13.020 of how you use coverage information 0:05:13.020,0:05:16.040 SimpleCov takes advantage of some built-in Ruby features 0:05:16.040,0:05:18.009 to give you C0 coverage 0:05:18.009,0:05:19.062 [It] does really nice reports 0:05:19.062,0:05:21.025 We can sort of see it 0:05:21.025,0:05:22.096 at the level of individual lines in your file 0:05:22.096,0:05:24.091 You can see what your coverage is 0:05:24.091,0:05:27.015 and I think that’s kind of a, you know 0:05:27.015,0:05:31.018 a good start for where we are 0:05:31.018,0:05:33.076 So, having see a sort of different flavours of tests 0:05:33.076,0:05:37.020 Stepping back and looking back at the big picture 0:05:37.020,0:05:38.098 what are the different kind of tests 0:05:38.098,0:05:40.078 that we’ve seen concretely? 0:05:40.078,0:05:42.032 and what are the tradeoffs 0:05:42.032,0:05:43.089 between using those different kinds of tests? 0:05:43.089,0:05:47.016 So we’ve seen at the level of individual classes or methods 0:05:47.016,0:05:50.009 we use RSpec, with extensive use of mocking and stubbing 0:05:50.009,0:05:53.004 So, for example when we do testing methods in the model 0:05:53.004,0:05:55.057 that will be an example of unit testing 0:05:55.057,0:05:59.025 We also did something that is pretty similar to 0:05:59.025,0:06:00.097 functional or module testing 0:06:00.097,0:06:02.071 where there is more than one module participating 0:06:02.071,0:06:04.065 So, for example when we did controller specs 0:06:04.065,0:06:07.085 we saw that—we simulate a POST action 0:06:07.085,0:06:09.029 but remember that the POST action 0:06:09.029,0:06:10.086 has to go through the routing subsystem 0:06:10.086,0:06:12.042 before it gets to the controller 0:06:12.042,0:06:14.048 Once the controller is done it will try to render a view 0:06:14.048,0:06:16.007 So in fact there’s other pieces 0:06:16.007,0:06:17.067 that collaborate with the controller 0:06:17.067,0:06:19.099 that have to be working in order for controller specs to pass 0:06:19.099,0:06:21.051 So that’s somewhere inbetween: 0:06:21.051,0:06:23.035 where we’re doing more than a single method 0:06:23.035,0:06:25.000 touching more than a single class 0:06:25.000,0:06:27.000 but we’re still concentrating [our] attention 0:06:27.000,0:06:28.088 on a fairly narrow slice of the system at a time 0:06:28.088,0:06:31.044 and we’re still using mocking and stubbing extensively 0:06:31.044,0:06:35.030 to sort of isolate that behaviour that we want to test 0:06:35.030,0:06:36.091 And then at the level of Cucumber scenarios 0:06:36.091,0:06:38.047 these are more like integration or system tests 0:06:38.047,0:06:41.069 They exercise complete paths throughout the application 0:06:41.069,0:06:43.044 They probably touch a lot of different modules 0:06:43.044,0:06:46.003 They make minimal use of mocks and stubs 0:06:46.003,0:06:48.032 because part of the goal of an integration test 0:06:48.032,0:06:50.099 is exactly to test the interaction between pieces 0:06:50.099,0:06:53.021 So you don’t want to stub or control those interactions 0:06:53.021,0:06:54.080 You actually want to let the system do 0:06:54.080,0:06:56.030 what it would really do 0:06:56.030,0:06:58.025 if this was a scenario happening in production 0:06:58.025,0:07:00.069 So how would we compare these different kinds of tests? 0:07:00.069,0:07:02.038 There’s a few different axes we can look at 0:07:02.038,0:07:05.007 One of them is how long they take to run 0:07:05.007,0:07:06.090 Now, both RSpec and Cucumber 0:07:06.090,0:07:09.013 have, kind of, high startup times and stuff like that 0:07:09.013,0:07:10.008 But, as you’ll see 0:07:10.008,0:07:11.090 as you start adding more and more RSpec tests 0:07:11.090,0:07:14.038 and using autotest to run them in the background 0:07:14.038,0:07:17.088 by and large, once RSpec kind of gets off the launching pad 0:07:17.088,0:07:19.092 it runs specs really fast 0:07:19.092,0:07:21.095 whereas running Cucumber features just takes a long time 0:07:21.095,0:07:24.059 as it essentially fires up your entire application 0:07:24.059,0:07:26.010 And later in this semester 0:07:26.010,0:07:28.086 we’ll see a way to make Cucumber even slower— 0:07:28.086,0:07:30.070 which is to have it fire up an entire browser 0:07:30.070,0:07:33.045 basically act like a puppet, remote-controlling Firefox 0:07:33.045,0:07:35.083 so you can test Javascript code 0:07:35.083,0:07:37.000 We’ll do that when we actually— 0:07:37.000,0:07:40.032 I think we’ll be able to work with our friends at SourceLabs 0:07:40.032,0:07:42.080 so you can do that in the cloud—That will be exciting 0:07:42.080,0:07:45.083 So, “run fast” versus “run slow” 0:07:45.083,0:07:46.068 Resolution: 0:07:46.068,0:07:48.025 If an error happens in your unit tests 0:07:48.025,0:07:49.075 it’s usually pretty easy 0:07:49.075,0:07:52.029 to figure out and track down what the source of that error is 0:07:52.029,0:07:53.071 because the tests are so isolated 0:07:53.071,0:07:56.025 You’ve stubbed out everything that doesn’t matter 0:07:56.025,0:07:58.025 and you’re focusing on only the behaviour of interest 0:07:58.025,0:07:59.076 So, if you’ve done a good job of doing that 0:07:59.076,0:08:01.097 when something goes wrong in one of your tests 0:08:01.097,0:08:03.045 there’s not a lot of places 0:08:03.045,0:08:04.088 that something could have gone wrong 0:08:04.088,0:08:07.041 In contrast, if you’re running a Cucumber scenario 0:08:07.041,0:08:08.089 that’s got, you know, 10 steps 0:08:08.089,0:08:10.031 and every step is touching 0:08:10.031,0:08:11.061 a whole bunch of pieces of the app 0:08:11.061,0:08:12.091 it could take a long time 0:08:12.091,0:08:14.076 to actually get to the bottom of a bug 0:08:14.076,0:08:16.014 So it is kind of a tradeoff 0:08:16.014,0:08:17.054 between how well can you localize errors 0:08:17.054,0:08:20.065 Coverage: 0:08:20.065,0:08:23.002 It’s possible if you write a good suite 0:08:23.002,0:08:24.072 of unit and functional tests 0:08:24.072,0:08:26.020 you can get really high coverage 0:08:26.020,0:08:27.085 You can run your SimpleCov report 0:08:27.085,0:08:30.080 and you can actually identify specific lines in your files 0:08:30.080,0:08:32.036 that have not been exercised by any test 0:08:32.036,0:08:34.016 and then you can go right at tests that cover them 0:08:34.016,0:08:36.014 So, figuring out how to improve your coverage 0:08:36.014,0:08:37.057 for example at the C0 level 0:08:37.057,0:08:40.021 is something much more easily done with unit tests 0:08:40.021,0:08:42.018 whereas, with a Cucumber test— 0:08:42.018,0:08:43.078 with a Cucumber scenario— 0:08:43.078,0:08:45.076 you are touching a lot of parts of the code 0:08:45.076,0:08:47.080 but you are doing it very sparsely 0:08:47.080,0:08:49.038 So, if your goal is to get your coverage up 0:08:49.038,0:08:51.031 use the tools at that are at the unit levels 0:08:51.031,0:08:53.007 so that you can focusing on understanding 0:08:53.007,0:08:54.074 what parts or my code are undertested 0:08:54.074,0:08:56.055 and then you can write very targeted tests 0:08:56.055,0:08:58.086 just to focus on them 0:08:58.086,0:09:01.043 And, sort of, you know, putting those pieces together 0:09:01.043,0:09:03.039 the unit tests 0:09:03.039,0:09:05.059 because of their isolation and their fine resolution 0:09:05.059,0:09:07.039 tend to use a lot of mocks 0:09:07.039,0:09:09.012 to isolate the behaviours you don’t care about 0:09:09.012,0:09:11.020 But that means that, by definition 0:09:11.020,0:09:12.070 you’re not testing the interfaces 0:09:12.070,0:09:14.099 and it’s sort of a “received wisdom” in software 0:09:14.099,0:09:16.069 that a lot of the interesting bugs 0:09:16.069,0:09:18.076 occur at the interfaces between pieces 0:09:18.076,0:09:20.078 and not sort of within a class or within a method— 0:09:20.078,0:09:22.040 those are sort of the easy bugs to track down 0:09:22.040,0:09:24.026 And at the other extreme 0:09:24.026,0:09:26.081 the more you get towards the integration testing extreme 0:09:26.081,0:09:29.072 you’re supposed to rely less and less on mocks 0:09:29.072,0:09:30.090 for that exact reason 0:09:30.090,0:09:32.066 Now we saw, if you’re testing something like 0:09:32.066,0:09:34.015 say, in a service-oriented architecture 0:09:34.015,0:09:35.089 where you have to interact with the remote site 0:09:35.089,0:09:37.028 you still end up 0:09:37.028,0:09:38.094 having to do a fair amount of mocking and stubbing 0:09:38.094,0:09:40.028 so that you don’t rely on the Internet 0:09:40.028,0:09:41.067 in order for your tests to pass 0:09:41.067,0:09:43.006 but, generally speaking 0:09:43.006,0:09:47.014 you’re trying to remove as many of the mocks that you can 0:09:47.014,0:09:48.095 and let the system run the way it would run in real life 0:09:48.095,0:09:52.070 So, the good news is you are testing the interfaces 0:09:52.070,0:09:54.074 but when something goes wrong in one of the interfaces 0:09:54.074,0:09:57.053 because your resolution is not as good 0:09:57.053,0:10:00.031 it may take longer to figure out what it is 0:10:00.031,0:10:05.019 So, what’s sort of the high-order bit from this tradeoff 0:10:05.019,0:10:07.024 is you don’t really want to rely 0:10:07.024,0:10:08.076 too heavily on any one kind of test 0:10:08.076,0:10:10.078 They serve different purposes and, depending on 0:10:10.078,0:10:13.043 are you trying to exercise your interfaces more 0:10:13.043,0:10:15.089 or are you trying to improve your fine-grain coverage 0:10:15.089,0:10:18.003 that affects how you develop your test suite 0:10:18.003,0:10:20.065 and you’ll evolve it along with your software 0:10:20.065,0:10:24.014 So, we’ve used a certain set of terminology in testing 0:10:24.014,0:10:26.028 It’s the terminology that, by and large 0:10:26.028,0:10:29.001 is most commonly used in the Rails community 0:10:29.001,0:10:30.060 but there’s some variation 0:10:30.060,0:10:33.069 [and] some other terms that you might hear 0:10:33.069,0:10:35.018 if you go get a job somewhere 0:10:35.018,0:10:36.093 and you hear about mutation testing 0:10:36.093,0:10:38.072 which we haven’t done 0:10:38.072,0:10:40.024 This is an interesting idea that was, I think, invented by 0:10:40.024,0:10:43.037 Ammann and Offutt, who have, sort of 0:10:43.037,0:10:44.093 the definitive book on software testing 0:10:44.093,0:10:46.048 The idea is: 0:10:46.048,0:10:48.000 Suppose I introduced a deliberate bug into my code 0:10:48.000,0:10:49.051 does that force some test to fail? 0:10:49.051,0:10:53.003 Because, if I changed, you know, “if x” to “if not x” 0:10:53.003,0:10:56.010 and no tests fail, then either I’m missing some coverage 0:10:56.010,0:10:59.019 or my app is very strange and somehow nondeterministic 0:10:59.019,0:11:03.099 Fuzz testing, which Koushik Sen may talk more about 0:11:03.099,0:11:07.085 basically, this is the “10,000 monkeys at typewriters 0:11:07.085,0:11:09.024 throwing random input at your code” 0:11:09.024,0:11:10.037 What’s interesting about it is that 0:11:10.037,0:11:11.065 those tests we’ve been doing 0:11:11.065,0:11:13.086 essentially are crafted to test the app 0:11:13.086,0:11:15.058 the way it was designed 0:11:15.058,0:11:16.088 and these, you know, fuzz testing 0:11:16.088,0:11:19.064 is about testing the app in ways it wasn’t meant to be used 0:11:19.064,0:11:22.098 So, what happens if you throw enormous form submissions 0:11:22.098,0:11:25.036 What happens if you put control characters in your forms? 0:11:25.036,0:11:27.062 What happens if you submit the same thing over and over? 0:11:27.062,0:11:29.093 And, Koushik has a statistic that 0:11:29.093,0:11:32.033 Microsoft finds up to 20% of their bugs 0:11:32.033,0:11:34.064 using some variation of fuzz testing 0:11:34.064,0:11:36.029 and that about 25% 0:11:36.029,0:11:39.021 of the common Unix command-line programs 0:11:39.021,0:11:40.092 can be made to crash 0:11:40.092,0:11:44.018 [when] put through aggressive fuzz testing 0:11:44.018,0:11:46.089 Defining-use coverage is something that we haven’t done 0:11:46.089,0:11:48.089 but it’s another interesting concept 0:11:48.089,0:11:50.089 The idea is that at any point in my program 0:11:50.089,0:11:52.062 there’s a place where I define— 0:11:52.062,0:11:54.046 or I assign a value to some variable— 0:11:54.046,0:11:56.000 and then there’s a place downstream 0:11:56.000,0:11:57.075 where presumably I’m going to consume that value— 0:11:57.075,0:11:59.058 someone’s going to use that value 0:11:59.058,0:12:01.013 Have I covered every pair? 0:12:01.013,0:12:02.059 In other words, do I have tests where every pair 0:12:02.059,0:12:04.054 of defining a variable and using it somewhere 0:12:04.054,0:12:07.014 is executed at some part of my test suites 0:12:07.014,0:12:10.071 It’s sometimes called DU-coverage 0:12:10.071,0:12:14.011 And other terms that I think are not as widely used anymore 0:12:14.011,0:12:17.071 blackbox versus whitebox, or blackbox versus glassbox 0:12:17.071,0:12:20.025 Roughly, a blackbox test is one that is written from 0:12:20.025,0:12:22.041 the point of view of the external specification of the thing 0:12:22.041,0:12:24.022 [For example:] “This is a hash table 0:12:24.022,0:12:26.015 When I put in a key I should get back a value 0:12:26.015,0:12:28.011 If I delete the key the value shouldn’t be there” 0:12:28.011,0:12:29.099 That’s a blackbox test because it doesn’t say 0:12:29.099,0:12:32.028 anything about how the hash table is implemented 0:12:32.028,0:12:34.072 and it doesn’t try to stress the implementation 0:12:34.072,0:12:36.056 A corresponding whitebox test might be: 0:12:36.056,0:12:38.008 “I know something about the hash function 0:12:38.008,0:12:39.098 and I’m going to deliberately create 0:12:39.098,0:12:41.088 hash keys in my test cases 0:12:41.088,0:12:43.078 that cause a lot of hash collisions 0:12:43.078,0:12:45.095 to make sure that I’m testing that part of the functionality” 0:12:45.095,0:12:49.007 Now, a C0 test coverage tool, like SimpleCov 0:12:49.007,0:12:52.001 would reveal that if all you had is blackbox tests 0:12:52.001,0:12:53.028 you might find that 0:12:53.028,0:12:55.056 the collision coverage code wasn’t being hit very often 0:12:55.056,0:12:56.075 And that might tip you off and say: 0:12:56.075,0:12:58.028 “Ok, if I really want to strengthen that— 0:12:58.028,0:13:00.008 for one, if I want to boost coverage for those tests 0:13:00.008,0:13:02.006 now I have to write a whitebox or a glassbox test 0:13:02.006,0:13:04.057 I have to look inside, see what the implementation does 0:13:04.057,0:13:05.061 and find specific ways 0:13:05.061,0:13:10.060 to try to break the implementation in evil ways” 0:13:10.060,0:13:13.075 So, I think, testing is a kind of a way of life, right? 0:13:13.075,0:13:16.069 We’ve gotten away from the phase of 0:13:16.069,0:13:18.033 “We’d build the whole thing and then we’d test it” 0:13:18.033,0:13:19.092 and we’ve gotten into the phase of 0:13:19.092,0:13:20.077 “We’re testing as we go” 0:13:20.077,0:13:22.048 Testing is really more like a development tool 0:13:22.048,0:13:24.022 and like so many development tools 0:13:24.022,0:13:25.062 the effectiveness of it depends 0:13:25.062,0:13:27.013 on whether you’re using it in a tasteful manner 0:13:27.013,0:13:31.002 So, you could say: “Well, let’s see—I kicked the tires 0:13:31.002,0:13:33.048 You know, I fired up the browser, I tried a couple of things 0:13:33.048,0:13:35.097 (claps hand) Looks like it works! Deploy it!” 0:13:35.097,0:13:38.045 That’s obviously a little more cavalier than you’d want to be 0:13:38.045,0:13:41.024 And, by the way, one of the things that we discovered 0:13:41.024,0:13:43.077 with this online course just starting up 0:13:43.077,0:13:45.090 when 60,000 people are enrolled in the course 0:13:45.090,0:13:48.099 and 0.1% of those people have a problem 0:13:48.099,0:13:50.083 you’d get 60 emails 0:13:50.083,0:13:53.078 The corollary is: when your site is used by a lot of people 0:13:53.078,0:13:55.089 some stupid bug that you didn’t find 0:13:55.089,0:13:57.018 but that could have found by testing 0:13:57.018,0:13:59.080 could very quickly generate *a lot* of pain 0:13:59.080,0:14:02.023 On the other hand, you don’t want to be dogmatic and say 0:14:02.023,0:14:04.056 “Uh, until we have 100% coverage and every test is green 0:14:04.056,0:14:06.005 we absolutely will not ship” 0:14:06.005,0:14:07.012 That’s not healthy either 0:14:07.012,0:14:08.048 And the test quality 0:14:08.048,0:14:10.057 doesn’t necessarily correlate with the statement 0:14:10.057,0:14:11.064 unless you can say something 0:14:11.064,0:14:12.068 about the quality of your tests 0:14:12.068,0:14:14.029 just because you’ve executed every line 0:14:14.029,0:14:17.010 doesn’t mean that you’ve tested the interesting cases 0:14:17.010,0:14:18.068 So, somewhere in between, you could say 0:14:18.068,0:14:20.014 “Well, we’ll use coverage tools to identify 0:14:20.014,0:14:23.004 undertested or poorly-tested parts of the code 0:14:23.004,0:14:24.073 and we’ll use them as a guideline 0:14:24.073,0:14:27.011 to sort of help improve our overall confidence level” 0:14:27.011,0:14:29.024 But remember, Agile is about embracing change 0:14:29.024,0:14:30.032 and dealing with it 0:14:30.032,0:14:32.002 Part of change is things would change that will cause 0:14:32.002,0:14:33.038 bugs that you didn’t foresee 0:14:33.038,0:14:34.031 and the right reaction is: 0:14:34.031,0:14:36.026 Be comfortable enough for the testing tools 0:14:36.026,0:14:37.064 [so] that you can quickly find those bugs 0:14:37.064,0:14:39.025 Write a test that reproduces that bug 0:14:39.025,0:14:40.062 And then make the test green 0:14:40.062,0:14:41.061 Then you’ll really fix it 0:14:41.061,0:14:43.004 That means, the way that you really fix a bug is 0:14:43.004,0:14:45.049 if you created a test that correctly failed 0:14:45.049,0:14:46.088 to reproduce that bug 0:14:46.088,0:14:48.055 and then you went back and fixed the code 0:14:48.055,0:14:49.057 to make those tests pass 0:14:49.057,0:14:51.073 Similarly, you don’t want to say 0:14:51.073,0:14:53.036 “Well, unit tests give you better coverage 0:14:53.036,0:14:54.073 They’re more thorough and detailed 0:14:54.073,0:14:56.044 So let’s focus all our energy on that” 0:14:56.044,0:14:57.062 as opposed to 0:14:57.062,0:14:58.093 “Oh, focus on integration tests 0:14:58.093,0:15:00.006 because they’re more realistic, right? 0:15:00.006,0:15:01.056 They reflect what the customer said they want 0:15:01.056,0:15:03.034 So, if the integration tests are passing 0:15:03.034,0:15:05.067 by definition we’re meeting a customer need” 0:15:05.067,0:15:07.034 Again, both extremes are kind of unhealthy 0:15:07.034,0:15:09.079 because each one of these can find problems 0:15:09.079,0:15:11.031 that would be missed by the other 0:15:11.031,0:15:12.060 So, having a good combination of them 0:15:12.060,0:15:15.042 is kind of all it is all about 0:15:15.042,0:15:18.072 The last thing I want to leave you with is, I think 0:15:18.072,0:15:20.036 in terms of testing, is “TDD versus 0:15:20.036,0:15:22.005 what I call conventional debugging— 0:15:22.005,0:15:24.004 i.e., the way that we all kind of do it 0:15:24.004,0:15:25.051 even though we say we don’t” 0:15:25.051,0:15:26.064 and we’re all trying to get better, right? 0:15:26.064,0:15:27.085 We’re all kind of in the gutter 0:15:27.085,0:15:29.036 Some of us are looking up at the stars 0:15:29.036,0:15:31.011 trying to improve our practices 0:15:31.011,0:15:33.099 But, having now lived with this for 3 or 4 years myself 0:15:33.099,0:15:35.091 and—I’ll be honest—3 years ago I didn’t do TDD 0:15:35.091,0:15:37.079 I do it now, because I find that it’s better 0:15:37.079,0:15:40.081 and here’s my distillation of why I think it works for me 0:15:40.081,0:15:43.032 Sorry, the colours are a little weird 0:15:43.032,0:15:45.000 but on the left column of the table 0:15:45.000,0:15:46.034 [it] says “Conventional debugging” 0:15:46.034,0:15:47.044 and the right side says “TDD” 0:15:47.044,0:15:49.069 So what’s the way I used to write code? 0:15:49.069,0:15:51.056 Maybe some of you still do this 0:15:51.056,0:15:53.013 I write a whole bunch of lines 0:15:53.013,0:15:54.043 maybe a few tens of lines of code 0:15:54.043,0:15:55.059 I’m sure they’re right— 0:15:55.059,0:15:56.061 I mean, I am a good programmer, right? 0:15:56.061,0:15:57.099 This is not that hard 0:15:57.099,0:15:59.002 I run it – It doesn’t work 0:15:59.002,0:16:01.098 Ok, fire up the debugger – Start putting in printf’s 0:16:01.098,0:16:04.088 If I’d been using TDD what would I do instead? 0:16:04.088,0:16:08.022 Well I’d write a few lines of code, having written a test first 0:16:08.022,0:16:10.071 So as soon as the test goes from red to green 0:16:10.071,0:16:12.064 I know I wrote code that works— 0:16:12.064,0:16:15.013 or at least the parts of the behaviour that I had in mind 0:16:15.013,0:16:16.096 Those parts of the behaviour work, because I had a test 0:16:16.096,0:16:19.056 Ok, back to conventional debugging: 0:16:19.056,0:16:21.073 I’m running my program, trying to find the bugs 0:16:21.073,0:16:23.028 I start putting in printf’s everywhere 0:16:23.028,0:16:24.062 to print out the values of things 0:16:24.062,0:16:25.064 which by the way is a lot fun 0:16:25.064,0:16:26.073 when you’re trying to read them 0:16:26.073,0:16:28.014 out of the 500 lines of log output 0:16:28.014,0:16:29.035 that you’d get in a Rails app 0:16:29.035,0:16:30.087 trying to find your printf’s 0:16:30.087,0:16:32.035 you know, “I know what I’ll do— 0:16:32.035,0:16:34.008 I’ll put in 75 asterisks before and after 0:16:34.008,0:16:36.043 That will make it readable” (laughter) 0:16:36.043,0:16:38.071 Who don’t—Ok, raise your hands if you don’t do this! 0:16:38.071,0:16:40.090 Thank you for your honesty. (laughter) Ok. 0:16:40.090,0:16:43.014 Or— Or I could do the other thing, I could say: 0:16:43.014,0:16:45.030 Instead of printing the value of a variable 0:16:45.030,0:16:47.039 why don’t I write a test that inspects it 0:16:47.039,0:16:48.079 in such an expectation which should 0:16:48.079,0:16:50.090 and I’ll know immediately in bright red letters 0:16:50.090,0:16:53.033 if that expectation wasn’t met 0:16:53.033,0:16:56.005 Ok, I’m back on the conventional debugging side: 0:16:56.005,0:16:58.090 I break out the big guns: I pull out the Ruby debugger 0:16:58.092,0:17:02.044 I set a debug breakpoint, and I now start tweaking and say 0:17:02.044,0:17:04.085 “Oh, let’s see, I have to get past that ‘if’ statement 0:17:04.085,0:17:06.002 so I have to set that thing 0:17:06.002,0:17:07.063 Oh, I have to call that method and so I need to…” 0:17:07.063,0:17:08.065 No! 0:17:08.065,0:17:10.087 I could instead—if I’m going to do that anyway— 0:17:10.087,0:17:13.000 let’s just do it in a file, set up some mocks and stubs 0:17:13.000,0:17:16.045 to control the code path, make it go the way I want 0:17:16.045,0:17:19.013 And now, “Ok, for sure I’ve fixed it! 0:17:19.013,0:17:22.012 I’ll get out of the debugger, run it all again!” 0:17:22.012,0:17:24.022 And, of course, 9 times out of 10, you didn’t fix it 0:17:24.022,0:17:26.072 or you kind of partly fixed it but you didn’t completely fix it 0:17:26.072,0:17:30.040 and now I have to do all these manual things all over again 0:17:30.040,0:17:32.086 or I already have a bunch of tests 0:17:32.086,0:17:34.031 and I can just rerun them automatically 0:17:34.031,0:17:35.056 and I could, if some of them fail 0:17:35.056,0:17:36.087 “Oh, I didn’t fix the whole thing 0:17:36.087,0:17:38.040 No problem, I’ll just go back!” 0:17:38.040,0:17:39.096 So, the bottom line is that 0:17:39.096,0:17:41.095 you know, you could do it on the left side 0:17:41.095,0:17:45.004 but you’re using the same techniques in both cases 0:17:45.004,0:17:48.062 The only difference is, in one case you’re doing it manually 0:17:48.062,0:17:50.004 which is boring and error-prone 0:17:50.004,0:17:51.078 In the other case you’re doing a little more work 0:17:51.078,0:17:53.095 but you can make it automatic and repeatable 0:17:53.095,0:17:55.071 and have, you know, some high confidence 0:17:55.071,0:17:57.003 that as you change things in your code 0:17:57.003,0:17:58.092 you are not breaking stuff that used to work 0:17:58.092,0:18:00.091 and basically it’s more productive 0:18:00.091,0:18:02.047 So you’re doing all the same things 0:18:02.047,0:18:04.037 but with a, kind of, “delta” extra work 0:18:04.037,0:18:07.086 you are using your effort at a much higher leverage 0:18:07.086,0:18:10.036 So that’s kind of my view of why TDD is a good thing 0:18:10.036,0:18:11.088 It’s really, it doesn’t require new skills 0:18:11.088,0:18:15.011 It just requires [you] to refactor your existing skills 0:18:15.011,0:18:18.014 I also tried when I—again, honest confessions, right?— 0:18:18.014,0:18:19.034 when I started doing this it was like 0:18:19.034,0:18:21.049 “Ok, I gonna be teaching a course on Rails 0:18:21.049,0:18:22.065 I should really focus on testing 0:18:22.065,0:18:24.032 So I went back to some code I had written 0:18:24.032,0:18:26.087 that was working—you know, that was decent code— 0:18:26.087,0:18:29.006 and I started trying to write tests for it 0:18:29.006,0:18:31.019 and it was *so painful* 0:18:31.019,0:18:33.033 because the code wasn’t written in way that was testable 0:18:33.033,0:18:34.097 There were all kinds of interactions 0:18:34.097,0:18:36.038 There were, like, nested conditionals 0:18:36.038,0:18:38.083 And if you wanted to isolate a particular statement 0:18:38.083,0:18:41.070 and have it test—to trigger test—just that statement 0:18:41.070,0:18:44.000 the amount of stuff you’d have to set up in your test 0:18:44.000,0:18:45.009 to have it happen— 0:18:45.009,0:18:46.040 remember when talked about mock train wrecks— 0:18:46.040,0:18:48.014 you have to set up all this infrastructure 0:18:48.014,0:18:49.063 just to get one line of code 0:18:49.063,0:18:51.015 and you do that and you go 0:18:51.015,0:18:52.074 “Gawd, testing is really not worth it! 0:18:52.074,0:18:54.034 I wrote 20 lines of setup 0:18:54.034,0:18:56.059 so that I could test two lines in my function!” 0:18:56.059,0:18:58.085 What that’s really telling you—as I now realize— 0:18:58.085,0:19:00.042 is your function is bad 0:19:00.042,0:19:01.049 It’s a badly written function 0:19:01.049,0:19:02.052 It’s not a testable function 0:19:02.052,0:19:03.088 It’s got too many moving parts 0:19:03.088,0:19:06.026 whose dependencies can be broken 0:19:06.026,0:19:07.070 There’s no seams in my function 0:19:07.070,0:19:11.008 that allow me to individually test the different behaviours 0:19:11.008,0:19:12.083 And once you start doing Test First Development 0:19:12.083,0:19:15.043 because you have to write your tests in small chunks 0:19:15.043,0:19:17.053 it kind of make this problem go away 0:19:17.053,9:59:59.000 So that’s been my epiphany