WEBVTT 00:00:00.057 --> 00:00:01.092 So we spent a bunch of time 00:00:01.092 --> 00:00:03.032 in the last couple of lectures 00:00:03.032 --> 00:00:05.082 talking about different kinds of testing 00:00:05.082 --> 00:00:08.021 about unit testing versus integration testing 00:00:08.021 --> 00:00:10.010 We talked about how do you use RSpec 00:00:10.010 --> 00:00:12.049 to really isolate the parts of your code you want to test 00:00:12.049 --> 00:00:14.090 you’ve also, you know, because of homework 3, 00:00:14.090 --> 00:00:18.017 and other stuff, we have been doing BDD, 00:00:18.017 --> 00:00:20.062 where we’ve been using Cucumber to turn user stories 00:00:20.062 --> 00:00:22.095 into, essentially, integration and acceptance tests 00:00:22.095 --> 00:00:25.061 So you’ve seen testing in a couple of different levels 00:00:25.061 --> 00:00:27.063 and the goal here is sort of to do a few remarks 00:00:27.063 --> 00:00:29.092 to, you know, let’s back up a little bit 00:00:29.092 --> 00:00:33.001 and see the big picture, and tie those things together 00:00:33.001 --> 00:00:34.095 So this sort of spans material 00:00:34.095 --> 00:00:37.000 that covers three or four sections in the book 00:00:37.000 --> 00:00:39.061 and I want to just hit the high points in lecture 00:00:39.061 --> 00:00:41.046 So a question that comes up 00:00:41.046 --> 00:00:43.025 I’m sure it’s come up for all of you 00:00:43.025 --> 00:00:44.052 as you have been doing homework 00:00:44.052 --> 00:00:45.069 is: “How much testing is enough?” 00:00:45.069 --> 00:00:48.049 And, sadly, for a long time 00:00:48.049 --> 00:00:51.009 kind of if you asked this question in industry 00:00:51.009 --> 00:00:52.017 the answer was basically 00:00:52.017 --> 00:00:53.017 “Well, we have a shipping deadline, 00:00:53.017 --> 00:00:54.099 so however much testing we can do 00:00:54.099 --> 00:00:56.066 before that deadline, that’s how much.” 00:00:56.066 --> 00:00:58.015 That’s what you have time for. 00:00:58.015 --> 00:01:00.002 So, you know, that’s a little flip 00:01:00.002 --> 00:01:01.011 obviously not very good 00:01:01.011 --> 00:01:02.054 So you can do a bit better, right? 00:01:02.054 --> 00:01:03.070 There’re some static measures 00:01:03.070 --> 00:01:06.003 like how many lines of code does your app have 00:01:06.003 --> 00:01:08.021 and how many lines of tests do you have? 00:01:08.021 --> 00:01:10.029 And it’s not unusual in industry 00:01:10.029 --> 00:01:12.068 in a well-tested piece of software 00:01:12.068 --> 00:01:14.057 for the number of lines of tests 00:01:14.057 --> 00:01:17.073 to go far beyond the number of lines of code 00:01:17.073 --> 00:01:19.075 So, integer multiples are not unusual 00:01:19.075 --> 00:01:21.084 And I think even for sort of, you know, 00:01:21.084 --> 00:01:23.022 research code or classwork 00:01:23.022 --> 00:01:26.085 a ratio of, you know, maybe 1.5 is not unreasonable 00:01:26.085 --> 00:01:30.005 so one and a half times the amount of test code 00:01:30.005 --> 00:01:32.024 as you have application code 00:01:32.024 --> 00:01:34.022 And in a lot of production systems 00:01:34.022 --> 00:01:35.027 where they really care about testing 00:01:35.027 --> 00:01:36.091 it is much higher than that 00:01:36.091 --> 00:01:38.015 So maybe a better question to ask: 00:01:38.015 --> 00:01:39.047 Rather than saying “How much testing is enough?” 00:01:39.047 --> 00:01:42.049 is to ask “How good is the testing I am doing now? 00:01:42.049 --> 00:01:44.035 How thorough is it?” 00:01:44.035 --> 00:01:45.056 Later in this semester 00:01:45.056 --> 00:01:46.056 Professor Sen will talk about 00:01:46.056 --> 00:01:48.018 a little bit about formal methods 00:01:48.018 --> 00:01:50.085 and sort of what’s at the frontiers of testing and debugging 00:01:50.085 --> 00:01:52.068 But a couple of things that we can talk about 00:01:52.068 --> 00:01:54.007 based on what you already know 00:01:54.007 --> 00:01:57.074 is some basic concepts about test coverage 00:01:57.074 --> 00:01:59.054 And although I would say 00:01:59.054 --> 00:02:01.001 you know, we’ve been saying all along 00:02:01.001 --> 00:02:03.003 formal methods, they don’t really work on big systems 00:02:03.003 --> 00:02:05.033 I think that statement, in my personal opinion 00:02:05.033 --> 00:02:07.001 is actually a lot less true than it used to be 00:02:07.001 --> 00:02:09.019 I think there are a number of specific places 00:02:09.019 --> 00:02:10.052 especially in testing and debugging 00:02:10.052 --> 00:02:12.084 where formal methods are actually making fast progress 00:02:12.084 --> 00:02:15.075 and Koushik Sen is one of the leaders in that 00:02:15.075 --> 00:02:17.094 So you’ll have the opportunity to hear more about that later 00:02:17.094 --> 00:02:21.043 but for the moment I think, kind of bread and butter 00:02:21.043 --> 00:02:22.073 is let’s talk about coverage measurement 00:02:22.073 --> 00:02:24.047 because this is where the rubber meets the road 00:02:24.047 --> 00:02:26.020 in terms of how you’d be evaluated 00:02:26.020 --> 00:02:28.063 if you are doing this for real 00:02:28.063 --> 00:02:29.052 So what’s some basics? 00:02:29.052 --> 00:02:30.078 Here’s a really simple class you can use 00:02:30.078 --> 00:02:32.090 to talk about different ways to measure 00:02:32.090 --> 00:02:34.080 how our test covers this code 00:02:34.080 --> 00:02:36.063 And there’re a few different levels 00:02:36.063 --> 00:02:37.085 with different terminologies 00:02:37.085 --> 00:02:40.073 It’s not really universal across all software houses 00:02:40.073 --> 00:02:42.064 But one common set of terminology 00:02:42.064 --> 00:02:43.064 that the book exposes 00:02:43.064 --> 00:02:44.068 is we could talk about S0 00:02:44.068 --> 00:02:47.045 where we’d just mean you’ve called every method once 00:02:47.045 --> 00:02:50.045 So you know, if you call foo, and you call bar, you’re done 00:02:50.045 --> 00:02:52.015 That’s S0 coverage: not terribly thorough 00:02:52.015 --> 00:02:54.068 A little more stringent, S1, is 00:02:54.068 --> 00:02:56.013 you could say, we’re calling every method 00:02:56.013 --> 00:02:57.028 from every place that it could be called 00:02:57.028 --> 00:02:58.082 So what does that mean? 00:02:58.082 --> 00:03:00.007 It means, for example 00:03:00.007 --> 00:03:01.012 it’s not enough to call bar 00:03:01.012 --> 00:03:02.095 You have to make sure that you have to call it 00:03:02.095 --> 00:03:05.057 at least once from in here 00:03:05.057 --> 00:03:07.016 as well as calling it once 00:03:07.016 --> 00:03:10.037 from any exterior function that might call it 00:03:10.037 --> 00:03:12.081 C0 which is what SimpleCov measures 00:03:12.081 --> 00:03:15.099 (those of you who’ve gotten SimpleCov up and running) 00:03:15.099 --> 00:03:18.052 basically says you’ve executed every statement 00:03:18.052 --> 00:03:20.004 you’ve touched every statement in your code once 00:03:20.004 --> 00:03:22.048 But the caveat there is that 00:03:22.048 --> 00:03:25.058 conditionals really just count as a single statement 00:03:25.058 --> 00:03:28.091 So, if you, no matter which branch of this “if” you took 00:03:28.091 --> 00:03:31.074 as long as you touched one of the other branch 00:03:31.074 --> 00:03:33.035 you’ve executed the “if’ statement 00:03:33.035 --> 00:03:35.066 So even C0 is still, you know, sort of superficial coverage 00:03:35.066 --> 00:03:37.026 But, as we will see 00:03:37.026 --> 00:03:39.023 the way that you will want to read this information is: 00:03:39.023 --> 00:03:41.079 if you are getting bad coverage at the C0 level 00:03:41.079 --> 00:03:44.007 then you have really really bad coverage 00:03:44.007 --> 00:03:46.008 So if you are not kind of making 00:03:46.008 --> 00:03:47.037 this simple level of superficial coverage 00:03:47.037 --> 00:03:50.002 then your testing is probably deficient 00:03:50.002 --> 00:03:51.091 C1 is the next step up from that 00:03:51.091 --> 00:03:53.071 We could say: 00:03:53.071 --> 00:03:55.019 Well, we have to take every branch in both directions 00:03:55.019 --> 00:03:56.061 So, when we are doing this “if” statement 00:03:56.061 --> 00:03:58.066 we have to make sure that 00:03:58.066 --> 00:03:59.092 we do the “if x” part once 00:03:59.092 --> 00:04:05.013 and the “if not x” part at least once to meet C1 00:04:05.013 --> 00:04:08.036 You can augment that with decision coverage 00:04:08.036 --> 00:04:09.063 saying: Well, if we’re gonna… 00:04:09.063 --> 00:04:12.036 If we have “if” statments where the condition 00:04:12.036 --> 00:04:13.088 is made up of multiple terms 00:04:13.088 --> 00:04:15.071 we have to make sure that every subexpression 00:04:15.071 --> 00:04:17.097 has been evaluated both directions 00:04:17.097 --> 00:04:19.067 In other words, that means that 00:04:19.067 --> 00:04:22.041 if we’re going to fail this “if” statement 00:04:22.041 --> 00:04:24.034 we have to make sure to fail it at least once 00:04:24.034 --> 00:04:26.044 because y was false in at least once because z was false 00:04:26.044 --> 00:04:28.088 In other words, any subexpression that could 00:04:28.088 --> 00:04:31.021 independently change the outcome of the condition 00:04:31.021 --> 00:04:34.048 has to be exercised in both directions 00:04:34.048 --> 00:04:36.003 And then, 00:04:36.003 --> 00:04:38.052 kind of, the one that, you know, a lot of people aspire to 00:04:38.052 --> 00:04:41.026 but there is disagreement on how much more valuable it is 00:04:41.026 --> 00:04:42.083 is you take every path through the code 00:04:42.083 --> 00:04:45.053 Obviously, this is kind of difficult because 00:04:45.053 --> 00:04:48.033 it tends to be exponential in the number of conditions 00:04:48.033 --> 00:04:53.008 And in general it’s difficult 00:04:53.008 --> 00:04:55.031 to evaluate if you’ve taken every path through the code 00:04:55.031 --> 00:04:57.001 There are formal techniques that you can use 00:04:57.001 --> 00:04:58.083 to tell you where the holes are 00:04:58.083 --> 00:05:01.031 but the bottom line is that 00:05:01.031 --> 00:05:03.004 in most commercial software houses 00:05:03.004 --> 00:05:04.089 there is, I would say, not complete consensus 00:05:04.089 --> 00:05:06.070 on how much more valuable C2 is 00:05:06.070 --> 00:05:08.068 compared to C0 or C1 00:05:08.068 --> 00:05:10.013 So, I think, for the purpose of our class 00:05:10.013 --> 00:05:11.067 you get exposed to the idea 00:05:11.067 --> 00:05:13.020 of how you use coverage information 00:05:13.020 --> 00:05:16.040 SimpleCov takes advantage of some built-in Ruby features 00:05:16.040 --> 00:05:18.009 to give you C0 coverage 00:05:18.009 --> 00:05:19.062 [It] does really nice reports 00:05:19.062 --> 00:05:21.025 We can sort of see it 00:05:21.025 --> 00:05:22.096 at the level of individual lines in your file 00:05:22.096 --> 00:05:24.091 You can see what your coverage is 00:05:24.091 --> 00:05:27.015 and I think that’s kind of a, you know 00:05:27.015 --> 00:05:31.018 a good start for where we are 00:05:31.018 --> 00:05:33.076 So, having see a sort of different flavours of tests 00:05:33.076 --> 00:05:37.020 Stepping back and looking back at the big picture 00:05:37.020 --> 00:05:38.098 what are the different kind of tests 00:05:38.098 --> 00:05:40.078 that we’ve seen concretely? 00:05:40.078 --> 00:05:42.032 and what are the tradeoffs 00:05:42.032 --> 00:05:43.089 between using those different kinds of tests? 00:05:43.089 --> 00:05:47.016 So we’ve seen at the level of individual classes or methods 00:05:47.016 --> 00:05:50.009 we use RSpec, with extensive use of mocking and stubbing 00:05:50.009 --> 00:05:53.004 So, for example when we do testing methods in the model 00:05:53.004 --> 00:05:55.057 that will be an example of unit testing 00:05:55.057 --> 00:05:59.025 We also did something that is pretty similar to 00:05:59.025 --> 00:06:00.097 functional or module testing 00:06:00.097 --> 00:06:02.071 where there is more than one module participating 00:06:02.071 --> 00:06:04.065 So, for example when we did controller specs 00:06:04.065 --> 00:06:07.085 we saw that—we simulate a POST action 00:06:07.085 --> 00:06:09.029 but remember that the POST action 00:06:09.029 --> 00:06:10.086 has to go through the routing subsystem 00:06:10.086 --> 00:06:12.042 before it gets to the controller 00:06:12.042 --> 00:06:14.048 Once the controller is done it will try to render a view 00:06:14.048 --> 00:06:16.007 So in fact there’s other pieces 00:06:16.007 --> 00:06:17.067 that collaborate with the controller 00:06:17.067 --> 00:06:19.099 that have to be working in order for controller specs to pass 00:06:19.099 --> 00:06:21.051 So that’s somewhere inbetween: 00:06:21.051 --> 00:06:23.035 where we’re doing more than a single method 00:06:23.035 --> 00:06:25.000 touching more than a single class 00:06:25.000 --> 00:06:27.000 but we’re still concentrating [our] attention 00:06:27.000 --> 00:06:28.088 on a fairly narrow slice of the system at a time 00:06:28.088 --> 00:06:31.044 and we’re still using mocking and stubbing extensively 00:06:31.044 --> 00:06:35.030 to sort of isolate that behaviour that we want to test 00:06:35.030 --> 00:06:36.091 And then at the level of Cucumber scenarios 00:06:36.091 --> 00:06:38.047 these are more like integration or system tests 00:06:38.047 --> 00:06:41.069 They exercise complete paths throughout the application 00:06:41.069 --> 00:06:43.044 They probably touch a lot of different modules 00:06:43.044 --> 00:06:46.003 They make minimal use of mocks and stubs 00:06:46.003 --> 00:06:48.032 because part of the goal of an integration test 00:06:48.032 --> 00:06:50.099 is exactly to test the interaction between pieces 00:06:50.099 --> 00:06:53.021 So you don’t want to stub or control those interactions 00:06:53.021 --> 00:06:54.080 You actually want to let the system do 00:06:54.080 --> 00:06:56.030 what it would really do 00:06:56.030 --> 00:06:58.025 if this was a scenario happening in production 00:06:58.025 --> 00:07:00.069 So how would we compare these different kinds of tests? 00:07:00.069 --> 00:07:02.038 There’s a few different axes we can look at 00:07:02.038 --> 00:07:05.007 One of them is how long they take to run 00:07:05.007 --> 00:07:06.090 Now, both RSpec and Cucumber 00:07:06.090 --> 00:07:09.013 have, kind of, high startup times and stuff like that 00:07:09.013 --> 00:07:10.008 But, as you’ll see 00:07:10.008 --> 00:07:11.090 as you start adding more and more RSpec tests 00:07:11.090 --> 00:07:14.038 and using autotest to run them in the background 00:07:14.038 --> 00:07:17.088 by and large, once RSpec kind of gets off the launching pad 00:07:17.088 --> 00:07:19.092 it runs specs really fast 00:07:19.092 --> 00:07:21.095 whereas running Cucumber features just takes a long time 00:07:21.095 --> 00:07:24.059 as it essentially fires up your entire application 00:07:24.059 --> 00:07:26.010 And later in this semester 00:07:26.010 --> 00:07:28.086 we’ll see a way to make Cucumber even slower— 00:07:28.086 --> 00:07:30.070 which is to have it fire up an entire browser 00:07:30.070 --> 00:07:33.045 basically act like a puppet, remote-controlling Firefox 00:07:33.045 --> 00:07:35.083 so you can test Javascript code 00:07:35.083 --> 00:07:37.000 We’ll do that when we actually— 00:07:37.000 --> 00:07:40.032 I think we’ll be able to work with our friends at SourceLabs 00:07:40.032 --> 00:07:42.080 so you can do that in the cloud—That will be exciting 00:07:42.080 --> 00:07:45.083 So, “run fast” versus “run slow” 00:07:45.083 --> 00:07:46.068 Resolution: 00:07:46.068 --> 00:07:48.025 If an error happens in your unit tests 00:07:48.025 --> 00:07:49.075 it’s usually pretty easy 00:07:49.075 --> 00:07:52.029 to figure out and track down what the source of that error is 00:07:52.029 --> 00:07:53.071 because the tests are so isolated 00:07:53.071 --> 00:07:56.025 You’ve stubbed out everything that doesn’t matter 00:07:56.025 --> 00:07:58.025 and you’re focusing on only the behaviour of interest 00:07:58.025 --> 00:07:59.076 So, if you’ve done a good job of doing that 00:07:59.076 --> 00:08:01.097 when something goes wrong in one of your tests 00:08:01.097 --> 00:08:03.045 there’s not a lot of places 00:08:03.045 --> 00:08:04.088 that something could have gone wrong 00:08:04.088 --> 00:08:07.041 In contrast, if you’re running a Cucumber scenario 00:08:07.041 --> 00:08:08.089 that’s got, you know, 10 steps 00:08:08.089 --> 00:08:10.031 and every step is touching 00:08:10.031 --> 00:08:11.061 a whole bunch of pieces of the app 00:08:11.061 --> 00:08:12.091 it could take a long time 00:08:12.091 --> 00:08:14.076 to actually get to the bottom of a bug 00:08:14.076 --> 00:08:16.014 So it is kind of a tradeoff 00:08:16.014 --> 00:08:17.054 between how well can you localize errors 00:08:17.054 --> 00:08:20.065 Coverage: 00:08:20.065 --> 00:08:23.002 It’s possible if you write a good suite 00:08:23.002 --> 00:08:24.072 of unit and functional tests 00:08:24.072 --> 00:08:26.020 you can get really high coverage 00:08:26.020 --> 00:08:27.085 You can run your SimpleCov report 00:08:27.085 --> 00:08:30.080 and you can actually identify specific lines in your files 00:08:30.080 --> 00:08:32.036 that have not been exercised by any test 00:08:32.036 --> 00:08:34.016 and then you can go right at tests that cover them 00:08:34.016 --> 00:08:36.014 So, figuring out how to improve your coverage 00:08:36.014 --> 00:08:37.057 for example at the C0 level 00:08:37.057 --> 00:08:40.021 is something much more easily done with unit tests 00:08:40.021 --> 00:08:42.018 whereas, with a Cucumber test— 00:08:42.018 --> 00:08:43.078 with a Cucumber scenario— 00:08:43.078 --> 00:08:45.076 you are touching a lot of parts of the code 00:08:45.076 --> 00:08:47.080 but you are doing it very sparsely 00:08:47.080 --> 00:08:49.038 So, if your goal is to get your coverage up 00:08:49.038 --> 00:08:51.031 use the tools at that are at the unit levels 00:08:51.031 --> 00:08:53.007 so that you can focusing on understanding 00:08:53.007 --> 00:08:54.074 what parts or my code are undertested 00:08:54.074 --> 00:08:56.055 and then you can write very targeted tests 00:08:56.055 --> 00:08:58.086 just to focus on them 00:08:58.086 --> 00:09:01.043 And, sort of, you know, putting those pieces together 00:09:01.043 --> 00:09:03.039 the unit tests 00:09:03.039 --> 00:09:05.059 because of their isolation and their fine resolution 00:09:05.059 --> 00:09:07.039 tend to use a lot of mocks 00:09:07.039 --> 00:09:09.012 to isolate the behaviours you don’t care about 00:09:09.012 --> 00:09:11.020 But that means that, by definition 00:09:11.020 --> 00:09:12.070 you’re not testing the interfaces 00:09:12.070 --> 00:09:14.099 and it’s sort of a “received wisdom” in software 00:09:14.099 --> 00:09:16.069 that a lot of the interesting bugs 00:09:16.069 --> 00:09:18.076 occur at the interfaces between pieces 00:09:18.076 --> 00:09:20.078 and not sort of within a class or within a method— 00:09:20.078 --> 00:09:22.040 those are sort of the easy bugs to track down 00:09:22.040 --> 00:09:24.026 And at the other extreme 00:09:24.026 --> 00:09:26.081 the more you get towards the integration testing extreme 00:09:26.081 --> 00:09:29.072 you’re supposed to rely less and less on mocks 00:09:29.072 --> 00:09:30.090 for that exact reason 00:09:30.090 --> 00:09:32.066 Now we saw, if you’re testing something like 00:09:32.066 --> 00:09:34.015 say, in a service-oriented architecture 00:09:34.015 --> 00:09:35.089 where you have to interact with the remote site 00:09:35.089 --> 00:09:37.028 you still end up 00:09:37.028 --> 00:09:38.094 having to do a fair amount of mocking and stubbing 00:09:38.094 --> 00:09:40.028 so that you don’t rely on the Internet 00:09:40.028 --> 00:09:41.067 in order for your tests to pass 00:09:41.067 --> 00:09:43.006 but, generally speaking 00:09:43.006 --> 00:09:47.014 you’re trying to remove as many of the mocks that you can 00:09:47.014 --> 00:09:48.095 and let the system run the way it would run in real life 00:09:48.095 --> 00:09:52.070 So, the good news is you are testing the interfaces 00:09:52.070 --> 00:09:54.074 but when something goes wrong in one of the interfaces 00:09:54.074 --> 00:09:57.053 because your resolution is not as good 00:09:57.053 --> 00:10:00.031 it may take longer to figure out what it is 00:10:00.031 --> 00:10:05.019 So, what’s sort of the high-order bit from this tradeoff 00:10:05.019 --> 00:10:07.024 is you don’t really want to rely 00:10:07.024 --> 00:10:08.076 too heavily on any one kind of test 00:10:08.076 --> 00:10:10.078 They serve different purposes and, depending on 00:10:10.078 --> 00:10:13.043 are you trying to exercise your interfaces more 00:10:13.043 --> 00:10:15.089 or are you trying to improve your fine-grain coverage 00:10:15.089 --> 00:10:18.003 that affects how you develop your test suite 00:10:18.003 --> 00:10:20.065 and you’ll evolve it along with your software 00:10:20.065 --> 00:10:24.014 So, we’ve used a certain set of terminology in testing 00:10:24.014 --> 00:10:26.028 It’s the terminology that, by and large 00:10:26.028 --> 00:10:29.001 is most commonly used in the Rails community 00:10:29.001 --> 00:10:30.060 but there’s some variation 00:10:30.060 --> 00:10:33.069 [and] some other terms that you might hear 00:10:33.069 --> 00:10:35.018 if you go get a job somewhere 00:10:35.018 --> 00:10:36.093 and you hear about mutation testing 00:10:36.093 --> 00:10:38.072 which we haven’t done 00:10:38.072 --> 00:10:40.024 This is an interesting idea that was, I think, invented by 00:10:40.024 --> 00:10:43.037 Ammann and Offutt, who have, sort of 00:10:43.037 --> 00:10:44.093 the definitive book on software testing 00:10:44.093 --> 00:10:46.048 The idea is: 00:10:46.048 --> 00:10:48.000 Suppose I introduced a deliberate bug into my code 00:10:48.000 --> 00:10:49.051 does that force some test to fail? 00:10:49.051 --> 00:10:53.003 Because, if I changed, you know, “if x” to “if not x” 00:10:53.003 --> 00:10:56.010 and no tests fail, then either I’m missing some coverage 00:10:56.010 --> 00:10:59.019 or my app is very strange and somehow nondeterministic 00:10:59.019 --> 00:11:03.099 Fuzz testing, which Koushik Sen may talk more about 00:11:03.099 --> 00:11:07.085 basically, this is the “10,000 monkeys at typewriters 00:11:07.085 --> 00:11:09.024 throwing random input at your code” 00:11:09.024 --> 00:11:10.037 What’s interesting about it is that 00:11:10.037 --> 00:11:11.065 those tests we’ve been doing 00:11:11.065 --> 00:11:13.086 essentially are crafted to test the app 00:11:13.086 --> 00:11:15.058 the way it was designed 00:11:15.058 --> 00:11:16.088 and these, you know, fuzz testing 00:11:16.088 --> 00:11:19.064 is about testing the app in ways it wasn’t meant to be used 00:11:19.064 --> 00:11:22.098 So, what happens if you throw enormous form submissions 00:11:22.098 --> 00:11:25.036 What happens if you put control characters in your forms? 00:11:25.036 --> 00:11:27.062 What happens if you submit the same thing over and over? 00:11:27.062 --> 00:11:29.093 And, Koushik has a statistic that 00:11:29.093 --> 00:11:32.033 Microsoft finds up to 20% of their bugs 00:11:32.033 --> 00:11:34.064 using some variation of fuzz testing 00:11:34.064 --> 00:11:36.029 and that about 25% 00:11:36.029 --> 00:11:39.021 of the common Unix command-line programs 00:11:39.021 --> 00:11:40.092 can be made to crash 00:11:40.092 --> 00:11:44.018 [when] put through aggressive fuzz testing 00:11:44.018 --> 00:11:46.089 Defining-use coverage is something that we haven’t done 00:11:46.089 --> 00:11:48.089 but it’s another interesting concept 00:11:48.089 --> 00:11:50.089 The idea is that at any point in my program 00:11:50.089 --> 00:11:52.062 there’s a place where I define— 00:11:52.062 --> 00:11:54.046 or I assign a value to some variable— 00:11:54.046 --> 00:11:56.000 and then there’s a place downstream 00:11:56.000 --> 00:11:57.075 where presumably I’m going to consume that value— 00:11:57.075 --> 00:11:59.058 someone’s going to use that value 00:11:59.058 --> 00:12:01.013 Have I covered every pair? 00:12:01.013 --> 00:12:02.059 In other words, do I have tests where every pair 00:12:02.059 --> 00:12:04.054 of defining a variable and using it somewhere 00:12:04.054 --> 00:12:07.014 is executed at some part of my test suites 00:12:07.014 --> 00:12:10.071 It’s sometimes called DU-coverage 00:12:10.071 --> 00:12:14.011 And other terms that I think are not as widely used anymore 00:12:14.011 --> 00:12:17.071 blackbox versus whitebox, or blackbox versus glassbox 00:12:17.071 --> 00:12:20.025 Roughly, a blackbox test is one that is written from 00:12:20.025 --> 00:12:22.041 the point of view of the external specification of the thing 00:12:22.041 --> 00:12:24.022 [For example:] “This is a hash table 00:12:24.022 --> 00:12:26.015 When I put in a key I should get back a value 00:12:26.015 --> 00:12:28.011 If I delete the key the value shouldn’t be there” 00:12:28.011 --> 00:12:29.099 That’s a blackbox test because it doesn’t say 00:12:29.099 --> 00:12:32.028 anything about how the hash table is implemented 00:12:32.028 --> 00:12:34.072 and it doesn’t try to stress the implementation 00:12:34.072 --> 00:12:36.056 A corresponding whitebox test might be: 00:12:36.056 --> 00:12:38.008 “I know something about the hash function 00:12:38.008 --> 00:12:39.098 and I’m going to deliberately create 00:12:39.098 --> 00:12:41.088 hash keys in my test cases 00:12:41.088 --> 00:12:43.078 that cause a lot of hash collisions 00:12:43.078 --> 00:12:45.095 to make sure that I’m testing that part of the functionality” 00:12:45.095 --> 00:12:49.007 Now, a C0 test coverage tool, like SimpleCov 00:12:49.007 --> 00:12:52.001 would reveal that if all you had is blackbox tests 00:12:52.001 --> 00:12:53.028 you might find that 00:12:53.028 --> 00:12:55.056 the collision coverage code wasn’t being hit very often 00:12:55.056 --> 00:12:56.075 And that might tip you off and say: 00:12:56.075 --> 00:12:58.028 “Ok, if I really want to strengthen that— 00:12:58.028 --> 00:13:00.008 for one, if I want to boost coverage for those tests 00:13:00.008 --> 00:13:02.006 now I have to write a whitebox or a glassbox test 00:13:02.006 --> 00:13:04.057 I have to look inside, see what the implementation does 00:13:04.057 --> 00:13:05.061 and find specific ways 00:13:05.061 --> 00:13:10.060 to try to break the implementation in evil ways” 00:13:10.060 --> 00:13:13.075 So, I think, testing is a kind of a way of life, right? 00:13:13.075 --> 00:13:16.069 We’ve gotten away from the phase of 00:13:16.069 --> 00:13:18.033 “We’d build the whole thing and then we’d test it” 00:13:18.033 --> 00:13:19.092 and we’ve gotten into the phase of 00:13:19.092 --> 00:13:20.077 “We’re testing as we go” 00:13:20.077 --> 00:13:22.048 Testing is really more like a development tool 00:13:22.048 --> 00:13:24.022 and like so many development tools 00:13:24.022 --> 00:13:25.062 the effectiveness of it depends 00:13:25.062 --> 00:13:27.013 on whether you’re using it in a tasteful manner 00:13:27.013 --> 00:13:31.002 So, you could say: “Well, let’s see—I kicked the tires 00:13:31.002 --> 00:13:33.048 You know, I fired up the browser, I tried a couple of things 00:13:33.048 --> 00:13:35.097 (claps hand) Looks like it works! Deploy it!” 00:13:35.097 --> 00:13:38.045 That’s obviously a little more cavalier than you’d want to be 00:13:38.045 --> 00:13:41.024 And, by the way, one of the things that we discovered 00:13:41.024 --> 00:13:43.077 with this online course just starting up 00:13:43.077 --> 00:13:45.090 when 60,000 people are enrolled in the course 00:13:45.090 --> 00:13:48.099 and 0.1% of those people have a problem 00:13:48.099 --> 00:13:50.083 you’d get 60 emails 00:13:50.083 --> 00:13:53.078 The corollary is: when your site is used by a lot of people 00:13:53.078 --> 00:13:55.089 some stupid bug that you didn’t find 00:13:55.089 --> 00:13:57.018 but that could have found by testing 00:13:57.018 --> 00:13:59.080 could very quickly generate *a lot* of pain 00:13:59.080 --> 00:14:02.023 On the other hand, you don’t want to be dogmatic and say 00:14:02.023 --> 00:14:04.056 “Uh, until we have 100% coverage and every test is green 00:14:04.056 --> 00:14:06.005 we absolutely will not ship” 00:14:06.005 --> 00:14:07.012 That’s not healthy either 00:14:07.012 --> 00:14:08.048 And the test quality 00:14:08.048 --> 00:14:10.057 doesn’t necessarily correlate with the statement 00:14:10.057 --> 00:14:11.064 unless you can say something 00:14:11.064 --> 00:14:12.068 about the quality of your tests 00:14:12.068 --> 00:14:14.029 just because you’ve executed every line 00:14:14.029 --> 00:14:17.010 doesn’t mean that you’ve tested the interesting cases 00:14:17.010 --> 00:14:18.068 So, somewhere in between, you could say 00:14:18.068 --> 00:14:20.014 “Well, we’ll use coverage tools to identify 00:14:20.014 --> 00:14:23.004 undertested or poorly-tested parts of the code 00:14:23.004 --> 00:14:24.073 and we’ll use them as a guideline 00:14:24.073 --> 00:14:27.011 to sort of help improve our overall confidence level” 00:14:27.011 --> 00:14:29.024 But remember, Agile is about embracing change 00:14:29.024 --> 00:14:30.032 and dealing with it 00:14:30.032 --> 00:14:32.002 Part of change is things would change that will cause 00:14:32.002 --> 00:14:33.038 bugs that you didn’t foresee 00:14:33.038 --> 00:14:34.031 and the right reaction is: 00:14:34.031 --> 00:14:36.026 Be comfortable enough for the testing tools 00:14:36.026 --> 00:14:37.064 [so] that you can quickly find those bugs 00:14:37.064 --> 00:14:39.025 Write a test that reproduces that bug 00:14:39.025 --> 00:14:40.062 And then make the test green 00:14:40.062 --> 00:14:41.061 Then you’ll really fix it 00:14:41.061 --> 00:14:43.004 That means, the way that you really fix a bug is 00:14:43.004 --> 00:14:45.049 if you created a test that correctly failed 00:14:45.049 --> 00:14:46.088 to reproduce that bug 00:14:46.088 --> 00:14:48.055 and then you went back and fixed the code 00:14:48.055 --> 00:14:49.057 to make those tests pass 00:14:49.057 --> 00:14:51.073 Similarly, you don’t want to say 00:14:51.073 --> 00:14:53.036 “Well, unit tests give you better coverage 00:14:53.036 --> 00:14:54.073 They’re more thorough and detailed 00:14:54.073 --> 00:14:56.044 So let’s focus all our energy on that” 00:14:56.044 --> 00:14:57.062 as opposed to 00:14:57.062 --> 00:14:58.093 “Oh, focus on integration tests 00:14:58.093 --> 00:15:00.006 because they’re more realistic, right? 00:15:00.006 --> 00:15:01.056 They reflect what the customer said they want 00:15:01.056 --> 00:15:03.034 So, if the integration tests are passing 00:15:03.034 --> 00:15:05.067 by definition we’re meeting a customer need” 00:15:05.067 --> 00:15:07.034 Again, both extremes are kind of unhealthy 00:15:07.034 --> 00:15:09.079 because each one of these can find problems 00:15:09.079 --> 00:15:11.031 that would be missed by the other 00:15:11.031 --> 00:15:12.060 So, having a good combination of them 00:15:12.060 --> 00:15:15.042 is kind of all it is all about 00:15:15.042 --> 00:15:18.072 The last thing I want to leave you with is, I think 00:15:18.072 --> 00:15:20.036 in terms of testing, is “TDD versus 00:15:20.036 --> 00:15:22.005 what I call conventional debugging— 00:15:22.005 --> 00:15:24.004 i.e., the way that we all kind of do it 00:15:24.004 --> 00:15:25.051 even though we say we don’t” 00:15:25.051 --> 00:15:26.064 and we’re all trying to get better, right? 00:15:26.064 --> 00:15:27.085 We’re all kind of in the gutter 00:15:27.085 --> 00:15:29.036 Some of us are looking up at the stars 00:15:29.036 --> 00:15:31.011 trying to improve our practices 00:15:31.011 --> 00:15:33.099 But, having now lived with this for 3 or 4 years myself 00:15:33.099 --> 00:15:35.091 and—I’ll be honest—3 years ago I didn’t do TDD 00:15:35.091 --> 00:15:37.079 I do it now, because I find that it’s better 00:15:37.079 --> 00:15:40.081 and here’s my distillation of why I think it works for me 00:15:40.081 --> 00:15:43.032 Sorry, the colours are a little weird 00:15:43.032 --> 00:15:45.000 but on the left column of the table 00:15:45.000 --> 00:15:46.034 [it] says “Conventional debugging” 00:15:46.034 --> 00:15:47.044 and the right side says “TDD” 00:15:47.044 --> 00:15:49.069 So what’s the way I used to write code? 00:15:49.069 --> 00:15:51.056 Maybe some of you still do this 00:15:51.056 --> 00:15:53.013 I write a whole bunch of lines 00:15:53.013 --> 00:15:54.043 maybe a few tens of lines of code 00:15:54.043 --> 00:15:55.059 I’m sure they’re right— 00:15:55.059 --> 00:15:56.061 I mean, I am a good programmer, right? 00:15:56.061 --> 00:15:57.099 This is not that hard 00:15:57.099 --> 00:15:59.002 I run it – It doesn’t work 00:15:59.002 --> 00:16:01.098 Ok, fire up the debugger – Start putting in printf’s 00:16:01.098 --> 00:16:04.088 If I’d been using TDD what would I do instead? 00:16:04.088 --> 00:16:08.022 Well I’d write a few lines of code, having written a test first 00:16:08.022 --> 00:16:10.071 So as soon as the test goes from red to green 00:16:10.071 --> 00:16:12.064 I know I wrote code that works— 00:16:12.064 --> 00:16:15.013 or at least the parts of the behaviour that I had in mind 00:16:15.013 --> 00:16:16.096 Those parts of the behaviour work, because I had a test 00:16:16.096 --> 00:16:19.056 Ok, back to conventional debugging: 00:16:19.056 --> 00:16:21.073 I’m running my program, trying to find the bugs 00:16:21.073 --> 00:16:23.028 I start putting in printf’s everywhere 00:16:23.028 --> 00:16:24.062 to print out the values of things 00:16:24.062 --> 00:16:25.064 which by the way is a lot fun 00:16:25.064 --> 00:16:26.073 when you’re trying to read them 00:16:26.073 --> 00:16:28.014 out of the 500 lines of log output 00:16:28.014 --> 00:16:29.035 that you’d get in a Rails app 00:16:29.035 --> 00:16:30.087 trying to find your printf’s 00:16:30.087 --> 00:16:32.035 you know, “I know what I’ll do— 00:16:32.035 --> 00:16:34.008 I’ll put in 75 asterisks before and after 00:16:34.008 --> 00:16:36.043 That will make it readable” (laughter) 00:16:36.043 --> 00:16:38.071 Who don’t—Ok, raise your hands if you don’t do this! 00:16:38.071 --> 00:16:40.090 Thank you for your honesty. (laughter) Ok. 00:16:40.090 --> 00:16:43.014 Or— Or I could do the other thing, I could say: 00:16:43.014 --> 00:16:45.030 Instead of printing the value of a variable 00:16:45.030 --> 00:16:47.039 why don’t I write a test that inspects it 00:16:47.039 --> 00:16:48.079 in such an expectation which should 00:16:48.079 --> 00:16:50.090 and I’ll know immediately in bright red letters 00:16:50.090 --> 00:16:53.033 if that expectation wasn’t met 00:16:53.033 --> 00:16:56.005 Ok, I’m back on the conventional debugging side: 00:16:56.005 --> 00:16:58.090 I break out the big guns: I pull out the Ruby debugger 00:16:58.092 --> 00:17:02.044 I set a debug breakpoint, and I now start tweaking and say 00:17:02.044 --> 00:17:04.085 “Oh, let’s see, I have to get past that ‘if’ statement 00:17:04.085 --> 00:17:06.002 so I have to set that thing 00:17:06.002 --> 00:17:07.063 Oh, I have to call that method and so I need to…” 00:17:07.063 --> 00:17:08.065 No! 00:17:08.065 --> 00:17:10.087 I could instead—if I’m going to do that anyway— 00:17:10.087 --> 00:17:13.000 let’s just do it in a file, set up some mocks and stubs 00:17:13.000 --> 00:17:16.045 to control the code path, make it go the way I want 00:17:16.045 --> 00:17:19.013 And now, “Ok, for sure I’ve fixed it! 00:17:19.013 --> 00:17:22.012 I’ll get out of the debugger, run it all again!” 00:17:22.012 --> 00:17:24.022 And, of course, 9 times out of 10, you didn’t fix it 00:17:24.022 --> 00:17:26.072 or you kind of partly fixed it but you didn’t completely fix it 00:17:26.072 --> 00:17:30.040 and now I have to do all these manual things all over again 00:17:30.040 --> 00:17:32.086 or I already have a bunch of tests 00:17:32.086 --> 00:17:34.031 and I can just rerun them automatically 00:17:34.031 --> 00:17:35.056 and I could, if some of them fail 00:17:35.056 --> 00:17:36.087 “Oh, I didn’t fix the whole thing 00:17:36.087 --> 00:17:38.040 No problem, I’ll just go back!” 00:17:38.040 --> 00:17:39.096 So, the bottom line is that 00:17:39.096 --> 00:17:41.095 you know, you could do it on the left side 00:17:41.095 --> 00:17:45.004 but you’re using the same techniques in both cases 00:17:45.004 --> 00:17:48.062 The only difference is, in one case you’re doing it manually 00:17:48.062 --> 00:17:50.004 which is boring and error-prone 00:17:50.004 --> 00:17:51.078 In the other case you’re doing a little more work 00:17:51.078 --> 00:17:53.095 but you can make it automatic and repeatable 00:17:53.095 --> 00:17:55.071 and have, you know, some high confidence 00:17:55.071 --> 00:17:57.003 that as you change things in your code 00:17:57.003 --> 00:17:58.092 you are not breaking stuff that used to work 00:17:58.092 --> 00:18:00.091 and basically it’s more productive 00:18:00.091 --> 00:18:02.047 So you’re doing all the same things 00:18:02.047 --> 00:18:04.037 but with a, kind of, “delta” extra work 00:18:04.037 --> 00:18:07.086 you are using your effort at a much higher leverage 00:18:07.086 --> 00:18:10.036 So that’s kind of my view of why TDD is a good thing 00:18:10.036 --> 00:18:11.088 It’s really, it doesn’t require new skills 00:18:11.088 --> 00:18:15.011 It just requires [you] to refactor your existing skills 00:18:15.011 --> 00:18:18.014 I also tried when I—again, honest confessions, right?— 00:18:18.014 --> 00:18:19.034 when I started doing this it was like 00:18:19.034 --> 00:18:21.049 “Ok, I gonna be teaching a course on Rails 00:18:21.049 --> 00:18:22.065 I should really focus on testing 00:18:22.065 --> 00:18:24.032 So I went back to some code I had written 00:18:24.032 --> 00:18:26.087 that was working—you know, that was decent code— 00:18:26.087 --> 00:18:29.006 and I started trying to write tests for it 00:18:29.006 --> 00:18:31.019 and it was *so painful* 00:18:31.019 --> 00:18:33.033 because the code wasn’t written in way that was testable 00:18:33.033 --> 00:18:34.097 There were all kinds of interactions 00:18:34.097 --> 00:18:36.038 There were, like, nested conditionals 00:18:36.038 --> 00:18:38.083 And if you wanted to isolate a particular statement 00:18:38.083 --> 00:18:41.070 and have it test—to trigger test—just that statement 00:18:41.070 --> 00:18:44.000 the amount of stuff you’d have to set up in your test 00:18:44.000 --> 00:18:45.009 to have it happen— 00:18:45.009 --> 00:18:46.040 remember when talked about mock train wrecks— 00:18:46.040 --> 00:18:48.014 you have to set up all this infrastructure 00:18:48.014 --> 00:18:49.063 just to get one line of code 00:18:49.063 --> 00:18:51.015 and you do that and you go 00:18:51.015 --> 00:18:52.074 “Gawd, testing is really not worth it! 00:18:52.074 --> 00:18:54.034 I wrote 20 lines of setup 00:18:54.034 --> 00:18:56.059 so that I could test two lines in my function!” 00:18:56.059 --> 00:18:58.085 What that’s really telling you—as I now realize— 00:18:58.085 --> 00:19:00.042 is your function is bad 00:19:00.042 --> 00:19:01.049 It’s a badly written function 00:19:01.049 --> 00:19:02.052 It’s not a testable function 00:19:02.052 --> 00:19:03.088 It’s got too many moving parts 00:19:03.088 --> 00:19:06.026 whose dependencies can be broken 00:19:06.026 --> 00:19:07.070 There’s no seams in my function 00:19:07.070 --> 00:19:11.008 that allow me to individually test the different behaviours 00:19:11.008 --> 00:19:12.083 And once you start doing Test First Development 00:19:12.083 --> 00:19:15.043 because you have to write your tests in small chunks 00:19:15.043 --> 00:19:17.053 it kind of make this problem go away 00:19:17.053 --> 99:59:59.999 So that’s been my epiphany