1 00:00:00,118 --> 00:00:02,898 So, now, let’s look at an example in [an] actual network, 2 00:00:02,898 --> 00:00:06,181 and try to see what the CPD’s look like, 3 00:00:06,181 --> 00:00:07,738 what behavior we get, 4 00:00:07,738 --> 00:00:09,488 and how we might augment the network 5 00:00:09,488 --> 00:00:11,238 to include additional things. 6 00:00:11,238 --> 00:00:12,802 Now, let me warn you right upfront 7 00:00:12,802 --> 00:00:14,656 that this is a baby network; 8 00:00:14,656 --> 00:00:16,263 it’s not a real network, 9 00:00:16,263 --> 00:00:19,622 but it’s compact enough to look at, but 10 00:00:19,622 --> 00:00:22,918 still interesting enough to get some non-trivial behaviors. 11 00:00:24,579 --> 00:00:26,593 So, to explore the network, 12 00:00:26,593 --> 00:00:28,854 we’re going to use a system called SAMIAM. 13 00:00:28,854 --> 00:00:31,662 It was produced by Adnan Darwiche and his group at UCLA, 14 00:00:31,662 --> 00:00:32,640 and it’s nice 15 00:00:32,640 --> 00:00:36,151 because it actually works on all sorts of different platforms, 16 00:00:36,151 --> 00:00:38,911 so it’s usable by pretty much everyone. 17 00:00:38,911 --> 00:00:41,537 So let’s look at a particular problem: 18 00:00:41,537 --> 00:00:43,616 Imagine that we’re an insurance company 19 00:00:43,616 --> 00:00:44,888 and we’re trying to decide 20 00:00:44,888 --> 00:00:46,438 for a person who comes into the door 21 00:00:46,438 --> 00:00:49,002 whether to give them insurance or not. 22 00:00:49,002 --> 00:00:51,746 So the operative aspect of making that decision 23 00:00:51,746 --> 00:00:54,369 is how much the policy is going to cost us, 24 00:00:54,369 --> 00:00:55,892 that is, how much we’re going to have to pay 25 00:00:55,892 --> 00:00:58,915 over the course of a year to insure this person. 26 00:00:58,915 --> 00:01:02,287 So there is a variable called Cost. 27 00:01:02,287 --> 00:01:06,847 Let’s click on that to see what properties that variable have. 28 00:01:06,847 --> 00:01:08,507 And we can see that in this case, 29 00:01:08,507 --> 00:01:11,952 we’ve decided to only give two values to the Cost variable, 30 00:01:11,952 --> 00:01:13,772 Low and High. 31 00:01:13,772 --> 00:01:16,790 This is clearly a very coarse-grained approximation 32 00:01:16,790 --> 00:01:18,472 and not one that we will use in practice. 33 00:01:18,472 --> 00:01:20,303 In reality we would probably 34 00:01:20,303 --> 00:01:22,135 have this be a continuous variable 35 00:01:22,135 --> 00:01:25,887 whose mean depends on various aspects of the model. 36 00:01:25,887 --> 00:01:27,783 But for the purposes of our illustration, 37 00:01:27,783 --> 00:01:29,559 we’re going to use this discrete distribution 38 00:01:29,559 --> 00:01:31,273 that only has values Low and High. 39 00:01:31,273 --> 00:01:32,647 Okay. 40 00:01:32,647 --> 00:01:36,855 So now, let’s build up this network using the technique of 41 00:01:36,855 --> 00:01:39,504 “expanding the conversation” that we’ve discussed before. 42 00:01:39,504 --> 00:01:44,224 And so what is most important determining factor 43 00:01:44,224 --> 00:01:46,740 as to the cost of the insurance company has to pay? 44 00:01:46,740 --> 00:01:50,647 Well, probably whether the person has accidents 45 00:01:50,647 --> 00:01:51,977 and how severe they are. 46 00:01:51,977 --> 00:01:57,005 So here we have a network that has two variables: 47 00:01:57,005 --> 00:01:59,724 One is Accident and one is Cost. 48 00:01:59,724 --> 00:02:02,645 And in this case we decided to select 49 00:02:02,645 --> 00:02:05,701 three possible values for the accident variable, 50 00:02:05,701 --> 00:02:09,031 None, Mild, and Severe, 51 00:02:09,031 --> 00:02:14,442 and with the probabilities that you see listed. 52 00:02:14,442 --> 00:02:17,447 And what you see down below is the Cost variable. 53 00:02:17,447 --> 00:02:18,759 And let’s open the CPD 54 00:02:18,759 --> 00:02:24,845 of the Cost variable given the Accident variable. 55 00:02:24,845 --> 00:02:26,946 And we can see that, in this case, 56 00:02:26,946 --> 00:02:29,169 we have a conditional probability table 57 00:02:29,169 --> 00:02:33,358 of Cost given Accident. 58 00:02:33,358 --> 00:02:35,485 Note that this is actually inverted 59 00:02:35,485 --> 00:02:38,542 from the notation that we have used in the class before, 60 00:02:38,542 --> 00:02:41,713 because here the conditioning cases are columns, 61 00:02:41,713 --> 00:02:44,545 whereas in the examples that we’ve given 62 00:02:44,545 --> 00:02:45,754 [they] have been rows. 63 00:02:45,754 --> 00:02:49,189 But that’s okay, it’s the same thing, just inverted. 64 00:02:49,650 --> 00:02:51,102 And so we see, for example, 65 00:02:51,102 --> 00:02:54,418 that if the person has no accidents, 66 00:02:54,418 --> 00:02:56,765 the costs are very likely to be very low; 67 00:02:56,765 --> 00:03:01,581 mild accidents incur different distributions over cost; 68 00:03:01,581 --> 00:03:03,173 and severe accidents have 69 00:03:03,173 --> 00:03:05,789 a probability of 0.9 of having high cost 70 00:03:05,789 --> 00:03:08,414 and 0.1 of having low cost. 71 00:03:09,060 --> 00:03:11,749 So now, let’s continue extending the conversation 72 00:03:11,749 --> 00:03:14,088 and ask what Accident depends on. 73 00:03:14,088 --> 00:03:17,207 And it seems that one of the obvious factors 74 00:03:17,207 --> 00:03:20,409 is whether the person is a good driver or not. 75 00:03:20,409 --> 00:03:22,661 And so we would expect driver quality 76 00:03:22,661 --> 00:03:23,965 to be a parent of Accident. 77 00:03:23,965 --> 00:03:25,214 But there is other things 78 00:03:25,214 --> 00:03:27,853 that also affect not just the presence of an accident, 79 00:03:27,853 --> 00:03:29,911 but also the severity of the accident. 80 00:03:29,911 --> 00:03:33,909 So for example, vehicle size would affect 81 00:03:33,909 --> 00:03:37,117 both the severity of an accident 82 00:03:37,117 --> 00:03:40,788 because if you are driving a large SUV, then chances are 83 00:03:40,788 --> 00:03:43,599 you are not likely to be in an accident as severe 84 00:03:43,599 --> 00:03:45,685 but it might also perhaps increase 85 00:03:45,685 --> 00:03:47,374 the chance of having an accident overall 86 00:03:47,374 --> 00:03:51,830 because maybe driving a large car is harder to handle. 87 00:03:52,615 --> 00:03:56,381 And then vehicle year might affect the chances of an accident 88 00:03:56,381 --> 00:03:59,509 because of the presence or absence of certain safety features 89 00:03:59,509 --> 00:04:02,037 like anti-lock brakes and airbags. 90 00:04:02,037 --> 00:04:03,814 So let’s open the CPD of Accident 91 00:04:03,814 --> 00:04:05,173 and see what that looks like 92 00:04:05,173 --> 00:04:07,325 now that we have all these parents for it. 93 00:04:07,325 --> 00:04:10,213 And we can see here that we have these, 94 00:04:10,213 --> 00:04:13,165 in this case, eight conditioning cases, 95 00:04:13,165 --> 00:04:18,317 correspond[ing] to three variables, two values each. 96 00:04:18,317 --> 00:04:22,757 And so here just to look at one of the samples, 97 00:04:22,757 --> 00:04:26,046 just as an example, distribution for example. 98 00:04:26,046 --> 00:04:30,749 So, if this is a fairly new vehicle—after 2000— 99 00:04:30,749 --> 00:04:32,398 and it’s an SUV, 100 00:04:32,398 --> 00:04:35,725 the probability of having a severe accident is quite low. 101 00:04:35,725 --> 00:04:38,813 and the probability of having a mild accident is moderate 102 00:04:38,813 --> 00:04:44,774 and the probability of having of no accidents is 0.85 103 00:04:44,774 --> 00:04:48,525 whereas if you compare that to corresponding entry 104 00:04:48,525 --> 00:04:52,053 when we keep everything fixed except that now it’s a compact car, 105 00:04:52,053 --> 00:05:00,509 we see that the probability of having a mild accident is lower, 106 00:05:00,509 --> 00:05:03,045 but the probability of having no accidents is higher, 107 00:05:03,045 --> 00:05:08,059 representing different driving patterns, for example. 108 00:05:08,506 --> 00:05:11,649 Okay, so with this network, 109 00:05:11,649 --> 00:05:13,649 we can now start asking simple questions. 110 00:05:14,695 --> 00:05:17,145 So to do some example of causal inference, 111 00:05:17,145 --> 00:05:20,559 let’s instantiate, for example, driving quality to be good. 112 00:05:21,574 --> 00:05:23,936 And bad. 113 00:05:23,936 --> 00:05:27,054 And we can see that for bad driver 114 00:05:27,054 --> 00:05:31,397 the probability of low cost is 81%. 115 00:05:31,397 --> 00:05:36,325 And for a good driver the probability of low cost is 87%. 116 00:05:36,325 --> 00:05:38,381 If we look at the accidents 117 00:05:38,381 --> 00:05:41,278 we can see that for a good driver 118 00:05:41,278 --> 00:05:44,800 there is a probability of 87.5 percent of no accidents 119 00:05:44,800 --> 00:05:46,431 and ten percent of mild accident. 120 00:05:46,431 --> 00:05:50,957 And the probability of no accident goes down for a bad driver, 121 00:05:50,957 --> 00:05:53,422 and mild accident goes up 122 00:05:53,422 --> 00:05:55,453 and severe accidents also goes way up. 123 00:05:55,453 --> 00:05:59,077 Now note that many of these differences are quite subtle. 124 00:05:59,077 --> 00:06:02,054 There’s a difference of a couple percent one way or the other. 125 00:06:02,054 --> 00:06:04,038 And you might think, 126 00:06:04,038 --> 00:06:05,326 if you were designing a network, 127 00:06:05,326 --> 00:06:09,245 that you’d like these really extreme probability changes 128 00:06:09,245 --> 00:06:11,485 when you instantiate values. 129 00:06:11,485 --> 00:06:13,790 But in many cases that’s not actually true, 130 00:06:13,790 --> 00:06:15,052 and these subtle differences 131 00:06:15,052 --> 00:06:17,643 are actually quite significant for an insurance company 132 00:06:17,643 --> 00:06:19,819 that insures hundreds of thousands of people— 133 00:06:19,819 --> 00:06:22,495 a couple of percentage points in the probability of an accident 134 00:06:22,495 --> 00:06:24,855 can make a very big difference to one’s profitability. 135 00:06:25,809 --> 00:06:26,972 So now let’s think about 136 00:06:26,972 --> 00:06:29,519 how we would expand this network even further. 137 00:06:30,196 --> 00:06:33,244 Vehicle size and vehicle year are things 138 00:06:33,244 --> 00:06:35,851 that we’re likely to observe in the insurance forum. 139 00:06:35,851 --> 00:06:39,070 But driver quality is something that’s very difficult to observe. 140 00:06:39,070 --> 00:06:41,829 You can’t go ask somebody, “Oh, are you a good driver?” 141 00:06:41,829 --> 00:06:43,451 Because everyone’s going to say, 142 00:06:43,451 --> 00:06:45,123 “Sure, I’m the best driver ever!” 143 00:06:45,123 --> 00:06:49,272 And so that’s not going to be a very useful question. 144 00:06:49,272 --> 00:06:53,147 So what evidence do we have that we can observe 145 00:06:53,147 --> 00:06:57,148 that might indicate to us the value of the driver quality? 146 00:06:57,148 --> 00:07:01,491 One obvious one is the person’s driving record. 147 00:07:01,491 --> 00:07:03,556 That is, whether they’ve had previous accidents 148 00:07:03,556 --> 00:07:05,355 or previous moving violations. 149 00:07:05,832 --> 00:07:08,412 So let’s think about adding a variable 150 00:07:08,412 --> 00:07:09,880 that represents driving history. 151 00:07:10,665 --> 00:07:13,507 And so let’s go ahead and introduce that variable. 152 00:07:13,507 --> 00:07:16,315 So we can click on this button 153 00:07:16,315 --> 00:07:17,697 that allows us to create a node. 154 00:07:17,697 --> 00:07:19,841 The node is now called variable1 155 00:07:19,841 --> 00:07:21,036 so we’d have to give it a name. 156 00:07:21,036 --> 00:07:24,787 So for example we’re going to call it DrivingHistory. 157 00:07:26,079 --> 00:07:28,074 And that’s its identifier, 158 00:07:28,074 --> 00:07:30,620 and we also have the name of the variable, 159 00:07:30,620 --> 00:07:32,246 which is usually the same. 160 00:07:32,246 --> 00:07:35,195 And let’s make that two values, 161 00:07:35,195 --> 00:07:37,725 say PreviousAccident and NoPreviousAccident. 162 00:07:41,586 --> 00:07:45,760 Now where will we place this variable in the network? 163 00:07:45,760 --> 00:07:48,828 One might initially think that the right thing to do 164 00:07:48,828 --> 00:07:53,228 is to place DrivingHistory as a parent of Driver_quality 165 00:07:53,228 --> 00:07:57,150 because driving history can influence 166 00:07:57,150 --> 00:07:59,044 our beliefs about driver quality. 167 00:07:59,044 --> 00:08:01,290 Now it’s true that observing driving history 168 00:08:01,290 --> 00:08:03,651 changes our probability within driver quality, 169 00:08:03,651 --> 00:08:07,443 but if you think about the actual causal structure of this scenario, 170 00:08:07,443 --> 00:08:11,878 what we actually have is that driver quality is a causal factor 171 00:08:11,878 --> 00:08:13,968 of both a previous accident 172 00:08:13,968 --> 00:08:16,765 as well as a subsequent accident. 173 00:08:16,765 --> 00:08:18,327 And so if we want to maintain 174 00:08:18,327 --> 00:08:20,421 the intuitive causal structure of the domain, 175 00:08:20,421 --> 00:08:28,095 a more appropriate thing is to add DrivingHistory as a child 176 00:08:28,095 --> 00:08:29,972 rather than parent of Driver_quality. 177 00:08:29,972 --> 00:08:32,187 [You] might question why it matters 178 00:08:32,187 --> 00:08:33,864 and in this very simple example 179 00:08:33,864 --> 00:08:36,874 the two models are in some sense equivalent 180 00:08:36,874 --> 00:08:38,851 and we could have placed it either way 181 00:08:38,851 --> 00:08:44,076 except that the CPD for driver quality given driving history 182 00:08:44,076 --> 00:08:46,006 might be a little bit less intuitive. 183 00:08:46,006 --> 00:08:49,955 But if we had other indicators of driver quality, 184 00:08:49,955 --> 00:08:52,436 for example a previous moving violation, 185 00:08:52,436 --> 00:08:55,661 then it actually makes a lot more sense 186 00:08:55,661 --> 00:08:58,559 to have all of these be children of driver quality 187 00:08:58,559 --> 00:09:00,925 as opposed to parents of driver quality. 188 00:09:01,802 --> 00:09:02,680 Okay. 189 00:09:02,680 --> 00:09:07,481 So that shows us how we would add a variable into the network. 190 00:09:07,481 --> 00:09:09,764 And now let’s go and open up a much larger network 191 00:09:09,764 --> 00:09:13,007 that includes these variables as well as others. 192 00:09:13,007 --> 00:09:15,971 So let’s look now at this larger network. 193 00:09:15,971 --> 00:09:17,347 And we can see 194 00:09:17,347 --> 00:09:20,237 that we’ve added several different variables to the network. 195 00:09:20,237 --> 00:09:23,211 We’ve added attributes of the vehicle, 196 00:09:23,211 --> 00:09:27,284 for example whether the vehicle had antilock brakes and an airbag, 197 00:09:27,284 --> 00:09:28,787 which is going to allow us to give 198 00:09:28,787 --> 00:09:31,483 more informative probabilities regarding the accident. 199 00:09:31,483 --> 00:09:35,227 We’ve also introduced aspects of the driver, 200 00:09:35,227 --> 00:09:38,243 for example, whether they’ve had extra-track training, 201 00:09:38,243 --> 00:09:40,084 which is going to increase driving quality, 202 00:09:40,084 --> 00:09:41,684 whether they’re young or old, 203 00:09:41,684 --> 00:09:42,952 where the presumption is 204 00:09:42,952 --> 00:09:45,883 that younger people tend to be more reckless drivers, 205 00:09:45,883 --> 00:09:50,373 and whether the driver is focused or more easily distracted, 206 00:09:50,373 --> 00:09:52,848 which again is going to affect driving quality. 207 00:09:53,648 --> 00:09:58,681 Now since personality type is hard to observe, 208 00:09:58,681 --> 00:10:03,155 we added another variable which is Good_student 209 00:10:03,155 --> 00:10:05,654 which might indicate one’s personality type. 210 00:10:05,654 --> 00:10:08,862 So let’s open [the] CPD for that one. 211 00:10:11,293 --> 00:10:14,071 And so we can see here that, for example, 212 00:10:14,071 --> 00:10:20,999 if you are a focused person who is young, 213 00:10:20,999 --> 00:10:23,720 you’re much more likely to be a good student, 214 00:10:23,720 --> 00:10:28,439 much more so than if you are not a focused person who is young. 215 00:10:28,439 --> 00:10:32,303 If you’re old, you’re just not very likely to be a student, 216 00:10:32,303 --> 00:10:38,021 and so this probability basically says that if you’re old, 217 00:10:38,021 --> 00:10:39,883 you’re just not very likely to be a student, 218 00:10:39,883 --> 00:10:41,322 and therefore not likely to be a good student. 219 00:10:42,014 --> 00:10:47,608 So, now that we’ve added all these variables to the network, 220 00:10:47,608 --> 00:10:51,161 let’s go ahead and run a few queries to see what happens. 221 00:10:51,161 --> 00:10:56,918 And let’s start by looking at the prior probability of Accident 222 00:10:56,918 --> 00:10:59,942 before we observe anything. 223 00:10:59,942 --> 00:11:04,299 So we can see that the probability of no accident is about 79.5%. 224 00:11:04,299 --> 00:11:07,065 The probability of severe accident is about 3%. 225 00:11:07,065 --> 00:11:10,077 Now let’s go ahead and tell the system 226 00:11:10,077 --> 00:11:11,837 that we have a good student at hand. 227 00:11:11,837 --> 00:11:13,608 So we’re going to observe 228 00:11:13,608 --> 00:11:15,978 that the student is a good student, 229 00:11:15,978 --> 00:11:17,567 and let’s see what happens. 230 00:11:18,059 --> 00:11:19,695 We can see, surprisingly, 231 00:11:19,695 --> 00:11:20,807 that even though we observe 232 00:11:20,807 --> 00:11:21,887 that somebody is a good student, 233 00:11:21,887 --> 00:11:23,954 the probability of no accidents 234 00:11:23,954 --> 00:11:27,699 went down from 79.5% to 78%, 235 00:11:27,699 --> 00:11:29,582 and the probability of severe accidents 236 00:11:29,582 --> 00:11:32,819 went up to 3.5 to 3.67 percent. 237 00:11:32,819 --> 00:11:33,880 You might say, 238 00:11:33,880 --> 00:11:35,986 “Well, but I told you that it’s a good student. 239 00:11:35,986 --> 00:11:38,138 Shouldn’t the probability of accidents go down?” 240 00:11:38,307 --> 00:11:41,972 So let’s look at some active trails in this graph. 241 00:11:41,972 --> 00:11:46,461 One active trail goes from Good_student to Focused, 242 00:11:46,461 --> 00:11:48,928 to Driver_quality, 243 00:11:48,928 --> 00:11:49,939 to Accident. 244 00:11:49,939 --> 00:11:53,602 And sure enough, if we consider that trail in isolation, 245 00:11:53,602 --> 00:11:58,378 it’s probably going to make the probability of no accident be higher. 246 00:11:58,378 --> 00:12:00,204 But, we have another active trail. 247 00:12:00,204 --> 00:12:04,058 We have the active trail that goes from good student up to age, 248 00:12:04,058 --> 00:12:07,085 and then back down, through [to] driver quality. 249 00:12:07,085 --> 00:12:09,921 So, to see that, let’s unclick on good student 250 00:12:09,921 --> 00:12:11,281 and see what happens. 251 00:12:11,281 --> 00:12:15,767 Note that the probability initially that the driver is young was 25%, 252 00:12:15,767 --> 00:12:17,710 but when I observed a good student, 253 00:12:17,710 --> 00:12:20,538 it went up to close to 95%. 254 00:12:20,538 --> 00:12:23,143 And that was enough to counteract the influence 255 00:12:23,143 --> 00:12:27,446 along this more obvious active trail. 256 00:12:27,831 --> 00:12:31,551 So, to demonstrate that this is indeed what’s going on, 257 00:12:31,551 --> 00:12:35,783 let’s click on the fact 258 00:12:35,783 --> 00:12:38,415 and instantiate the fact that the student is young, 259 00:12:38,415 --> 00:12:43,446 and we can see that the probability of severe accident went up to 3.7% 260 00:12:43,446 --> 00:12:47,967 and no accident went down to a little bit shy of 77%. 261 00:12:47,967 --> 00:12:51,559 And now let’s observe good student and see what happens. 262 00:12:51,559 --> 00:12:53,174 So now we observed good student, 263 00:12:53,174 --> 00:13:01,656 and the probability of no accidents went down to 78%, 264 00:13:01,656 --> 00:13:07,036 as opposed to before when it was 77%. 265 00:13:07,036 --> 00:13:10,558 And the reason for that 266 00:13:10,558 --> 00:13:12,773 is that we’ve now blocked this trail 267 00:13:12,773 --> 00:13:15,871 that goes from good student, through age, to driver quality 268 00:13:15,871 --> 00:13:17,917 by observing this variable which blocks the trail. 269 00:13:17,917 --> 00:13:20,624 So we can see the reasoning patterns 270 00:13:20,624 --> 00:13:24,981 in a Bayesian network are sometimes subtle. 271 00:13:24,981 --> 00:13:28,640 And there are different trails that can affect things 272 00:13:28,640 --> 00:13:31,748 and interact with each other in different ways. 273 00:13:31,748 --> 00:13:34,698 And so it’s useful to take the model 274 00:13:34,698 --> 00:13:36,315 and play around with different queries 275 00:13:36,315 --> 00:13:37,734 and different combinations of evidence 276 00:13:37,734 --> 00:13:40,026 to understand the behavior of a network. 277 00:13:40,026 --> 00:13:41,341 And especially if you’re designing 278 00:13:41,341 --> 00:13:43,786 such a network for a particular application, 279 00:13:43,786 --> 00:13:46,290 it’s useful to try out these different queries 280 00:13:46,290 --> 00:13:48,053 and see if the behavior that you get 281 00:13:48,053 --> 00:13:49,706 is the behavior that you want to get. 282 00:13:49,706 --> 00:13:52,076 And if not, then you need to thing about 283 00:13:52,076 --> 00:13:55,962 how do I modify this network to get behavior 284 00:13:55,962 --> 00:14:00,498 that’s more analogous to the desired behavior. 285 00:14:00,498 --> 00:14:03,755 This network is available for you to play with 286 00:14:03,755 --> 00:14:06,005 and you can try out different things 287 00:14:06,005 --> 00:14:08,961 and see what behaviors you get.