0:00:00.118,0:00:02.898 So, now, let’s look at an example in [an] actual network, 0:00:02.898,0:00:06.181 and try to see what the CPD’s look like, 0:00:06.181,0:00:07.738 what behavior we get, 0:00:07.738,0:00:09.488 and how we might augment the network 0:00:09.488,0:00:11.238 to include additional things. 0:00:11.238,0:00:12.802 Now, let me warn you right upfront 0:00:12.802,0:00:14.656 that this is a baby network; 0:00:14.656,0:00:16.263 it’s not a real network, 0:00:16.263,0:00:19.622 but it’s compact enough to look at, but 0:00:19.622,0:00:22.918 still interesting enough to get some non-trivial behaviors. 0:00:24.579,0:00:26.593 So, to explore the network, 0:00:26.593,0:00:28.854 we’re going to use a system called SAMIAM. 0:00:28.854,0:00:31.662 It was produced by Adnan Darwiche and his group at UCLA, 0:00:31.662,0:00:32.640 and it’s nice 0:00:32.640,0:00:36.151 because it actually works on all sorts of different platforms, 0:00:36.151,0:00:38.911 so it’s usable by pretty much everyone. 0:00:38.911,0:00:41.537 So let’s look at a particular problem: 0:00:41.537,0:00:43.616 Imagine that we’re an insurance company 0:00:43.616,0:00:44.888 and we’re trying to decide 0:00:44.888,0:00:46.438 for a person who comes into the door 0:00:46.438,0:00:49.002 whether to give them insurance or not. 0:00:49.002,0:00:51.746 So the operative aspect of making that decision 0:00:51.746,0:00:54.369 is how much the policy is going to cost us, 0:00:54.369,0:00:55.892 that is, how much we’re going to have to pay 0:00:55.892,0:00:58.915 over the course of a year to insure this person. 0:00:58.915,0:01:02.287 So there is a variable called Cost. 0:01:02.287,0:01:06.847 Let’s click on that to see what properties that variable have. 0:01:06.847,0:01:08.507 And we can see that in this case, 0:01:08.507,0:01:11.952 we’ve decided to only give two values to the Cost variable, 0:01:11.952,0:01:13.772 Low and High. 0:01:13.772,0:01:16.790 This is clearly a very coarse-grained approximation 0:01:16.790,0:01:18.472 and not one that we will use in practice. 0:01:18.472,0:01:20.303 In reality we would probably 0:01:20.303,0:01:22.135 have this be a continuous variable 0:01:22.135,0:01:25.887 whose mean depends on various aspects of the model. 0:01:25.887,0:01:27.783 But for the purposes of our illustration, 0:01:27.783,0:01:29.559 we’re going to use this discrete distribution 0:01:29.559,0:01:31.273 that only has values Low and High. 0:01:31.273,0:01:32.647 Okay. 0:01:32.647,0:01:36.855 So now, let’s build up this network using the technique of 0:01:36.855,0:01:39.504 “expanding the conversation” that we’ve discussed before. 0:01:39.504,0:01:44.224 And so what is most important determining factor 0:01:44.224,0:01:46.740 as to the cost of the insurance company has to pay? 0:01:46.740,0:01:50.647 Well, probably whether the person has accidents 0:01:50.647,0:01:51.977 and how severe they are. 0:01:51.977,0:01:57.005 So here we have a network that has two variables: 0:01:57.005,0:01:59.724 One is Accident and one is Cost. 0:01:59.724,0:02:02.645 And in this case we decided to select 0:02:02.645,0:02:05.701 three possible values for the accident variable, 0:02:05.701,0:02:09.031 None, Mild, and Severe, 0:02:09.031,0:02:14.442 and with the probabilities that you see listed. 0:02:14.442,0:02:17.447 And what you see down below is the Cost variable. 0:02:17.447,0:02:18.759 And let’s open the CPD 0:02:18.759,0:02:24.845 of the Cost variable given the Accident variable. 0:02:24.845,0:02:26.946 And we can see that, in this case, 0:02:26.946,0:02:29.169 we have a conditional probability table 0:02:29.169,0:02:33.358 of Cost given Accident. 0:02:33.358,0:02:35.485 Note that this is actually inverted 0:02:35.485,0:02:38.542 from the notation that we have used in the class before, 0:02:38.542,0:02:41.713 because here the conditioning cases are columns, 0:02:41.713,0:02:44.545 whereas in the examples that we’ve given 0:02:44.545,0:02:45.754 [they] have been rows. 0:02:45.754,0:02:49.189 But that’s okay, it’s the same thing, just inverted. 0:02:49.650,0:02:51.102 And so we see, for example, 0:02:51.102,0:02:54.418 that if the person has no accidents, 0:02:54.418,0:02:56.765 the costs are very likely to be very low; 0:02:56.765,0:03:01.581 mild accidents incur different distributions over cost; 0:03:01.581,0:03:03.173 and severe accidents have 0:03:03.173,0:03:05.789 a probability of 0.9 of having high cost 0:03:05.789,0:03:08.414 and 0.1 of having low cost. 0:03:09.060,0:03:11.749 So now, let’s continue extending the conversation 0:03:11.749,0:03:14.088 and ask what Accident depends on. 0:03:14.088,0:03:17.207 And it seems that one of the obvious factors 0:03:17.207,0:03:20.409 is whether the person is a good driver or not. 0:03:20.409,0:03:22.661 And so we would expect driver quality 0:03:22.661,0:03:23.965 to be a parent of Accident. 0:03:23.965,0:03:25.214 But there is other things 0:03:25.214,0:03:27.853 that also affect not just the presence of an accident, 0:03:27.853,0:03:29.911 but also the severity of the accident. 0:03:29.911,0:03:33.909 So for example, vehicle size would affect 0:03:33.909,0:03:37.117 both the severity of an accident 0:03:37.117,0:03:40.788 because if you are driving a large SUV, then chances are 0:03:40.788,0:03:43.599 you are not likely to be in an accident as severe 0:03:43.599,0:03:45.685 but it might also perhaps increase 0:03:45.685,0:03:47.374 the chance of having an accident overall 0:03:47.374,0:03:51.830 because maybe driving a large car is harder to handle. 0:03:52.615,0:03:56.381 And then vehicle year might affect the chances of an accident 0:03:56.381,0:03:59.509 because of the presence or absence of certain safety features 0:03:59.509,0:04:02.037 like anti-lock brakes and airbags. 0:04:02.037,0:04:03.814 So let’s open the CPD of Accident 0:04:03.814,0:04:05.173 and see what that looks like 0:04:05.173,0:04:07.325 now that we have all these parents for it. 0:04:07.325,0:04:10.213 And we can see here that we have these, 0:04:10.213,0:04:13.165 in this case, eight conditioning cases, 0:04:13.165,0:04:18.317 correspond[ing] to three variables, two values each. 0:04:18.317,0:04:22.757 And so here just to look at one of the samples, 0:04:22.757,0:04:26.046 just as an example, distribution for example. 0:04:26.046,0:04:30.749 So, if this is a fairly new vehicle—after 2000— 0:04:30.749,0:04:32.398 and it’s an SUV, 0:04:32.398,0:04:35.725 the probability of having a severe accident is quite low. 0:04:35.725,0:04:38.813 and the probability of having a mild accident is moderate 0:04:38.813,0:04:44.774 and the probability of having of no accidents is 0.85 0:04:44.774,0:04:48.525 whereas if you compare that to corresponding entry 0:04:48.525,0:04:52.053 when we keep everything fixed except that now it’s a compact car, 0:04:52.053,0:05:00.509 we see that the probability of having a mild accident is lower, 0:05:00.509,0:05:03.045 but the probability of having no accidents is higher, 0:05:03.045,0:05:08.059 representing different driving patterns, for example. 0:05:08.506,0:05:11.649 Okay, so with this network, 0:05:11.649,0:05:13.649 we can now start asking simple questions. 0:05:14.695,0:05:17.145 So to do some example of causal inference, 0:05:17.145,0:05:20.559 let’s instantiate, for example, driving quality to be good. 0:05:21.574,0:05:23.936 And bad. 0:05:23.936,0:05:27.054 And we can see that for bad driver 0:05:27.054,0:05:31.397 the probability of low cost is 81%. 0:05:31.397,0:05:36.325 And for a good driver the probability of low cost is 87%. 0:05:36.325,0:05:38.381 If we look at the accidents 0:05:38.381,0:05:41.278 we can see that for a good driver 0:05:41.278,0:05:44.800 there is a probability of 87.5 percent of no accidents 0:05:44.800,0:05:46.431 and ten percent of mild accident. 0:05:46.431,0:05:50.957 And the probability of no accident goes down for a bad driver, 0:05:50.957,0:05:53.422 and mild accident goes up 0:05:53.422,0:05:55.453 and severe accidents also goes way up. 0:05:55.453,0:05:59.077 Now note that many of these differences are quite subtle. 0:05:59.077,0:06:02.054 There’s a difference of a couple percent one way or the other. 0:06:02.054,0:06:04.038 And you might think, 0:06:04.038,0:06:05.326 if you were designing a network, 0:06:05.326,0:06:09.245 that you’d like these really extreme probability changes 0:06:09.245,0:06:11.485 when you instantiate values. 0:06:11.485,0:06:13.790 But in many cases that’s not actually true, 0:06:13.790,0:06:15.052 and these subtle differences 0:06:15.052,0:06:17.643 are actually quite significant for an insurance company 0:06:17.643,0:06:19.819 that insures hundreds of thousands of people— 0:06:19.819,0:06:22.495 a couple of percentage points in the probability of an accident 0:06:22.495,0:06:24.855 can make a very big difference to one’s profitability. 0:06:25.809,0:06:26.972 So now let’s think about 0:06:26.972,0:06:29.519 how we would expand this network even further. 0:06:30.196,0:06:33.244 Vehicle size and vehicle year are things 0:06:33.244,0:06:35.851 that we’re likely to observe in the insurance forum. 0:06:35.851,0:06:39.070 But driver quality is something that’s very difficult to observe. 0:06:39.070,0:06:41.829 You can’t go ask somebody, “Oh, are you a good driver?” 0:06:41.829,0:06:43.451 Because everyone’s going to say, 0:06:43.451,0:06:45.123 “Sure, I’m the best driver ever!” 0:06:45.123,0:06:49.272 And so that’s not going to be a very useful question. 0:06:49.272,0:06:53.147 So what evidence do we have that we can observe 0:06:53.147,0:06:57.148 that might indicate to us the value of the driver quality? 0:06:57.148,0:07:01.491 One obvious one is the person’s driving record. 0:07:01.491,0:07:03.556 That is, whether they’ve had previous accidents 0:07:03.556,0:07:05.355 or previous moving violations. 0:07:05.832,0:07:08.412 So let’s think about adding a variable 0:07:08.412,0:07:09.880 that represents driving history. 0:07:10.665,0:07:13.507 And so let’s go ahead and introduce that variable. 0:07:13.507,0:07:16.315 So we can click on this button 0:07:16.315,0:07:17.697 that allows us to create a node. 0:07:17.697,0:07:19.841 The node is now called variable1 0:07:19.841,0:07:21.036 so we’d have to give it a name. 0:07:21.036,0:07:24.787 So for example we’re going to call it DrivingHistory. 0:07:26.079,0:07:28.074 And that’s its identifier, 0:07:28.074,0:07:30.620 and we also have the name of the variable, 0:07:30.620,0:07:32.246 which is usually the same. 0:07:32.246,0:07:35.195 And let’s make that two values, 0:07:35.195,0:07:37.725 say PreviousAccident and NoPreviousAccident. 0:07:41.586,0:07:45.760 Now where will we place this variable in the network? 0:07:45.760,0:07:48.828 One might initially think that the right thing to do 0:07:48.828,0:07:53.228 is to place DrivingHistory as a parent of Driver_quality 0:07:53.228,0:07:57.150 because driving history can influence 0:07:57.150,0:07:59.044 our beliefs about driver quality. 0:07:59.044,0:08:01.290 Now it’s true that observing driving history 0:08:01.290,0:08:03.651 changes our probability within driver quality, 0:08:03.651,0:08:07.443 but if you think about the actual causal structure of this scenario, 0:08:07.443,0:08:11.878 what we actually have is that driver quality is a causal factor 0:08:11.878,0:08:13.968 of both a previous accident 0:08:13.968,0:08:16.765 as well as a subsequent accident. 0:08:16.765,0:08:18.327 And so if we want to maintain 0:08:18.327,0:08:20.421 the intuitive causal structure of the domain, 0:08:20.421,0:08:28.095 a more appropriate thing is to add DrivingHistory as a child 0:08:28.095,0:08:29.972 rather than parent of Driver_quality. 0:08:29.972,0:08:32.187 [You] might question why it matters 0:08:32.187,0:08:33.864 and in this very simple example 0:08:33.864,0:08:36.874 the two models are in some sense equivalent 0:08:36.874,0:08:38.851 and we could have placed it either way 0:08:38.851,0:08:44.076 except that the CPD for driver quality given driving history 0:08:44.076,0:08:46.006 might be a little bit less intuitive. 0:08:46.006,0:08:49.955 But if we had other indicators of driver quality, 0:08:49.955,0:08:52.436 for example a previous moving violation, 0:08:52.436,0:08:55.661 then it actually makes a lot more sense 0:08:55.661,0:08:58.559 to have all of these be children of driver quality 0:08:58.559,0:09:00.925 as opposed to parents of driver quality. 0:09:01.802,0:09:02.680 Okay. 0:09:02.680,0:09:07.481 So that shows us how we would add a variable into the network. 0:09:07.481,0:09:09.764 And now let’s go and open up a much larger network 0:09:09.764,0:09:13.007 that includes these variables as well as others. 0:09:13.007,0:09:15.971 So let’s look now at this larger network. 0:09:15.971,0:09:17.347 And we can see 0:09:17.347,0:09:20.237 that we’ve added several different variables to the network. 0:09:20.237,0:09:23.211 We’ve added attributes of the vehicle, 0:09:23.211,0:09:27.284 for example whether the vehicle had antilock brakes and an airbag, 0:09:27.284,0:09:28.787 which is going to allow us to give 0:09:28.787,0:09:31.483 more informative probabilities regarding the accident. 0:09:31.483,0:09:35.227 We’ve also introduced aspects of the driver, 0:09:35.227,0:09:38.243 for example, whether they’ve had extra-track training, 0:09:38.243,0:09:40.084 which is going to increase driving quality, 0:09:40.084,0:09:41.684 whether they’re young or old, 0:09:41.684,0:09:42.952 where the presumption is 0:09:42.952,0:09:45.883 that younger people tend to be more reckless drivers, 0:09:45.883,0:09:50.373 and whether the driver is focused or more easily distracted, 0:09:50.373,0:09:52.848 which again is going to affect driving quality. 0:09:53.648,0:09:58.681 Now since personality type is hard to observe, 0:09:58.681,0:10:03.155 we added another variable which is Good_student 0:10:03.155,0:10:05.654 which might indicate one’s personality type. 0:10:05.654,0:10:08.862 So let’s open [the] CPD for that one. 0:10:11.293,0:10:14.071 And so we can see here that, for example, 0:10:14.071,0:10:20.999 if you are a focused person who is young, 0:10:20.999,0:10:23.720 you’re much more likely to be a good student, 0:10:23.720,0:10:28.439 much more so than if you are not a focused person who is young. 0:10:28.439,0:10:32.303 If you’re old, you’re just not very likely to be a student, 0:10:32.303,0:10:38.021 and so this probability basically says that if you’re old, 0:10:38.021,0:10:39.883 you’re just not very likely to be a student, 0:10:39.883,0:10:41.322 and therefore not likely to be a good student. 0:10:42.014,0:10:47.608 So, now that we’ve added all these variables to the network, 0:10:47.608,0:10:51.161 let’s go ahead and run a few queries to see what happens. 0:10:51.161,0:10:56.918 And let’s start by looking at the prior probability of Accident 0:10:56.918,0:10:59.942 before we observe anything. 0:10:59.942,0:11:04.299 So we can see that the probability of no accident is about 79.5%. 0:11:04.299,0:11:07.065 The probability of severe accident is about 3%. 0:11:07.065,0:11:10.077 Now let’s go ahead and tell the system 0:11:10.077,0:11:11.837 that we have a good student at hand. 0:11:11.837,0:11:13.608 So we’re going to observe 0:11:13.608,0:11:15.978 that the student is a good student, 0:11:15.978,0:11:17.567 and let’s see what happens. 0:11:18.059,0:11:19.695 We can see, surprisingly, 0:11:19.695,0:11:20.807 that even though we observe 0:11:20.807,0:11:21.887 that somebody is a good student, 0:11:21.887,0:11:23.954 the probability of no accidents 0:11:23.954,0:11:27.699 went down from 79.5% to 78%, 0:11:27.699,0:11:29.582 and the probability of severe accidents 0:11:29.582,0:11:32.819 went up to 3.5 to 3.67 percent. 0:11:32.819,0:11:33.880 You might say, 0:11:33.880,0:11:35.986 “Well, but I told you that it’s a good student. 0:11:35.986,0:11:38.138 Shouldn’t the probability of accidents go down?” 0:11:38.307,0:11:41.972 So let’s look at some active trails in this graph. 0:11:41.972,0:11:46.461 One active trail goes from Good_student to Focused, 0:11:46.461,0:11:48.928 to Driver_quality, 0:11:48.928,0:11:49.939 to Accident. 0:11:49.939,0:11:53.602 And sure enough, if we consider that trail in isolation, 0:11:53.602,0:11:58.378 it’s probably going to make the probability of no accident be higher. 0:11:58.378,0:12:00.204 But, we have another active trail. 0:12:00.204,0:12:04.058 We have the active trail that goes from good student up to age, 0:12:04.058,0:12:07.085 and then back down, through [to] driver quality. 0:12:07.085,0:12:09.921 So, to see that, let’s unclick on good student 0:12:09.921,0:12:11.281 and see what happens. 0:12:11.281,0:12:15.767 Note that the probability initially that the driver is young was 25%, 0:12:15.767,0:12:17.710 but when I observed a good student, 0:12:17.710,0:12:20.538 it went up to close to 95%. 0:12:20.538,0:12:23.143 And that was enough to counteract the influence 0:12:23.143,0:12:27.446 along this more obvious active trail. 0:12:27.831,0:12:31.551 So, to demonstrate that this is indeed what’s going on, 0:12:31.551,0:12:35.783 let’s click on the fact 0:12:35.783,0:12:38.415 and instantiate the fact that the student is young, 0:12:38.415,0:12:43.446 and we can see that the probability of severe accident went up to 3.7% 0:12:43.446,0:12:47.967 and no accident went down to a little bit shy of 77%. 0:12:47.967,0:12:51.559 And now let’s observe good student and see what happens. 0:12:51.559,0:12:53.174 So now we observed good student, 0:12:53.174,0:13:01.656 and the probability of no accidents went down to 78%, 0:13:01.656,0:13:07.036 as opposed to before when it was 77%. 0:13:07.036,0:13:10.558 And the reason for that 0:13:10.558,0:13:12.773 is that we’ve now blocked this trail 0:13:12.773,0:13:15.871 that goes from good student, through age, to driver quality 0:13:15.871,0:13:17.917 by observing this variable which blocks the trail. 0:13:17.917,0:13:20.624 So we can see the reasoning patterns 0:13:20.624,0:13:24.981 in a Bayesian network are sometimes subtle. 0:13:24.981,0:13:28.640 And there are different trails that can affect things 0:13:28.640,0:13:31.748 and interact with each other in different ways. 0:13:31.748,0:13:34.698 And so it’s useful to take the model 0:13:34.698,0:13:36.315 and play around with different queries 0:13:36.315,0:13:37.734 and different combinations of evidence 0:13:37.734,0:13:40.026 to understand the behavior of a network. 0:13:40.026,0:13:41.341 And especially if you’re designing 0:13:41.341,0:13:43.786 such a network for a particular application, 0:13:43.786,0:13:46.290 it’s useful to try out these different queries 0:13:46.290,0:13:48.053 and see if the behavior that you get 0:13:48.053,0:13:49.706 is the behavior that you want to get. 0:13:49.706,0:13:52.076 And if not, then you need to thing about 0:13:52.076,0:13:55.962 how do I modify this network to get behavior 0:13:55.962,0:14:00.498 that’s more analogous to the desired behavior. 0:14:00.498,0:14:03.755 This network is available for you to play with 0:14:03.755,0:14:06.005 and you can try out different things 0:14:06.005,0:14:08.961 and see what behaviors you get.