So, now, let’s look at an example in [an] actual network, and try to see what the CPD’s look like, what behavior we get, and how we might augment the network to include additional things. Now, let me warn you right upfront that this is a baby network; it’s not a real network, but it’s compact enough to look at, but still interesting enough to get some non-trivial behaviors. So, to explore the network, we’re going to use a system called SAMIAM. It was produced by Adnan Darwiche and his group at UCLA, and it’s nice because it actually works on all sorts of different platforms, so it’s usable by pretty much everyone. So let’s look at a particular problem: Imagine that we’re an insurance company and we’re trying to decide for a person who comes into the door whether to give them insurance or not. So the operative aspect of making that decision is how much the policy is going to cost us, that is, how much we’re going to have to pay over the course of a year to insure this person. So there is a variable called Cost. Let’s click on that to see what properties that variable have. And we can see that in this case, we’ve decided to only give two values to the Cost variable, Low and High. This is clearly a very coarse-grained approximation and not one that we will use in practice. In reality we would probably have this be a continuous variable whose mean depends on various aspects of the model. But for the purposes of our illustration, we’re going to use this discrete distribution that only has values Low and High. Okay. So now, let’s build up this network using the technique of “expanding the conversation” that we’ve discussed before. And so what is most important determining factor as to the cost of the insurance company has to pay? Well, probably whether the person has accidents and how severe they are. So here we have a network that has two variables: One is Accident and one is Cost. And in this case we decided to select three possible values for the accident variable, None, Mild, and Severe, and with the probabilities that you see listed. And what you see down below is the Cost variable. And let’s open the CPD of the Cost variable given the Accident variable. And we can see that, in this case, we have a conditional probability table of Cost given Accident. Note that this is actually inverted from the notation that we have used in the class before, because here the conditioning cases are columns, whereas in the examples that we’ve given [they] have been rows. But that’s okay, it’s the same thing, just inverted. And so we see, for example, that if the person has no accidents, the costs are very likely to be very low; mild accidents incur different distributions over cost; and severe accidents have a probability of 0.9 of having high cost and 0.1 of having low cost. So now, let’s continue extending the conversation and ask what Accident depends on. And it seems that one of the obvious factors is whether the person is a good driver or not. And so we would expect driver quality to be a parent of Accident. But there is other things that also affect not just the presence of an accident, but also the severity of the accident. So for example, vehicle size would affect both the severity of an accident because if you are driving a large SUV, then chances are you are not likely to be in an accident as severe but it might also perhaps increase the chance of having an accident overall because maybe driving a large car is harder to handle. And then vehicle year might affect the chances of an accident because of the presence or absence of certain safety features like anti-lock brakes and airbags. So let’s open the CPD of Accident and see what that looks like now that we have all these parents for it. And we can see here that we have these, in this case, eight conditioning cases, correspond[ing] to three variables, two values each. And so here just to look at one of the samples, just as an example, distribution for example. So, if this is a fairly new vehicle—after 2000— and it’s an SUV, the probability of having a severe accident is quite low. and the probability of having a mild accident is moderate and the probability of having of no accidents is 0.85 whereas if you compare that to corresponding entry when we keep everything fixed except that now it’s a compact car, we see that the probability of having a mild accident is lower, but the probability of having no accidents is higher, representing different driving patterns, for example. Okay, so with this network, we can now start asking simple questions. So to do some example of causal inference, let’s instantiate, for example, driving quality to be good. And bad. And we can see that for bad driver the probability of low cost is 81%. And for a good driver the probability of low cost is 87%. If we look at the accidents we can see that for a good driver there is a probability of 87.5 percent of no accidents and ten percent of mild accident. And the probability of no accident goes down for a bad driver, and mild accident goes up and severe accidents also goes way up. Now note that many of these differences are quite subtle. There’s a difference of a couple percent one way or the other. And you might think, if you were designing a network, that you’d like these really extreme probability changes when you instantiate values. But in many cases that’s not actually true, and these subtle differences are actually quite significant for an insurance company that insures hundreds of thousands of people— a couple of percentage points in the probability of an accident can make a very big difference to one’s profitability. So now let’s think about how we would expand this network even further. Vehicle size and vehicle year are things that we’re likely to observe in the insurance forum. But driver quality is something that’s very difficult to observe. You can’t go ask somebody, “Oh, are you a good driver?” Because everyone’s going to say, “Sure, I’m the best driver ever!” And so that’s not going to be a very useful question. So what evidence do we have that we can observe that might indicate to us the value of the driver quality? One obvious one is the person’s driving record. That is, whether they’ve had previous accidents or previous moving violations. So let’s think about adding a variable that represents driving history. And so let’s go ahead and introduce that variable. So we can click on this button that allows us to create a node. The node is now called variable1 so we’d have to give it a name. So for example we’re going to call it DrivingHistory. And that’s its identifier, and we also have the name of the variable, which is usually the same. And let’s make that two values, say PreviousAccident and NoPreviousAccident. Now where will we place this variable in the network? One might initially think that the right thing to do is to place DrivingHistory as a parent of Driver_quality because driving history can influence our beliefs about driver quality. Now it’s true that observing driving history changes our probability within driver quality, but if you think about the actual causal structure of this scenario, what we actually have is that driver quality is a causal factor of both a previous accident as well as a subsequent accident. And so if we want to maintain the intuitive causal structure of the domain, a more appropriate thing is to add DrivingHistory as a child rather than parent of Driver_quality. [You] might question why it matters and in this very simple example the two models are in some sense equivalent and we could have placed it either way except that the CPD for driver quality given driving history might be a little bit less intuitive. But if we had other indicators of driver quality, for example a previous moving violation, then it actually makes a lot more sense to have all of these be children of driver quality as opposed to parents of driver quality. Okay. So that shows us how we would add a variable into the network. And now let’s go and open up a much larger network that includes these variables as well as others. So let’s look now at this larger network. And we can see that we’ve added several different variables to the network. We’ve added attributes of the vehicle, for example whether the vehicle had antilock brakes and an airbag, which is going to allow us to give more informative probabilities regarding the accident. We’ve also introduced aspects of the driver, for example, whether they’ve had extra-track training, which is going to increase driving quality, whether they’re young or old, where the presumption is that younger people tend to be more reckless drivers, and whether the driver is focused or more easily distracted, which again is going to affect driving quality. Now since personality type is hard to observe, we added another variable which is Good_student which might indicate one’s personality type. So let’s open [the] CPD for that one. And so we can see here that, for example, if you are a focused person who is young, you’re much more likely to be a good student, much more so than if you are not a focused person who is young. If you’re old, you’re just not very likely to be a student, and so this probability basically says that if you’re old, you’re just not very likely to be a student, and therefore not likely to be a good student. So, now that we’ve added all these variables to the network, let’s go ahead and run a few queries to see what happens. And let’s start by looking at the prior probability of Accident before we observe anything. So we can see that the probability of no accident is about 79.5%. The probability of severe accident is about 3%. Now let’s go ahead and tell the system that we have a good student at hand. So we’re going to observe that the student is a good student, and let’s see what happens. We can see, surprisingly, that even though we observe that somebody is a good student, the probability of no accidents went down from 79.5% to 78%, and the probability of severe accidents went up to 3.5 to 3.67 percent. You might say, “Well, but I told you that it’s a good student. Shouldn’t the probability of accidents go down?” So let’s look at some active trails in this graph. One active trail goes from Good_student to Focused, to Driver_quality, to Accident. And sure enough, if we consider that trail in isolation, it’s probably going to make the probability of no accident be higher. But, we have another active trail. We have the active trail that goes from good student up to age, and then back down, through [to] driver quality. So, to see that, let’s unclick on good student and see what happens. Note that the probability initially that the driver is young was 25%, but when I observed a good student, it went up to close to 95%. And that was enough to counteract the influence along this more obvious active trail. So, to demonstrate that this is indeed what’s going on, let’s click on the fact and instantiate the fact that the student is young, and we can see that the probability of severe accident went up to 3.7% and no accident went down to a little bit shy of 77%. And now let’s observe good student and see what happens. So now we observed good student, and the probability of no accidents went down to 78%, as opposed to before when it was 77%. And the reason for that is that we’ve now blocked this trail that goes from good student, through age, to driver quality by observing this variable which blocks the trail. So we can see the reasoning patterns in a Bayesian network are sometimes subtle. And there are different trails that can affect things and interact with each other in different ways. And so it’s useful to take the model and play around with different queries and different combinations of evidence to understand the behavior of a network. And especially if you’re designing such a network for a particular application, it’s useful to try out these different queries and see if the behavior that you get is the behavior that you want to get. And if not, then you need to thing about how do I modify this network to get behavior that’s more analogous to the desired behavior. This network is available for you to play with and you can try out different things and see what behaviors you get.