So, now, let’s look at an example in [an] actual network,
and try to see what the CPD’s look like,
what behavior we get,
and how we might augment the network
to include additional things.
Now, let me warn you right upfront
that this is a baby network;
it’s not a real network,
but it’s compact enough to look at, but
still interesting enough to get some non-trivial behaviors.
So, to explore the network,
we’re going to use a system called SAMIAM.
It was produced by Adnan Darwiche and his group at UCLA,
and it’s nice
because it actually works on all sorts of different platforms,
so it’s usable by pretty much everyone.
So let’s look at a particular problem:
Imagine that we’re an insurance company
and we’re trying to decide
for a person who comes into the door
whether to give them insurance or not.
So the operative aspect of making that decision
is how much the policy is going to cost us,
that is, how much we’re going to have to pay
over the course of a year to insure this person.
So there is a variable called Cost.
Let’s click on that to see what properties that variable have.
And we can see that in this case,
we’ve decided to only give two values to the Cost variable,
Low and High.
This is clearly a very coarse-grained approximation
and not one that we will use in practice.
In reality we would probably
have this be a continuous variable
whose mean depends on various aspects of the model.
But for the purposes of our illustration,
we’re going to use this discrete distribution
that only has values Low and High.
Okay.
So now, let’s build up this network using the technique of
“expanding the conversation” that we’ve discussed before.
And so what is most important determining factor
as to the cost of the insurance company has to pay?
Well, probably whether the person has accidents
and how severe they are.
So here we have a network that has two variables:
One is Accident and one is Cost.
And in this case we decided to select
three possible values for the accident variable,
None, Mild, and Severe,
and with the probabilities that you see listed.
And what you see down below is the Cost variable.
And let’s open the CPD
of the Cost variable given the Accident variable.
And we can see that, in this case,
we have a conditional probability table
of Cost given Accident.
Note that this is actually inverted
from the notation that we have used in the class before,
because here the conditioning cases are columns,
whereas in the examples that we’ve given
[they] have been rows.
But that’s okay, it’s the same thing, just inverted.
And so we see, for example,
that if the person has no accidents,
the costs are very likely to be very low;
mild accidents incur different distributions over cost;
and severe accidents have
a probability of 0.9 of having high cost
and 0.1 of having low cost.
So now, let’s continue extending the conversation
and ask what Accident depends on.
And it seems that one of the obvious factors
is whether the person is a good driver or not.
And so we would expect driver quality
to be a parent of Accident.
But there is other things
that also affect not just the presence of an accident,
but also the severity of the accident.
So for example, vehicle size would affect
both the severity of an accident
because if you are driving a large SUV, then chances are
you are not likely to be in an accident as severe
but it might also perhaps increase
the chance of having an accident overall
because maybe driving a large car is harder to handle.
And then vehicle year might affect the chances of an accident
because of the presence or absence of certain safety features
like anti-lock brakes and airbags.
So let’s open the CPD of Accident
and see what that looks like
now that we have all these parents for it.
And we can see here that we have these,
in this case, eight conditioning cases,
correspond[ing] to three variables, two values each.
And so here just to look at one of the samples,
just as an example, distribution for example.
So, if this is a fairly new vehicle—after 2000—
and it’s an SUV,
the probability of having a severe accident is quite low.
and the probability of having a mild accident is moderate
and the probability of having of no accidents is 0.85
whereas if you compare that to corresponding entry
when we keep everything fixed except that now it’s a compact car,
we see that the probability of having a mild accident is lower,
but the probability of having no accidents is higher,
representing different driving patterns, for example.
Okay, so with this network,
we can now start asking simple questions.
So to do some example of causal inference,
let’s instantiate, for example, driving quality to be good.
And bad.
And we can see that for bad driver
the probability of low cost is 81%.
And for a good driver the probability of low cost is 87%.
If we look at the accidents
we can see that for a good driver
there is a probability of 87.5 percent of no accidents
and ten percent of mild accident.
And the probability of no accident goes down for a bad driver,
and mild accident goes up
and severe accidents also goes way up.
Now note that many of these differences are quite subtle.
There’s a difference of a couple percent one way or the other.
And you might think,
if you were designing a network,
that you’d like these really extreme probability changes
when you instantiate values.
But in many cases that’s not actually true,
and these subtle differences
are actually quite significant for an insurance company
that insures hundreds of thousands of people—
a couple of percentage points in the probability of an accident
can make a very big difference to one’s profitability.
So now let’s think about
how we would expand this network even further.
Vehicle size and vehicle year are things
that we’re likely to observe in the insurance forum.
But driver quality is something that’s very difficult to observe.
You can’t go ask somebody, “Oh, are you a good driver?”
Because everyone’s going to say,
“Sure, I’m the best driver ever!”
And so that’s not going to be a very useful question.
So what evidence do we have that we can observe
that might indicate to us the value of the driver quality?
One obvious one is the person’s driving record.
That is, whether they’ve had previous accidents
or previous moving violations.
So let’s think about adding a variable
that represents driving history.
And so let’s go ahead and introduce that variable.
So we can click on this button
that allows us to create a node.
The node is now called variable1
so we’d have to give it a name.
So for example we’re going to call it DrivingHistory.
And that’s its identifier,
and we also have the name of the variable,
which is usually the same.
And let’s make that two values,
say PreviousAccident and NoPreviousAccident.
Now where will we place this variable in the network?
One might initially think that the right thing to do
is to place DrivingHistory as a parent of Driver_quality
because driving history can influence
our beliefs about driver quality.
Now it’s true that observing driving history
changes our probability within driver quality,
but if you think about the actual causal structure of this scenario,
what we actually have is that driver quality is a causal factor
of both a previous accident
as well as a subsequent accident.
And so if we want to maintain
the intuitive causal structure of the domain,
a more appropriate thing is to add DrivingHistory as a child
rather than parent of Driver_quality.
[You] might question why it matters
and in this very simple example
the two models are in some sense equivalent
and we could have placed it either way
except that the CPD for driver quality given driving history
might be a little bit less intuitive.
But if we had other indicators of driver quality,
for example a previous moving violation,
then it actually makes a lot more sense
to have all of these be children of driver quality
as opposed to parents of driver quality.
Okay.
So that shows us how we would add a variable into the network.
And now let’s go and open up a much larger network
that includes these variables as well as others.
So let’s look now at this larger network.
And we can see
that we’ve added several different variables to the network.
We’ve added attributes of the vehicle,
for example whether the vehicle had antilock brakes and an airbag,
which is going to allow us to give
more informative probabilities regarding the accident.
We’ve also introduced aspects of the driver,
for example, whether they’ve had extra-track training,
which is going to increase driving quality,
whether they’re young or old,
where the presumption is
that younger people tend to be more reckless drivers,
and whether the driver is focused or more easily distracted,
which again is going to affect driving quality.
Now since personality type is hard to observe,
we added another variable which is Good_student
which might indicate one’s personality type.
So let’s open [the] CPD for that one.
And so we can see here that, for example,
if you are a focused person who is young,
you’re much more likely to be a good student,
much more so than if you are not a focused person who is young.
If you’re old, you’re just not very likely to be a student,
and so this probability basically says that if you’re old,
you’re just not very likely to be a student,
and therefore not likely to be a good student.
So, now that we’ve added all these variables to the network,
let’s go ahead and run a few queries to see what happens.
And let’s start by looking at the prior probability of Accident
before we observe anything.
So we can see that the probability of no accident is about 79.5%.
The probability of severe accident is about 3%.
Now let’s go ahead and tell the system
that we have a good student at hand.
So we’re going to observe
that the student is a good student,
and let’s see what happens.
We can see, surprisingly,
that even though we observe
that somebody is a good student,
the probability of no accidents
went down from 79.5% to 78%,
and the probability of severe accidents
went up to 3.5 to 3.67 percent.
You might say,
“Well, but I told you that it’s a good student.
Shouldn’t the probability of accidents go down?”
So let’s look at some active trails in this graph.
One active trail goes from Good_student to Focused,
to Driver_quality,
to Accident.
And sure enough, if we consider that trail in isolation,
it’s probably going to make the probability of no accident be higher.
But, we have another active trail.
We have the active trail that goes from good student up to age,
and then back down, through [to] driver quality.
So, to see that, let’s unclick on good student
and see what happens.
Note that the probability initially that the driver is young was 25%,
but when I observed a good student,
it went up to close to 95%.
And that was enough to counteract the influence
along this more obvious active trail.
So, to demonstrate that this is indeed what’s going on,
let’s click on the fact
and instantiate the fact that the student is young,
and we can see that the probability of severe accident went up to 3.7%
and no accident went down to a little bit shy of 77%.
And now let’s observe good student and see what happens.
So now we observed good student,
and the probability of no accidents went down to 78%,
as opposed to before when it was 77%.
And the reason for that
is that we’ve now blocked this trail
that goes from good student, through age, to driver quality
by observing this variable which blocks the trail.
So we can see the reasoning patterns
in a Bayesian network are sometimes subtle.
And there are different trails that can affect things
and interact with each other in different ways.
And so it’s useful to take the model
and play around with different queries
and different combinations of evidence
to understand the behavior of a network.
And especially if you’re designing
such a network for a particular application,
it’s useful to try out these different queries
and see if the behavior that you get
is the behavior that you want to get.
And if not, then you need to thing about
how do I modify this network to get behavior
that’s more analogous to the desired behavior.
This network is available for you to play with
and you can try out different things
and see what behaviors you get.