So, now, let’s look at an example in [an] actual network,

and try to see what the CPD’s look like,

what behavior we get,

and how we might augment the network

to include additional things.

Now, let me warn you right upfront

that this is a baby network;

it’s not a real network,

but it’s compact enough to look at, but

still interesting enough to get some non-trivial behaviors.

So, to explore the network,

we’re going to use a system called SAMIAM.

It was produced by Adnan Darwiche and his group at UCLA,

and it’s nice

because it actually works on all sorts of different platforms,

so it’s usable by pretty much everyone.

So let’s look at a particular problem:

Imagine that we’re an insurance company

and we’re trying to decide

for a person who comes into the door

whether to give them insurance or not.

So the operative aspect of making that decision

is how much the policy is going to cost us,

that is, how much we’re going to have to pay

over the course of a year to insure this person.

So there is a variable called Cost.

Let’s click on that to see what properties that variable have.

And we can see that in this case,

we’ve decided to only give two values to the Cost variable,

Low and High.

This is clearly a very coarse-grained approximation

and not one that we will use in practice.

In reality we would probably

have this be a continuous variable

whose mean depends on various aspects of the model.

But for the purposes of our illustration,

we’re going to use this discrete distribution

that only has values Low and High.

Okay.

So now, let’s build up this network using the technique of

“expanding the conversation” that we’ve discussed before.

And so what is most important determining factor

as to the cost of the insurance company has to pay?

Well, probably whether the person has accidents

and how severe they are.

So here we have a network that has two variables:

One is Accident and one is Cost.

And in this case we decided to select

three possible values for the accident variable,

None, Mild, and Severe,

and with the probabilities that you see listed.

And what you see down below is the Cost variable.

And let’s open the CPD

of the Cost variable given the Accident variable.

And we can see that, in this case,

we have a conditional probability table

of Cost given Accident.

Note that this is actually inverted

from the notation that we have used in the class before,

because here the conditioning cases are columns,

whereas in the examples that we’ve given

[they] have been rows.

But that’s okay, it’s the same thing, just inverted.

And so we see, for example,

that if the person has no accidents,

the costs are very likely to be very low;

mild accidents incur different distributions over cost;

and severe accidents have

a probability of 0.9 of having high cost

and 0.1 of having low cost.

So now, let’s continue extending the conversation

and ask what Accident depends on.

And it seems that one of the obvious factors

is whether the person is a good driver or not.

And so we would expect driver quality

to be a parent of Accident.

But there is other things

that also affect not just the presence of an accident,

but also the severity of the accident.

So for example, vehicle size would affect

both the severity of an accident

because if you are driving a large SUV, then chances are

you are not likely to be in an accident as severe

but it might also perhaps increase

the chance of having an accident overall

because maybe driving a large car is harder to handle.

And then vehicle year might affect the chances of an accident

because of the presence or absence of certain safety features

like anti-lock brakes and airbags.

So let’s open the CPD of Accident

and see what that looks like

now that we have all these parents for it.

And we can see here that we have these,

in this case, eight conditioning cases,

correspond[ing] to three variables, two values each.

And so here just to look at one of the samples,

just as an example, distribution for example.

So, if this is a fairly new vehicle—after 2000—

and it’s an SUV,

the probability of having a severe accident is quite low.

and the probability of having a mild accident is moderate

and the probability of having of no accidents is 0.85

whereas if you compare that to corresponding entry

when we keep everything fixed except that now it’s a compact car,

we see that the probability of having a mild accident is lower,

but the probability of having no accidents is higher,

representing different driving patterns, for example.

Okay, so with this network,

we can now start asking simple questions.

So to do some example of causal inference,

let’s instantiate, for example, driving quality to be good.

And bad.

And we can see that for bad driver

the probability of low cost is 81%.

And for a good driver the probability of low cost is 87%.

If we look at the accidents

we can see that for a good driver

there is a probability of 87.5 percent of no accidents

and ten percent of mild accident.

And the probability of no accident goes down for a bad driver,

and mild accident goes up

and severe accidents also goes way up.

Now note that many of these differences are quite subtle.

There’s a difference of a couple percent one way or the other.

And you might think,

if you were designing a network,

that you’d like these really extreme probability changes

when you instantiate values.

But in many cases that’s not actually true,

and these subtle differences

are actually quite significant for an insurance company

that insures hundreds of thousands of people—

a couple of percentage points in the probability of an accident

can make a very big difference to one’s profitability.

So now let’s think about

how we would expand this network even further.

Vehicle size and vehicle year are things

that we’re likely to observe in the insurance forum.

But driver quality is something that’s very difficult to observe.

You can’t go ask somebody, “Oh, are you a good driver?”

Because everyone’s going to say,

“Sure, I’m the best driver ever!”

And so that’s not going to be a very useful question.

So what evidence do we have that we can observe

that might indicate to us the value of the driver quality?

One obvious one is the person’s driving record.

That is, whether they’ve had previous accidents

or previous moving violations.

So let’s think about adding a variable

that represents driving history.

And so let’s go ahead and introduce that variable.

So we can click on this button

that allows us to create a node.

The node is now called variable1

so we’d have to give it a name.

So for example we’re going to call it DrivingHistory.

And that’s its identifier,

and we also have the name of the variable,

which is usually the same.

And let’s make that two values,

say PreviousAccident and NoPreviousAccident.

Now where will we place this variable in the network?

One might initially think that the right thing to do

is to place DrivingHistory as a parent of Driver_quality

because driving history can influence

our beliefs about driver quality.

Now it’s true that observing driving history

changes our probability within driver quality,

but if you think about the actual causal structure of this scenario,

what we actually have is that driver quality is a causal factor

of both a previous accident

as well as a subsequent accident.

And so if we want to maintain

the intuitive causal structure of the domain,

a more appropriate thing is to add DrivingHistory as a child

rather than parent of Driver_quality.

[You] might question why it matters

and in this very simple example

the two models are in some sense equivalent

and we could have placed it either way

except that the CPD for driver quality given driving history

might be a little bit less intuitive.

But if we had other indicators of driver quality,

for example a previous moving violation,

then it actually makes a lot more sense

to have all of these be children of driver quality

as opposed to parents of driver quality.

Okay.

So that shows us how we would add a variable into the network.

And now let’s go and open up a much larger network

that includes these variables as well as others.

So let’s look now at this larger network.

And we can see

that we’ve added several different variables to the network.

We’ve added attributes of the vehicle,

for example whether the vehicle had antilock brakes and an airbag,

which is going to allow us to give

more informative probabilities regarding the accident.

We’ve also introduced aspects of the driver,

for example, whether they’ve had extra-track training,

which is going to increase driving quality,

whether they’re young or old,

where the presumption is

that younger people tend to be more reckless drivers,

and whether the driver is focused or more easily distracted,

which again is going to affect driving quality.

Now since personality type is hard to observe,

we added another variable which is Good_student

which might indicate one’s personality type.

So let’s open [the] CPD for that one.

And so we can see here that, for example,

if you are a focused person who is young,

you’re much more likely to be a good student,

much more so than if you are not a focused person who is young.

If you’re old, you’re just not very likely to be a student,

and so this probability basically says that if you’re old,

you’re just not very likely to be a student,

and therefore not likely to be a good student.

So, now that we’ve added all these variables to the network,

let’s go ahead and run a few queries to see what happens.

And let’s start by looking at the prior probability of Accident

before we observe anything.

So we can see that the probability of no accident is about 79.5%.

The probability of severe accident is about 3%.

Now let’s go ahead and tell the system

that we have a good student at hand.

So we’re going to observe

that the student is a good student,

and let’s see what happens.

We can see, surprisingly,

that even though we observe

that somebody is a good student,

the probability of no accidents

went down from 79.5% to 78%,

and the probability of severe accidents

went up to 3.5 to 3.67 percent.

You might say,

“Well, but I <i>told</i> you that it’s a good student.

Shouldn’t the probability of accidents go down?”

So let’s look at some active trails in this graph.

One active trail goes from Good_student to Focused,

to Driver_quality,

to Accident.

And sure enough, if we consider that trail in isolation,

it’s probably going to make the probability of no accident be higher.

But, we have another active trail.

We have the active trail that goes from good student up to age,

and then back down, through [to] driver quality.

So, to see that, let’s unclick on good student

and see what happens.

Note that the probability initially that the driver is young was 25%,

but when I observed a good student,

it went up to close to 95%.

And that was enough to counteract the influence

along this more obvious active trail.

So, to demonstrate that this is indeed what’s going on,

let’s click on the fact

and instantiate the fact that the student is young,

and we can see that the probability of severe accident went up to 3.7%

and no accident went down to a little bit shy of 77%.

And now let’s observe good student and see what happens.

So now we observed good student,

and the probability of no accidents went down to 78%,

as opposed to before when it was 77%.

And the reason for that

is that we’ve now blocked this trail

that goes from good student, through age, to driver quality

by observing this variable which blocks the trail.

So we can see the reasoning patterns

in a Bayesian network are sometimes subtle.

And there are different trails that can affect things

and interact with each other in different ways.

And so it’s useful to take the model

and play around with different queries

and different combinations of evidence

to understand the behavior of a network.

And especially if you’re designing

such a network for a particular application,

it’s useful to try out these different queries

and see if the behavior that you get

is the behavior that you want to get.

And if not, then you need to thing about

how do I modify this network to get behavior

that’s more analogous to the desired behavior.

This network is available for you to play with

and you can try out different things

and see what behaviors you get.