0:00:00.118,0:00:02.898
So, now, let’s look at an example in [an] actual network,

0:00:02.898,0:00:06.181
and try to see what the CPD’s look like,

0:00:06.181,0:00:07.738
what behavior we get,

0:00:07.738,0:00:09.488
and how we might augment the network

0:00:09.488,0:00:11.238
to include additional things.

0:00:11.238,0:00:12.802
Now, let me warn you right upfront

0:00:12.802,0:00:14.656
that this is a baby network;

0:00:14.656,0:00:16.263
it’s not a real network,

0:00:16.263,0:00:19.622
but it’s compact enough to look at, but

0:00:19.622,0:00:22.918
still interesting enough to get some non-trivial behaviors.

0:00:24.579,0:00:26.593
So, to explore the network,

0:00:26.593,0:00:28.854
we’re going to use a system called SAMIAM.

0:00:28.854,0:00:31.662
It was produced by Adnan Darwiche and his group at UCLA,

0:00:31.662,0:00:32.640
and it’s nice

0:00:32.640,0:00:36.151
because it actually works on all sorts of different platforms,

0:00:36.151,0:00:38.911
so it’s usable by pretty much everyone.

0:00:38.911,0:00:41.537
So let’s look at a particular problem:

0:00:41.537,0:00:43.616
Imagine that we’re an insurance company

0:00:43.616,0:00:44.888
and we’re trying to decide

0:00:44.888,0:00:46.438
for a person who comes into the door

0:00:46.438,0:00:49.002
whether to give them insurance or not.

0:00:49.002,0:00:51.746
So the operative aspect of making that decision

0:00:51.746,0:00:54.369
is how much the policy is going to cost us,

0:00:54.369,0:00:55.892
that is, how much we’re going to have to pay

0:00:55.892,0:00:58.915
over the course of a year to insure this person.

0:00:58.915,0:01:02.287
So there is a variable called Cost.

0:01:02.287,0:01:06.847
Let’s click on that to see what properties that variable have.

0:01:06.847,0:01:08.507
And we can see that in this case,

0:01:08.507,0:01:11.952
we’ve decided to only give two values to the Cost variable,

0:01:11.952,0:01:13.772
Low and High.

0:01:13.772,0:01:16.790
This is clearly a very coarse-grained approximation

0:01:16.790,0:01:18.472
and not one that we will use in practice.

0:01:18.472,0:01:20.303
In reality we would probably

0:01:20.303,0:01:22.135
have this be a continuous variable

0:01:22.135,0:01:25.887
whose mean depends on various aspects of the model.

0:01:25.887,0:01:27.783
But for the purposes of our illustration,

0:01:27.783,0:01:29.559
we’re going to use this discrete distribution

0:01:29.559,0:01:31.273
that only has values Low and High.

0:01:31.273,0:01:32.647
Okay.

0:01:32.647,0:01:36.855
So now, let’s build up this network using the technique of

0:01:36.855,0:01:39.504
“expanding the conversation” that we’ve discussed before.

0:01:39.504,0:01:44.224
And so what is most important determining factor

0:01:44.224,0:01:46.740
as to the cost of the insurance company has to pay?

0:01:46.740,0:01:50.647
Well, probably whether the person has accidents

0:01:50.647,0:01:51.977
and how severe they are.

0:01:51.977,0:01:57.005
So here we have a network that has two variables:

0:01:57.005,0:01:59.724
One is Accident and one is Cost.

0:01:59.724,0:02:02.645
And in this case we decided to select

0:02:02.645,0:02:05.701
three possible values for the accident variable,

0:02:05.701,0:02:09.031
None, Mild, and Severe,

0:02:09.031,0:02:14.442
and with the probabilities that you see listed.

0:02:14.442,0:02:17.447
And what you see down below is the Cost variable.

0:02:17.447,0:02:18.759
And let’s open the CPD

0:02:18.759,0:02:24.845
of the Cost variable given the Accident variable.

0:02:24.845,0:02:26.946
And we can see that, in this case,

0:02:26.946,0:02:29.169
we have a conditional probability table

0:02:29.169,0:02:33.358
of Cost given Accident.

0:02:33.358,0:02:35.485
Note that this is actually inverted

0:02:35.485,0:02:38.542
from the notation that we have used in the class before,

0:02:38.542,0:02:41.713
because here the conditioning cases are columns,

0:02:41.713,0:02:44.545
whereas in the examples that we’ve given

0:02:44.545,0:02:45.754
[they] have been rows.

0:02:45.754,0:02:49.189
But that’s okay, it’s the same thing, just inverted.

0:02:49.650,0:02:51.102
And so we see, for example,

0:02:51.102,0:02:54.418
that if the person has no accidents,

0:02:54.418,0:02:56.765
the costs are very likely to be very low;

0:02:56.765,0:03:01.581
mild accidents incur different distributions over cost;

0:03:01.581,0:03:03.173
and severe accidents have

0:03:03.173,0:03:05.789
a probability of 0.9 of having high cost

0:03:05.789,0:03:08.414
and 0.1 of having low cost.

0:03:09.060,0:03:11.749
So now, let’s continue extending the conversation

0:03:11.749,0:03:14.088
and ask what Accident depends on.

0:03:14.088,0:03:17.207
And it seems that one of the obvious factors

0:03:17.207,0:03:20.409
is whether the person is a good driver or not.

0:03:20.409,0:03:22.661
And so we would expect driver quality

0:03:22.661,0:03:23.965
to be a parent of Accident.

0:03:23.965,0:03:25.214
But there is other things

0:03:25.214,0:03:27.853
that also affect not just the presence of an accident,

0:03:27.853,0:03:29.911
but also the severity of the accident.

0:03:29.911,0:03:33.909
So for example, vehicle size would affect

0:03:33.909,0:03:37.117
both the severity of an accident

0:03:37.117,0:03:40.788
because if you are driving a large SUV, then chances are

0:03:40.788,0:03:43.599
you are not likely to be in an accident as severe

0:03:43.599,0:03:45.685
but it might also perhaps increase

0:03:45.685,0:03:47.374
the chance of having an accident overall

0:03:47.374,0:03:51.830
because maybe driving a large car is harder to handle.

0:03:52.615,0:03:56.381
And then vehicle year might affect the chances of an accident

0:03:56.381,0:03:59.509
because of the presence or absence of certain safety features

0:03:59.509,0:04:02.037
like anti-lock brakes and airbags.

0:04:02.037,0:04:03.814
So let’s open the CPD of Accident

0:04:03.814,0:04:05.173
and see what that looks like

0:04:05.173,0:04:07.325
now that we have all these parents for it.

0:04:07.325,0:04:10.213
And we can see here that we have these,

0:04:10.213,0:04:13.165
in this case, eight conditioning cases,

0:04:13.165,0:04:18.317
correspond[ing] to three variables, two values each.

0:04:18.317,0:04:22.757
And so here just to look at one of the samples,

0:04:22.757,0:04:26.046
just as an example, distribution for example.

0:04:26.046,0:04:30.749
So, if this is a fairly new vehicle—after 2000—

0:04:30.749,0:04:32.398
and it’s an SUV,

0:04:32.398,0:04:35.725
the probability of having a severe accident is quite low.

0:04:35.725,0:04:38.813
and the probability of having a mild accident is moderate

0:04:38.813,0:04:44.774
and the probability of having of no accidents is 0.85

0:04:44.774,0:04:48.525
whereas if you compare that to corresponding entry

0:04:48.525,0:04:52.053
when we keep everything fixed except that now it’s a compact car,

0:04:52.053,0:05:00.509
we see that the probability of having a mild accident is lower,

0:05:00.509,0:05:03.045
but the probability of having no accidents is higher,

0:05:03.045,0:05:08.059
representing different driving patterns, for example.

0:05:08.506,0:05:11.649
Okay, so with this network,

0:05:11.649,0:05:13.649
we can now start asking simple questions.

0:05:14.695,0:05:17.145
So to do some example of causal inference,

0:05:17.145,0:05:20.559
let’s instantiate, for example, driving quality to be good.

0:05:21.574,0:05:23.936
And bad.

0:05:23.936,0:05:27.054
And we can see that for bad driver

0:05:27.054,0:05:31.397
the probability of low cost is 81%.

0:05:31.397,0:05:36.325
And for a good driver the probability of low cost is 87%.

0:05:36.325,0:05:38.381
If we look at the accidents

0:05:38.381,0:05:41.278
we can see that for a good driver

0:05:41.278,0:05:44.800
there is a probability of 87.5 percent of no accidents

0:05:44.800,0:05:46.431
and ten percent of mild accident.

0:05:46.431,0:05:50.957
And the probability of no accident goes down for a bad driver,

0:05:50.957,0:05:53.422
and mild accident goes up

0:05:53.422,0:05:55.453
and severe accidents also goes way up.

0:05:55.453,0:05:59.077
Now note that many of these differences are quite subtle.

0:05:59.077,0:06:02.054
There’s a difference of a couple percent one way or the other.

0:06:02.054,0:06:04.038
And you might think,

0:06:04.038,0:06:05.326
if you were designing a network,

0:06:05.326,0:06:09.245
that you’d like these really extreme probability changes

0:06:09.245,0:06:11.485
when you instantiate values.

0:06:11.485,0:06:13.790
But in many cases that’s not actually true,

0:06:13.790,0:06:15.052
and these subtle differences

0:06:15.052,0:06:17.643
are actually quite significant for an insurance company

0:06:17.643,0:06:19.819
that insures hundreds of thousands of people—

0:06:19.819,0:06:22.495
a couple of percentage points in the probability of an accident

0:06:22.495,0:06:24.855
can make a very big difference to one’s profitability.

0:06:25.809,0:06:26.972
So now let’s think about

0:06:26.972,0:06:29.519
how we would expand this network even further.

0:06:30.196,0:06:33.244
Vehicle size and vehicle year are things

0:06:33.244,0:06:35.851
that we’re likely to observe in the insurance forum.

0:06:35.851,0:06:39.070
But driver quality is something that’s very difficult to observe.

0:06:39.070,0:06:41.829
You can’t go ask somebody, “Oh, are you a good driver?”

0:06:41.829,0:06:43.451
Because everyone’s going to say,

0:06:43.451,0:06:45.123
“Sure, I’m the best driver ever!”

0:06:45.123,0:06:49.272
And so that’s not going to be a very useful question.

0:06:49.272,0:06:53.147
So what evidence do we have that we can observe

0:06:53.147,0:06:57.148
that might indicate to us the value of the driver quality?

0:06:57.148,0:07:01.491
One obvious one is the person’s driving record.

0:07:01.491,0:07:03.556
That is, whether they’ve had previous accidents

0:07:03.556,0:07:05.355
or previous moving violations.

0:07:05.832,0:07:08.412
So let’s think about adding a variable

0:07:08.412,0:07:09.880
that represents driving history.

0:07:10.665,0:07:13.507
And so let’s go ahead and introduce that variable.

0:07:13.507,0:07:16.315
So we can click on this button

0:07:16.315,0:07:17.697
that allows us to create a node.

0:07:17.697,0:07:19.841
The node is now called variable1

0:07:19.841,0:07:21.036
so we’d have to give it a name.

0:07:21.036,0:07:24.787
So for example we’re going to call it DrivingHistory.

0:07:26.079,0:07:28.074
And that’s its identifier,

0:07:28.074,0:07:30.620
and we also have the name of the variable,

0:07:30.620,0:07:32.246
which is usually the same.

0:07:32.246,0:07:35.195
And let’s make that two values,

0:07:35.195,0:07:37.725
say PreviousAccident and NoPreviousAccident.

0:07:41.586,0:07:45.760
Now where will we place this variable in the network?

0:07:45.760,0:07:48.828
One might initially think that the right thing to do

0:07:48.828,0:07:53.228
is to place DrivingHistory as a parent of Driver_quality

0:07:53.228,0:07:57.150
because driving history can influence

0:07:57.150,0:07:59.044
our beliefs about driver quality.

0:07:59.044,0:08:01.290
Now it’s true that observing driving history

0:08:01.290,0:08:03.651
changes our probability within driver quality,

0:08:03.651,0:08:07.443
but if you think about the actual causal structure of this scenario,

0:08:07.443,0:08:11.878
what we actually have is that driver quality is a causal factor

0:08:11.878,0:08:13.968
of both a previous accident

0:08:13.968,0:08:16.765
as well as a subsequent accident.

0:08:16.765,0:08:18.327
And so if we want to maintain

0:08:18.327,0:08:20.421
the intuitive causal structure of the domain,

0:08:20.421,0:08:28.095
a more appropriate thing is to add DrivingHistory as a child

0:08:28.095,0:08:29.972
rather than parent of Driver_quality.

0:08:29.972,0:08:32.187
[You] might question why it matters

0:08:32.187,0:08:33.864
and in this very simple example

0:08:33.864,0:08:36.874
the two models are in some sense equivalent

0:08:36.874,0:08:38.851
and we could have placed it either way

0:08:38.851,0:08:44.076
except that the CPD for driver quality given driving history

0:08:44.076,0:08:46.006
might be a little bit less intuitive.

0:08:46.006,0:08:49.955
But if we had other indicators of driver quality,

0:08:49.955,0:08:52.436
for example a previous moving violation,

0:08:52.436,0:08:55.661
then it actually makes a lot more sense

0:08:55.661,0:08:58.559
to have all of these be children of driver quality

0:08:58.559,0:09:00.925
as opposed to parents of driver quality.

0:09:01.802,0:09:02.680
Okay.

0:09:02.680,0:09:07.481
So that shows us how we would add a variable into the network.

0:09:07.481,0:09:09.764
And now let’s go and open up a much larger network

0:09:09.764,0:09:13.007
that includes these variables as well as others.

0:09:13.007,0:09:15.971
So let’s look now at this larger network.

0:09:15.971,0:09:17.347
And we can see

0:09:17.347,0:09:20.237
that we’ve added several different variables to the network.

0:09:20.237,0:09:23.211
We’ve added attributes of the vehicle,

0:09:23.211,0:09:27.284
for example whether the vehicle had antilock brakes and an airbag,

0:09:27.284,0:09:28.787
which is going to allow us to give

0:09:28.787,0:09:31.483
more informative probabilities regarding the accident.

0:09:31.483,0:09:35.227
We’ve also introduced aspects of the driver,

0:09:35.227,0:09:38.243
for example, whether they’ve had extra-track training,

0:09:38.243,0:09:40.084
which is going to increase driving quality,

0:09:40.084,0:09:41.684
whether they’re young or old,

0:09:41.684,0:09:42.952
where the presumption is

0:09:42.952,0:09:45.883
that younger people tend to be more reckless drivers,

0:09:45.883,0:09:50.373
and whether the driver is focused or more easily distracted,

0:09:50.373,0:09:52.848
which again is going to affect driving quality.

0:09:53.648,0:09:58.681
Now since personality type is hard to observe,

0:09:58.681,0:10:03.155
we added another variable which is Good_student

0:10:03.155,0:10:05.654
which might indicate one’s personality type.

0:10:05.654,0:10:08.862
So let’s open [the] CPD for that one.

0:10:11.293,0:10:14.071
And so we can see here that, for example,

0:10:14.071,0:10:20.999
if you are a focused person who is young,

0:10:20.999,0:10:23.720
you’re much more likely to be a good student,

0:10:23.720,0:10:28.439
much more so than if you are not a focused person who is young.

0:10:28.439,0:10:32.303
If you’re old, you’re just not very likely to be a student,

0:10:32.303,0:10:38.021
and so this probability basically says that if you’re old,

0:10:38.021,0:10:39.883
you’re just not very likely to be a student,

0:10:39.883,0:10:41.322
and therefore not likely to be a good student.

0:10:42.014,0:10:47.608
So, now that we’ve added all these variables to the network,

0:10:47.608,0:10:51.161
let’s go ahead and run a few queries to see what happens.

0:10:51.161,0:10:56.918
And let’s start by looking at the prior probability of Accident

0:10:56.918,0:10:59.942
before we observe anything.

0:10:59.942,0:11:04.299
So we can see that the probability of no accident is about 79.5%.

0:11:04.299,0:11:07.065
The probability of severe accident is about 3%.

0:11:07.065,0:11:10.077
Now let’s go ahead and tell the system

0:11:10.077,0:11:11.837
that we have a good student at hand.

0:11:11.837,0:11:13.608
So we’re going to observe

0:11:13.608,0:11:15.978
that the student is a good student,

0:11:15.978,0:11:17.567
and let’s see what happens.

0:11:18.059,0:11:19.695
We can see, surprisingly,

0:11:19.695,0:11:20.807
that even though we observe

0:11:20.807,0:11:21.887
that somebody is a good student,

0:11:21.887,0:11:23.954
the probability of no accidents

0:11:23.954,0:11:27.699
went down from 79.5% to 78%,

0:11:27.699,0:11:29.582
and the probability of severe accidents

0:11:29.582,0:11:32.819
went up to 3.5 to 3.67 percent.

0:11:32.819,0:11:33.880
You might say,

0:11:33.880,0:11:35.986
“Well, but I <i>told</i> you that it’s a good student.

0:11:35.986,0:11:38.138
Shouldn’t the probability of accidents go down?”

0:11:38.307,0:11:41.972
So let’s look at some active trails in this graph.

0:11:41.972,0:11:46.461
One active trail goes from Good_student to Focused,

0:11:46.461,0:11:48.928
to Driver_quality,

0:11:48.928,0:11:49.939
to Accident.

0:11:49.939,0:11:53.602
And sure enough, if we consider that trail in isolation,

0:11:53.602,0:11:58.378
it’s probably going to make the probability of no accident be higher.

0:11:58.378,0:12:00.204
But, we have another active trail.

0:12:00.204,0:12:04.058
We have the active trail that goes from good student up to age,

0:12:04.058,0:12:07.085
and then back down, through [to] driver quality.

0:12:07.085,0:12:09.921
So, to see that, let’s unclick on good student

0:12:09.921,0:12:11.281
and see what happens.

0:12:11.281,0:12:15.767
Note that the probability initially that the driver is young was 25%,

0:12:15.767,0:12:17.710
but when I observed a good student,

0:12:17.710,0:12:20.538
it went up to close to 95%.

0:12:20.538,0:12:23.143
And that was enough to counteract the influence

0:12:23.143,0:12:27.446
along this more obvious active trail.

0:12:27.831,0:12:31.551
So, to demonstrate that this is indeed what’s going on,

0:12:31.551,0:12:35.783
let’s click on the fact

0:12:35.783,0:12:38.415
and instantiate the fact that the student is young,

0:12:38.415,0:12:43.446
and we can see that the probability of severe accident went up to 3.7%

0:12:43.446,0:12:47.967
and no accident went down to a little bit shy of 77%.

0:12:47.967,0:12:51.559
And now let’s observe good student and see what happens.

0:12:51.559,0:12:53.174
So now we observed good student,

0:12:53.174,0:13:01.656
and the probability of no accidents went down to 78%,

0:13:01.656,0:13:07.036
as opposed to before when it was 77%.

0:13:07.036,0:13:10.558
And the reason for that

0:13:10.558,0:13:12.773
is that we’ve now blocked this trail

0:13:12.773,0:13:15.871
that goes from good student, through age, to driver quality

0:13:15.871,0:13:17.917
by observing this variable which blocks the trail.

0:13:17.917,0:13:20.624
So we can see the reasoning patterns

0:13:20.624,0:13:24.981
in a Bayesian network are sometimes subtle.

0:13:24.981,0:13:28.640
And there are different trails that can affect things

0:13:28.640,0:13:31.748
and interact with each other in different ways.

0:13:31.748,0:13:34.698
And so it’s useful to take the model

0:13:34.698,0:13:36.315
and play around with different queries

0:13:36.315,0:13:37.734
and different combinations of evidence

0:13:37.734,0:13:40.026
to understand the behavior of a network.

0:13:40.026,0:13:41.341
And especially if you’re designing

0:13:41.341,0:13:43.786
such a network for a particular application,

0:13:43.786,0:13:46.290
it’s useful to try out these different queries

0:13:46.290,0:13:48.053
and see if the behavior that you get

0:13:48.053,0:13:49.706
is the behavior that you want to get.

0:13:49.706,0:13:52.076
And if not, then you need to thing about

0:13:52.076,0:13:55.962
how do I modify this network to get behavior

0:13:55.962,0:14:00.498
that’s more analogous to the desired behavior.

0:14:00.498,0:14:03.755
This network is available for you to play with

0:14:03.755,0:14:06.005
and you can try out different things

0:14:06.005,0:14:08.961
and see what behaviors you get.