1
00:00:00,118 --> 00:00:02,898
So, now, let’s look at an example in [an] actual network,

2
00:00:02,898 --> 00:00:06,181
and try to see what the CPD’s look like,

3
00:00:06,181 --> 00:00:07,738
what behavior we get,

4
00:00:07,738 --> 00:00:09,488
and how we might augment the network

5
00:00:09,488 --> 00:00:11,238
to include additional things.

6
00:00:11,238 --> 00:00:12,802
Now, let me warn you right upfront

7
00:00:12,802 --> 00:00:14,656
that this is a baby network;

8
00:00:14,656 --> 00:00:16,263
it’s not a real network,

9
00:00:16,263 --> 00:00:19,622
but it’s compact enough to look at, but

10
00:00:19,622 --> 00:00:22,918
still interesting enough to get some non-trivial behaviors.

11
00:00:24,579 --> 00:00:26,593
So, to explore the network,

12
00:00:26,593 --> 00:00:28,854
we’re going to use a system called SAMIAM.

13
00:00:28,854 --> 00:00:31,662
It was produced by Adnan Darwiche and his group at UCLA,

14
00:00:31,662 --> 00:00:32,640
and it’s nice

15
00:00:32,640 --> 00:00:36,151
because it actually works on all sorts of different platforms,

16
00:00:36,151 --> 00:00:38,911
so it’s usable by pretty much everyone.

17
00:00:38,911 --> 00:00:41,537
So let’s look at a particular problem:

18
00:00:41,537 --> 00:00:43,616
Imagine that we’re an insurance company

19
00:00:43,616 --> 00:00:44,888
and we’re trying to decide

20
00:00:44,888 --> 00:00:46,438
for a person who comes into the door

21
00:00:46,438 --> 00:00:49,002
whether to give them insurance or not.

22
00:00:49,002 --> 00:00:51,746
So the operative aspect of making that decision

23
00:00:51,746 --> 00:00:54,369
is how much the policy is going to cost us,

24
00:00:54,369 --> 00:00:55,892
that is, how much we’re going to have to pay

25
00:00:55,892 --> 00:00:58,915
over the course of a year to insure this person.

26
00:00:58,915 --> 00:01:02,287
So there is a variable called Cost.

27
00:01:02,287 --> 00:01:06,847
Let’s click on that to see what properties that variable have.

28
00:01:06,847 --> 00:01:08,507
And we can see that in this case,

29
00:01:08,507 --> 00:01:11,952
we’ve decided to only give two values to the Cost variable,

30
00:01:11,952 --> 00:01:13,772
Low and High.

31
00:01:13,772 --> 00:01:16,790
This is clearly a very coarse-grained approximation

32
00:01:16,790 --> 00:01:18,472
and not one that we will use in practice.

33
00:01:18,472 --> 00:01:20,303
In reality we would probably

34
00:01:20,303 --> 00:01:22,135
have this be a continuous variable

35
00:01:22,135 --> 00:01:25,887
whose mean depends on various aspects of the model.

36
00:01:25,887 --> 00:01:27,783
But for the purposes of our illustration,

37
00:01:27,783 --> 00:01:29,559
we’re going to use this discrete distribution

38
00:01:29,559 --> 00:01:31,273
that only has values Low and High.

39
00:01:31,273 --> 00:01:32,647
Okay.

40
00:01:32,647 --> 00:01:36,855
So now, let’s build up this network using the technique of

41
00:01:36,855 --> 00:01:39,504
“expanding the conversation” that we’ve discussed before.

42
00:01:39,504 --> 00:01:44,224
And so what is most important determining factor

43
00:01:44,224 --> 00:01:46,740
as to the cost of the insurance company has to pay?

44
00:01:46,740 --> 00:01:50,647
Well, probably whether the person has accidents

45
00:01:50,647 --> 00:01:51,977
and how severe they are.

46
00:01:51,977 --> 00:01:57,005
So here we have a network that has two variables:

47
00:01:57,005 --> 00:01:59,724
One is Accident and one is Cost.

48
00:01:59,724 --> 00:02:02,645
And in this case we decided to select

49
00:02:02,645 --> 00:02:05,701
three possible values for the accident variable,

50
00:02:05,701 --> 00:02:09,031
None, Mild, and Severe,

51
00:02:09,031 --> 00:02:14,442
and with the probabilities that you see listed.

52
00:02:14,442 --> 00:02:17,447
And what you see down below is the Cost variable.

53
00:02:17,447 --> 00:02:18,759
And let’s open the CPD

54
00:02:18,759 --> 00:02:24,845
of the Cost variable given the Accident variable.

55
00:02:24,845 --> 00:02:26,946
And we can see that, in this case,

56
00:02:26,946 --> 00:02:29,169
we have a conditional probability table

57
00:02:29,169 --> 00:02:33,358
of Cost given Accident.

58
00:02:33,358 --> 00:02:35,485
Note that this is actually inverted

59
00:02:35,485 --> 00:02:38,542
from the notation that we have used in the class before,

60
00:02:38,542 --> 00:02:41,713
because here the conditioning cases are columns,

61
00:02:41,713 --> 00:02:44,545
whereas in the examples that we’ve given

62
00:02:44,545 --> 00:02:45,754
[they] have been rows.

63
00:02:45,754 --> 00:02:49,189
But that’s okay, it’s the same thing, just inverted.

64
00:02:49,650 --> 00:02:51,102
And so we see, for example,

65
00:02:51,102 --> 00:02:54,418
that if the person has no accidents,

66
00:02:54,418 --> 00:02:56,765
the costs are very likely to be very low;

67
00:02:56,765 --> 00:03:01,581
mild accidents incur different distributions over cost;

68
00:03:01,581 --> 00:03:03,173
and severe accidents have

69
00:03:03,173 --> 00:03:05,789
a probability of 0.9 of having high cost

70
00:03:05,789 --> 00:03:08,414
and 0.1 of having low cost.

71
00:03:09,060 --> 00:03:11,749
So now, let’s continue extending the conversation

72
00:03:11,749 --> 00:03:14,088
and ask what Accident depends on.

73
00:03:14,088 --> 00:03:17,207
And it seems that one of the obvious factors

74
00:03:17,207 --> 00:03:20,409
is whether the person is a good driver or not.

75
00:03:20,409 --> 00:03:22,661
And so we would expect driver quality

76
00:03:22,661 --> 00:03:23,965
to be a parent of Accident.

77
00:03:23,965 --> 00:03:25,214
But there is other things

78
00:03:25,214 --> 00:03:27,853
that also affect not just the presence of an accident,

79
00:03:27,853 --> 00:03:29,911
but also the severity of the accident.

80
00:03:29,911 --> 00:03:33,909
So for example, vehicle size would affect

81
00:03:33,909 --> 00:03:37,117
both the severity of an accident

82
00:03:37,117 --> 00:03:40,788
because if you are driving a large SUV, then chances are

83
00:03:40,788 --> 00:03:43,599
you are not likely to be in an accident as severe

84
00:03:43,599 --> 00:03:45,685
but it might also perhaps increase

85
00:03:45,685 --> 00:03:47,374
the chance of having an accident overall

86
00:03:47,374 --> 00:03:51,830
because maybe driving a large car is harder to handle.

87
00:03:52,615 --> 00:03:56,381
And then vehicle year might affect the chances of an accident

88
00:03:56,381 --> 00:03:59,509
because of the presence or absence of certain safety features

89
00:03:59,509 --> 00:04:02,037
like anti-lock brakes and airbags.

90
00:04:02,037 --> 00:04:03,814
So let’s open the CPD of Accident

91
00:04:03,814 --> 00:04:05,173
and see what that looks like

92
00:04:05,173 --> 00:04:07,325
now that we have all these parents for it.

93
00:04:07,325 --> 00:04:10,213
And we can see here that we have these,

94
00:04:10,213 --> 00:04:13,165
in this case, eight conditioning cases,

95
00:04:13,165 --> 00:04:18,317
correspond[ing] to three variables, two values each.

96
00:04:18,317 --> 00:04:22,757
And so here just to look at one of the samples,

97
00:04:22,757 --> 00:04:26,046
just as an example, distribution for example.

98
00:04:26,046 --> 00:04:30,749
So, if this is a fairly new vehicle—after 2000—

99
00:04:30,749 --> 00:04:32,398
and it’s an SUV,

100
00:04:32,398 --> 00:04:35,725
the probability of having a severe accident is quite low.

101
00:04:35,725 --> 00:04:38,813
and the probability of having a mild accident is moderate

102
00:04:38,813 --> 00:04:44,774
and the probability of having of no accidents is 0.85

103
00:04:44,774 --> 00:04:48,525
whereas if you compare that to corresponding entry

104
00:04:48,525 --> 00:04:52,053
when we keep everything fixed except that now it’s a compact car,

105
00:04:52,053 --> 00:05:00,509
we see that the probability of having a mild accident is lower,

106
00:05:00,509 --> 00:05:03,045
but the probability of having no accidents is higher,

107
00:05:03,045 --> 00:05:08,059
representing different driving patterns, for example.

108
00:05:08,506 --> 00:05:11,649
Okay, so with this network,

109
00:05:11,649 --> 00:05:13,649
we can now start asking simple questions.

110
00:05:14,695 --> 00:05:17,145
So to do some example of causal inference,

111
00:05:17,145 --> 00:05:20,559
let’s instantiate, for example, driving quality to be good.

112
00:05:21,574 --> 00:05:23,936
And bad.

113
00:05:23,936 --> 00:05:27,054
And we can see that for bad driver

114
00:05:27,054 --> 00:05:31,397
the probability of low cost is 81%.

115
00:05:31,397 --> 00:05:36,325
And for a good driver the probability of low cost is 87%.

116
00:05:36,325 --> 00:05:38,381
If we look at the accidents

117
00:05:38,381 --> 00:05:41,278
we can see that for a good driver

118
00:05:41,278 --> 00:05:44,800
there is a probability of 87.5 percent of no accidents

119
00:05:44,800 --> 00:05:46,431
and ten percent of mild accident.

120
00:05:46,431 --> 00:05:50,957
And the probability of no accident goes down for a bad driver,

121
00:05:50,957 --> 00:05:53,422
and mild accident goes up

122
00:05:53,422 --> 00:05:55,453
and severe accidents also goes way up.

123
00:05:55,453 --> 00:05:59,077
Now note that many of these differences are quite subtle.

124
00:05:59,077 --> 00:06:02,054
There’s a difference of a couple percent one way or the other.

125
00:06:02,054 --> 00:06:04,038
And you might think,

126
00:06:04,038 --> 00:06:05,326
if you were designing a network,

127
00:06:05,326 --> 00:06:09,245
that you’d like these really extreme probability changes

128
00:06:09,245 --> 00:06:11,485
when you instantiate values.

129
00:06:11,485 --> 00:06:13,790
But in many cases that’s not actually true,

130
00:06:13,790 --> 00:06:15,052
and these subtle differences

131
00:06:15,052 --> 00:06:17,643
are actually quite significant for an insurance company

132
00:06:17,643 --> 00:06:19,819
that insures hundreds of thousands of people—

133
00:06:19,819 --> 00:06:22,495
a couple of percentage points in the probability of an accident

134
00:06:22,495 --> 00:06:24,855
can make a very big difference to one’s profitability.

135
00:06:25,809 --> 00:06:26,972
So now let’s think about

136
00:06:26,972 --> 00:06:29,519
how we would expand this network even further.

137
00:06:30,196 --> 00:06:33,244
Vehicle size and vehicle year are things

138
00:06:33,244 --> 00:06:35,851
that we’re likely to observe in the insurance forum.

139
00:06:35,851 --> 00:06:39,070
But driver quality is something that’s very difficult to observe.

140
00:06:39,070 --> 00:06:41,829
You can’t go ask somebody, “Oh, are you a good driver?”

141
00:06:41,829 --> 00:06:43,451
Because everyone’s going to say,

142
00:06:43,451 --> 00:06:45,123
“Sure, I’m the best driver ever!”

143
00:06:45,123 --> 00:06:49,272
And so that’s not going to be a very useful question.

144
00:06:49,272 --> 00:06:53,147
So what evidence do we have that we can observe

145
00:06:53,147 --> 00:06:57,148
that might indicate to us the value of the driver quality?

146
00:06:57,148 --> 00:07:01,491
One obvious one is the person’s driving record.

147
00:07:01,491 --> 00:07:03,556
That is, whether they’ve had previous accidents

148
00:07:03,556 --> 00:07:05,355
or previous moving violations.

149
00:07:05,832 --> 00:07:08,412
So let’s think about adding a variable

150
00:07:08,412 --> 00:07:09,880
that represents driving history.

151
00:07:10,665 --> 00:07:13,507
And so let’s go ahead and introduce that variable.

152
00:07:13,507 --> 00:07:16,315
So we can click on this button

153
00:07:16,315 --> 00:07:17,697
that allows us to create a node.

154
00:07:17,697 --> 00:07:19,841
The node is now called variable1

155
00:07:19,841 --> 00:07:21,036
so we’d have to give it a name.

156
00:07:21,036 --> 00:07:24,787
So for example we’re going to call it DrivingHistory.

157
00:07:26,079 --> 00:07:28,074
And that’s its identifier,

158
00:07:28,074 --> 00:07:30,620
and we also have the name of the variable,

159
00:07:30,620 --> 00:07:32,246
which is usually the same.

160
00:07:32,246 --> 00:07:35,195
And let’s make that two values,

161
00:07:35,195 --> 00:07:37,725
say PreviousAccident and NoPreviousAccident.

162
00:07:41,586 --> 00:07:45,760
Now where will we place this variable in the network?

163
00:07:45,760 --> 00:07:48,828
One might initially think that the right thing to do

164
00:07:48,828 --> 00:07:53,228
is to place DrivingHistory as a parent of Driver_quality

165
00:07:53,228 --> 00:07:57,150
because driving history can influence

166
00:07:57,150 --> 00:07:59,044
our beliefs about driver quality.

167
00:07:59,044 --> 00:08:01,290
Now it’s true that observing driving history

168
00:08:01,290 --> 00:08:03,651
changes our probability within driver quality,

169
00:08:03,651 --> 00:08:07,443
but if you think about the actual causal structure of this scenario,

170
00:08:07,443 --> 00:08:11,878
what we actually have is that driver quality is a causal factor

171
00:08:11,878 --> 00:08:13,968
of both a previous accident

172
00:08:13,968 --> 00:08:16,765
as well as a subsequent accident.

173
00:08:16,765 --> 00:08:18,327
And so if we want to maintain

174
00:08:18,327 --> 00:08:20,421
the intuitive causal structure of the domain,

175
00:08:20,421 --> 00:08:28,095
a more appropriate thing is to add DrivingHistory as a child

176
00:08:28,095 --> 00:08:29,972
rather than parent of Driver_quality.

177
00:08:29,972 --> 00:08:32,187
[You] might question why it matters

178
00:08:32,187 --> 00:08:33,864
and in this very simple example

179
00:08:33,864 --> 00:08:36,874
the two models are in some sense equivalent

180
00:08:36,874 --> 00:08:38,851
and we could have placed it either way

181
00:08:38,851 --> 00:08:44,076
except that the CPD for driver quality given driving history

182
00:08:44,076 --> 00:08:46,006
might be a little bit less intuitive.

183
00:08:46,006 --> 00:08:49,955
But if we had other indicators of driver quality,

184
00:08:49,955 --> 00:08:52,436
for example a previous moving violation,

185
00:08:52,436 --> 00:08:55,661
then it actually makes a lot more sense

186
00:08:55,661 --> 00:08:58,559
to have all of these be children of driver quality

187
00:08:58,559 --> 00:09:00,925
as opposed to parents of driver quality.

188
00:09:01,802 --> 00:09:02,680
Okay.

189
00:09:02,680 --> 00:09:07,481
So that shows us how we would add a variable into the network.

190
00:09:07,481 --> 00:09:09,764
And now let’s go and open up a much larger network

191
00:09:09,764 --> 00:09:13,007
that includes these variables as well as others.

192
00:09:13,007 --> 00:09:15,971
So let’s look now at this larger network.

193
00:09:15,971 --> 00:09:17,347
And we can see

194
00:09:17,347 --> 00:09:20,237
that we’ve added several different variables to the network.

195
00:09:20,237 --> 00:09:23,211
We’ve added attributes of the vehicle,

196
00:09:23,211 --> 00:09:27,284
for example whether the vehicle had antilock brakes and an airbag,

197
00:09:27,284 --> 00:09:28,787
which is going to allow us to give

198
00:09:28,787 --> 00:09:31,483
more informative probabilities regarding the accident.

199
00:09:31,483 --> 00:09:35,227
We’ve also introduced aspects of the driver,

200
00:09:35,227 --> 00:09:38,243
for example, whether they’ve had extra-track training,

201
00:09:38,243 --> 00:09:40,084
which is going to increase driving quality,

202
00:09:40,084 --> 00:09:41,684
whether they’re young or old,

203
00:09:41,684 --> 00:09:42,952
where the presumption is

204
00:09:42,952 --> 00:09:45,883
that younger people tend to be more reckless drivers,

205
00:09:45,883 --> 00:09:50,373
and whether the driver is focused or more easily distracted,

206
00:09:50,373 --> 00:09:52,848
which again is going to affect driving quality.

207
00:09:53,648 --> 00:09:58,681
Now since personality type is hard to observe,

208
00:09:58,681 --> 00:10:03,155
we added another variable which is Good_student

209
00:10:03,155 --> 00:10:05,654
which might indicate one’s personality type.

210
00:10:05,654 --> 00:10:08,862
So let’s open [the] CPD for that one.

211
00:10:11,293 --> 00:10:14,071
And so we can see here that, for example,

212
00:10:14,071 --> 00:10:20,999
if you are a focused person who is young,

213
00:10:20,999 --> 00:10:23,720
you’re much more likely to be a good student,

214
00:10:23,720 --> 00:10:28,439
much more so than if you are not a focused person who is young.

215
00:10:28,439 --> 00:10:32,303
If you’re old, you’re just not very likely to be a student,

216
00:10:32,303 --> 00:10:38,021
and so this probability basically says that if you’re old,

217
00:10:38,021 --> 00:10:39,883
you’re just not very likely to be a student,

218
00:10:39,883 --> 00:10:41,322
and therefore not likely to be a good student.

219
00:10:42,014 --> 00:10:47,608
So, now that we’ve added all these variables to the network,

220
00:10:47,608 --> 00:10:51,161
let’s go ahead and run a few queries to see what happens.

221
00:10:51,161 --> 00:10:56,918
And let’s start by looking at the prior probability of Accident

222
00:10:56,918 --> 00:10:59,942
before we observe anything.

223
00:10:59,942 --> 00:11:04,299
So we can see that the probability of no accident is about 79.5%.

224
00:11:04,299 --> 00:11:07,065
The probability of severe accident is about 3%.

225
00:11:07,065 --> 00:11:10,077
Now let’s go ahead and tell the system

226
00:11:10,077 --> 00:11:11,837
that we have a good student at hand.

227
00:11:11,837 --> 00:11:13,608
So we’re going to observe

228
00:11:13,608 --> 00:11:15,978
that the student is a good student,

229
00:11:15,978 --> 00:11:17,567
and let’s see what happens.

230
00:11:18,059 --> 00:11:19,695
We can see, surprisingly,

231
00:11:19,695 --> 00:11:20,807
that even though we observe

232
00:11:20,807 --> 00:11:21,887
that somebody is a good student,

233
00:11:21,887 --> 00:11:23,954
the probability of no accidents

234
00:11:23,954 --> 00:11:27,699
went down from 79.5% to 78%,

235
00:11:27,699 --> 00:11:29,582
and the probability of severe accidents

236
00:11:29,582 --> 00:11:32,819
went up to 3.5 to 3.67 percent.

237
00:11:32,819 --> 00:11:33,880
You might say,

238
00:11:33,880 --> 00:11:35,986
“Well, but I <i>told</i> you that it’s a good student.

239
00:11:35,986 --> 00:11:38,138
Shouldn’t the probability of accidents go down?”

240
00:11:38,307 --> 00:11:41,972
So let’s look at some active trails in this graph.

241
00:11:41,972 --> 00:11:46,461
One active trail goes from Good_student to Focused,

242
00:11:46,461 --> 00:11:48,928
to Driver_quality,

243
00:11:48,928 --> 00:11:49,939
to Accident.

244
00:11:49,939 --> 00:11:53,602
And sure enough, if we consider that trail in isolation,

245
00:11:53,602 --> 00:11:58,378
it’s probably going to make the probability of no accident be higher.

246
00:11:58,378 --> 00:12:00,204
But, we have another active trail.

247
00:12:00,204 --> 00:12:04,058
We have the active trail that goes from good student up to age,

248
00:12:04,058 --> 00:12:07,085
and then back down, through [to] driver quality.

249
00:12:07,085 --> 00:12:09,921
So, to see that, let’s unclick on good student

250
00:12:09,921 --> 00:12:11,281
and see what happens.

251
00:12:11,281 --> 00:12:15,767
Note that the probability initially that the driver is young was 25%,

252
00:12:15,767 --> 00:12:17,710
but when I observed a good student,

253
00:12:17,710 --> 00:12:20,538
it went up to close to 95%.

254
00:12:20,538 --> 00:12:23,143
And that was enough to counteract the influence

255
00:12:23,143 --> 00:12:27,446
along this more obvious active trail.

256
00:12:27,831 --> 00:12:31,551
So, to demonstrate that this is indeed what’s going on,

257
00:12:31,551 --> 00:12:35,783
let’s click on the fact

258
00:12:35,783 --> 00:12:38,415
and instantiate the fact that the student is young,

259
00:12:38,415 --> 00:12:43,446
and we can see that the probability of severe accident went up to 3.7%

260
00:12:43,446 --> 00:12:47,967
and no accident went down to a little bit shy of 77%.

261
00:12:47,967 --> 00:12:51,559
And now let’s observe good student and see what happens.

262
00:12:51,559 --> 00:12:53,174
So now we observed good student,

263
00:12:53,174 --> 00:13:01,656
and the probability of no accidents went down to 78%,

264
00:13:01,656 --> 00:13:07,036
as opposed to before when it was 77%.

265
00:13:07,036 --> 00:13:10,558
And the reason for that

266
00:13:10,558 --> 00:13:12,773
is that we’ve now blocked this trail

267
00:13:12,773 --> 00:13:15,871
that goes from good student, through age, to driver quality

268
00:13:15,871 --> 00:13:17,917
by observing this variable which blocks the trail.

269
00:13:17,917 --> 00:13:20,624
So we can see the reasoning patterns

270
00:13:20,624 --> 00:13:24,981
in a Bayesian network are sometimes subtle.

271
00:13:24,981 --> 00:13:28,640
And there are different trails that can affect things

272
00:13:28,640 --> 00:13:31,748
and interact with each other in different ways.

273
00:13:31,748 --> 00:13:34,698
And so it’s useful to take the model

274
00:13:34,698 --> 00:13:36,315
and play around with different queries

275
00:13:36,315 --> 00:13:37,734
and different combinations of evidence

276
00:13:37,734 --> 00:13:40,026
to understand the behavior of a network.

277
00:13:40,026 --> 00:13:41,341
And especially if you’re designing

278
00:13:41,341 --> 00:13:43,786
such a network for a particular application,

279
00:13:43,786 --> 00:13:46,290
it’s useful to try out these different queries

280
00:13:46,290 --> 00:13:48,053
and see if the behavior that you get

281
00:13:48,053 --> 00:13:49,706
is the behavior that you want to get.

282
00:13:49,706 --> 00:13:52,076
And if not, then you need to thing about

283
00:13:52,076 --> 00:13:55,962
how do I modify this network to get behavior

284
00:13:55,962 --> 00:14:00,498
that’s more analogous to the desired behavior.

285
00:14:00,498 --> 00:14:03,755
This network is available for you to play with

286
00:14:03,755 --> 00:14:06,005
and you can try out different things

287
00:14:06,005 --> 00:14:08,961
and see what behaviors you get.