Return to Video

Knowledge Engineering Example - SAMIAM (14:14)

  • 0:00 - 0:03
    So, now, let’s look at an example in [an] actual network,
  • 0:03 - 0:06
    and try to see what the CPD’s look like,
  • 0:06 - 0:08
    what behavior we get,
  • 0:08 - 0:09
    and how we might augment the network
  • 0:09 - 0:11
    to include additional things.
  • 0:11 - 0:13
    Now, let me warn you right upfront
  • 0:13 - 0:15
    that this is a baby network;
  • 0:15 - 0:16
    it’s not a real network,
  • 0:16 - 0:20
    but it’s compact enough to look at, but
  • 0:20 - 0:23
    still interesting enough to get some non-trivial behaviors.
  • 0:25 - 0:27
    So, to explore the network,
  • 0:27 - 0:29
    we’re going to use a system called SAMIAM.
  • 0:29 - 0:32
    It was produced by Adnan Darwiche and his group at UCLA,
  • 0:32 - 0:33
    and it’s nice
  • 0:33 - 0:36
    because it actually works on all sorts of different platforms,
  • 0:36 - 0:39
    so it’s usable by pretty much everyone.
  • 0:39 - 0:42
    So let’s look at a particular problem:
  • 0:42 - 0:44
    Imagine that we’re an insurance company
  • 0:44 - 0:45
    and we’re trying to decide
  • 0:45 - 0:46
    for a person who comes into the door
  • 0:46 - 0:49
    whether to give them insurance or not.
  • 0:49 - 0:52
    So the operative aspect of making that decision
  • 0:52 - 0:54
    is how much the policy is going to cost us,
  • 0:54 - 0:56
    that is, how much we’re going to have to pay
  • 0:56 - 0:59
    over the course of a year to insure this person.
  • 0:59 - 1:02
    So there is a variable called Cost.
  • 1:02 - 1:07
    Let’s click on that to see what properties that variable have.
  • 1:07 - 1:09
    And we can see that in this case,
  • 1:09 - 1:12
    we’ve decided to only give two values to the Cost variable,
  • 1:12 - 1:14
    Low and High.
  • 1:14 - 1:17
    This is clearly a very coarse-grained approximation
  • 1:17 - 1:18
    and not one that we will use in practice.
  • 1:18 - 1:20
    In reality we would probably
  • 1:20 - 1:22
    have this be a continuous variable
  • 1:22 - 1:26
    whose mean depends on various aspects of the model.
  • 1:26 - 1:28
    But for the purposes of our illustration,
  • 1:28 - 1:30
    we’re going to use this discrete distribution
  • 1:30 - 1:31
    that only has values Low and High.
  • 1:31 - 1:33
    Okay.
  • 1:33 - 1:37
    So now, let’s build up this network using the technique of
  • 1:37 - 1:40
    “expanding the conversation” that we’ve discussed before.
  • 1:40 - 1:44
    And so what is most important determining factor
  • 1:44 - 1:47
    as to the cost of the insurance company has to pay?
  • 1:47 - 1:51
    Well, probably whether the person has accidents
  • 1:51 - 1:52
    and how severe they are.
  • 1:52 - 1:57
    So here we have a network that has two variables:
  • 1:57 - 2:00
    One is Accident and one is Cost.
  • 2:00 - 2:03
    And in this case we decided to select
  • 2:03 - 2:06
    three possible values for the accident variable,
  • 2:06 - 2:09
    None, Mild, and Severe,
  • 2:09 - 2:14
    and with the probabilities that you see listed.
  • 2:14 - 2:17
    And what you see down below is the Cost variable.
  • 2:17 - 2:19
    And let’s open the CPD
  • 2:19 - 2:25
    of the Cost variable given the Accident variable.
  • 2:25 - 2:27
    And we can see that, in this case,
  • 2:27 - 2:29
    we have a conditional probability table
  • 2:29 - 2:33
    of Cost given Accident.
  • 2:33 - 2:35
    Note that this is actually inverted
  • 2:35 - 2:39
    from the notation that we have used in the class before,
  • 2:39 - 2:42
    because here the conditioning cases are columns,
  • 2:42 - 2:45
    whereas in the examples that we’ve given
  • 2:45 - 2:46
    [they] have been rows.
  • 2:46 - 2:49
    But that’s okay, it’s the same thing, just inverted.
  • 2:50 - 2:51
    And so we see, for example,
  • 2:51 - 2:54
    that if the person has no accidents,
  • 2:54 - 2:57
    the costs are very likely to be very low;
  • 2:57 - 3:02
    mild accidents incur different distributions over cost;
  • 3:02 - 3:03
    and severe accidents have
  • 3:03 - 3:06
    a probability of 0.9 of having high cost
  • 3:06 - 3:08
    and 0.1 of having low cost.
  • 3:09 - 3:12
    So now, let’s continue extending the conversation
  • 3:12 - 3:14
    and ask what Accident depends on.
  • 3:14 - 3:17
    And it seems that one of the obvious factors
  • 3:17 - 3:20
    is whether the person is a good driver or not.
  • 3:20 - 3:23
    And so we would expect driver quality
  • 3:23 - 3:24
    to be a parent of Accident.
  • 3:24 - 3:25
    But there is other things
  • 3:25 - 3:28
    that also affect not just the presence of an accident,
  • 3:28 - 3:30
    but also the severity of the accident.
  • 3:30 - 3:34
    So for example, vehicle size would affect
  • 3:34 - 3:37
    both the severity of an accident
  • 3:37 - 3:41
    because if you are driving a large SUV, then chances are
  • 3:41 - 3:44
    you are not likely to be in an accident as severe
  • 3:44 - 3:46
    but it might also perhaps increase
  • 3:46 - 3:47
    the chance of having an accident overall
  • 3:47 - 3:52
    because maybe driving a large car is harder to handle.
  • 3:53 - 3:56
    And then vehicle year might affect the chances of an accident
  • 3:56 - 4:00
    because of the presence or absence of certain safety features
  • 4:00 - 4:02
    like anti-lock brakes and airbags.
  • 4:02 - 4:04
    So let’s open the CPD of Accident
  • 4:04 - 4:05
    and see what that looks like
  • 4:05 - 4:07
    now that we have all these parents for it.
  • 4:07 - 4:10
    And we can see here that we have these,
  • 4:10 - 4:13
    in this case, eight conditioning cases,
  • 4:13 - 4:18
    correspond[ing] to three variables, two values each.
  • 4:18 - 4:23
    And so here just to look at one of the samples,
  • 4:23 - 4:26
    just as an example, distribution for example.
  • 4:26 - 4:31
    So, if this is a fairly new vehicle—after 2000—
  • 4:31 - 4:32
    and it’s an SUV,
  • 4:32 - 4:36
    the probability of having a severe accident is quite low.
  • 4:36 - 4:39
    and the probability of having a mild accident is moderate
  • 4:39 - 4:45
    and the probability of having of no accidents is 0.85
  • 4:45 - 4:49
    whereas if you compare that to corresponding entry
  • 4:49 - 4:52
    when we keep everything fixed except that now it’s a compact car,
  • 4:52 - 5:01
    we see that the probability of having a mild accident is lower,
  • 5:01 - 5:03
    but the probability of having no accidents is higher,
  • 5:03 - 5:08
    representing different driving patterns, for example.
  • 5:09 - 5:12
    Okay, so with this network,
  • 5:12 - 5:14
    we can now start asking simple questions.
  • 5:15 - 5:17
    So to do some example of causal inference,
  • 5:17 - 5:21
    let’s instantiate, for example, driving quality to be good.
  • 5:22 - 5:24
    And bad.
  • 5:24 - 5:27
    And we can see that for bad driver
  • 5:27 - 5:31
    the probability of low cost is 81%.
  • 5:31 - 5:36
    And for a good driver the probability of low cost is 87%.
  • 5:36 - 5:38
    If we look at the accidents
  • 5:38 - 5:41
    we can see that for a good driver
  • 5:41 - 5:45
    there is a probability of 87.5 percent of no accidents
  • 5:45 - 5:46
    and ten percent of mild accident.
  • 5:46 - 5:51
    And the probability of no accident goes down for a bad driver,
  • 5:51 - 5:53
    and mild accident goes up
  • 5:53 - 5:55
    and severe accidents also goes way up.
  • 5:55 - 5:59
    Now note that many of these differences are quite subtle.
  • 5:59 - 6:02
    There’s a difference of a couple percent one way or the other.
  • 6:02 - 6:04
    And you might think,
  • 6:04 - 6:05
    if you were designing a network,
  • 6:05 - 6:09
    that you’d like these really extreme probability changes
  • 6:09 - 6:11
    when you instantiate values.
  • 6:11 - 6:14
    But in many cases that’s not actually true,
  • 6:14 - 6:15
    and these subtle differences
  • 6:15 - 6:18
    are actually quite significant for an insurance company
  • 6:18 - 6:20
    that insures hundreds of thousands of people—
  • 6:20 - 6:22
    a couple of percentage points in the probability of an accident
  • 6:22 - 6:25
    can make a very big difference to one’s profitability.
  • 6:26 - 6:27
    So now let’s think about
  • 6:27 - 6:30
    how we would expand this network even further.
  • 6:30 - 6:33
    Vehicle size and vehicle year are things
  • 6:33 - 6:36
    that we’re likely to observe in the insurance forum.
  • 6:36 - 6:39
    But driver quality is something that’s very difficult to observe.
  • 6:39 - 6:42
    You can’t go ask somebody, “Oh, are you a good driver?”
  • 6:42 - 6:43
    Because everyone’s going to say,
  • 6:43 - 6:45
    “Sure, I’m the best driver ever!”
  • 6:45 - 6:49
    And so that’s not going to be a very useful question.
  • 6:49 - 6:53
    So what evidence do we have that we can observe
  • 6:53 - 6:57
    that might indicate to us the value of the driver quality?
  • 6:57 - 7:01
    One obvious one is the person’s driving record.
  • 7:01 - 7:04
    That is, whether they’ve had previous accidents
  • 7:04 - 7:05
    or previous moving violations.
  • 7:06 - 7:08
    So let’s think about adding a variable
  • 7:08 - 7:10
    that represents driving history.
  • 7:11 - 7:14
    And so let’s go ahead and introduce that variable.
  • 7:14 - 7:16
    So we can click on this button
  • 7:16 - 7:18
    that allows us to create a node.
  • 7:18 - 7:20
    The node is now called variable1
  • 7:20 - 7:21
    so we’d have to give it a name.
  • 7:21 - 7:25
    So for example we’re going to call it DrivingHistory.
  • 7:26 - 7:28
    And that’s its identifier,
  • 7:28 - 7:31
    and we also have the name of the variable,
  • 7:31 - 7:32
    which is usually the same.
  • 7:32 - 7:35
    And let’s make that two values,
  • 7:35 - 7:38
    say PreviousAccident and NoPreviousAccident.
  • 7:42 - 7:46
    Now where will we place this variable in the network?
  • 7:46 - 7:49
    One might initially think that the right thing to do
  • 7:49 - 7:53
    is to place DrivingHistory as a parent of Driver_quality
  • 7:53 - 7:57
    because driving history can influence
  • 7:57 - 7:59
    our beliefs about driver quality.
  • 7:59 - 8:01
    Now it’s true that observing driving history
  • 8:01 - 8:04
    changes our probability within driver quality,
  • 8:04 - 8:07
    but if you think about the actual causal structure of this scenario,
  • 8:07 - 8:12
    what we actually have is that driver quality is a causal factor
  • 8:12 - 8:14
    of both a previous accident
  • 8:14 - 8:17
    as well as a subsequent accident.
  • 8:17 - 8:18
    And so if we want to maintain
  • 8:18 - 8:20
    the intuitive causal structure of the domain,
  • 8:20 - 8:28
    a more appropriate thing is to add DrivingHistory as a child
  • 8:28 - 8:30
    rather than parent of Driver_quality.
  • 8:30 - 8:32
    [You] might question why it matters
  • 8:32 - 8:34
    and in this very simple example
  • 8:34 - 8:37
    the two models are in some sense equivalent
  • 8:37 - 8:39
    and we could have placed it either way
  • 8:39 - 8:44
    except that the CPD for driver quality given driving history
  • 8:44 - 8:46
    might be a little bit less intuitive.
  • 8:46 - 8:50
    But if we had other indicators of driver quality,
  • 8:50 - 8:52
    for example a previous moving violation,
  • 8:52 - 8:56
    then it actually makes a lot more sense
  • 8:56 - 8:59
    to have all of these be children of driver quality
  • 8:59 - 9:01
    as opposed to parents of driver quality.
  • 9:02 - 9:03
    Okay.
  • 9:03 - 9:07
    So that shows us how we would add a variable into the network.
  • 9:07 - 9:10
    And now let’s go and open up a much larger network
  • 9:10 - 9:13
    that includes these variables as well as others.
  • 9:13 - 9:16
    So let’s look now at this larger network.
  • 9:16 - 9:17
    And we can see
  • 9:17 - 9:20
    that we’ve added several different variables to the network.
  • 9:20 - 9:23
    We’ve added attributes of the vehicle,
  • 9:23 - 9:27
    for example whether the vehicle had antilock brakes and an airbag,
  • 9:27 - 9:29
    which is going to allow us to give
  • 9:29 - 9:31
    more informative probabilities regarding the accident.
  • 9:31 - 9:35
    We’ve also introduced aspects of the driver,
  • 9:35 - 9:38
    for example, whether they’ve had extra-track training,
  • 9:38 - 9:40
    which is going to increase driving quality,
  • 9:40 - 9:42
    whether they’re young or old,
  • 9:42 - 9:43
    where the presumption is
  • 9:43 - 9:46
    that younger people tend to be more reckless drivers,
  • 9:46 - 9:50
    and whether the driver is focused or more easily distracted,
  • 9:50 - 9:53
    which again is going to affect driving quality.
  • 9:54 - 9:59
    Now since personality type is hard to observe,
  • 9:59 - 10:03
    we added another variable which is Good_student
  • 10:03 - 10:06
    which might indicate one’s personality type.
  • 10:06 - 10:09
    So let’s open [the] CPD for that one.
  • 10:11 - 10:14
    And so we can see here that, for example,
  • 10:14 - 10:21
    if you are a focused person who is young,
  • 10:21 - 10:24
    you’re much more likely to be a good student,
  • 10:24 - 10:28
    much more so than if you are not a focused person who is young.
  • 10:28 - 10:32
    If you’re old, you’re just not very likely to be a student,
  • 10:32 - 10:38
    and so this probability basically says that if you’re old,
  • 10:38 - 10:40
    you’re just not very likely to be a student,
  • 10:40 - 10:41
    and therefore not likely to be a good student.
  • 10:42 - 10:48
    So, now that we’ve added all these variables to the network,
  • 10:48 - 10:51
    let’s go ahead and run a few queries to see what happens.
  • 10:51 - 10:57
    And let’s start by looking at the prior probability of Accident
  • 10:57 - 11:00
    before we observe anything.
  • 11:00 - 11:04
    So we can see that the probability of no accident is about 79.5%.
  • 11:04 - 11:07
    The probability of severe accident is about 3%.
  • 11:07 - 11:10
    Now let’s go ahead and tell the system
  • 11:10 - 11:12
    that we have a good student at hand.
  • 11:12 - 11:14
    So we’re going to observe
  • 11:14 - 11:16
    that the student is a good student,
  • 11:16 - 11:18
    and let’s see what happens.
  • 11:18 - 11:20
    We can see, surprisingly,
  • 11:20 - 11:21
    that even though we observe
  • 11:21 - 11:22
    that somebody is a good student,
  • 11:22 - 11:24
    the probability of no accidents
  • 11:24 - 11:28
    went down from 79.5% to 78%,
  • 11:28 - 11:30
    and the probability of severe accidents
  • 11:30 - 11:33
    went up to 3.5 to 3.67 percent.
  • 11:33 - 11:34
    You might say,
  • 11:34 - 11:36
    “Well, but I told you that it’s a good student.
  • 11:36 - 11:38
    Shouldn’t the probability of accidents go down?”
  • 11:38 - 11:42
    So let’s look at some active trails in this graph.
  • 11:42 - 11:46
    One active trail goes from Good_student to Focused,
  • 11:46 - 11:49
    to Driver_quality,
  • 11:49 - 11:50
    to Accident.
  • 11:50 - 11:54
    And sure enough, if we consider that trail in isolation,
  • 11:54 - 11:58
    it’s probably going to make the probability of no accident be higher.
  • 11:58 - 12:00
    But, we have another active trail.
  • 12:00 - 12:04
    We have the active trail that goes from good student up to age,
  • 12:04 - 12:07
    and then back down, through [to] driver quality.
  • 12:07 - 12:10
    So, to see that, let’s unclick on good student
  • 12:10 - 12:11
    and see what happens.
  • 12:11 - 12:16
    Note that the probability initially that the driver is young was 25%,
  • 12:16 - 12:18
    but when I observed a good student,
  • 12:18 - 12:21
    it went up to close to 95%.
  • 12:21 - 12:23
    And that was enough to counteract the influence
  • 12:23 - 12:27
    along this more obvious active trail.
  • 12:28 - 12:32
    So, to demonstrate that this is indeed what’s going on,
  • 12:32 - 12:36
    let’s click on the fact
  • 12:36 - 12:38
    and instantiate the fact that the student is young,
  • 12:38 - 12:43
    and we can see that the probability of severe accident went up to 3.7%
  • 12:43 - 12:48
    and no accident went down to a little bit shy of 77%.
  • 12:48 - 12:52
    And now let’s observe good student and see what happens.
  • 12:52 - 12:53
    So now we observed good student,
  • 12:53 - 13:02
    and the probability of no accidents went down to 78%,
  • 13:02 - 13:07
    as opposed to before when it was 77%.
  • 13:07 - 13:11
    And the reason for that
  • 13:11 - 13:13
    is that we’ve now blocked this trail
  • 13:13 - 13:16
    that goes from good student, through age, to driver quality
  • 13:16 - 13:18
    by observing this variable which blocks the trail.
  • 13:18 - 13:21
    So we can see the reasoning patterns
  • 13:21 - 13:25
    in a Bayesian network are sometimes subtle.
  • 13:25 - 13:29
    And there are different trails that can affect things
  • 13:29 - 13:32
    and interact with each other in different ways.
  • 13:32 - 13:35
    And so it’s useful to take the model
  • 13:35 - 13:36
    and play around with different queries
  • 13:36 - 13:38
    and different combinations of evidence
  • 13:38 - 13:40
    to understand the behavior of a network.
  • 13:40 - 13:41
    And especially if you’re designing
  • 13:41 - 13:44
    such a network for a particular application,
  • 13:44 - 13:46
    it’s useful to try out these different queries
  • 13:46 - 13:48
    and see if the behavior that you get
  • 13:48 - 13:50
    is the behavior that you want to get.
  • 13:50 - 13:52
    And if not, then you need to thing about
  • 13:52 - 13:56
    how do I modify this network to get behavior
  • 13:56 - 14:00
    that’s more analogous to the desired behavior.
  • 14:00 - 14:04
    This network is available for you to play with
  • 14:04 - 14:06
    and you can try out different things
  • 14:06 - 14:09
    and see what behaviors you get.
Title:
Knowledge Engineering Example - SAMIAM (14:14)
Video Language:
English

English subtitles

Revisions