[Script Info]
Title: 
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:00.00,0:00:05.44,Default,,0000,0000,0000,,Local structure that doesn’t require full table representations
Dialogue: 0,0:00:05.45,0:00:09.07,Default,,0000,0000,0000,,is important in both directed and undirected models.
Dialogue: 0,0:00:09.07,0:00:14.87,Default,,0000,0000,0000,,How do we incorporate local structure into undirected models?
Dialogue: 0,0:00:14.87,0:00:22.08,Default,,0000,0000,0000,,The framework for that is called “log-linear models” for reasons that will be clear in just a moment.
Dialogue: 0,0:00:23.08,0:00:24.23,Default,,0000,0000,0000,,So
Dialogue: 0,0:00:26.57,0:00:33.14,Default,,0000,0000,0000,,Whereas, in the original representation of the unnormalized density
Dialogue: 0,0:00:33.14,0:00:39.19,Default,,0000,0000,0000,,we defined P tilde as the product of factors φi(Di),
Dialogue: 0,0:00:39.22,0:00:42.30,Default,,0000,0000,0000,,each [of] which is potentially a full table.
Dialogue: 0,0:00:42.89,0:00:46.03,Default,,0000,0000,0000,,Now we're going to shift that representation
Dialogue: 0,0:00:46.03,0:00:49.98,Default,,0000,0000,0000,,to something that uses a linear form
Dialogue: 0,0:00:50.00,0:00:52.45,Default,,0000,0000,0000,,(So here's a linear form)
Dialogue: 0,0:00:52.45,0:00:54.89,Default,,0000,0000,0000,,that is subsequently exponentiated,
Dialogue: 0,0:00:54.92,0:00:56.97,Default,,0000,0000,0000,,and that's why it's called log-linear—
Dialogue: 0,0:00:56.97,0:00:59.63,Default,,0000,0000,0000,,because the logarithm is a linear function.
Dialogue: 0,0:00:59.63,0:01:02.74,Default,,0000,0000,0000,,So what is this form over here?
Dialogue: 0,0:01:02.74,0:01:06.99,Default,,0000,0000,0000,,It's a linear function that has these things that are called “coefficients”
Dialogue: 0,0:01:10.17,0:01:12.63,Default,,0000,0000,0000,,and these things that are called “features”.
Dialogue: 0,0:01:14.20,0:01:22.11,Default,,0000,0000,0000,,Features, like factors, each have a scope which is a set of variables on which the feature depends.
Dialogue: 0,0:01:24.15,0:01:27.03,Default,,0000,0000,0000,,But different features can have the same scope.
Dialogue: 0,0:01:27.03,0:01:31.18,Default,,0000,0000,0000,,You can have multiple features all of which are over the same set of variables.
Dialogue: 0,0:01:31.18,0:01:36.47,Default,,0000,0000,0000,,Notice that each feature has just a single parameter wj that multiplies it.
Dialogue: 0,0:01:37.66,0:01:40.31,Default,,0000,0000,0000,,So, what does this give rise to?
Dialogue: 0,0:01:40.31,0:01:43.46,Default,,0000,0000,0000,,I mean if we have a log-linear model,
Dialogue: 0,0:01:43.49,0:01:49.46,Default,,0000,0000,0000,,we can push in the exponent through the summation,
Dialogue: 0,0:01:49.46,0:01:58.76,Default,,0000,0000,0000,,and that gives us something that is a product of exponential functions.
Dialogue: 0,0:01:58.80,0:02:03.53,Default,,0000,0000,0000,,You can think of each of these as effectively a little factor,
Dialogue: 0,0:02:03.53,0:02:07.69,Default,,0000,0000,0000,,but it’s a factor that only has a single parameter wj.
Dialogue: 0,0:02:07.69,0:02:11.20,Default,,0000,0000,0000,,Since this is a little bit abstract, so let’s look at an example.
Dialogue: 0,0:02:11.20,0:02:17.69,Default,,0000,0000,0000,,Specifically lets look at how we might represent a simple table factor as a log linear model.
Dialogue: 0,0:02:17.71,0:02:24.29,Default,,0000,0000,0000,,So here’s a param, here’s a factor φ, over two binary random variables X1 and X2.
Dialogue: 0,0:02:24.29,0:02:30.79,Default,,0000,0000,0000,,And so a full table factor would have four parameters: a00, a01, a10, and a11.
Dialogue: 0,0:02:30.79,0:02:34.55,Default,,0000,0000,0000,,So we can capture this model using a log linear model,
Dialogue: 0,0:02:34.55,0:02:37.96,Default,,0000,0000,0000,,using a set of such of features,
Dialogue: 0,0:02:37.96,0:02:41.36,Default,,0000,0000,0000,,using a set of these guys, which are indicator functions.
Dialogue: 0,0:02:41.36,0:02:43.22,Default,,0000,0000,0000,,So this is an indicator function.
Dialogue: 0,0:02:43.22,0:02:48.96,Default,,0000,0000,0000,,It takes one if X1 is zero and X2 is zero,
Dialogue: 0,0:02:48.98,0:02:50.88,Default,,0000,0000,0000,,and it takes zero otherwise.
Dialogue: 0,0:02:51.04,0:02:54.09,Default,,0000,0000,0000,,So this the general notion of an indicator function.
Dialogue: 0,0:02:54.12,0:03:00.32,Default,,0000,0000,0000,,It looks at the event—or constraint—inside the curly braces,
Dialogue: 0,0:03:00.41,0:03:05.23,Default,,0000,0000,0000,,and it returns a value of 0 or 1, depending on\Nwhether that event is true or not.
Dialogue: 0,0:03:05.72,0:03:13.60,Default,,0000,0000,0000,,And so, if we wanted to represent this factor as a log-linear model,
Dialogue: 0,0:03:13.63,0:03:19.92,Default,,0000,0000,0000,,We can see that we can simply sum up over all the four values of k and ℓ,
Dialogue: 0,0:03:19.92,0:03:22.69,Default,,0000,0000,0000,,which are either 0 or 1, each of them.
Dialogue: 0,0:03:22.71,0:03:25.41,Default,,0000,0000,0000,,So were summing up over all four entries here.
Dialogue: 0,0:03:25.41,0:03:32.48,Default,,0000,0000,0000,,And we have a parameter—or coefficient—w_kℓ which multiplies this feature.
Dialogue: 0,0:03:33.30,0:03:44.12,Default,,0000,0000,0000,,And so, we would have a summation of w_kℓ:
Dialogue: 0,0:03:44.14,0:03:49.22,Default,,0000,0000,0000,,of w00 only in the case where X1 is zero and X2 is zero.
Dialogue: 0,0:03:49.26,0:04:00.48,Default,,0000,0000,0000,,So we would have exp of negative w00 when X1=0 and X2=0,
Dialogue: 0,0:04:01.11,0:04:12.04,Default,,0000,0000,0000,,and we would have exp of negative w01 when\NX1=0 and X2=1, and so on and so forth.
Dialogue: 0,0:04:12.39,0:04:15.51,Default,,0000,0000,0000,,And it’s not difficult to convince ourselves that
Dialogue: 0,0:04:15.52,0:04:24.77,Default,,0000,0000,0000,,if we define w_kℓ to be the negative log of the corresponding entries in this table,
Dialogue: 0,0:04:24.80,0:04:28.65,Default,,0000,0000,0000,,then that gives us right back the factor that\Nwe defined to begin with.
Dialogue: 0,0:04:28.68,0:04:33.14,Default,,0000,0000,0000,,So this shows that this is a general representation,
Dialogue: 0,0:04:34.00,0:04:37.27,Default,,0000,0000,0000,,in the sense that we can take any factor
Dialogue: 0,0:04:37.90,0:04:40.53,Default,,0000,0000,0000,,and represent it as a log-linear model
Dialogue: 0,0:04:40.54,0:04:47.18,Default,,0000,0000,0000,,simply by including all of the appropriate features.
Dialogue: 0,0:04:48.60,0:04:52.36,Default,,0000,0000,0000,,But we don’t generally want to do that.
Dialogue: 0,0:04:52.36,0:04:55.80,Default,,0000,0000,0000,,Generally we want a much finer grain set of features.
Dialogue: 0,0:04:55.80,0:05:01.20,Default,,0000,0000,0000,,So let’s look at some of the examples of features that people use in practice.
Dialogue: 0,0:05:01.20,0:05:03.84,Default,,0000,0000,0000,,So here are the features used in a language model.
Dialogue: 0,0:05:03.85,0:05:07.93,Default,,0000,0000,0000,,This is a language model that we that we discussed previously.
Dialogue: 0,0:05:07.95,0:05:12.28,Default,,0000,0000,0000,,And here we have features that relate:
Dialogue: 0,0:05:12.29,0:05:15.84,Default,,0000,0000,0000,,First of all, let’s just remind ourselves [that] we have two sets of variables.
Dialogue: 0,0:05:15.84,0:05:23.68,Default,,0000,0000,0000,,We have the variables Y which represent the annotations for each word
Dialogue: 0,0:05:23.70,0:05:29.96,Default,,0000,0000,0000,,in the sequence corresponding to what category that corresponds to.
Dialogue: 0,0:05:29.96,0:05:32.43,Default,,0000,0000,0000,,So this is a person.
Dialogue: 0,0:05:32.43,0:05:34.90,Default,,0000,0000,0000,,This is the beginning of a person name.
Dialogue: 0,0:05:34.94,0:05:37.85,Default,,0000,0000,0000,,This is the continuation of a person name.
Dialogue: 0,0:05:37.85,0:05:39.30,Default,,0000,0000,0000,,The beginning of a location.
Dialogue: 0,0:05:39.30,0:05:42.24,Default,,0000,0000,0000,,The continuation of a location, and so on.
Dialogue: 0,0:05:42.24,0:05:45.69,Default,,0000,0000,0000,,As well as a bunch of words that are not:
Dialogue: 0,0:05:45.69,0:05:48.44,Default,,0000,0000,0000,,[i.e.,] none of person, location, organization.
Dialogue: 0,0:05:48.44,0:05:50.77,Default,,0000,0000,0000,,And they’re all labeled “other”.
Dialogue: 0,0:05:50.78,0:05:54.94,Default,,0000,0000,0000,,And so the value Y tells us for each word what\Ncategory it belongs to,
Dialogue: 0,0:05:54.94,0:05:59.38,Default,,0000,0000,0000,,so that we’re trying to identify people, locations, and\Norganizations in the sentence.
Dialogue: 0,0:05:59.38,0:06:02.15,Default,,0000,0000,0000,,We have another set of variables X,
Dialogue: 0,0:06:03.32,0:06:08.69,Default,,0000,0000,0000,,which are the actual words in the sentence.
Dialogue: 0,0:06:09.22,0:06:12.81,Default,,0000,0000,0000,,Now we can go ahead and define…
Dialogue: 0,0:06:12.81,0:06:15.85,Default,,0000,0000,0000,,We can use a full table representation that
Dialogue: 0,0:06:15.85,0:06:22.73,Default,,0000,0000,0000,,basically tries to relate each and every Y that has a feature,
Dialogue: 0,0:06:22.73,0:06:27.68,Default,,0000,0000,0000,,that has a full factor that looks at every possible word in the English language;
Dialogue: 0,0:06:27.68,0:06:31.67,Default,,0000,0000,0000,,but those are going to be very, very, expensive,
Dialogue: 0,0:06:31.67,0:06:34.07,Default,,0000,0000,0000,,and a very large number of parameters.
Dialogue: 0,0:06:34.12,0:06:40.05,Default,,0000,0000,0000,,And so we're going to define a feature that looks, for example, at f of
Dialogue: 0,0:06:40.05,0:06:45.20,Default,,0000,0000,0000,,say a particular Y_i, which is the label for the i’th word in the sentence,
Dialogue: 0,0:06:45.20,0:06:47.65,Default,,0000,0000,0000,,and X_i, being that i’th word.
Dialogue: 0,0:06:47.65,0:06:54.84,Default,,0000,0000,0000,,And that feature says, for example: Y_i equals person.
Dialogue: 0,0:06:55.84,0:07:03.75,Default,,0000,0000,0000,,It’s the indicator function for “Y_i = person and X_i is capitalized”.
Dialogue: 0,0:07:05.83,0:07:09.69,Default,,0000,0000,0000,,And so that feature doesn’t look at the individual words.
Dialogue: 0,0:07:09.69,0:07:13.19,Default,,0000,0000,0000,,It just looks at whether that word is capitalized.
Dialogue: 0,0:07:13.19,0:07:17.43,Default,,0000,0000,0000,,Now we have just the single parameter that looks just at capitalization,
Dialogue: 0,0:07:17.43,0:07:23.14,Default,,0000,0000,0000,,and parameterizes how important is capitalization for recognizing that something's a person.
Dialogue: 0,0:07:23.15,0:07:26.95,Default,,0000,0000,0000,,We could also have another feature.
Dialogue: 0,0:07:26.95,0:07:28.86,Default,,0000,0000,0000,,This is an alternative:
Dialogue: 0,0:07:28.86,0:07:32.59,Default,,0000,0000,0000,,This a different feature that can and could be part of the same model
Dialogue: 0,0:07:32.59,0:07:38.38,Default,,0000,0000,0000,,that says: Y_i is equal to location,
Dialogue: 0,0:07:38.38,0:07:41.06,Default,,0000,0000,0000,,Or, actually, I was little bit imprecise here—
Dialogue: 0,0:07:41.06,0:07:45.46,Default,,0000,0000,0000,,This might be beginning of person. This might be beginning of location.
Dialogue: 0,0:07:45.48,0:07:51.41,Default,,0000,0000,0000,,And X_i appears in some atlas.
Dialogue: 0,0:07:52.14,0:07:55.37,Default,,0000,0000,0000,,Now there is other things that appear in the atlas than locations,
Dialogue: 0,0:07:55.37,0:07:58.65,Default,,0000,0000,0000,,but if a word appears in the atlas,
Dialogue: 0,0:07:58.66,0:08:01.97,Default,,0000,0000,0000,,there is a much higher probability presumably that it’s actually a location
Dialogue: 0,0:08:02.00,0:08:06.20,Default,,0000,0000,0000,,and so we might have, again, [a] weight for this feature
Dialogue: 0,0:08:06.20,0:08:13.61,Default,,0000,0000,0000,,that indicates that maybe increases the probability in Y_i being labeled in this way.
Dialogue: 0,0:08:13.61,0:08:19.44,Default,,0000,0000,0000,,And so you can imagine that constructing a very rich set of features,
Dialogue: 0,0:08:19.44,0:08:24.16,Default,,0000,0000,0000,,all of which look at certain aspects of the word,
Dialogue: 0,0:08:24.16,0:08:31.90,Default,,0000,0000,0000,,and rather than enumerating all the possible words\Nand giving a parameter to each and one of them.
Dialogue: 0,0:08:33.56,0:08:38.07,Default,,0000,0000,0000,,Let’s look at some other examples of feature-based models.
Dialogue: 0,0:08:38.12,0:08:40.91,Default,,0000,0000,0000,,So this is an example from statistical physics.
Dialogue: 0,0:08:40.91,0:08:43.44,Default,,0000,0000,0000,,It’s called the Ising model.
Dialogue: 0,0:08:43.44,0:08:48.78,Default,,0000,0000,0000,,And the Ising model is something that looks at pairs\Nof variables.
Dialogue: 0,0:08:48.78,0:08:51.77,Default,,0000,0000,0000,,It’s a pairwise Markov network.
Dialogue: 0,0:08:53.13,0:08:55.67,Default,,0000,0000,0000,,And [it] looks the pairs of adjacent variables,
Dialogue: 0,0:08:55.67,0:08:59.50,Default,,0000,0000,0000,,and basically gives us a coefficient for their products.
Dialogue: 0,0:08:59.50,0:09:03.51,Default,,0000,0000,0000,,So now, this is a case where variables are in the end are binary,
Dialogue: 0,0:09:03.51,0:09:07.84,Default,,0000,0000,0000,,but not in the space {0, 1} but rather\Nnegative one and positive one.
Dialogue: 0,0:09:07.84,0:09:10.31,Default,,0000,0000,0000,,And so now, we have a model that's parametrized
Dialogue: 0,0:09:10.31,0:09:13.97,Default,,0000,0000,0000,,as features that are just the product of the values of the adjacent variables.
Dialogue: 0,0:09:14.34,0:09:15.61,Default,,0000,0000,0000,,Where might this come up?
Dialogue: 0,0:09:15.61,0:09:23.16,Default,,0000,0000,0000,,It comes up in the context, for example, of modeling the spins of electrons in a grid.
Dialogue: 0,0:09:23.70,0:09:28.74,Default,,0000,0000,0000,,So here you have a case where the electrons can rotate
Dialogue: 0,0:09:28.74,0:09:31.59,Default,,0000,0000,0000,,either along one direction or in the other direction
Dialogue: 0,0:09:31.59,0:09:40.54,Default,,0000,0000,0000,,so here is a bunch of the atoms that are marked with a blue arrow.
Dialogue: 0,0:09:40.54,0:09:43.06,Default,,0000,0000,0000,,You have one rotational axis,
Dialogue: 0,0:09:43.06,0:09:45.58,Default,,0000,0000,0000,,and the red arrow[s] are rotating in the opposite direction.
Dialogue: 0,0:09:46.08,0:09:52.54,Default,,0000,0000,0000,,And this basically says we have a term that
Dialogue: 0,0:09:52.54,0:09:57.06,Default,,0000,0000,0000,,[whose] probability distribution over the joint set of spins.
Dialogue: 0,0:09:57.06,0:10:01.52,Default,,0000,0000,0000,,(So this is the joint spins.)
Dialogue: 0,0:10:01.95,0:10:08.14,Default,,0000,0000,0000,,And the model, depends on whether adjacent\Natoms have the same spin or opposite spin.
Dialogue: 0,0:10:08.14,0:10:12.44,Default,,0000,0000,0000,,So notice that one times one is the same as negative one times negative one.
Dialogue: 0,0:10:12.44,0:10:16.54,Default,,0000,0000,0000,,So this really just looks at whether they have the same spin\Nor different spins.
Dialogue: 0,0:10:16.54,0:10:22.89,Default,,0000,0000,0000,,And there is a parameter that looks at, you know, same or\Ndifferent.
Dialogue: 0,0:10:24.09,0:10:26.71,Default,,0000,0000,0000,,That's what this feature represents.
Dialogue: 0,0:10:27.17,0:10:32.51,Default,,0000,0000,0000,,And, depending on the value of this parameter over here,
Dialogue: 0,0:10:32.51,0:10:35.01,Default,,0000,0000,0000,,if the parameter goes one way,
Dialogue: 0,0:10:35.01,0:10:39.60,Default,,0000,0000,0000,,we're going to favor systems
Dialogue: 0,0:10:39.60,0:10:42.51,Default,,0000,0000,0000,,where the atoms spin in the same direction.
Dialogue: 0,0:10:42.51,0:10:48.05,Default,,0000,0000,0000,,And if it’s going in the opposite direction, you’re going to favor atoms that spin in the different direction.
Dialogue: 0,0:10:48.05,0:10:51.27,Default,,0000,0000,0000,,And those are called ferromagnetic and anti-ferromagnetic.
Dialogue: 0,0:10:52.51,0:10:58.54,Default,,0000,0000,0000,,Furthermore, you can define in these systems the notion of a temperature.
Dialogue: 0,0:10:58.81,0:11:04.72,Default,,0000,0000,0000,,So the temperature here says how strong is this connection.
Dialogue: 0,0:11:04.72,0:11:13.13,Default,,0000,0000,0000,,So notice that as T grows—as the temperature grows—the w_ij’s get divided by T.
Dialogue: 0,0:11:15.70,0:11:19.27,Default,,0000,0000,0000,,And they all kind of go towards zero,
Dialogue: 0,0:11:20.99,0:11:24.81,Default,,0000,0000,0000,,which means that the strength of the connection between
Dialogue: 0,0:11:24.81,0:11:27.90,Default,,0000,0000,0000,,adjacent atoms, effectively becomes almost moot,
Dialogue: 0,0:11:27.90,0:11:30.98,Default,,0000,0000,0000,,and they become almost decoupled from each other.
Dialogue: 0,0:11:30.98,0:11:36.22,Default,,0000,0000,0000,,On the other hand, as the temperature decreases,
Dialogue: 0,0:11:38.20,0:11:43.46,Default,,0000,0000,0000,,Then the effect of the interaction between the atoms becomes much more significant
Dialogue: 0,0:11:43.46,0:11:46.92,Default,,0000,0000,0000,,and they’re going to impose much stronger constraints on each other.
Dialogue: 0,0:11:46.92,0:11:49.43,Default,,0000,0000,0000,,And this is actually a model of a real physical system.
Dialogue: 0,0:11:49.43,0:11:51.94,Default,,0000,0000,0000,,I mean, this is real temperature, and real atoms, and so on.
Dialogue: 0,0:11:51.94,0:11:57.08,Default,,0000,0000,0000,,And sure enough, if you look at what happens to these models as a function of temperature,
Dialogue: 0,0:11:57.08,0:12:00.64,Default,,0000,0000,0000,,what we see over here is high temperature.
Dialogue: 0,0:12:02.28,0:12:04.22,Default,,0000,0000,0000,,This is high temperature
Dialogue: 0,0:12:04.22,0:12:10.03,Default,,0000,0000,0000,,and you can see that there is a lot of mixing between the two types of spin
Dialogue: 0,0:12:10.03,0:12:11.92,Default,,0000,0000,0000,,and this is low temperature
Dialogue: 0,0:12:12.59,0:12:20.49,Default,,0000,0000,0000,,and you can see that there is much stronger\Nconstraints in this configuration
Dialogue: 0,0:12:20.49,0:12:23.00,Default,,0000,0000,0000,,about the spins of adjacent atoms.
Dialogue: 0,0:12:26.96,0:12:33.84,Default,,0000,0000,0000,,Another kind of feature that's used very much in lots of practical applications
Dialogue: 0,0:12:33.84,0:12:37.68,Default,,0000,0000,0000,,is the notion of a metric, of a metric feature, an M.R.F.
Dialogue: 0,0:12:37.68,0:12:39.70,Default,,0000,0000,0000,,So what's a metric feature?
Dialogue: 0,0:12:39.70,0:12:41.80,Default,,0000,0000,0000,,This is something that comes up, mostly in cases
Dialogue: 0,0:12:41.80,0:12:48.43,Default,,0000,0000,0000,,where you have a bunch of random variables X_i that all take values in some joint label space of V.
Dialogue: 0,0:12:48.43,0:12:50.27,Default,,0000,0000,0000,,So, for example, they might all be binary.
Dialogue: 0,0:12:50.27,0:12:52.85,Default,,0000,0000,0000,,They all might take values one, two, three, four.
Dialogue: 0,0:12:52.85,0:12:57.25,Default,,0000,0000,0000,,And what we'd like to do is
Dialogue: 0,0:12:57.25,0:13:03.47,Default,,0000,0000,0000,,we have X_i and X_j that are connected to each other by an edge.
Dialogue: 0,0:13:03.47,0:13:08.21,Default,,0000,0000,0000,,We want X_ and X_j to take “similar” values.
Dialogue: 0,0:13:08.21,0:13:11.97,Default,,0000,0000,0000,,So in order to enforce the fact that X_i and X_j should take similar values
Dialogue: 0,0:13:11.97,0:13:13.94,Default,,0000,0000,0000,,we need a notion of similarity.
Dialogue: 0,0:13:13.94,0:13:24.11,Default,,0000,0000,0000,,And we're going to encode that using the distance function µ that takes two values, one for X_i and one for X_j’s,
Dialogue: 0,0:13:24.11,0:13:26.26,Default,,0000,0000,0000,,[that] says how close are they to each other.
Dialogue: 0,0:13:26.26,0:13:28.94,Default,,0000,0000,0000,,So what does the distance function need to be?
Dialogue: 0,0:13:28.94,0:13:35.55,Default,,0000,0000,0000,,Well, the distance function needs to satisfy the standard condition on a distance function or a metric.
Dialogue: 0,0:13:35.55,0:13:39.38,Default,,0000,0000,0000,,So first is reflexivity,
Dialogue: 0,0:13:39.38,0:13:43.46,Default,,0000,0000,0000,,which means that if the two variables take on the same value,
Dialogue: 0,0:13:43.46,0:13:45.68,Default,,0000,0000,0000,,then that distance better be zero.
Dialogue: 0,0:13:48.60,0:13:53.74,Default,,0000,0000,0000,,Oh I forgot to say that this. Sorry, this needs to be a non-negative function.
Dialogue: 0,0:13:53.74,0:14:00.73,Default,,0000,0000,0000,,Symmetry means that the distances are symetrical.
Dialogue: 0,0:14:00.73,0:14:06.28,Default,,0000,0000,0000,,So the distance between two values v1 and v2 are the same as the distance between v2 and v1.
Dialogue: 0,0:14:06.28,0:14:12.32,Default,,0000,0000,0000,,And finally is the triangle inequality, which says that the distance between v1 and v2
Dialogue: 0,0:14:12.32,0:14:13.50,Default,,0000,0000,0000,,(So here is v1)
Dialogue: 0,0:14:13.50,0:14:14.68,Default,,0000,0000,0000,,(Here is v2)
Dialogue: 0,0:14:15.48,0:14:21.09,Default,,0000,0000,0000,,and the distance between v1 and v2 is less than the distance between v1 and v3
Dialogue: 0,0:14:21.89,0:14:25.13,Default,,0000,0000,0000,,and then going to v2. So the standard triangle inequality.
Dialogue: 0,0:14:26.51,0:14:32.74,Default,,0000,0000,0000,,if a distance just satisfies these two conditions, it's called a semi metric.
Dialogue: 0,0:14:33.85,0:14:37.42,Default,,0000,0000,0000,,Otherwise, if it satisfies all three, it's called a metric.
Dialogue: 0,0:14:37.85,0:14:41.38,Default,,0000,0000,0000,,And both are actually used in practical applications.
Dialogue: 0,0:14:43.93,0:14:48.56,Default,,0000,0000,0000,,But how do we take this distance feature and put it in the context of an MRF?
Dialogue: 0,0:14:48.56,0:14:55.67,Default,,0000,0000,0000,,We have a feature that looks at two variables, X_i and X_j.
Dialogue: 0,0:14:55.67,0:14:59.48,Default,,0000,0000,0000,,And that feature is the distance between X_i and X_j.
Dialogue: 0,0:14:59.48,0:15:08.68,Default,,0000,0000,0000,,And now, we put it together by multiplying that with a coefficient, w_ij,
Dialogue: 0,0:15:08.68,0:15:13.02,Default,,0000,0000,0000,,such that w_ij has to be greater than zero.
Dialogue: 0,0:15:13.02,0:15:16.43,Default,,0000,0000,0000,,So that we want the metric MRF
Dialogue: 0,0:15:17.28,0:15:21.19,Default,,0000,0000,0000,,[to have] the effect that
Dialogue: 0,0:15:21.19,0:15:29.35,Default,,0000,0000,0000,,the lower the distance, the higher this is,
Dialogue: 0,0:15:30.27,0:15:36.44,Default,,0000,0000,0000,,because of the negative coefficient, which means that higher the probability. Okay?
Dialogue: 0,0:15:37.01,0:15:43.16,Default,,0000,0000,0000,,So, the more pairs you have that are close to each other
Dialogue: 0,0:15:43.16,0:15:47.28,Default,,0000,0000,0000,,and the closer they are to each other the higher\Nthe probability of the overall configuration.
Dialogue: 0,0:15:47.28,0:15:49.65,Default,,0000,0000,0000,,Which is exactly what we wanted to have happen.
Dialogue: 0,0:15:53.10,0:15:58.42,Default,,0000,0000,0000,,So, conversely, if you have values that are far from\Neach other in the distance metric
Dialogue: 0,0:15:58.42,0:16:00.60,Default,,0000,0000,0000,,the lower the probability in the model.
Dialogue: 0,0:16:01.80,0:16:04.56,Default,,0000,0000,0000,,So, here are some examples of metric MRF’s.
Dialogue: 0,0:16:04.83,0:16:07.82,Default,,0000,0000,0000,,So one: The simplest possible metric MRF
Dialogue: 0,0:16:07.82,0:16:11.98,Default,,0000,0000,0000,,is one that gives [a] distance of zero when the two classes are equal to each other
Dialogue: 0,0:16:11.98,0:16:13.88,Default,,0000,0000,0000,,and [a] distance of one everywhere else.
Dialogue: 0,0:16:13.88,0:16:16.47,Default,,0000,0000,0000,,So now, you know, this is just like a step function.
Dialogue: 0,0:16:16.47,0:16:22.58,Default,,0000,0000,0000,,And, this gives rise to a potential that looks like this.
Dialogue: 0,0:16:22.58,0:16:26.58,Default,,0000,0000,0000,,So we have 0’s on the diagonal.
Dialogue: 0,0:16:26.58,0:16:33.62,Default,,0000,0000,0000,,So we get a bump in the probability when the two adjacent variables take on the same label
Dialogue: 0,0:16:33.62,0:16:37.12,Default,,0000,0000,0000,,and otherwise we get a reduction in the probability.
Dialogue: 0,0:16:37.14,0:16:40.69,Default,,0000,0000,0000,,But it doesn’t matter what particular value they take.
Dialogue: 0,0:16:40.69,0:16:44.09,Default,,0000,0000,0000,,That’s one example of a simple metric.
Dialogue: 0,0:16:44.09,0:16:51.100,Default,,0000,0000,0000,,A somewhat more expressive example might come up when the values V are actually numerical values.
Dialogue: 0,0:16:52.03,0:16:58.10,Default,,0000,0000,0000,,In which case you can look at maybe the difference between the miracle values.
Dialogue: 0,0:16:58.10,0:17:00.79,Default,,0000,0000,0000,,So, v_k minus v_l.
Dialogue: 0,0:17:00.79,0:17:05.10,Default,,0000,0000,0000,,And you want, and when v_k is equal to v_l, the distance is zero,
Dialogue: 0,0:17:05.10,0:17:14.30,Default,,0000,0000,0000,,and then you have a linear function that increases the\Ndistance as the distance between v_k and v_l grows.
Dialogue: 0,0:17:14.30,0:17:18.73,Default,,0000,0000,0000,,So, this is the absolute value of v_k minus v_l.
Dialogue: 0,0:17:20.35,0:17:26.01,Default,,0000,0000,0000,,A more interesting notion that comes up a lot in\Npractice is:
Dialogue: 0,0:17:26.01,0:17:32.37,Default,,0000,0000,0000,,we don’t want to penalize arbitrarily things that are far way from each other in label space.
Dialogue: 0,0:17:32.37,0:17:37.44,Default,,0000,0000,0000,,So this is what is called a truncated linear penalty.
Dialogue: 0,0:17:38.16,0:17:41.85,Default,,0000,0000,0000,,And you can see that beyond a certain threshold,
Dialogue: 0,0:17:44.07,0:17:48.65,Default,,0000,0000,0000,,the penalty just becomes constant, so it plateaus.
Dialogue: 0,0:17:48.67,0:17:54.05,Default,,0000,0000,0000,,So that there is a penalty, but it doesn’t keep increasing over as the labels get further from each other
Dialogue: 0,0:17:54.05,0:17:59.77,Default,,0000,0000,0000,,One example where metric MRF’s are used is when we’re doing image segmentation.
Dialogue: 0,0:17:59.77,0:18:04.14,Default,,0000,0000,0000,,And here we tend to favor segmentations where adjacent superpixels…
Dialogue: 0,0:18:06.55,0:18:09.06,Default,,0000,0000,0000,,(These are adjacent superpixels.)
Dialogue: 0,0:18:10.40,0:18:12.70,Default,,0000,0000,0000,,And we want them to take the same class.
Dialogue: 0,0:18:18.15,0:18:22.28,Default,,0000,0000,0000,,And so here we have no penalty when the superpixels take the same class
Dialogue: 0,0:18:22.28,0:18:25.20,Default,,0000,0000,0000,,and we have some penalty when they take different classes.
Dialogue: 0,0:18:25.20,0:18:30.95,Default,,0000,0000,0000,,And this is actually a very common, albeit simple, model for\Nimage segmentation.
Dialogue: 0,0:18:32.16,0:18:37.14,Default,,0000,0000,0000,,Let’s look at a different MRF, also in the context of\Ncomputer vision.
Dialogue: 0,0:18:37.14,0:18:40.44,Default,,0000,0000,0000,,This is an MRF that’s used for image denoising.
Dialogue: 0,0:18:40.44,0:18:45.19,Default,,0000,0000,0000,,So here we have a noisy version of a real image that looks like this.
Dialogue: 0,0:18:45.21,0:18:50.82,Default,,0000,0000,0000,,So this is, you can see this kind of, white noise overlayed on top of the image.
Dialogue: 0,0:18:50.82,0:18:53.92,Default,,0000,0000,0000,,And what we’d like to do, is we’d like to get a cleaned-up version of the image.
Dialogue: 0,0:18:53.92,0:19:00.80,Default,,0000,0000,0000,,So here we have, a set of variables, X, that correspond to the noisy pixels.
Dialogue: 0,0:19:02.23,0:19:07.74,Default,,0000,0000,0000,,And we have a set of variables, Y, that corresponds to the cleaned pixels.
Dialogue: 0,0:19:07.74,0:19:15.43,Default,,0000,0000,0000,,And we'd like to have a probabilistic model that relates X and Y.
Dialogue: 0,0:19:15.43,0:19:20.52,Default,,0000,0000,0000,,And what we’re going to do is we’d like, so, intuiti—, I mean,
Dialogue: 0,0:19:20.52,0:19:25.49,Default,,0000,0000,0000,,so you’d like to have two effects on the pixels Y:
Dialogue: 0,0:19:25.49,0:19:33.10,Default,,0000,0000,0000,,First, you'd like Y_i to be close to X_i.
Dialogue: 0,0:19:33.10,0:19:36.37,Default,,0000,0000,0000,,But if you just do that, then you're just going to stick with\Nthe original image.
Dialogue: 0,0:19:36.37,0:19:41.35,Default,,0000,0000,0000,,So what is the main constraint that we can employ on the image in order to clean it up
Dialogue: 0,0:19:41.35,0:19:46.92,Default,,0000,0000,0000,,is the fact that adjacent pixels tend to have the same value.
Dialogue: 0,0:19:46.92,0:19:52.91,Default,,0000,0000,0000,,So in this case what we’re going to do is we’re going to model, we’re going to constrain the image
Dialogue: 0,0:19:52.91,0:20:00.21,Default,,0000,0000,0000,,so that we’re going to constrain the Y_i’s to try and make Y_i close to its neighbors.
Dialogue: 0,0:20:03.01,0:20:05.83,Default,,0000,0000,0000,,And the further away it is, the bigger the penalty.
Dialogue: 0,0:20:05.83,0:20:08.22,Default,,0000,0000,0000,,And that's a metric MRF.
Dialogue: 0,0:20:11.53,0:20:17.08,Default,,0000,0000,0000,,Now we could use just a linear penalty,
Dialogue: 0,0:20:17.08,0:20:21.89,Default,,0000,0000,0000,,but that’s going to be a very fragile model,
Dialogue: 0,0:20:21.89,0:20:28.21,Default,,0000,0000,0000,,because, now obviously the right answer isn't the model\Nwhere all pixels are equal to each other
Dialogue: 0,0:20:28.21,0:20:33.87,Default,,0000,0000,0000,,in their actual intensity value because that would be just a single, you know, grayish-looking image.
Dialogue: 0,0:20:33.87,0:20:39.69,Default,,0000,0000,0000,,So what you like is that you would like to let one pixel depart from its adjacent pixel
Dialogue: 0,0:20:39.69,0:20:44.10,Default,,0000,0000,0000,,if it’s getting close in a different direction either by its own observation or by other adjacent pixels.
Dialogue: 0,0:20:44.10,0:20:49.48,Default,,0000,0000,0000,,And so the right model to use here is actually the truncated linear model
Dialogue: 0,0:20:49.48,0:20:52.17,Default,,0000,0000,0000,,and that one is [the] one that’s commonly used
Dialogue: 0,0:20:52.17,0:20:54.97,Default,,0000,0000,0000,,and is very successful for doing image denoising.
Dialogue: 0,0:20:55.43,0:21:02.06,Default,,0000,0000,0000,,Interesting, almost exactly the same idea is used in the context of stereo reconstruction.
Dialogue: 0,0:21:02.06,0:21:06.26,Default,,0000,0000,0000,,There, the values that you’d like to infer, the Y_i’s,
Dialogue: 0,0:21:06.26,0:21:11.56,Default,,0000,0000,0000,,are the depth disparity for a given pixel in the image—how deep it is.
Dialogue: 0,0:21:11.56,0:21:13.96,Default,,0000,0000,0000,,And here also we have spacial continuity.
Dialogue: 0,0:21:13.96,0:21:19.39,Default,,0000,0000,0000,,We like the depth of one pixel to be close to the depth of an adjacent pixel.
Dialogue: 0,0:21:19.39,0:21:22.50,Default,,0000,0000,0000,,But once again we don’t want to enforce this too strongly
Dialogue: 0,0:21:22.50,0:21:25.16,Default,,0000,0000,0000,,because you do have depth disparity in the image
Dialogue: 0,0:21:25.16,0:21:27.82,Default,,0000,0000,0000,,and so eventually you'd like things to be allowed to break away from each other.
Dialogue: 0,0:21:27.82,0:21:33.32,Default,,0000,0000,0000,,And so once again, one typically uses some kind of truncated linear model
Dialogue: 0,0:21:33.32,0:21:36.24,Default,,0000,0000,0000,,for doing this stereo construction,
Dialogue: 0,0:21:36.24,0:21:38.09,Default,,0000,0000,0000,,often augmented by other little tricks.
Dialogue: 0,0:21:38.09,0:21:45.14,Default,,0000,0000,0000,,So, for example, here we have the actual pixel appearance,
Dialogue: 0,0:21:45.14,0:21:47.87,Default,,0000,0000,0000,,for example, the color and texture.
Dialogue: 0,0:21:47.87,0:21:50.39,Default,,0000,0000,0000,,And if the color and texture are very similar to each other,
Dialogue: 0,0:21:50.41,0:21:54.86,Default,,0000,0000,0000,,you might want to have the stronger constraint on similarity.
Dialogue: 0,0:21:54.86,0:22:00.26,Default,,0000,0000,0000,,Versus: if the color and texture of the adjacent pixels are\Nvery different from each other,
Dialogue: 0,0:22:00.26,0:22:02.87,Default,,0000,0000,0000,,they may be more likely to belong to different objects
Dialogue: 0,0:22:02.87,0:22:07.58,Default,,0000,0000,0000,,and you don’t want to enforce quite as strong of a similarity constraint.