[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.00,0:00:05.44,Default,,0000,0000,0000,,Local structure that doesn’t require full table representations Dialogue: 0,0:00:05.45,0:00:09.07,Default,,0000,0000,0000,,is important in both directed and undirected models. Dialogue: 0,0:00:09.07,0:00:14.87,Default,,0000,0000,0000,,How do we incorporate local structure into undirected models? Dialogue: 0,0:00:14.87,0:00:22.08,Default,,0000,0000,0000,,The framework for that is called “log-linear models” for reasons that will be clear in just a moment. Dialogue: 0,0:00:23.08,0:00:24.23,Default,,0000,0000,0000,,So Dialogue: 0,0:00:26.57,0:00:33.14,Default,,0000,0000,0000,,Whereas, in the original representation of the unnormalized density Dialogue: 0,0:00:33.14,0:00:39.19,Default,,0000,0000,0000,,we defined P tilde as the product of factors φi(Di), Dialogue: 0,0:00:39.22,0:00:42.30,Default,,0000,0000,0000,,each [of] which is potentially a full table. Dialogue: 0,0:00:42.89,0:00:46.03,Default,,0000,0000,0000,,Now we're going to shift that representation Dialogue: 0,0:00:46.03,0:00:49.98,Default,,0000,0000,0000,,to something that uses a linear form Dialogue: 0,0:00:50.00,0:00:52.45,Default,,0000,0000,0000,,(So here's a linear form) Dialogue: 0,0:00:52.45,0:00:54.89,Default,,0000,0000,0000,,that is subsequently exponentiated, Dialogue: 0,0:00:54.92,0:00:56.97,Default,,0000,0000,0000,,and that's why it's called log-linear— Dialogue: 0,0:00:56.97,0:00:59.63,Default,,0000,0000,0000,,because the logarithm is a linear function. Dialogue: 0,0:00:59.63,0:01:02.74,Default,,0000,0000,0000,,So what is this form over here? Dialogue: 0,0:01:02.74,0:01:06.99,Default,,0000,0000,0000,,It's a linear function that has these things that are called “coefficients” Dialogue: 0,0:01:10.17,0:01:12.63,Default,,0000,0000,0000,,and these things that are called “features”. Dialogue: 0,0:01:14.20,0:01:22.11,Default,,0000,0000,0000,,Features, like factors, each have a scope which is a set of variables on which the feature depends. Dialogue: 0,0:01:24.15,0:01:27.03,Default,,0000,0000,0000,,But different features can have the same scope. Dialogue: 0,0:01:27.03,0:01:31.18,Default,,0000,0000,0000,,You can have multiple features all of which are over the same set of variables. Dialogue: 0,0:01:31.18,0:01:36.47,Default,,0000,0000,0000,,Notice that each feature has just a single parameter wj that multiplies it. Dialogue: 0,0:01:37.66,0:01:40.31,Default,,0000,0000,0000,,So, what does this give rise to? Dialogue: 0,0:01:40.31,0:01:43.46,Default,,0000,0000,0000,,I mean if we have a log-linear model, Dialogue: 0,0:01:43.49,0:01:49.46,Default,,0000,0000,0000,,we can push in the exponent through the summation, Dialogue: 0,0:01:49.46,0:01:58.76,Default,,0000,0000,0000,,and that gives us something that is a product of exponential functions. Dialogue: 0,0:01:58.80,0:02:03.53,Default,,0000,0000,0000,,You can think of each of these as effectively a little factor, Dialogue: 0,0:02:03.53,0:02:07.69,Default,,0000,0000,0000,,but it’s a factor that only has a single parameter wj. Dialogue: 0,0:02:07.69,0:02:11.20,Default,,0000,0000,0000,,Since this is a little bit abstract, so let’s look at an example. Dialogue: 0,0:02:11.20,0:02:17.69,Default,,0000,0000,0000,,Specifically lets look at how we might represent a simple table factor as a log linear model. Dialogue: 0,0:02:17.71,0:02:24.29,Default,,0000,0000,0000,,So here’s a param, here’s a factor φ, over two binary random variables X1 and X2. Dialogue: 0,0:02:24.29,0:02:30.79,Default,,0000,0000,0000,,And so a full table factor would have four parameters: a00, a01, a10, and a11. Dialogue: 0,0:02:30.79,0:02:34.55,Default,,0000,0000,0000,,So we can capture this model using a log linear model, Dialogue: 0,0:02:34.55,0:02:37.96,Default,,0000,0000,0000,,using a set of such of features, Dialogue: 0,0:02:37.96,0:02:41.36,Default,,0000,0000,0000,,using a set of these guys, which are indicator functions. Dialogue: 0,0:02:41.36,0:02:43.22,Default,,0000,0000,0000,,So this is an indicator function. Dialogue: 0,0:02:43.22,0:02:48.96,Default,,0000,0000,0000,,It takes one if X1 is zero and X2 is zero, Dialogue: 0,0:02:48.98,0:02:50.88,Default,,0000,0000,0000,,and it takes zero otherwise. Dialogue: 0,0:02:51.04,0:02:54.09,Default,,0000,0000,0000,,So this the general notion of an indicator function. Dialogue: 0,0:02:54.12,0:03:00.32,Default,,0000,0000,0000,,It looks at the event—or constraint—inside the curly braces, Dialogue: 0,0:03:00.41,0:03:05.23,Default,,0000,0000,0000,,and it returns a value of 0 or 1, depending on\Nwhether that event is true or not. Dialogue: 0,0:03:05.72,0:03:13.60,Default,,0000,0000,0000,,And so, if we wanted to represent this factor as a log-linear model, Dialogue: 0,0:03:13.63,0:03:19.92,Default,,0000,0000,0000,,We can see that we can simply sum up over all the four values of k and ℓ, Dialogue: 0,0:03:19.92,0:03:22.69,Default,,0000,0000,0000,,which are either 0 or 1, each of them. Dialogue: 0,0:03:22.71,0:03:25.41,Default,,0000,0000,0000,,So were summing up over all four entries here. Dialogue: 0,0:03:25.41,0:03:32.48,Default,,0000,0000,0000,,And we have a parameter—or coefficient—w_kℓ which multiplies this feature. Dialogue: 0,0:03:33.30,0:03:44.12,Default,,0000,0000,0000,,And so, we would have a summation of w_kℓ: Dialogue: 0,0:03:44.14,0:03:49.22,Default,,0000,0000,0000,,of w00 only in the case where X1 is zero and X2 is zero. Dialogue: 0,0:03:49.26,0:04:00.48,Default,,0000,0000,0000,,So we would have exp of negative w00 when X1=0 and X2=0, Dialogue: 0,0:04:01.11,0:04:12.04,Default,,0000,0000,0000,,and we would have exp of negative w01 when\NX1=0 and X2=1, and so on and so forth. Dialogue: 0,0:04:12.39,0:04:15.51,Default,,0000,0000,0000,,And it’s not difficult to convince ourselves that Dialogue: 0,0:04:15.52,0:04:24.77,Default,,0000,0000,0000,,if we define w_kℓ to be the negative log of the corresponding entries in this table, Dialogue: 0,0:04:24.80,0:04:28.65,Default,,0000,0000,0000,,then that gives us right back the factor that\Nwe defined to begin with. Dialogue: 0,0:04:28.68,0:04:33.14,Default,,0000,0000,0000,,So this shows that this is a general representation, Dialogue: 0,0:04:34.00,0:04:37.27,Default,,0000,0000,0000,,in the sense that we can take any factor Dialogue: 0,0:04:37.90,0:04:40.53,Default,,0000,0000,0000,,and represent it as a log-linear model Dialogue: 0,0:04:40.54,0:04:47.18,Default,,0000,0000,0000,,simply by including all of the appropriate features. Dialogue: 0,0:04:48.60,0:04:52.36,Default,,0000,0000,0000,,But we don’t generally want to do that. Dialogue: 0,0:04:52.36,0:04:55.80,Default,,0000,0000,0000,,Generally we want a much finer grain set of features. Dialogue: 0,0:04:55.80,0:05:01.20,Default,,0000,0000,0000,,So let’s look at some of the examples of features that people use in practice. Dialogue: 0,0:05:01.20,0:05:03.84,Default,,0000,0000,0000,,So here are the features used in a language model. Dialogue: 0,0:05:03.85,0:05:07.93,Default,,0000,0000,0000,,This is a language model that we that we discussed previously. Dialogue: 0,0:05:07.95,0:05:12.28,Default,,0000,0000,0000,,And here we have features that relate: Dialogue: 0,0:05:12.29,0:05:15.84,Default,,0000,0000,0000,,First of all, let’s just remind ourselves [that] we have two sets of variables. Dialogue: 0,0:05:15.84,0:05:23.68,Default,,0000,0000,0000,,We have the variables Y which represent the annotations for each word Dialogue: 0,0:05:23.70,0:05:29.96,Default,,0000,0000,0000,,in the sequence corresponding to what category that corresponds to. Dialogue: 0,0:05:29.96,0:05:32.43,Default,,0000,0000,0000,,So this is a person. Dialogue: 0,0:05:32.43,0:05:34.90,Default,,0000,0000,0000,,This is the beginning of a person name. Dialogue: 0,0:05:34.94,0:05:37.85,Default,,0000,0000,0000,,This is the continuation of a person name. Dialogue: 0,0:05:37.85,0:05:39.30,Default,,0000,0000,0000,,The beginning of a location. Dialogue: 0,0:05:39.30,0:05:42.24,Default,,0000,0000,0000,,The continuation of a location, and so on. Dialogue: 0,0:05:42.24,0:05:45.69,Default,,0000,0000,0000,,As well as a bunch of words that are not: Dialogue: 0,0:05:45.69,0:05:48.44,Default,,0000,0000,0000,,[i.e.,] none of person, location, organization. Dialogue: 0,0:05:48.44,0:05:50.77,Default,,0000,0000,0000,,And they’re all labeled “other”. Dialogue: 0,0:05:50.78,0:05:54.94,Default,,0000,0000,0000,,And so the value Y tells us for each word what\Ncategory it belongs to, Dialogue: 0,0:05:54.94,0:05:59.38,Default,,0000,0000,0000,,so that we’re trying to identify people, locations, and\Norganizations in the sentence. Dialogue: 0,0:05:59.38,0:06:02.15,Default,,0000,0000,0000,,We have another set of variables X, Dialogue: 0,0:06:03.32,0:06:08.69,Default,,0000,0000,0000,,which are the actual words in the sentence. Dialogue: 0,0:06:09.22,0:06:12.81,Default,,0000,0000,0000,,Now we can go ahead and define… Dialogue: 0,0:06:12.81,0:06:15.85,Default,,0000,0000,0000,,We can use a full table representation that Dialogue: 0,0:06:15.85,0:06:22.73,Default,,0000,0000,0000,,basically tries to relate each and every Y that has a feature, Dialogue: 0,0:06:22.73,0:06:27.68,Default,,0000,0000,0000,,that has a full factor that looks at every possible word in the English language; Dialogue: 0,0:06:27.68,0:06:31.67,Default,,0000,0000,0000,,but those are going to be very, very, expensive, Dialogue: 0,0:06:31.67,0:06:34.07,Default,,0000,0000,0000,,and a very large number of parameters. Dialogue: 0,0:06:34.12,0:06:40.05,Default,,0000,0000,0000,,And so we're going to define a feature that looks, for example, at f of Dialogue: 0,0:06:40.05,0:06:45.20,Default,,0000,0000,0000,,say a particular Y_i, which is the label for the i’th word in the sentence, Dialogue: 0,0:06:45.20,0:06:47.65,Default,,0000,0000,0000,,and X_i, being that i’th word. Dialogue: 0,0:06:47.65,0:06:54.84,Default,,0000,0000,0000,,And that feature says, for example: Y_i equals person. Dialogue: 0,0:06:55.84,0:07:03.75,Default,,0000,0000,0000,,It’s the indicator function for “Y_i = person and X_i is capitalized”. Dialogue: 0,0:07:05.83,0:07:09.69,Default,,0000,0000,0000,,And so that feature doesn’t look at the individual words. Dialogue: 0,0:07:09.69,0:07:13.19,Default,,0000,0000,0000,,It just looks at whether that word is capitalized. Dialogue: 0,0:07:13.19,0:07:17.43,Default,,0000,0000,0000,,Now we have just the single parameter that looks just at capitalization, Dialogue: 0,0:07:17.43,0:07:23.14,Default,,0000,0000,0000,,and parameterizes how important is capitalization for recognizing that something's a person. Dialogue: 0,0:07:23.15,0:07:26.95,Default,,0000,0000,0000,,We could also have another feature. Dialogue: 0,0:07:26.95,0:07:28.86,Default,,0000,0000,0000,,This is an alternative: Dialogue: 0,0:07:28.86,0:07:32.59,Default,,0000,0000,0000,,This a different feature that can and could be part of the same model Dialogue: 0,0:07:32.59,0:07:38.38,Default,,0000,0000,0000,,that says: Y_i is equal to location, Dialogue: 0,0:07:38.38,0:07:41.06,Default,,0000,0000,0000,,Or, actually, I was little bit imprecise here— Dialogue: 0,0:07:41.06,0:07:45.46,Default,,0000,0000,0000,,This might be beginning of person. This might be beginning of location. Dialogue: 0,0:07:45.48,0:07:51.41,Default,,0000,0000,0000,,And X_i appears in some atlas. Dialogue: 0,0:07:52.14,0:07:55.37,Default,,0000,0000,0000,,Now there is other things that appear in the atlas than locations, Dialogue: 0,0:07:55.37,0:07:58.65,Default,,0000,0000,0000,,but if a word appears in the atlas, Dialogue: 0,0:07:58.66,0:08:01.97,Default,,0000,0000,0000,,there is a much higher probability presumably that it’s actually a location Dialogue: 0,0:08:02.00,0:08:06.20,Default,,0000,0000,0000,,and so we might have, again, [a] weight for this feature Dialogue: 0,0:08:06.20,0:08:13.61,Default,,0000,0000,0000,,that indicates that maybe increases the probability in Y_i being labeled in this way. Dialogue: 0,0:08:13.61,0:08:19.44,Default,,0000,0000,0000,,And so you can imagine that constructing a very rich set of features, Dialogue: 0,0:08:19.44,0:08:24.16,Default,,0000,0000,0000,,all of which look at certain aspects of the word, Dialogue: 0,0:08:24.16,0:08:31.90,Default,,0000,0000,0000,,and rather than enumerating all the possible words\Nand giving a parameter to each and one of them. Dialogue: 0,0:08:33.56,0:08:38.07,Default,,0000,0000,0000,,Let’s look at some other examples of feature-based models. Dialogue: 0,0:08:38.12,0:08:40.91,Default,,0000,0000,0000,,So this is an example from statistical physics. Dialogue: 0,0:08:40.91,0:08:43.44,Default,,0000,0000,0000,,It’s called the Ising model. Dialogue: 0,0:08:43.44,0:08:48.78,Default,,0000,0000,0000,,And the Ising model is something that looks at pairs\Nof variables. Dialogue: 0,0:08:48.78,0:08:51.77,Default,,0000,0000,0000,,It’s a pairwise Markov network. Dialogue: 0,0:08:53.13,0:08:55.67,Default,,0000,0000,0000,,And [it] looks the pairs of adjacent variables, Dialogue: 0,0:08:55.67,0:08:59.50,Default,,0000,0000,0000,,and basically gives us a coefficient for their products. Dialogue: 0,0:08:59.50,0:09:03.51,Default,,0000,0000,0000,,So now, this is a case where variables are in the end are binary, Dialogue: 0,0:09:03.51,0:09:07.84,Default,,0000,0000,0000,,but not in the space {0, 1} but rather\Nnegative one and positive one. Dialogue: 0,0:09:07.84,0:09:10.31,Default,,0000,0000,0000,,And so now, we have a model that's parametrized Dialogue: 0,0:09:10.31,0:09:13.97,Default,,0000,0000,0000,,as features that are just the product of the values of the adjacent variables. Dialogue: 0,0:09:14.34,0:09:15.61,Default,,0000,0000,0000,,Where might this come up? Dialogue: 0,0:09:15.61,0:09:23.16,Default,,0000,0000,0000,,It comes up in the context, for example, of modeling the spins of electrons in a grid. Dialogue: 0,0:09:23.70,0:09:28.74,Default,,0000,0000,0000,,So here you have a case where the electrons can rotate Dialogue: 0,0:09:28.74,0:09:31.59,Default,,0000,0000,0000,,either along one direction or in the other direction Dialogue: 0,0:09:31.59,0:09:40.54,Default,,0000,0000,0000,,so here is a bunch of the atoms that are marked with a blue arrow. Dialogue: 0,0:09:40.54,0:09:43.06,Default,,0000,0000,0000,,You have one rotational axis, Dialogue: 0,0:09:43.06,0:09:45.58,Default,,0000,0000,0000,,and the red arrow[s] are rotating in the opposite direction. Dialogue: 0,0:09:46.08,0:09:52.54,Default,,0000,0000,0000,,And this basically says we have a term that Dialogue: 0,0:09:52.54,0:09:57.06,Default,,0000,0000,0000,,[whose] probability distribution over the joint set of spins. Dialogue: 0,0:09:57.06,0:10:01.52,Default,,0000,0000,0000,,(So this is the joint spins.) Dialogue: 0,0:10:01.95,0:10:08.14,Default,,0000,0000,0000,,And the model, depends on whether adjacent\Natoms have the same spin or opposite spin. Dialogue: 0,0:10:08.14,0:10:12.44,Default,,0000,0000,0000,,So notice that one times one is the same as negative one times negative one. Dialogue: 0,0:10:12.44,0:10:16.54,Default,,0000,0000,0000,,So this really just looks at whether they have the same spin\Nor different spins. Dialogue: 0,0:10:16.54,0:10:22.89,Default,,0000,0000,0000,,And there is a parameter that looks at, you know, same or\Ndifferent. Dialogue: 0,0:10:24.09,0:10:26.71,Default,,0000,0000,0000,,That's what this feature represents. Dialogue: 0,0:10:27.17,0:10:32.51,Default,,0000,0000,0000,,And, depending on the value of this parameter over here, Dialogue: 0,0:10:32.51,0:10:35.01,Default,,0000,0000,0000,,if the parameter goes one way, Dialogue: 0,0:10:35.01,0:10:39.60,Default,,0000,0000,0000,,we're going to favor systems Dialogue: 0,0:10:39.60,0:10:42.51,Default,,0000,0000,0000,,where the atoms spin in the same direction. Dialogue: 0,0:10:42.51,0:10:48.05,Default,,0000,0000,0000,,And if it’s going in the opposite direction, you’re going to favor atoms that spin in the different direction. Dialogue: 0,0:10:48.05,0:10:51.27,Default,,0000,0000,0000,,And those are called ferromagnetic and anti-ferromagnetic. Dialogue: 0,0:10:52.51,0:10:58.54,Default,,0000,0000,0000,,Furthermore, you can define in these systems the notion of a temperature. Dialogue: 0,0:10:58.81,0:11:04.72,Default,,0000,0000,0000,,So the temperature here says how strong is this connection. Dialogue: 0,0:11:04.72,0:11:13.13,Default,,0000,0000,0000,,So notice that as T grows—as the temperature grows—the w_ij’s get divided by T. Dialogue: 0,0:11:15.70,0:11:19.27,Default,,0000,0000,0000,,And they all kind of go towards zero, Dialogue: 0,0:11:20.99,0:11:24.81,Default,,0000,0000,0000,,which means that the strength of the connection between Dialogue: 0,0:11:24.81,0:11:27.90,Default,,0000,0000,0000,,adjacent atoms, effectively becomes almost moot, Dialogue: 0,0:11:27.90,0:11:30.98,Default,,0000,0000,0000,,and they become almost decoupled from each other. Dialogue: 0,0:11:30.98,0:11:36.22,Default,,0000,0000,0000,,On the other hand, as the temperature decreases, Dialogue: 0,0:11:38.20,0:11:43.46,Default,,0000,0000,0000,,Then the effect of the interaction between the atoms becomes much more significant Dialogue: 0,0:11:43.46,0:11:46.92,Default,,0000,0000,0000,,and they’re going to impose much stronger constraints on each other. Dialogue: 0,0:11:46.92,0:11:49.43,Default,,0000,0000,0000,,And this is actually a model of a real physical system. Dialogue: 0,0:11:49.43,0:11:51.94,Default,,0000,0000,0000,,I mean, this is real temperature, and real atoms, and so on. Dialogue: 0,0:11:51.94,0:11:57.08,Default,,0000,0000,0000,,And sure enough, if you look at what happens to these models as a function of temperature, Dialogue: 0,0:11:57.08,0:12:00.64,Default,,0000,0000,0000,,what we see over here is high temperature. Dialogue: 0,0:12:02.28,0:12:04.22,Default,,0000,0000,0000,,This is high temperature Dialogue: 0,0:12:04.22,0:12:10.03,Default,,0000,0000,0000,,and you can see that there is a lot of mixing between the two types of spin Dialogue: 0,0:12:10.03,0:12:11.92,Default,,0000,0000,0000,,and this is low temperature Dialogue: 0,0:12:12.59,0:12:20.49,Default,,0000,0000,0000,,and you can see that there is much stronger\Nconstraints in this configuration Dialogue: 0,0:12:20.49,0:12:23.00,Default,,0000,0000,0000,,about the spins of adjacent atoms. Dialogue: 0,0:12:26.96,0:12:33.84,Default,,0000,0000,0000,,Another kind of feature that's used very much in lots of practical applications Dialogue: 0,0:12:33.84,0:12:37.68,Default,,0000,0000,0000,,is the notion of a metric, of a metric feature, an M.R.F. Dialogue: 0,0:12:37.68,0:12:39.70,Default,,0000,0000,0000,,So what's a metric feature? Dialogue: 0,0:12:39.70,0:12:41.80,Default,,0000,0000,0000,,This is something that comes up, mostly in cases Dialogue: 0,0:12:41.80,0:12:48.43,Default,,0000,0000,0000,,where you have a bunch of random variables X_i that all take values in some joint label space of V. Dialogue: 0,0:12:48.43,0:12:50.27,Default,,0000,0000,0000,,So, for example, they might all be binary. Dialogue: 0,0:12:50.27,0:12:52.85,Default,,0000,0000,0000,,They all might take values one, two, three, four. Dialogue: 0,0:12:52.85,0:12:57.25,Default,,0000,0000,0000,,And what we'd like to do is Dialogue: 0,0:12:57.25,0:13:03.47,Default,,0000,0000,0000,,we have X_i and X_j that are connected to each other by an edge. Dialogue: 0,0:13:03.47,0:13:08.21,Default,,0000,0000,0000,,We want X_ and X_j to take “similar” values. Dialogue: 0,0:13:08.21,0:13:11.97,Default,,0000,0000,0000,,So in order to enforce the fact that X_i and X_j should take similar values Dialogue: 0,0:13:11.97,0:13:13.94,Default,,0000,0000,0000,,we need a notion of similarity. Dialogue: 0,0:13:13.94,0:13:24.11,Default,,0000,0000,0000,,And we're going to encode that using the distance function µ that takes two values, one for X_i and one for X_j’s, Dialogue: 0,0:13:24.11,0:13:26.26,Default,,0000,0000,0000,,[that] says how close are they to each other. Dialogue: 0,0:13:26.26,0:13:28.94,Default,,0000,0000,0000,,So what does the distance function need to be? Dialogue: 0,0:13:28.94,0:13:35.55,Default,,0000,0000,0000,,Well, the distance function needs to satisfy the standard condition on a distance function or a metric. Dialogue: 0,0:13:35.55,0:13:39.38,Default,,0000,0000,0000,,So first is reflexivity, Dialogue: 0,0:13:39.38,0:13:43.46,Default,,0000,0000,0000,,which means that if the two variables take on the same value, Dialogue: 0,0:13:43.46,0:13:45.68,Default,,0000,0000,0000,,then that distance better be zero. Dialogue: 0,0:13:48.60,0:13:53.74,Default,,0000,0000,0000,,Oh I forgot to say that this. Sorry, this needs to be a non-negative function. Dialogue: 0,0:13:53.74,0:14:00.73,Default,,0000,0000,0000,,Symmetry means that the distances are symetrical. Dialogue: 0,0:14:00.73,0:14:06.28,Default,,0000,0000,0000,,So the distance between two values v1 and v2 are the same as the distance between v2 and v1. Dialogue: 0,0:14:06.28,0:14:12.32,Default,,0000,0000,0000,,And finally is the triangle inequality, which says that the distance between v1 and v2 Dialogue: 0,0:14:12.32,0:14:13.50,Default,,0000,0000,0000,,(So here is v1) Dialogue: 0,0:14:13.50,0:14:14.68,Default,,0000,0000,0000,,(Here is v2) Dialogue: 0,0:14:15.48,0:14:21.09,Default,,0000,0000,0000,,and the distance between v1 and v2 is less than the distance between v1 and v3 Dialogue: 0,0:14:21.89,0:14:25.13,Default,,0000,0000,0000,,and then going to v2. So the standard triangle inequality. Dialogue: 0,0:14:26.51,0:14:32.74,Default,,0000,0000,0000,,if a distance just satisfies these two conditions, it's called a semi metric. Dialogue: 0,0:14:33.85,0:14:37.42,Default,,0000,0000,0000,,Otherwise, if it satisfies all three, it's called a metric. Dialogue: 0,0:14:37.85,0:14:41.38,Default,,0000,0000,0000,,And both are actually used in practical applications. Dialogue: 0,0:14:43.93,0:14:48.56,Default,,0000,0000,0000,,But how do we take this distance feature and put it in the context of an MRF? Dialogue: 0,0:14:48.56,0:14:55.67,Default,,0000,0000,0000,,We have a feature that looks at two variables, X_i and X_j. Dialogue: 0,0:14:55.67,0:14:59.48,Default,,0000,0000,0000,,And that feature is the distance between X_i and X_j. Dialogue: 0,0:14:59.48,0:15:08.68,Default,,0000,0000,0000,,And now, we put it together by multiplying that with a coefficient, w_ij, Dialogue: 0,0:15:08.68,0:15:13.02,Default,,0000,0000,0000,,such that w_ij has to be greater than zero. Dialogue: 0,0:15:13.02,0:15:16.43,Default,,0000,0000,0000,,So that we want the metric MRF Dialogue: 0,0:15:17.28,0:15:21.19,Default,,0000,0000,0000,,[to have] the effect that Dialogue: 0,0:15:21.19,0:15:29.35,Default,,0000,0000,0000,,the lower the distance, the higher this is, Dialogue: 0,0:15:30.27,0:15:36.44,Default,,0000,0000,0000,,because of the negative coefficient, which means that higher the probability. Okay? Dialogue: 0,0:15:37.01,0:15:43.16,Default,,0000,0000,0000,,So, the more pairs you have that are close to each other Dialogue: 0,0:15:43.16,0:15:47.28,Default,,0000,0000,0000,,and the closer they are to each other the higher\Nthe probability of the overall configuration. Dialogue: 0,0:15:47.28,0:15:49.65,Default,,0000,0000,0000,,Which is exactly what we wanted to have happen. Dialogue: 0,0:15:53.10,0:15:58.42,Default,,0000,0000,0000,,So, conversely, if you have values that are far from\Neach other in the distance metric Dialogue: 0,0:15:58.42,0:16:00.60,Default,,0000,0000,0000,,the lower the probability in the model. Dialogue: 0,0:16:01.80,0:16:04.56,Default,,0000,0000,0000,,So, here are some examples of metric MRF’s. Dialogue: 0,0:16:04.83,0:16:07.82,Default,,0000,0000,0000,,So one: The simplest possible metric MRF Dialogue: 0,0:16:07.82,0:16:11.98,Default,,0000,0000,0000,,is one that gives [a] distance of zero when the two classes are equal to each other Dialogue: 0,0:16:11.98,0:16:13.88,Default,,0000,0000,0000,,and [a] distance of one everywhere else. Dialogue: 0,0:16:13.88,0:16:16.47,Default,,0000,0000,0000,,So now, you know, this is just like a step function. Dialogue: 0,0:16:16.47,0:16:22.58,Default,,0000,0000,0000,,And, this gives rise to a potential that looks like this. Dialogue: 0,0:16:22.58,0:16:26.58,Default,,0000,0000,0000,,So we have 0’s on the diagonal. Dialogue: 0,0:16:26.58,0:16:33.62,Default,,0000,0000,0000,,So we get a bump in the probability when the two adjacent variables take on the same label Dialogue: 0,0:16:33.62,0:16:37.12,Default,,0000,0000,0000,,and otherwise we get a reduction in the probability. Dialogue: 0,0:16:37.14,0:16:40.69,Default,,0000,0000,0000,,But it doesn’t matter what particular value they take. Dialogue: 0,0:16:40.69,0:16:44.09,Default,,0000,0000,0000,,That’s one example of a simple metric. Dialogue: 0,0:16:44.09,0:16:51.100,Default,,0000,0000,0000,,A somewhat more expressive example might come up when the values V are actually numerical values. Dialogue: 0,0:16:52.03,0:16:58.10,Default,,0000,0000,0000,,In which case you can look at maybe the difference between the miracle values. Dialogue: 0,0:16:58.10,0:17:00.79,Default,,0000,0000,0000,,So, v_k minus v_l. Dialogue: 0,0:17:00.79,0:17:05.10,Default,,0000,0000,0000,,And you want, and when v_k is equal to v_l, the distance is zero, Dialogue: 0,0:17:05.10,0:17:14.30,Default,,0000,0000,0000,,and then you have a linear function that increases the\Ndistance as the distance between v_k and v_l grows. Dialogue: 0,0:17:14.30,0:17:18.73,Default,,0000,0000,0000,,So, this is the absolute value of v_k minus v_l. Dialogue: 0,0:17:20.35,0:17:26.01,Default,,0000,0000,0000,,A more interesting notion that comes up a lot in\Npractice is: Dialogue: 0,0:17:26.01,0:17:32.37,Default,,0000,0000,0000,,we don’t want to penalize arbitrarily things that are far way from each other in label space. Dialogue: 0,0:17:32.37,0:17:37.44,Default,,0000,0000,0000,,So this is what is called a truncated linear penalty. Dialogue: 0,0:17:38.16,0:17:41.85,Default,,0000,0000,0000,,And you can see that beyond a certain threshold, Dialogue: 0,0:17:44.07,0:17:48.65,Default,,0000,0000,0000,,the penalty just becomes constant, so it plateaus. Dialogue: 0,0:17:48.67,0:17:54.05,Default,,0000,0000,0000,,So that there is a penalty, but it doesn’t keep increasing over as the labels get further from each other Dialogue: 0,0:17:54.05,0:17:59.77,Default,,0000,0000,0000,,One example where metric MRF’s are used is when we’re doing image segmentation. Dialogue: 0,0:17:59.77,0:18:04.14,Default,,0000,0000,0000,,And here we tend to favor segmentations where adjacent superpixels… Dialogue: 0,0:18:06.55,0:18:09.06,Default,,0000,0000,0000,,(These are adjacent superpixels.) Dialogue: 0,0:18:10.40,0:18:12.70,Default,,0000,0000,0000,,And we want them to take the same class. Dialogue: 0,0:18:18.15,0:18:22.28,Default,,0000,0000,0000,,And so here we have no penalty when the superpixels take the same class Dialogue: 0,0:18:22.28,0:18:25.20,Default,,0000,0000,0000,,and we have some penalty when they take different classes. Dialogue: 0,0:18:25.20,0:18:30.95,Default,,0000,0000,0000,,And this is actually a very common, albeit simple, model for\Nimage segmentation. Dialogue: 0,0:18:32.16,0:18:37.14,Default,,0000,0000,0000,,Let’s look at a different MRF, also in the context of\Ncomputer vision. Dialogue: 0,0:18:37.14,0:18:40.44,Default,,0000,0000,0000,,This is an MRF that’s used for image denoising. Dialogue: 0,0:18:40.44,0:18:45.19,Default,,0000,0000,0000,,So here we have a noisy version of a real image that looks like this. Dialogue: 0,0:18:45.21,0:18:50.82,Default,,0000,0000,0000,,So this is, you can see this kind of, white noise overlayed on top of the image. Dialogue: 0,0:18:50.82,0:18:53.92,Default,,0000,0000,0000,,And what we’d like to do, is we’d like to get a cleaned-up version of the image. Dialogue: 0,0:18:53.92,0:19:00.80,Default,,0000,0000,0000,,So here we have, a set of variables, X, that correspond to the noisy pixels. Dialogue: 0,0:19:02.23,0:19:07.74,Default,,0000,0000,0000,,And we have a set of variables, Y, that corresponds to the cleaned pixels. Dialogue: 0,0:19:07.74,0:19:15.43,Default,,0000,0000,0000,,And we'd like to have a probabilistic model that relates X and Y. Dialogue: 0,0:19:15.43,0:19:20.52,Default,,0000,0000,0000,,And what we’re going to do is we’d like, so, intuiti—, I mean, Dialogue: 0,0:19:20.52,0:19:25.49,Default,,0000,0000,0000,,so you’d like to have two effects on the pixels Y: Dialogue: 0,0:19:25.49,0:19:33.10,Default,,0000,0000,0000,,First, you'd like Y_i to be close to X_i. Dialogue: 0,0:19:33.10,0:19:36.37,Default,,0000,0000,0000,,But if you just do that, then you're just going to stick with\Nthe original image. Dialogue: 0,0:19:36.37,0:19:41.35,Default,,0000,0000,0000,,So what is the main constraint that we can employ on the image in order to clean it up Dialogue: 0,0:19:41.35,0:19:46.92,Default,,0000,0000,0000,,is the fact that adjacent pixels tend to have the same value. Dialogue: 0,0:19:46.92,0:19:52.91,Default,,0000,0000,0000,,So in this case what we’re going to do is we’re going to model, we’re going to constrain the image Dialogue: 0,0:19:52.91,0:20:00.21,Default,,0000,0000,0000,,so that we’re going to constrain the Y_i’s to try and make Y_i close to its neighbors. Dialogue: 0,0:20:03.01,0:20:05.83,Default,,0000,0000,0000,,And the further away it is, the bigger the penalty. Dialogue: 0,0:20:05.83,0:20:08.22,Default,,0000,0000,0000,,And that's a metric MRF. Dialogue: 0,0:20:11.53,0:20:17.08,Default,,0000,0000,0000,,Now we could use just a linear penalty, Dialogue: 0,0:20:17.08,0:20:21.89,Default,,0000,0000,0000,,but that’s going to be a very fragile model, Dialogue: 0,0:20:21.89,0:20:28.21,Default,,0000,0000,0000,,because, now obviously the right answer isn't the model\Nwhere all pixels are equal to each other Dialogue: 0,0:20:28.21,0:20:33.87,Default,,0000,0000,0000,,in their actual intensity value because that would be just a single, you know, grayish-looking image. Dialogue: 0,0:20:33.87,0:20:39.69,Default,,0000,0000,0000,,So what you like is that you would like to let one pixel depart from its adjacent pixel Dialogue: 0,0:20:39.69,0:20:44.10,Default,,0000,0000,0000,,if it’s getting close in a different direction either by its own observation or by other adjacent pixels. Dialogue: 0,0:20:44.10,0:20:49.48,Default,,0000,0000,0000,,And so the right model to use here is actually the truncated linear model Dialogue: 0,0:20:49.48,0:20:52.17,Default,,0000,0000,0000,,and that one is [the] one that’s commonly used Dialogue: 0,0:20:52.17,0:20:54.97,Default,,0000,0000,0000,,and is very successful for doing image denoising. Dialogue: 0,0:20:55.43,0:21:02.06,Default,,0000,0000,0000,,Interesting, almost exactly the same idea is used in the context of stereo reconstruction. Dialogue: 0,0:21:02.06,0:21:06.26,Default,,0000,0000,0000,,There, the values that you’d like to infer, the Y_i’s, Dialogue: 0,0:21:06.26,0:21:11.56,Default,,0000,0000,0000,,are the depth disparity for a given pixel in the image—how deep it is. Dialogue: 0,0:21:11.56,0:21:13.96,Default,,0000,0000,0000,,And here also we have spacial continuity. Dialogue: 0,0:21:13.96,0:21:19.39,Default,,0000,0000,0000,,We like the depth of one pixel to be close to the depth of an adjacent pixel. Dialogue: 0,0:21:19.39,0:21:22.50,Default,,0000,0000,0000,,But once again we don’t want to enforce this too strongly Dialogue: 0,0:21:22.50,0:21:25.16,Default,,0000,0000,0000,,because you do have depth disparity in the image Dialogue: 0,0:21:25.16,0:21:27.82,Default,,0000,0000,0000,,and so eventually you'd like things to be allowed to break away from each other. Dialogue: 0,0:21:27.82,0:21:33.32,Default,,0000,0000,0000,,And so once again, one typically uses some kind of truncated linear model Dialogue: 0,0:21:33.32,0:21:36.24,Default,,0000,0000,0000,,for doing this stereo construction, Dialogue: 0,0:21:36.24,0:21:38.09,Default,,0000,0000,0000,,often augmented by other little tricks. Dialogue: 0,0:21:38.09,0:21:45.14,Default,,0000,0000,0000,,So, for example, here we have the actual pixel appearance, Dialogue: 0,0:21:45.14,0:21:47.87,Default,,0000,0000,0000,,for example, the color and texture. Dialogue: 0,0:21:47.87,0:21:50.39,Default,,0000,0000,0000,,And if the color and texture are very similar to each other, Dialogue: 0,0:21:50.41,0:21:54.86,Default,,0000,0000,0000,,you might want to have the stronger constraint on similarity. Dialogue: 0,0:21:54.86,0:22:00.26,Default,,0000,0000,0000,,Versus: if the color and texture of the adjacent pixels are\Nvery different from each other, Dialogue: 0,0:22:00.26,0:22:02.87,Default,,0000,0000,0000,,they may be more likely to belong to different objects Dialogue: 0,0:22:02.87,0:22:07.58,Default,,0000,0000,0000,,and you don’t want to enforce quite as strong of a similarity constraint.