1 00:00:00,000 --> 00:00:05,436 Local structure that doesn’t require full table representations 2 00:00:05,452 --> 00:00:09,073 is important in both directed and undirected models. 3 00:00:09,073 --> 00:00:14,868 How do we incorporate local structure into undirected models? 4 00:00:14,868 --> 00:00:22,084 The framework for that is called “log-linear models” for reasons that will be clear in just a moment. 5 00:00:23,084 --> 00:00:24,227 So 6 00:00:26,566 --> 00:00:33,140 Whereas, in the original representation of the unnormalized density 7 00:00:33,140 --> 00:00:39,187 we defined P tilde as the product of factors φi(Di), 8 00:00:39,218 --> 00:00:42,305 each [of] which is potentially a full table. 9 00:00:42,889 --> 00:00:46,032 Now we're going to shift that representation 10 00:00:46,032 --> 00:00:49,975 to something that uses a linear form 11 00:00:50,000 --> 00:00:52,446 (So here's a linear form) 12 00:00:52,446 --> 00:00:54,892 that is subsequently exponentiated, 13 00:00:54,922 --> 00:00:56,970 and that's why it's called log-linear— 14 00:00:56,970 --> 00:00:59,632 because the logarithm is a linear function. 15 00:00:59,632 --> 00:01:02,742 So what is this form over here? 16 00:01:02,742 --> 00:01:06,986 It's a linear function that has these things that are called “coefficients” 17 00:01:10,171 --> 00:01:12,631 and these things that are called “features”. 18 00:01:14,200 --> 00:01:22,108 Features, like factors, each have a scope which is a set of variables on which the feature depends. 19 00:01:24,153 --> 00:01:27,026 But different features can have the same scope. 20 00:01:27,026 --> 00:01:31,176 You can have multiple features all of which are over the same set of variables. 21 00:01:31,176 --> 00:01:36,472 Notice that each feature has just a single parameter wj that multiplies it. 22 00:01:37,657 --> 00:01:40,310 So, what does this give rise to? 23 00:01:40,310 --> 00:01:43,456 I mean if we have a log-linear model, 24 00:01:43,487 --> 00:01:49,465 we can push in the exponent through the summation, 25 00:01:49,465 --> 00:01:58,764 and that gives us something that is a product of exponential functions. 26 00:01:58,795 --> 00:02:03,530 You can think of each of these as effectively a little factor, 27 00:02:03,530 --> 00:02:07,689 but it’s a factor that only has a single parameter wj. 28 00:02:07,689 --> 00:02:11,201 Since this is a little bit abstract, so let’s look at an example. 29 00:02:11,201 --> 00:02:17,694 Specifically lets look at how we might represent a simple table factor as a log linear model. 30 00:02:17,709 --> 00:02:24,290 So here’s a param, here’s a factor φ, over two binary random variables X1 and X2. 31 00:02:24,290 --> 00:02:30,788 And so a full table factor would have four parameters: a00, a01, a10, and a11. 32 00:02:30,788 --> 00:02:34,552 So we can capture this model using a log linear model, 33 00:02:34,552 --> 00:02:37,958 using a set of such of features, 34 00:02:37,958 --> 00:02:41,365 using a set of these guys, which are indicator functions. 35 00:02:41,365 --> 00:02:43,225 So this is an indicator function. 36 00:02:43,225 --> 00:02:48,963 It takes one if X1 is zero and X2 is zero, 37 00:02:48,978 --> 00:02:50,881 and it takes zero otherwise. 38 00:02:51,035 --> 00:02:54,092 So this the general notion of an indicator function. 39 00:02:54,123 --> 00:03:00,315 It looks at the event—or constraint—inside the curly braces, 40 00:03:00,407 --> 00:03:05,230 and it returns a value of 0 or 1, depending on whether that event is true or not. 41 00:03:05,722 --> 00:03:13,595 And so, if we wanted to represent this factor as a log-linear model, 42 00:03:13,626 --> 00:03:19,920 We can see that we can simply sum up over all the four values of k and ℓ, 43 00:03:19,920 --> 00:03:22,691 which are either 0 or 1, each of them. 44 00:03:22,707 --> 00:03:25,407 So were summing up over all four entries here. 45 00:03:25,407 --> 00:03:32,480 And we have a parameter—or coefficient—w_kℓ which multiplies this feature. 46 00:03:33,295 --> 00:03:44,124 And so, we would have a summation of w_kℓ: 47 00:03:44,139 --> 00:03:49,224 of w00 only in the case where X1 is zero and X2 is zero. 48 00:03:49,255 --> 00:04:00,483 So we would have exp of negative w00 when X1=0 and X2=0, 49 00:04:01,114 --> 00:04:12,037 and we would have exp of negative w01 when X1=0 and X2=1, and so on and so forth. 50 00:04:12,390 --> 00:04:15,509 And it’s not difficult to convince ourselves that 51 00:04:15,525 --> 00:04:24,771 if we define w_kℓ to be the negative log of the corresponding entries in this table, 52 00:04:24,801 --> 00:04:28,650 then that gives us right back the factor that we defined to begin with. 53 00:04:28,681 --> 00:04:33,144 So this shows that this is a general representation, 54 00:04:34,005 --> 00:04:37,267 in the sense that we can take any factor 55 00:04:37,898 --> 00:04:40,529 and represent it as a log-linear model 56 00:04:40,544 --> 00:04:47,175 simply by including all of the appropriate features. 57 00:04:48,605 --> 00:04:52,359 But we don’t generally want to do that. 58 00:04:52,359 --> 00:04:55,800 Generally we want a much finer grain set of features. 59 00:04:55,800 --> 00:05:01,197 So let’s look at some of the examples of features that people use in practice. 60 00:05:01,197 --> 00:05:03,838 So here are the features used in a language model. 61 00:05:03,853 --> 00:05:07,934 This is a language model that we that we discussed previously. 62 00:05:07,950 --> 00:05:12,276 And here we have features that relate: 63 00:05:12,292 --> 00:05:15,839 First of all, let’s just remind ourselves [that] we have two sets of variables. 64 00:05:15,839 --> 00:05:23,683 We have the variables Y which represent the annotations for each word 65 00:05:23,698 --> 00:05:29,956 in the sequence corresponding to what category that corresponds to. 66 00:05:29,956 --> 00:05:32,429 So this is a person. 67 00:05:32,429 --> 00:05:34,903 This is the beginning of a person name. 68 00:05:34,936 --> 00:05:37,847 This is the continuation of a person name. 69 00:05:37,847 --> 00:05:39,302 The beginning of a location. 70 00:05:39,302 --> 00:05:42,235 The continuation of a location, and so on. 71 00:05:42,235 --> 00:05:45,694 As well as a bunch of words that are not: 72 00:05:45,694 --> 00:05:48,444 [i.e.,] none of person, location, organization. 73 00:05:48,444 --> 00:05:50,768 And they’re all labeled “other”. 74 00:05:50,783 --> 00:05:54,938 And so the value Y tells us for each word what category it belongs to, 75 00:05:54,938 --> 00:05:59,375 so that we’re trying to identify people, locations, and organizations in the sentence. 76 00:05:59,375 --> 00:06:02,146 We have another set of variables X, 77 00:06:03,315 --> 00:06:08,687 which are the actual words in the sentence. 78 00:06:09,225 --> 00:06:12,806 Now we can go ahead and define… 79 00:06:12,806 --> 00:06:15,849 We can use a full table representation that 80 00:06:15,849 --> 00:06:22,729 basically tries to relate each and every Y that has a feature, 81 00:06:22,729 --> 00:06:27,678 that has a full factor that looks at every possible word in the English language; 82 00:06:27,678 --> 00:06:31,674 but those are going to be very, very, expensive, 83 00:06:31,674 --> 00:06:34,070 and a very large number of parameters. 84 00:06:34,117 --> 00:06:40,047 And so we're going to define a feature that looks, for example, at f of 85 00:06:40,047 --> 00:06:45,203 say a particular Y_i, which is the label for the i’th word in the sentence, 86 00:06:45,203 --> 00:06:47,652 and X_i, being that i’th word. 87 00:06:47,652 --> 00:06:54,839 And that feature says, for example: Y_i equals person. 88 00:06:55,839 --> 00:07:03,750 It’s the indicator function for “Y_i = person and X_i is capitalized”. 89 00:07:05,827 --> 00:07:09,694 And so that feature doesn’t look at the individual words. 90 00:07:09,694 --> 00:07:13,192 It just looks at whether that word is capitalized. 91 00:07:13,192 --> 00:07:17,430 Now we have just the single parameter that looks just at capitalization, 92 00:07:17,430 --> 00:07:23,140 and parameterizes how important is capitalization for recognizing that something's a person. 93 00:07:23,153 --> 00:07:26,949 We could also have another feature. 94 00:07:26,949 --> 00:07:28,862 This is an alternative: 95 00:07:28,862 --> 00:07:32,590 This a different feature that can and could be part of the same model 96 00:07:32,590 --> 00:07:38,381 that says: Y_i is equal to location, 97 00:07:38,381 --> 00:07:41,061 Or, actually, I was little bit imprecise here— 98 00:07:41,061 --> 00:07:45,464 This might be beginning of person. This might be beginning of location. 99 00:07:45,480 --> 00:07:51,414 And X_i appears in some atlas. 100 00:07:52,137 --> 00:07:55,370 Now there is other things that appear in the atlas than locations, 101 00:07:55,370 --> 00:07:58,648 but if a word appears in the atlas, 102 00:07:58,663 --> 00:08:01,972 there is a much higher probability presumably that it’s actually a location 103 00:08:02,003 --> 00:08:06,199 and so we might have, again, [a] weight for this feature 104 00:08:06,199 --> 00:08:13,611 that indicates that maybe increases the probability in Y_i being labeled in this way. 105 00:08:13,611 --> 00:08:19,438 And so you can imagine that constructing a very rich set of features, 106 00:08:19,438 --> 00:08:24,163 all of which look at certain aspects of the word, 107 00:08:24,163 --> 00:08:31,900 and rather than enumerating all the possible words and giving a parameter to each and one of them. 108 00:08:33,562 --> 00:08:38,073 Let’s look at some other examples of feature-based models. 109 00:08:38,120 --> 00:08:40,913 So this is an example from statistical physics. 110 00:08:40,913 --> 00:08:43,445 It’s called the Ising model. 111 00:08:43,445 --> 00:08:48,778 And the Ising model is something that looks at pairs of variables. 112 00:08:48,778 --> 00:08:51,773 It’s a pairwise Markov network. 113 00:08:53,127 --> 00:08:55,672 And [it] looks the pairs of adjacent variables, 114 00:08:55,672 --> 00:08:59,495 and basically gives us a coefficient for their products. 115 00:08:59,495 --> 00:09:03,509 So now, this is a case where variables are in the end are binary, 116 00:09:03,509 --> 00:09:07,836 but not in the space {0, 1} but rather negative one and positive one. 117 00:09:07,836 --> 00:09:10,311 And so now, we have a model that's parametrized 118 00:09:10,311 --> 00:09:13,970 as features that are just the product of the values of the adjacent variables. 119 00:09:14,339 --> 00:09:15,606 Where might this come up? 120 00:09:15,606 --> 00:09:23,165 It comes up in the context, for example, of modeling the spins of electrons in a grid. 121 00:09:23,704 --> 00:09:28,742 So here you have a case where the electrons can rotate 122 00:09:28,742 --> 00:09:31,586 either along one direction or in the other direction 123 00:09:31,586 --> 00:09:40,542 so here is a bunch of the atoms that are marked with a blue arrow. 124 00:09:40,542 --> 00:09:43,059 You have one rotational axis, 125 00:09:43,059 --> 00:09:45,576 and the red arrow[s] are rotating in the opposite direction. 126 00:09:46,084 --> 00:09:52,545 And this basically says we have a term that 127 00:09:52,545 --> 00:09:57,063 [whose] probability distribution over the joint set of spins. 128 00:09:57,063 --> 00:10:01,519 (So this is the joint spins.) 129 00:10:01,950 --> 00:10:08,142 And the model, depends on whether adjacent atoms have the same spin or opposite spin. 130 00:10:08,142 --> 00:10:12,445 So notice that one times one is the same as negative one times negative one. 131 00:10:12,445 --> 00:10:16,543 So this really just looks at whether they have the same spin or different spins. 132 00:10:16,543 --> 00:10:22,888 And there is a parameter that looks at, you know, same or different. 133 00:10:24,088 --> 00:10:26,709 That's what this feature represents. 134 00:10:27,170 --> 00:10:32,514 And, depending on the value of this parameter over here, 135 00:10:32,514 --> 00:10:35,010 if the parameter goes one way, 136 00:10:35,010 --> 00:10:39,599 we're going to favor systems 137 00:10:39,599 --> 00:10:42,509 where the atoms spin in the same direction. 138 00:10:42,509 --> 00:10:48,049 And if it’s going in the opposite direction, you’re going to favor atoms that spin in the different direction. 139 00:10:48,049 --> 00:10:51,268 And those are called ferromagnetic and anti-ferromagnetic. 140 00:10:52,514 --> 00:10:58,536 Furthermore, you can define in these systems the notion of a temperature. 141 00:10:58,813 --> 00:11:04,720 So the temperature here says how strong is this connection. 142 00:11:04,720 --> 00:11:13,126 So notice that as T grows—as the temperature grows—the w_ij’s get divided by T. 143 00:11:15,695 --> 00:11:19,269 And they all kind of go towards zero, 144 00:11:20,992 --> 00:11:24,812 which means that the strength of the connection between 145 00:11:24,812 --> 00:11:27,896 adjacent atoms, effectively becomes almost moot, 146 00:11:27,896 --> 00:11:30,980 and they become almost decoupled from each other. 147 00:11:30,980 --> 00:11:36,216 On the other hand, as the temperature decreases, 148 00:11:38,201 --> 00:11:43,461 Then the effect of the interaction between the atoms becomes much more significant 149 00:11:43,461 --> 00:11:46,915 and they’re going to impose much stronger constraints on each other. 150 00:11:46,915 --> 00:11:49,428 And this is actually a model of a real physical system. 151 00:11:49,428 --> 00:11:51,941 I mean, this is real temperature, and real atoms, and so on. 152 00:11:51,941 --> 00:11:57,075 And sure enough, if you look at what happens to these models as a function of temperature, 153 00:11:57,075 --> 00:12:00,640 what we see over here is high temperature. 154 00:12:02,280 --> 00:12:04,218 This is high temperature 155 00:12:04,218 --> 00:12:10,033 and you can see that there is a lot of mixing between the two types of spin 156 00:12:10,033 --> 00:12:11,925 and this is low temperature 157 00:12:12,587 --> 00:12:20,486 and you can see that there is much stronger constraints in this configuration 158 00:12:20,486 --> 00:12:23,002 about the spins of adjacent atoms. 159 00:12:26,955 --> 00:12:33,844 Another kind of feature that's used very much in lots of practical applications 160 00:12:33,844 --> 00:12:37,682 is the notion of a metric, of a metric feature, an M.R.F. 161 00:12:37,682 --> 00:12:39,695 So what's a metric feature? 162 00:12:39,695 --> 00:12:41,800 This is something that comes up, mostly in cases 163 00:12:41,800 --> 00:12:48,432 where you have a bunch of random variables X_i that all take values in some joint label space of V. 164 00:12:48,432 --> 00:12:50,269 So, for example, they might all be binary. 165 00:12:50,269 --> 00:12:52,852 They all might take values one, two, three, four. 166 00:12:52,852 --> 00:12:57,250 And what we'd like to do is 167 00:12:57,250 --> 00:13:03,466 we have X_i and X_j that are connected to each other by an edge. 168 00:13:03,466 --> 00:13:08,213 We want X_ and X_j to take “similar” values. 169 00:13:08,213 --> 00:13:11,968 So in order to enforce the fact that X_i and X_j should take similar values 170 00:13:11,968 --> 00:13:13,939 we need a notion of similarity. 171 00:13:13,939 --> 00:13:24,110 And we're going to encode that using the distance function µ that takes two values, one for X_i and one for X_j’s, 172 00:13:24,110 --> 00:13:26,265 [that] says how close are they to each other. 173 00:13:26,265 --> 00:13:28,939 So what does the distance function need to be? 174 00:13:28,939 --> 00:13:35,552 Well, the distance function needs to satisfy the standard condition on a distance function or a metric. 175 00:13:35,552 --> 00:13:39,375 So first is reflexivity, 176 00:13:39,375 --> 00:13:43,459 which means that if the two variables take on the same value, 177 00:13:43,459 --> 00:13:45,675 then that distance better be zero. 178 00:13:48,598 --> 00:13:53,736 Oh I forgot to say that this. Sorry, this needs to be a non-negative function. 179 00:13:53,736 --> 00:14:00,733 Symmetry means that the distances are symetrical. 180 00:14:00,733 --> 00:14:06,276 So the distance between two values v1 and v2 are the same as the distance between v2 and v1. 181 00:14:06,276 --> 00:14:12,325 And finally is the triangle inequality, which says that the distance between v1 and v2 182 00:14:12,325 --> 00:14:13,505 (So here is v1) 183 00:14:13,505 --> 00:14:14,685 (Here is v2) 184 00:14:15,485 --> 00:14:21,091 and the distance between v1 and v2 is less than the distance between v1 and v3 185 00:14:21,891 --> 00:14:25,128 and then going to v2. So the standard triangle inequality. 186 00:14:26,513 --> 00:14:32,741 if a distance just satisfies these two conditions, it's called a semi metric. 187 00:14:33,848 --> 00:14:37,415 Otherwise, if it satisfies all three, it's called a metric. 188 00:14:37,846 --> 00:14:41,378 And both are actually used in practical applications. 189 00:14:43,932 --> 00:14:48,556 But how do we take this distance feature and put it in the context of an MRF? 190 00:14:48,556 --> 00:14:55,670 We have a feature that looks at two variables, X_i and X_j. 191 00:14:55,670 --> 00:14:59,476 And that feature is the distance between X_i and X_j. 192 00:14:59,476 --> 00:15:08,675 And now, we put it together by multiplying that with a coefficient, w_ij, 193 00:15:08,675 --> 00:15:13,015 such that w_ij has to be greater than zero. 194 00:15:13,015 --> 00:15:16,433 So that we want the metric MRF 195 00:15:17,279 --> 00:15:21,189 [to have] the effect that 196 00:15:21,189 --> 00:15:29,346 the lower the distance, the higher this is, 197 00:15:30,269 --> 00:15:36,438 because of the negative coefficient, which means that higher the probability. Okay? 198 00:15:37,007 --> 00:15:43,156 So, the more pairs you have that are close to each other 199 00:15:43,156 --> 00:15:47,280 and the closer they are to each other the higher the probability of the overall configuration. 200 00:15:47,280 --> 00:15:49,652 Which is exactly what we wanted to have happen. 201 00:15:53,101 --> 00:15:58,418 So, conversely, if you have values that are far from each other in the distance metric 202 00:15:58,418 --> 00:16:00,596 the lower the probability in the model. 203 00:16:01,796 --> 00:16:04,555 So, here are some examples of metric MRF’s. 204 00:16:04,832 --> 00:16:07,816 So one: The simplest possible metric MRF 205 00:16:07,816 --> 00:16:11,985 is one that gives [a] distance of zero when the two classes are equal to each other 206 00:16:11,985 --> 00:16:13,877 and [a] distance of one everywhere else. 207 00:16:13,877 --> 00:16:16,468 So now, you know, this is just like a step function. 208 00:16:16,468 --> 00:16:22,583 And, this gives rise to a potential that looks like this. 209 00:16:22,583 --> 00:16:26,579 So we have 0’s on the diagonal. 210 00:16:26,579 --> 00:16:33,621 So we get a bump in the probability when the two adjacent variables take on the same label 211 00:16:33,621 --> 00:16:37,122 and otherwise we get a reduction in the probability. 212 00:16:37,137 --> 00:16:40,692 But it doesn’t matter what particular value they take. 213 00:16:40,692 --> 00:16:44,090 That’s one example of a simple metric. 214 00:16:44,090 --> 00:16:51,995 A somewhat more expressive example might come up when the values V are actually numerical values. 215 00:16:52,026 --> 00:16:58,099 In which case you can look at maybe the difference between the miracle values. 216 00:16:58,099 --> 00:17:00,794 So, v_k minus v_l. 217 00:17:00,794 --> 00:17:05,103 And you want, and when v_k is equal to v_l, the distance is zero, 218 00:17:05,103 --> 00:17:14,297 and then you have a linear function that increases the distance as the distance between v_k and v_l grows. 219 00:17:14,297 --> 00:17:18,733 So, this is the absolute value of v_k minus v_l. 220 00:17:20,353 --> 00:17:26,010 A more interesting notion that comes up a lot in practice is: 221 00:17:26,010 --> 00:17:32,374 we don’t want to penalize arbitrarily things that are far way from each other in label space. 222 00:17:32,374 --> 00:17:37,437 So this is what is called a truncated linear penalty. 223 00:17:38,160 --> 00:17:41,852 And you can see that beyond a certain threshold, 224 00:17:44,067 --> 00:17:48,651 the penalty just becomes constant, so it plateaus. 225 00:17:48,667 --> 00:17:54,046 So that there is a penalty, but it doesn’t keep increasing over as the labels get further from each other 226 00:17:54,046 --> 00:17:59,768 One example where metric MRF’s are used is when we’re doing image segmentation. 227 00:17:59,768 --> 00:18:04,136 And here we tend to favor segmentations where adjacent superpixels… 228 00:18:06,551 --> 00:18:09,064 (These are adjacent superpixels.) 229 00:18:10,402 --> 00:18:12,700 And we want them to take the same class. 230 00:18:18,146 --> 00:18:22,279 And so here we have no penalty when the superpixels take the same class 231 00:18:22,279 --> 00:18:25,199 and we have some penalty when they take different classes. 232 00:18:25,199 --> 00:18:30,949 And this is actually a very common, albeit simple, model for image segmentation. 233 00:18:32,165 --> 00:18:37,141 Let’s look at a different MRF, also in the context of computer vision. 234 00:18:37,141 --> 00:18:40,440 This is an MRF that’s used for image denoising. 235 00:18:40,440 --> 00:18:45,192 So here we have a noisy version of a real image that looks like this. 236 00:18:45,208 --> 00:18:50,824 So this is, you can see this kind of, white noise overlayed on top of the image. 237 00:18:50,824 --> 00:18:53,923 And what we’d like to do, is we’d like to get a cleaned-up version of the image. 238 00:18:53,923 --> 00:19:00,799 So here we have, a set of variables, X, that correspond to the noisy pixels. 239 00:19:02,230 --> 00:19:07,736 And we have a set of variables, Y, that corresponds to the cleaned pixels. 240 00:19:07,736 --> 00:19:15,427 And we'd like to have a probabilistic model that relates X and Y. 241 00:19:15,427 --> 00:19:20,525 And what we’re going to do is we’d like, so, intuiti—, I mean, 242 00:19:20,525 --> 00:19:25,488 so you’d like to have two effects on the pixels Y: 243 00:19:25,488 --> 00:19:33,105 First, you'd like Y_i to be close to X_i. 244 00:19:33,105 --> 00:19:36,372 But if you just do that, then you're just going to stick with the original image. 245 00:19:36,372 --> 00:19:41,348 So what is the main constraint that we can employ on the image in order to clean it up 246 00:19:41,348 --> 00:19:46,916 is the fact that adjacent pixels tend to have the same value. 247 00:19:46,916 --> 00:19:52,907 So in this case what we’re going to do is we’re going to model, we’re going to constrain the image 248 00:19:52,907 --> 00:20:00,212 so that we’re going to constrain the Y_i’s to try and make Y_i close to its neighbors. 249 00:20:03,012 --> 00:20:05,831 And the further away it is, the bigger the penalty. 250 00:20:05,831 --> 00:20:08,219 And that's a metric MRF. 251 00:20:11,526 --> 00:20:17,085 Now we could use just a linear penalty, 252 00:20:17,085 --> 00:20:21,892 but that’s going to be a very fragile model, 253 00:20:21,892 --> 00:20:28,208 because, now obviously the right answer isn't the model where all pixels are equal to each other 254 00:20:28,208 --> 00:20:33,874 in their actual intensity value because that would be just a single, you know, grayish-looking image. 255 00:20:33,874 --> 00:20:39,690 So what you like is that you would like to let one pixel depart from its adjacent pixel 256 00:20:39,690 --> 00:20:44,098 if it’s getting close in a different direction either by its own observation or by other adjacent pixels. 257 00:20:44,098 --> 00:20:49,482 And so the right model to use here is actually the truncated linear model 258 00:20:49,482 --> 00:20:52,174 and that one is [the] one that’s commonly used 259 00:20:52,174 --> 00:20:54,970 and is very successful for doing image denoising. 260 00:20:55,432 --> 00:21:02,063 Interesting, almost exactly the same idea is used in the context of stereo reconstruction. 261 00:21:02,063 --> 00:21:06,264 There, the values that you’d like to infer, the Y_i’s, 262 00:21:06,264 --> 00:21:11,557 are the depth disparity for a given pixel in the image—how deep it is. 263 00:21:11,557 --> 00:21:13,957 And here also we have spacial continuity. 264 00:21:13,957 --> 00:21:19,388 We like the depth of one pixel to be close to the depth of an adjacent pixel. 265 00:21:19,388 --> 00:21:22,497 But once again we don’t want to enforce this too strongly 266 00:21:22,497 --> 00:21:25,159 because you do have depth disparity in the image 267 00:21:25,159 --> 00:21:27,820 and so eventually you'd like things to be allowed to break away from each other. 268 00:21:27,820 --> 00:21:33,320 And so once again, one typically uses some kind of truncated linear model 269 00:21:33,320 --> 00:21:36,239 for doing this stereo construction, 270 00:21:36,239 --> 00:21:38,091 often augmented by other little tricks. 271 00:21:38,091 --> 00:21:45,143 So, for example, here we have the actual pixel appearance, 272 00:21:45,143 --> 00:21:47,867 for example, the color and texture. 273 00:21:47,867 --> 00:21:50,392 And if the color and texture are very similar to each other, 274 00:21:50,407 --> 00:21:54,861 you might want to have the stronger constraint on similarity. 275 00:21:54,861 --> 00:22:00,257 Versus: if the color and texture of the adjacent pixels are very different from each other, 276 00:22:00,257 --> 00:22:02,868 they may be more likely to belong to different objects 277 00:22:02,868 --> 00:22:07,580 and you don’t want to enforce quite as strong of a similarity constraint.