1
00:00:00,000 --> 00:00:05,436
Local structure that doesn’t require full table representations

2
00:00:05,452 --> 00:00:09,073
is important in both directed and undirected models.

3
00:00:09,073 --> 00:00:14,868
How do we incorporate local structure into undirected models?

4
00:00:14,868 --> 00:00:22,084
The framework for that is called “log-linear models” for reasons that will be clear in just a moment.

5
00:00:23,084 --> 00:00:24,227
So

6
00:00:26,566 --> 00:00:33,140
Whereas, in the original representation of the unnormalized density

7
00:00:33,140 --> 00:00:39,187
we defined P tilde as the product of factors φi(Di),

8
00:00:39,218 --> 00:00:42,305
each [of] which is potentially a full table.

9
00:00:42,889 --> 00:00:46,032
Now we're going to shift that representation

10
00:00:46,032 --> 00:00:49,975
to something that uses a linear form

11
00:00:50,000 --> 00:00:52,446
(So here's a linear form)

12
00:00:52,446 --> 00:00:54,892
that is subsequently exponentiated,

13
00:00:54,922 --> 00:00:56,970
and that's why it's called log-linear—

14
00:00:56,970 --> 00:00:59,632
because the logarithm is a linear function.

15
00:00:59,632 --> 00:01:02,742
So what is this form over here?

16
00:01:02,742 --> 00:01:06,986
It's a linear function that has these things that are called “coefficients”

17
00:01:10,171 --> 00:01:12,631
and these things that are called “features”.

18
00:01:14,200 --> 00:01:22,108
Features, like factors, each have a scope which is a set of variables on which the feature depends.

19
00:01:24,153 --> 00:01:27,026
But different features can have the same scope.

20
00:01:27,026 --> 00:01:31,176
You can have multiple features all of which are over the same set of variables.

21
00:01:31,176 --> 00:01:36,472
Notice that each feature has just a single parameter wj that multiplies it.

22
00:01:37,657 --> 00:01:40,310
So, what does this give rise to?

23
00:01:40,310 --> 00:01:43,456
I mean if we have a log-linear model,

24
00:01:43,487 --> 00:01:49,465
we can push in the exponent through the summation,

25
00:01:49,465 --> 00:01:58,764
and that gives us something that is a product of exponential functions.

26
00:01:58,795 --> 00:02:03,530
You can think of each of these as effectively a little factor,

27
00:02:03,530 --> 00:02:07,689
but it’s a factor that only has a single parameter wj.

28
00:02:07,689 --> 00:02:11,201
Since this is a little bit abstract, so let’s look at an example.

29
00:02:11,201 --> 00:02:17,694
Specifically lets look at how we might represent a simple table factor as a log linear model.

30
00:02:17,709 --> 00:02:24,290
So here’s a param, here’s a factor φ, over two binary random variables X1 and X2.

31
00:02:24,290 --> 00:02:30,788
And so a full table factor would have four parameters: a00, a01, a10, and a11.

32
00:02:30,788 --> 00:02:34,552
So we can capture this model using a log linear model,

33
00:02:34,552 --> 00:02:37,958
using a set of such of features,

34
00:02:37,958 --> 00:02:41,365
using a set of these guys, which are indicator functions.

35
00:02:41,365 --> 00:02:43,225
So this is an indicator function.

36
00:02:43,225 --> 00:02:48,963
It takes one if X1 is zero and X2 is zero,

37
00:02:48,978 --> 00:02:50,881
and it takes zero otherwise.

38
00:02:51,035 --> 00:02:54,092
So this the general notion of an indicator function.

39
00:02:54,123 --> 00:03:00,315
It looks at the event—or constraint—inside the curly braces,

40
00:03:00,407 --> 00:03:05,230
and it returns a value of 0 or 1, depending on
whether that event is true or not.

41
00:03:05,722 --> 00:03:13,595
And so, if we wanted to represent this factor as a log-linear model,

42
00:03:13,626 --> 00:03:19,920
We can see that we can simply sum up over all the four values of k and ℓ,

43
00:03:19,920 --> 00:03:22,691
which are either 0 or 1, each of them.

44
00:03:22,707 --> 00:03:25,407
So were summing up over all four entries here.

45
00:03:25,407 --> 00:03:32,480
And we have a parameter—or coefficient—w_kℓ which multiplies this feature.

46
00:03:33,295 --> 00:03:44,124
And so, we would have a summation of w_kℓ:

47
00:03:44,139 --> 00:03:49,224
of w00 only in the case where X1 is zero and X2 is zero.

48
00:03:49,255 --> 00:04:00,483
So we would have exp of negative w00 when X1=0 and X2=0,

49
00:04:01,114 --> 00:04:12,037
and we would have exp of negative w01 when
X1=0 and X2=1, and so on and so forth.

50
00:04:12,390 --> 00:04:15,509
And it’s not difficult to convince ourselves that

51
00:04:15,525 --> 00:04:24,771
if we define w_kℓ to be the negative log of the corresponding entries in this table,

52
00:04:24,801 --> 00:04:28,650
then that gives us right back the factor that
we defined to begin with.

53
00:04:28,681 --> 00:04:33,144
So this shows that this is a general representation,

54
00:04:34,005 --> 00:04:37,267
in the sense that we can take any factor

55
00:04:37,898 --> 00:04:40,529
and represent it as a log-linear model

56
00:04:40,544 --> 00:04:47,175
simply by including all of the appropriate features.

57
00:04:48,605 --> 00:04:52,359
But we don’t generally want to do that.

58
00:04:52,359 --> 00:04:55,800
Generally we want a much finer grain set of features.

59
00:04:55,800 --> 00:05:01,197
So let’s look at some of the examples of features that people use in practice.

60
00:05:01,197 --> 00:05:03,838
So here are the features used in a language model.

61
00:05:03,853 --> 00:05:07,934
This is a language model that we that we discussed previously.

62
00:05:07,950 --> 00:05:12,276
And here we have features that relate:

63
00:05:12,292 --> 00:05:15,839
First of all, let’s just remind ourselves [that] we have two sets of variables.

64
00:05:15,839 --> 00:05:23,683
We have the variables Y which represent the annotations for each word

65
00:05:23,698 --> 00:05:29,956
in the sequence corresponding to what category that corresponds to.

66
00:05:29,956 --> 00:05:32,429
So this is a person.

67
00:05:32,429 --> 00:05:34,903
This is the beginning of a person name.

68
00:05:34,936 --> 00:05:37,847
This is the continuation of a person name.

69
00:05:37,847 --> 00:05:39,302
The beginning of a location.

70
00:05:39,302 --> 00:05:42,235
The continuation of a location, and so on.

71
00:05:42,235 --> 00:05:45,694
As well as a bunch of words that are not:

72
00:05:45,694 --> 00:05:48,444
[i.e.,] none of person, location, organization.

73
00:05:48,444 --> 00:05:50,768
And they’re all labeled “other”.

74
00:05:50,783 --> 00:05:54,938
And so the value Y tells us for each word what
category it belongs to,

75
00:05:54,938 --> 00:05:59,375
so that we’re trying to identify people, locations, and
organizations in the sentence.

76
00:05:59,375 --> 00:06:02,146
We have another set of variables X,

77
00:06:03,315 --> 00:06:08,687
which are the actual words in the sentence.

78
00:06:09,225 --> 00:06:12,806
Now we can go ahead and define…

79
00:06:12,806 --> 00:06:15,849
We can use a full table representation that

80
00:06:15,849 --> 00:06:22,729
basically tries to relate each and every Y that has a feature,

81
00:06:22,729 --> 00:06:27,678
that has a full factor that looks at every possible word in the English language;

82
00:06:27,678 --> 00:06:31,674
but those are going to be very, very, expensive,

83
00:06:31,674 --> 00:06:34,070
and a very large number of parameters.

84
00:06:34,117 --> 00:06:40,047
And so we're going to define a feature that looks, for example, at f of

85
00:06:40,047 --> 00:06:45,203
say a particular Y_i, which is the label for the i’th word in the sentence,

86
00:06:45,203 --> 00:06:47,652
and X_i, being that i’th word.

87
00:06:47,652 --> 00:06:54,839
And that feature says, for example: Y_i equals person.

88
00:06:55,839 --> 00:07:03,750
It’s the indicator function for “Y_i = person and X_i is capitalized”.

89
00:07:05,827 --> 00:07:09,694
And so that feature doesn’t look at the individual words.

90
00:07:09,694 --> 00:07:13,192
It just looks at whether that word is capitalized.

91
00:07:13,192 --> 00:07:17,430
Now we have just the single parameter that looks just at capitalization,

92
00:07:17,430 --> 00:07:23,140
and parameterizes how important is capitalization for recognizing that something's a person.

93
00:07:23,153 --> 00:07:26,949
We could also have another feature.

94
00:07:26,949 --> 00:07:28,862
This is an alternative:

95
00:07:28,862 --> 00:07:32,590
This a different feature that can and could be part of the same model

96
00:07:32,590 --> 00:07:38,381
that says: Y_i is equal to location,

97
00:07:38,381 --> 00:07:41,061
Or, actually, I was little bit imprecise here—

98
00:07:41,061 --> 00:07:45,464
This might be beginning of person. This might be beginning of location.

99
00:07:45,480 --> 00:07:51,414
And X_i appears in some atlas.

100
00:07:52,137 --> 00:07:55,370
Now there is other things that appear in the atlas than locations,

101
00:07:55,370 --> 00:07:58,648
but if a word appears in the atlas,

102
00:07:58,663 --> 00:08:01,972
there is a much higher probability presumably that it’s actually a location

103
00:08:02,003 --> 00:08:06,199
and so we might have, again, [a] weight for this feature

104
00:08:06,199 --> 00:08:13,611
that indicates that maybe increases the probability in Y_i being labeled in this way.

105
00:08:13,611 --> 00:08:19,438
And so you can imagine that constructing a very rich set of features,

106
00:08:19,438 --> 00:08:24,163
all of which look at certain aspects of the word,

107
00:08:24,163 --> 00:08:31,900
and rather than enumerating all the possible words
and giving a parameter to each and one of them.

108
00:08:33,562 --> 00:08:38,073
Let’s look at some other examples of feature-based models.

109
00:08:38,120 --> 00:08:40,913
So this is an example from statistical physics.

110
00:08:40,913 --> 00:08:43,445
It’s called the Ising model.

111
00:08:43,445 --> 00:08:48,778
And the Ising model is something that looks at pairs
of variables.

112
00:08:48,778 --> 00:08:51,773
It’s a pairwise Markov network.

113
00:08:53,127 --> 00:08:55,672
And [it] looks the pairs of adjacent variables,

114
00:08:55,672 --> 00:08:59,495
and basically gives us a coefficient for their products.

115
00:08:59,495 --> 00:09:03,509
So now, this is a case where variables are in the end are binary,

116
00:09:03,509 --> 00:09:07,836
but not in the space {0, 1} but rather
negative one and positive one.

117
00:09:07,836 --> 00:09:10,311
And so now, we have a model that's parametrized

118
00:09:10,311 --> 00:09:13,970
as features that are just the product of the values of the adjacent variables.

119
00:09:14,339 --> 00:09:15,606
Where might this come up?

120
00:09:15,606 --> 00:09:23,165
It comes up in the context, for example, of modeling the spins of electrons in a grid.

121
00:09:23,704 --> 00:09:28,742
So here you have a case where the electrons can rotate

122
00:09:28,742 --> 00:09:31,586
either along one direction or in the other direction

123
00:09:31,586 --> 00:09:40,542
so here is a bunch of the atoms that are marked with a blue arrow.

124
00:09:40,542 --> 00:09:43,059
You have one rotational axis,

125
00:09:43,059 --> 00:09:45,576
and the red arrow[s] are rotating in the opposite direction.

126
00:09:46,084 --> 00:09:52,545
And this basically says we have a term that

127
00:09:52,545 --> 00:09:57,063
[whose] probability distribution over the joint set of spins.

128
00:09:57,063 --> 00:10:01,519
(So this is the joint spins.)

129
00:10:01,950 --> 00:10:08,142
And the model, depends on whether adjacent
atoms have the same spin or opposite spin.

130
00:10:08,142 --> 00:10:12,445
So notice that one times one is the same as negative one times negative one.

131
00:10:12,445 --> 00:10:16,543
So this really just looks at whether they have the same spin
or different spins.

132
00:10:16,543 --> 00:10:22,888
And there is a parameter that looks at, you know, same or
different.

133
00:10:24,088 --> 00:10:26,709
That's what this feature represents.

134
00:10:27,170 --> 00:10:32,514
And, depending on the value of this parameter over here,

135
00:10:32,514 --> 00:10:35,010
if the parameter goes one way,

136
00:10:35,010 --> 00:10:39,599
we're going to favor systems

137
00:10:39,599 --> 00:10:42,509
where the atoms spin in the same direction.

138
00:10:42,509 --> 00:10:48,049
And if it’s going in the opposite direction, you’re going to favor atoms that spin in the different direction.

139
00:10:48,049 --> 00:10:51,268
And those are called ferromagnetic and anti-ferromagnetic.

140
00:10:52,514 --> 00:10:58,536
Furthermore, you can define in these systems the notion of a temperature.

141
00:10:58,813 --> 00:11:04,720
So the temperature here says how strong is this connection.

142
00:11:04,720 --> 00:11:13,126
So notice that as T grows—as the temperature grows—the w_ij’s get divided by T.

143
00:11:15,695 --> 00:11:19,269
And they all kind of go towards zero,

144
00:11:20,992 --> 00:11:24,812
which means that the strength of the connection between

145
00:11:24,812 --> 00:11:27,896
adjacent atoms, effectively becomes almost moot,

146
00:11:27,896 --> 00:11:30,980
and they become almost decoupled from each other.

147
00:11:30,980 --> 00:11:36,216
On the other hand, as the temperature decreases,

148
00:11:38,201 --> 00:11:43,461
Then the effect of the interaction between the atoms becomes much more significant

149
00:11:43,461 --> 00:11:46,915
and they’re going to impose much stronger constraints on each other.

150
00:11:46,915 --> 00:11:49,428
And this is actually a model of a real physical system.

151
00:11:49,428 --> 00:11:51,941
I mean, this is real temperature, and real atoms, and so on.

152
00:11:51,941 --> 00:11:57,075
And sure enough, if you look at what happens to these models as a function of temperature,

153
00:11:57,075 --> 00:12:00,640
what we see over here is high temperature.

154
00:12:02,280 --> 00:12:04,218
This is high temperature

155
00:12:04,218 --> 00:12:10,033
and you can see that there is a lot of mixing between the two types of spin

156
00:12:10,033 --> 00:12:11,925
and this is low temperature

157
00:12:12,587 --> 00:12:20,486
and you can see that there is much stronger
constraints in this configuration

158
00:12:20,486 --> 00:12:23,002
about the spins of adjacent atoms.

159
00:12:26,955 --> 00:12:33,844
Another kind of feature that's used very much in lots of practical applications

160
00:12:33,844 --> 00:12:37,682
is the notion of a metric, of a metric feature, an M.R.F.

161
00:12:37,682 --> 00:12:39,695
So what's a metric feature?

162
00:12:39,695 --> 00:12:41,800
This is something that comes up, mostly in cases

163
00:12:41,800 --> 00:12:48,432
where you have a bunch of random variables X_i that all take values in some joint label space of V.

164
00:12:48,432 --> 00:12:50,269
So, for example, they might all be binary.

165
00:12:50,269 --> 00:12:52,852
They all might take values one, two, three, four.

166
00:12:52,852 --> 00:12:57,250
And what we'd like to do is

167
00:12:57,250 --> 00:13:03,466
we have X_i and X_j that are connected to each other by an edge.

168
00:13:03,466 --> 00:13:08,213
We want X_ and X_j to take “similar” values.

169
00:13:08,213 --> 00:13:11,968
So in order to enforce the fact that X_i and X_j should take similar values

170
00:13:11,968 --> 00:13:13,939
we need a notion of similarity.

171
00:13:13,939 --> 00:13:24,110
And we're going to encode that using the distance function µ that takes two values, one for X_i and one for X_j’s,

172
00:13:24,110 --> 00:13:26,265
[that] says how close are they to each other.

173
00:13:26,265 --> 00:13:28,939
So what does the distance function need to be?

174
00:13:28,939 --> 00:13:35,552
Well, the distance function needs to satisfy the standard condition on a distance function or a metric.

175
00:13:35,552 --> 00:13:39,375
So first is reflexivity,

176
00:13:39,375 --> 00:13:43,459
which means that if the two variables take on the same value,

177
00:13:43,459 --> 00:13:45,675
then that distance better be zero.

178
00:13:48,598 --> 00:13:53,736
Oh I forgot to say that this. Sorry, this needs to be a non-negative function.

179
00:13:53,736 --> 00:14:00,733
Symmetry means that the distances are symetrical.

180
00:14:00,733 --> 00:14:06,276
So the distance between two values v1 and v2 are the same as the distance between v2 and v1.

181
00:14:06,276 --> 00:14:12,325
And finally is the triangle inequality, which says that the distance between v1 and v2

182
00:14:12,325 --> 00:14:13,505
(So here is v1)

183
00:14:13,505 --> 00:14:14,685
(Here is v2)

184
00:14:15,485 --> 00:14:21,091
and the distance between v1 and v2 is less than the distance between v1 and v3

185
00:14:21,891 --> 00:14:25,128
and then going to v2. So the standard triangle inequality.

186
00:14:26,513 --> 00:14:32,741
if a distance just satisfies these two conditions, it's called a semi metric.

187
00:14:33,848 --> 00:14:37,415
Otherwise, if it satisfies all three, it's called a metric.

188
00:14:37,846 --> 00:14:41,378
And both are actually used in practical applications.

189
00:14:43,932 --> 00:14:48,556
But how do we take this distance feature and put it in the context of an MRF?

190
00:14:48,556 --> 00:14:55,670
We have a feature that looks at two variables, X_i and X_j.

191
00:14:55,670 --> 00:14:59,476
And that feature is the distance between X_i and X_j.

192
00:14:59,476 --> 00:15:08,675
And now, we put it together by multiplying that with a coefficient, w_ij,

193
00:15:08,675 --> 00:15:13,015
such that w_ij has to be greater than zero.

194
00:15:13,015 --> 00:15:16,433
So that we want the metric MRF

195
00:15:17,279 --> 00:15:21,189
[to have] the effect that

196
00:15:21,189 --> 00:15:29,346
the lower the distance, the higher this is,

197
00:15:30,269 --> 00:15:36,438
because of the negative coefficient, which means that higher the probability. Okay?

198
00:15:37,007 --> 00:15:43,156
So, the more pairs you have that are close to each other

199
00:15:43,156 --> 00:15:47,280
and the closer they are to each other the higher
the probability of the overall configuration.

200
00:15:47,280 --> 00:15:49,652
Which is exactly what we wanted to have happen.

201
00:15:53,101 --> 00:15:58,418
So, conversely, if you have values that are far from
each other in the distance metric

202
00:15:58,418 --> 00:16:00,596
the lower the probability in the model.

203
00:16:01,796 --> 00:16:04,555
So, here are some examples of metric MRF’s.

204
00:16:04,832 --> 00:16:07,816
So one: The simplest possible metric MRF

205
00:16:07,816 --> 00:16:11,985
is one that gives [a] distance of zero when the two classes are equal to each other

206
00:16:11,985 --> 00:16:13,877
and [a] distance of one everywhere else.

207
00:16:13,877 --> 00:16:16,468
So now, you know, this is just like a step function.

208
00:16:16,468 --> 00:16:22,583
And, this gives rise to a potential that looks like this.

209
00:16:22,583 --> 00:16:26,579
So we have 0’s on the diagonal.

210
00:16:26,579 --> 00:16:33,621
So we get a bump in the probability when the two adjacent variables take on the same label

211
00:16:33,621 --> 00:16:37,122
and otherwise we get a reduction in the probability.

212
00:16:37,137 --> 00:16:40,692
But it doesn’t matter what particular value they take.

213
00:16:40,692 --> 00:16:44,090
That’s one example of a simple metric.

214
00:16:44,090 --> 00:16:51,995
A somewhat more expressive example might come up when the values V are actually numerical values.

215
00:16:52,026 --> 00:16:58,099
In which case you can look at maybe the difference between the miracle values.

216
00:16:58,099 --> 00:17:00,794
So, v_k minus v_l.

217
00:17:00,794 --> 00:17:05,103
And you want, and when v_k is equal to v_l, the distance is zero,

218
00:17:05,103 --> 00:17:14,297
and then you have a linear function that increases the
distance as the distance between v_k and v_l grows.

219
00:17:14,297 --> 00:17:18,733
So, this is the absolute value of v_k minus v_l.

220
00:17:20,353 --> 00:17:26,010
A more interesting notion that comes up a lot in
practice is:

221
00:17:26,010 --> 00:17:32,374
we don’t want to penalize arbitrarily things that are far way from each other in label space.

222
00:17:32,374 --> 00:17:37,437
So this is what is called a truncated linear penalty.

223
00:17:38,160 --> 00:17:41,852
And you can see that beyond a certain threshold,

224
00:17:44,067 --> 00:17:48,651
the penalty just becomes constant, so it plateaus.

225
00:17:48,667 --> 00:17:54,046
So that there is a penalty, but it doesn’t keep increasing over as the labels get further from each other

226
00:17:54,046 --> 00:17:59,768
One example where metric MRF’s are used is when we’re doing image segmentation.

227
00:17:59,768 --> 00:18:04,136
And here we tend to favor segmentations where adjacent superpixels…

228
00:18:06,551 --> 00:18:09,064
(These are adjacent superpixels.)

229
00:18:10,402 --> 00:18:12,700
And we want them to take the same class.

230
00:18:18,146 --> 00:18:22,279
And so here we have no penalty when the superpixels take the same class

231
00:18:22,279 --> 00:18:25,199
and we have some penalty when they take different classes.

232
00:18:25,199 --> 00:18:30,949
And this is actually a very common, albeit simple, model for
image segmentation.

233
00:18:32,165 --> 00:18:37,141
Let’s look at a different MRF, also in the context of
computer vision.

234
00:18:37,141 --> 00:18:40,440
This is an MRF that’s used for image denoising.

235
00:18:40,440 --> 00:18:45,192
So here we have a noisy version of a real image that looks like this.

236
00:18:45,208 --> 00:18:50,824
So this is, you can see this kind of, white noise overlayed on top of the image.

237
00:18:50,824 --> 00:18:53,923
And what we’d like to do, is we’d like to get a cleaned-up version of the image.

238
00:18:53,923 --> 00:19:00,799
So here we have, a set of variables, X, that correspond to the noisy pixels.

239
00:19:02,230 --> 00:19:07,736
And we have a set of variables, Y, that corresponds to the cleaned pixels.

240
00:19:07,736 --> 00:19:15,427
And we'd like to have a probabilistic model that relates X and Y.

241
00:19:15,427 --> 00:19:20,525
And what we’re going to do is we’d like, so, intuiti—, I mean,

242
00:19:20,525 --> 00:19:25,488
so you’d like to have two effects on the pixels Y:

243
00:19:25,488 --> 00:19:33,105
First, you'd like Y_i to be close to X_i.

244
00:19:33,105 --> 00:19:36,372
But if you just do that, then you're just going to stick with
the original image.

245
00:19:36,372 --> 00:19:41,348
So what is the main constraint that we can employ on the image in order to clean it up

246
00:19:41,348 --> 00:19:46,916
is the fact that adjacent pixels tend to have the same value.

247
00:19:46,916 --> 00:19:52,907
So in this case what we’re going to do is we’re going to model, we’re going to constrain the image

248
00:19:52,907 --> 00:20:00,212
so that we’re going to constrain the Y_i’s to try and make Y_i close to its neighbors.

249
00:20:03,012 --> 00:20:05,831
And the further away it is, the bigger the penalty.

250
00:20:05,831 --> 00:20:08,219
And that's a metric MRF.

251
00:20:11,526 --> 00:20:17,085
Now we could use just a linear penalty,

252
00:20:17,085 --> 00:20:21,892
but that’s going to be a very fragile model,

253
00:20:21,892 --> 00:20:28,208
because, now obviously the right answer isn't the model
where all pixels are equal to each other

254
00:20:28,208 --> 00:20:33,874
in their actual intensity value because that would be just a single, you know, grayish-looking image.

255
00:20:33,874 --> 00:20:39,690
So what you like is that you would like to let one pixel depart from its adjacent pixel

256
00:20:39,690 --> 00:20:44,098
if it’s getting close in a different direction either by its own observation or by other adjacent pixels.

257
00:20:44,098 --> 00:20:49,482
And so the right model to use here is actually the truncated linear model

258
00:20:49,482 --> 00:20:52,174
and that one is [the] one that’s commonly used

259
00:20:52,174 --> 00:20:54,970
and is very successful for doing image denoising.

260
00:20:55,432 --> 00:21:02,063
Interesting, almost exactly the same idea is used in the context of stereo reconstruction.

261
00:21:02,063 --> 00:21:06,264
There, the values that you’d like to infer, the Y_i’s,

262
00:21:06,264 --> 00:21:11,557
are the depth disparity for a given pixel in the image—how deep it is.

263
00:21:11,557 --> 00:21:13,957
And here also we have spacial continuity.

264
00:21:13,957 --> 00:21:19,388
We like the depth of one pixel to be close to the depth of an adjacent pixel.

265
00:21:19,388 --> 00:21:22,497
But once again we don’t want to enforce this too strongly

266
00:21:22,497 --> 00:21:25,159
because you do have depth disparity in the image

267
00:21:25,159 --> 00:21:27,820
and so eventually you'd like things to be allowed to break away from each other.

268
00:21:27,820 --> 00:21:33,320
And so once again, one typically uses some kind of truncated linear model

269
00:21:33,320 --> 00:21:36,239
for doing this stereo construction,

270
00:21:36,239 --> 00:21:38,091
often augmented by other little tricks.

271
00:21:38,091 --> 00:21:45,143
So, for example, here we have the actual pixel appearance,

272
00:21:45,143 --> 00:21:47,867
for example, the color and texture.

273
00:21:47,867 --> 00:21:50,392
And if the color and texture are very similar to each other,

274
00:21:50,407 --> 00:21:54,861
you might want to have the stronger constraint on similarity.

275
00:21:54,861 --> 00:22:00,257
Versus: if the color and texture of the adjacent pixels are
very different from each other,

276
00:22:00,257 --> 00:22:02,868
they may be more likely to belong to different objects

277
00:22:02,868 --> 00:22:07,580
and you don’t want to enforce quite as strong of a similarity constraint.