WEBVTT

00:00:00.000 --> 00:00:05.436
Local structure that doesn’t require full table representations

00:00:05.452 --> 00:00:09.073
is important in both directed and undirected models.

00:00:09.073 --> 00:00:14.868
How do we incorporate local structure into undirected models?

00:00:14.868 --> 00:00:22.084
The framework for that is called “log-linear models” for reasons that will be clear in just a moment.

00:00:23.084 --> 00:00:24.227
So

00:00:26.566 --> 00:00:33.140
Whereas, in the original representation of the unnormalized density

00:00:33.140 --> 00:00:39.187
we defined P tilde as the product of factors φi(Di),

00:00:39.218 --> 00:00:42.305
each [of] which is potentially a full table.

00:00:42.889 --> 00:00:46.032
Now we're going to shift that representation

NOTE Paragraph

00:00:46.032 --> 00:00:49.975
to something that uses a linear form

00:00:50.000 --> 00:00:52.446
(So here's a linear form)

00:00:52.446 --> 00:00:54.892
that is subsequently exponentiated,

00:00:54.922 --> 00:00:56.970
and that's why it's called log-linear—

00:00:56.970 --> 00:00:59.632
because the logarithm is a linear function.

00:00:59.632 --> 00:01:02.742
So what is this form over here?

00:01:02.742 --> 00:01:06.986
It's a linear function that has these things that are called “coefficients”

00:01:10.171 --> 00:01:12.631
and these things that are called “features”.

00:01:14.200 --> 00:01:22.108
Features, like factors, each have a scope which is a set of variables on which the feature depends.

00:01:24.153 --> 00:01:27.026
But different features can have the same scope.

00:01:27.026 --> 00:01:31.176
You can have multiple features all of which are over the same set of variables.

00:01:31.176 --> 00:01:36.472
Notice that each feature has just a single parameter wj that multiplies it.

00:01:37.657 --> 00:01:40.310
So, what does this give rise to?

00:01:40.310 --> 00:01:43.456
I mean if we have a log-linear model,

00:01:43.487 --> 00:01:49.465
we can push in the exponent through the summation,

00:01:49.465 --> 00:01:58.764
and that gives us something that is a product of exponential functions.

00:01:58.795 --> 00:02:03.530
You can think of each of these as effectively a little factor,

00:02:03.530 --> 00:02:07.689
but it’s a factor that only has a single parameter wj.

00:02:07.689 --> 00:02:11.201
Since this is a little bit abstract, so let’s look at an example.

00:02:11.201 --> 00:02:17.694
Specifically lets look at how we might represent a simple table factor as a log linear model.

00:02:17.709 --> 00:02:24.290
So here’s a param, here’s a factor φ, over two binary random variables X1 and X2.

00:02:24.290 --> 00:02:30.788
And so a full table factor would have four parameters: a00, a01, a10, and a11.

00:02:30.788 --> 00:02:34.552
So we can capture this model using a log linear model,

00:02:34.552 --> 00:02:37.958
using a set of such of features,

00:02:37.958 --> 00:02:41.365
using a set of these guys, which are indicator functions.

00:02:41.365 --> 00:02:43.225
So this is an indicator function.

00:02:43.225 --> 00:02:48.963
It takes one if X1 is zero and X2 is zero,

00:02:48.978 --> 00:02:50.881
and it takes zero otherwise.

00:02:51.035 --> 00:02:54.092
So this the general notion of an indicator function.

00:02:54.123 --> 00:03:00.315
It looks at the event—or constraint—inside the curly braces,

00:03:00.407 --> 00:03:05.230
and it returns a value of 0 or 1, depending on
whether that event is true or not.

00:03:05.722 --> 00:03:13.595
And so, if we wanted to represent this factor as a log-linear model,

00:03:13.626 --> 00:03:19.920
We can see that we can simply sum up over all the four values of k and ℓ,

00:03:19.920 --> 00:03:22.691
which are either 0 or 1, each of them.

00:03:22.707 --> 00:03:25.407
So were summing up over all four entries here.

00:03:25.407 --> 00:03:32.480
And we have a parameter—or coefficient—w_kℓ which multiplies this feature.

00:03:33.295 --> 00:03:44.124
And so, we would have a summation of w_kℓ:

00:03:44.139 --> 00:03:49.224
of w00 only in the case where X1 is zero and X2 is zero.

00:03:49.255 --> 00:04:00.483
So we would have exp of negative w00 when X1=0 and X2=0,

00:04:01.114 --> 00:04:12.037
and we would have exp of negative w01 when
X1=0 and X2=1, and so on and so forth.

00:04:12.390 --> 00:04:15.509
And it’s not difficult to convince ourselves that

00:04:15.525 --> 00:04:24.771
if we define w_kℓ to be the negative log of the corresponding entries in this table,

00:04:24.801 --> 00:04:28.650
then that gives us right back the factor that
we defined to begin with.

00:04:28.681 --> 00:04:33.144
So this shows that this is a general representation,

00:04:34.005 --> 00:04:37.267
in the sense that we can take any factor

00:04:37.898 --> 00:04:40.529
and represent it as a log-linear model

00:04:40.544 --> 00:04:47.175
simply by including all of the appropriate features.

00:04:48.605 --> 00:04:52.359
But we don’t generally want to do that.

00:04:52.359 --> 00:04:55.800
Generally we want a much finer grain set of features.

00:04:55.800 --> 00:05:01.197
So let’s look at some of the examples of features that people use in practice.

00:05:01.197 --> 00:05:03.838
So here are the features used in a language model.

00:05:03.853 --> 00:05:07.934
This is a language model that we that we discussed previously.

00:05:07.950 --> 00:05:12.276
And here we have features that relate:

00:05:12.292 --> 00:05:15.839
First of all, let’s just remind ourselves [that] we have two sets of variables.

00:05:15.839 --> 00:05:23.683
We have the variables Y which represent the annotations for each word

00:05:23.698 --> 00:05:29.956
in the sequence corresponding to what category that corresponds to.

00:05:29.956 --> 00:05:32.429
So this is a person.

00:05:32.429 --> 00:05:34.903
This is the beginning of a person name.

00:05:34.936 --> 00:05:37.847
This is the continuation of a person name.

00:05:37.847 --> 00:05:39.302
The beginning of a location.

00:05:39.302 --> 00:05:42.235
The continuation of a location, and so on.

00:05:42.235 --> 00:05:45.694
As well as a bunch of words that are not:

00:05:45.694 --> 00:05:48.444
[i.e.,] none of person, location, organization.

00:05:48.444 --> 00:05:50.768
And they’re all labeled “other”.

00:05:50.783 --> 00:05:54.938
And so the value Y tells us for each word what
category it belongs to,

00:05:54.938 --> 00:05:59.375
so that we’re trying to identify people, locations, and
organizations in the sentence.

00:05:59.375 --> 00:06:02.146
We have another set of variables X,

00:06:03.315 --> 00:06:08.687
which are the actual words in the sentence.

00:06:09.225 --> 00:06:12.806
Now we can go ahead and define…

00:06:12.806 --> 00:06:15.849
We can use a full table representation that

00:06:15.849 --> 00:06:22.729
basically tries to relate each and every Y that has a feature,

00:06:22.729 --> 00:06:27.678
that has a full factor that looks at every possible word in the English language;

00:06:27.678 --> 00:06:31.674
but those are going to be very, very, expensive,

00:06:31.674 --> 00:06:34.070
and a very large number of parameters.

00:06:34.117 --> 00:06:40.047
And so we're going to define a feature that looks, for example, at f of

00:06:40.047 --> 00:06:45.203
say a particular Y_i, which is the label for the i’th word in the sentence,

00:06:45.203 --> 00:06:47.652
and X_i, being that i’th word.

00:06:47.652 --> 00:06:54.839
And that feature says, for example: Y_i equals person.

00:06:55.839 --> 00:07:03.750
It’s the indicator function for “Y_i = person and X_i is capitalized”.

00:07:05.827 --> 00:07:09.694
And so that feature doesn’t look at the individual words.

00:07:09.694 --> 00:07:13.192
It just looks at whether that word is capitalized.

00:07:13.192 --> 00:07:17.430
Now we have just the single parameter that looks just at capitalization,

00:07:17.430 --> 00:07:23.140
and parameterizes how important is capitalization for recognizing that something's a person.

00:07:23.153 --> 00:07:26.949
We could also have another feature.

00:07:26.949 --> 00:07:28.862
This is an alternative:

00:07:28.862 --> 00:07:32.590
This a different feature that can and could be part of the same model

00:07:32.590 --> 00:07:38.381
that says: Y_i is equal to location,

00:07:38.381 --> 00:07:41.061
Or, actually, I was little bit imprecise here—

00:07:41.061 --> 00:07:45.464
This might be beginning of person. This might be beginning of location.

00:07:45.480 --> 00:07:51.414
And X_i appears in some atlas.

00:07:52.137 --> 00:07:55.370
Now there is other things that appear in the atlas than locations,

00:07:55.370 --> 00:07:58.648
but if a word appears in the atlas,

00:07:58.663 --> 00:08:01.972
there is a much higher probability presumably that it’s actually a location

00:08:02.003 --> 00:08:06.199
and so we might have, again, [a] weight for this feature

00:08:06.199 --> 00:08:13.611
that indicates that maybe increases the probability in Y_i being labeled in this way.

00:08:13.611 --> 00:08:19.438
And so you can imagine that constructing a very rich set of features,

00:08:19.438 --> 00:08:24.163
all of which look at certain aspects of the word,

00:08:24.163 --> 00:08:31.900
and rather than enumerating all the possible words
and giving a parameter to each and one of them.

00:08:33.562 --> 00:08:38.073
Let’s look at some other examples of feature-based models.

00:08:38.120 --> 00:08:40.913
So this is an example from statistical physics.

00:08:40.913 --> 00:08:43.445
It’s called the Ising model.

00:08:43.445 --> 00:08:48.778
And the Ising model is something that looks at pairs
of variables.

00:08:48.778 --> 00:08:51.773
It’s a pairwise Markov network.

00:08:53.127 --> 00:08:55.672
And [it] looks the pairs of adjacent variables,

00:08:55.672 --> 00:08:59.495
and basically gives us a coefficient for their products.

00:08:59.495 --> 00:09:03.509
So now, this is a case where variables are in the end are binary,

00:09:03.509 --> 00:09:07.836
but not in the space {0, 1} but rather
negative one and positive one.

00:09:07.836 --> 00:09:10.311
And so now, we have a model that's parametrized

00:09:10.311 --> 00:09:13.970
as features that are just the product of the values of the adjacent variables.

00:09:14.339 --> 00:09:15.606
Where might this come up?

00:09:15.606 --> 00:09:23.165
It comes up in the context, for example, of modeling the spins of electrons in a grid.

00:09:23.704 --> 00:09:28.742
So here you have a case where the electrons can rotate

00:09:28.742 --> 00:09:31.586
either along one direction or in the other direction

00:09:31.586 --> 00:09:40.542
so here is a bunch of the atoms that are marked with a blue arrow.

00:09:40.542 --> 00:09:43.059
You have one rotational axis,

00:09:43.059 --> 00:09:45.576
and the red arrow[s] are rotating in the opposite direction.

00:09:46.084 --> 00:09:52.545
And this basically says we have a term that

00:09:52.545 --> 00:09:57.063
[whose] probability distribution over the joint set of spins.

00:09:57.063 --> 00:10:01.519
(So this is the joint spins.)

00:10:01.950 --> 00:10:08.142
And the model, depends on whether adjacent
atoms have the same spin or opposite spin.

00:10:08.142 --> 00:10:12.445
So notice that one times one is the same as negative one times negative one.

00:10:12.445 --> 00:10:16.543
So this really just looks at whether they have the same spin
or different spins.

00:10:16.543 --> 00:10:22.888
And there is a parameter that looks at, you know, same or
different.

00:10:24.088 --> 00:10:26.709
That's what this feature represents.

00:10:27.170 --> 00:10:32.514
And, depending on the value of this parameter over here,

00:10:32.514 --> 00:10:35.010
if the parameter goes one way,

00:10:35.010 --> 00:10:39.599
we're going to favor systems

00:10:39.599 --> 00:10:42.509
where the atoms spin in the same direction.

00:10:42.509 --> 00:10:48.049
And if it’s going in the opposite direction, you’re going to favor atoms that spin in the different direction.

00:10:48.049 --> 00:10:51.268
And those are called ferromagnetic and anti-ferromagnetic.

00:10:52.514 --> 00:10:58.536
Furthermore, you can define in these systems the notion of a temperature.

00:10:58.813 --> 00:11:04.720
So the temperature here says how strong is this connection.

00:11:04.720 --> 00:11:13.126
So notice that as T grows—as the temperature grows—the w_ij’s get divided by T.

00:11:15.695 --> 00:11:19.269
And they all kind of go towards zero,

00:11:20.992 --> 00:11:24.812
which means that the strength of the connection between

00:11:24.812 --> 00:11:27.896
adjacent atoms, effectively becomes almost moot,

00:11:27.896 --> 00:11:30.980
and they become almost decoupled from each other.

00:11:30.980 --> 00:11:36.216
On the other hand, as the temperature decreases,

00:11:38.201 --> 00:11:43.461
Then the effect of the interaction between the atoms becomes much more significant

00:11:43.461 --> 00:11:46.915
and they’re going to impose much stronger constraints on each other.

00:11:46.915 --> 00:11:49.428
And this is actually a model of a real physical system.

00:11:49.428 --> 00:11:51.941
I mean, this is real temperature, and real atoms, and so on.

00:11:51.941 --> 00:11:57.075
And sure enough, if you look at what happens to these models as a function of temperature,

00:11:57.075 --> 00:12:00.640
what we see over here is high temperature.

00:12:02.280 --> 00:12:04.218
This is high temperature

00:12:04.218 --> 00:12:10.033
and you can see that there is a lot of mixing between the two types of spin

00:12:10.033 --> 00:12:11.925
and this is low temperature

00:12:12.587 --> 00:12:20.486
and you can see that there is much stronger
constraints in this configuration

00:12:20.486 --> 00:12:23.002
about the spins of adjacent atoms.

00:12:26.955 --> 00:12:33.844
Another kind of feature that's used very much in lots of practical applications

00:12:33.844 --> 00:12:37.682
is the notion of a metric, of a metric feature, an M.R.F.

00:12:37.682 --> 00:12:39.695
So what's a metric feature?

00:12:39.695 --> 00:12:41.800
This is something that comes up, mostly in cases

00:12:41.800 --> 00:12:48.432
where you have a bunch of random variables X_i that all take values in some joint label space of V.

00:12:48.432 --> 00:12:50.269
So, for example, they might all be binary.

00:12:50.269 --> 00:12:52.852
They all might take values one, two, three, four.

00:12:52.852 --> 00:12:57.250
And what we'd like to do is

00:12:57.250 --> 00:13:03.466
we have X_i and X_j that are connected to each other by an edge.

00:13:03.466 --> 00:13:08.213
We want X_ and X_j to take “similar” values.

00:13:08.213 --> 00:13:11.968
So in order to enforce the fact that X_i and X_j should take similar values

00:13:11.968 --> 00:13:13.939
we need a notion of similarity.

00:13:13.939 --> 00:13:24.110
And we're going to encode that using the distance function µ that takes two values, one for X_i and one for X_j’s,

00:13:24.110 --> 00:13:26.265
[that] says how close are they to each other.

00:13:26.265 --> 00:13:28.939
So what does the distance function need to be?

00:13:28.939 --> 00:13:35.552
Well, the distance function needs to satisfy the standard condition on a distance function or a metric.

00:13:35.552 --> 00:13:39.375
So first is reflexivity,

00:13:39.375 --> 00:13:43.459
which means that if the two variables take on the same value,

00:13:43.459 --> 00:13:45.675
then that distance better be zero.

00:13:48.598 --> 00:13:53.736
Oh I forgot to say that this. Sorry, this needs to be a non-negative function.

00:13:53.736 --> 00:14:00.733
Symmetry means that the distances are symetrical.

00:14:00.733 --> 00:14:06.276
So the distance between two values v1 and v2 are the same as the distance between v2 and v1.

00:14:06.276 --> 00:14:12.325
And finally is the triangle inequality, which says that the distance between v1 and v2

00:14:12.325 --> 00:14:13.505
(So here is v1)

00:14:13.505 --> 00:14:14.685
(Here is v2)

00:14:15.485 --> 00:14:21.091
and the distance between v1 and v2 is less than the distance between v1 and v3

00:14:21.891 --> 00:14:25.128
and then going to v2. So the standard triangle inequality.

00:14:26.513 --> 00:14:32.741
if a distance just satisfies these two conditions, it's called a semi metric.

00:14:33.848 --> 00:14:37.415
Otherwise, if it satisfies all three, it's called a metric.

00:14:37.846 --> 00:14:41.378
And both are actually used in practical applications.

00:14:43.932 --> 00:14:48.556
But how do we take this distance feature and put it in the context of an MRF?

00:14:48.556 --> 00:14:55.670
We have a feature that looks at two variables, X_i and X_j.

00:14:55.670 --> 00:14:59.476
And that feature is the distance between X_i and X_j.

00:14:59.476 --> 00:15:08.675
And now, we put it together by multiplying that with a coefficient, w_ij,

00:15:08.675 --> 00:15:13.015
such that w_ij has to be greater than zero.

00:15:13.015 --> 00:15:16.433
So that we want the metric MRF

00:15:17.279 --> 00:15:21.189
[to have] the effect that

00:15:21.189 --> 00:15:29.346
the lower the distance, the higher this is,

00:15:30.269 --> 00:15:36.438
because of the negative coefficient, which means that higher the probability. Okay?

00:15:37.007 --> 00:15:43.156
So, the more pairs you have that are close to each other

00:15:43.156 --> 00:15:47.280
and the closer they are to each other the higher
the probability of the overall configuration.

00:15:47.280 --> 00:15:49.652
Which is exactly what we wanted to have happen.

00:15:53.101 --> 00:15:58.418
So, conversely, if you have values that are far from
each other in the distance metric

00:15:58.418 --> 00:16:00.596
the lower the probability in the model.

00:16:01.796 --> 00:16:04.555
So, here are some examples of metric MRF’s.

00:16:04.832 --> 00:16:07.816
So one: The simplest possible metric MRF

00:16:07.816 --> 00:16:11.985
is one that gives [a] distance of zero when the two classes are equal to each other

00:16:11.985 --> 00:16:13.877
and [a] distance of one everywhere else.

00:16:13.877 --> 00:16:16.468
So now, you know, this is just like a step function.

00:16:16.468 --> 00:16:22.583
And, this gives rise to a potential that looks like this.

00:16:22.583 --> 00:16:26.579
So we have 0’s on the diagonal.

00:16:26.579 --> 00:16:33.621
So we get a bump in the probability when the two adjacent variables take on the same label

00:16:33.621 --> 00:16:37.122
and otherwise we get a reduction in the probability.

00:16:37.137 --> 00:16:40.692
But it doesn’t matter what particular value they take.

00:16:40.692 --> 00:16:44.090
That’s one example of a simple metric.

00:16:44.090 --> 00:16:51.995
A somewhat more expressive example might come up when the values V are actually numerical values.

00:16:52.026 --> 00:16:58.099
In which case you can look at maybe the difference between the miracle values.

00:16:58.099 --> 00:17:00.794
So, v_k minus v_l.

00:17:00.794 --> 00:17:05.103
And you want, and when v_k is equal to v_l, the distance is zero,

00:17:05.103 --> 00:17:14.297
and then you have a linear function that increases the
distance as the distance between v_k and v_l grows.

00:17:14.297 --> 00:17:18.733
So, this is the absolute value of v_k minus v_l.

00:17:20.353 --> 00:17:26.010
A more interesting notion that comes up a lot in
practice is:

00:17:26.010 --> 00:17:32.374
we don’t want to penalize arbitrarily things that are far way from each other in label space.

00:17:32.374 --> 00:17:37.437
So this is what is called a truncated linear penalty.

00:17:38.160 --> 00:17:41.852
And you can see that beyond a certain threshold,

00:17:44.067 --> 00:17:48.651
the penalty just becomes constant, so it plateaus.

00:17:48.667 --> 00:17:54.046
So that there is a penalty, but it doesn’t keep increasing over as the labels get further from each other

00:17:54.046 --> 00:17:59.768
One example where metric MRF’s are used is when we’re doing image segmentation.

00:17:59.768 --> 00:18:04.136
And here we tend to favor segmentations where adjacent superpixels…

00:18:06.551 --> 00:18:09.064
(These are adjacent superpixels.)

00:18:10.402 --> 00:18:12.700
And we want them to take the same class.

00:18:18.146 --> 00:18:22.279
And so here we have no penalty when the superpixels take the same class

00:18:22.279 --> 00:18:25.199
and we have some penalty when they take different classes.

00:18:25.199 --> 00:18:30.949
And this is actually a very common, albeit simple, model for
image segmentation.

00:18:32.165 --> 00:18:37.141
Let’s look at a different MRF, also in the context of
computer vision.

00:18:37.141 --> 00:18:40.440
This is an MRF that’s used for image denoising.

00:18:40.440 --> 00:18:45.192
So here we have a noisy version of a real image that looks like this.

00:18:45.208 --> 00:18:50.824
So this is, you can see this kind of, white noise overlayed on top of the image.

00:18:50.824 --> 00:18:53.923
And what we’d like to do, is we’d like to get a cleaned-up version of the image.

00:18:53.923 --> 00:19:00.799
So here we have, a set of variables, X, that correspond to the noisy pixels.

00:19:02.230 --> 00:19:07.736
And we have a set of variables, Y, that corresponds to the cleaned pixels.

00:19:07.736 --> 00:19:15.427
And we'd like to have a probabilistic model that relates X and Y.

00:19:15.427 --> 00:19:20.525
And what we’re going to do is we’d like, so, intuiti—, I mean,

00:19:20.525 --> 00:19:25.488
so you’d like to have two effects on the pixels Y:

00:19:25.488 --> 00:19:33.105
First, you'd like Y_i to be close to X_i.

00:19:33.105 --> 00:19:36.372
But if you just do that, then you're just going to stick with
the original image.

00:19:36.372 --> 00:19:41.348
So what is the main constraint that we can employ on the image in order to clean it up

00:19:41.348 --> 00:19:46.916
is the fact that adjacent pixels tend to have the same value.

00:19:46.916 --> 00:19:52.907
So in this case what we’re going to do is we’re going to model, we’re going to constrain the image

00:19:52.907 --> 00:20:00.212
so that we’re going to constrain the Y_i’s to try and make Y_i close to its neighbors.

00:20:03.012 --> 00:20:05.831
And the further away it is, the bigger the penalty.

00:20:05.831 --> 00:20:08.219
And that's a metric MRF.

00:20:11.526 --> 00:20:17.085
Now we could use just a linear penalty,

00:20:17.085 --> 00:20:21.892
but that’s going to be a very fragile model,

00:20:21.892 --> 00:20:28.208
because, now obviously the right answer isn't the model
where all pixels are equal to each other

00:20:28.208 --> 00:20:33.874
in their actual intensity value because that would be just a single, you know, grayish-looking image.

00:20:33.874 --> 00:20:39.690
So what you like is that you would like to let one pixel depart from its adjacent pixel

00:20:39.690 --> 00:20:44.098
if it’s getting close in a different direction either by its own observation or by other adjacent pixels.

00:20:44.098 --> 00:20:49.482
And so the right model to use here is actually the truncated linear model

00:20:49.482 --> 00:20:52.174
and that one is [the] one that’s commonly used

00:20:52.174 --> 00:20:54.970
and is very successful for doing image denoising.

00:20:55.432 --> 00:21:02.063
Interesting, almost exactly the same idea is used in the context of stereo reconstruction.

00:21:02.063 --> 00:21:06.264
There, the values that you’d like to infer, the Y_i’s,

00:21:06.264 --> 00:21:11.557
are the depth disparity for a given pixel in the image—how deep it is.

00:21:11.557 --> 00:21:13.957
And here also we have spacial continuity.

00:21:13.957 --> 00:21:19.388
We like the depth of one pixel to be close to the depth of an adjacent pixel.

00:21:19.388 --> 00:21:22.497
But once again we don’t want to enforce this too strongly

00:21:22.497 --> 00:21:25.159
because you do have depth disparity in the image

00:21:25.159 --> 00:21:27.820
and so eventually you'd like things to be allowed to break away from each other.

00:21:27.820 --> 00:21:33.320
And so once again, one typically uses some kind of truncated linear model

00:21:33.320 --> 00:21:36.239
for doing this stereo construction,

00:21:36.239 --> 00:21:38.091
often augmented by other little tricks.

00:21:38.091 --> 00:21:45.143
So, for example, here we have the actual pixel appearance,

00:21:45.143 --> 00:21:47.867
for example, the color and texture.

00:21:47.867 --> 00:21:50.392
And if the color and texture are very similar to each other,

00:21:50.407 --> 00:21:54.861
you might want to have the stronger constraint on similarity.

00:21:54.861 --> 00:22:00.257
Versus: if the color and texture of the adjacent pixels are
very different from each other,

00:22:00.257 --> 00:22:02.868
they may be more likely to belong to different objects

00:22:02.868 --> 00:22:07.580
and you don’t want to enforce quite as strong of a similarity constraint.