MAC padding (9 min)

0:00 - 0:04

In the last segment we talked about the
CBC-MAC and the NMAC, but throughout
0:04 - 0:08

the segment we always assumed that the
message length was a multiple of the block
0:08 - 0:12

size. In this segment, we're going to see
what to do when the message length is not
0:12 - 0:17

a multiple of the block size. So recall
that the encrypted CBC mac or ECBC-MAC for
0:17 - 0:21

short uses pseudorandom permutation F to
actually compute the CBC function as we
0:21 - 0:26

discussed in the last segment. But in the
last segment, we always assumed that the
0:26 - 0:30

message itself could be broken into an
integer number of blocks for the block
0:30 - 0:35

cipher. And the question is what to do
when the message length is not a multiple
0:35 - 0:39

of the block size. So here we have a
message where the last block actually is
0:39 - 0:43

shorter than the full block and the
question is how to compute the ECBC-MAC in
0:43 - 0:48

that case. So the solution of course is to
pad the message and the first pad that
0:48 - 0:52

comes to mind is to simply pad the message
with all zeros. In other words we take the
0:52 - 0:57

last block and just add zeros to it until
the last block becomes as long as one full
0:57 - 1:02

block size. And so my question to you is
whether the resulting MAC is secure. So
1:02 - 1:07

the answer is no, the MAC is not secure.
And let me explain why, basically the
1:07 - 1:12

problem is that it's possible now to come
up with messages so that message m and the
1:12 - 1:18

message m concatenated zero happen to have
exactly the same Pad. And as a result once
1:18 - 1:23

we plug in both m and m||0 into ECBC we'll
get the same tag out, which means that
1:23 - 1:28

both m and m||0 have the
same tag and therefore the attacker can
1:28 - 1:33

mount an existential forgery. He would ask
for the tag on the message m. And then he
1:33 - 1:38

would output as its forgery the tag and the
message m||0. And it's
1:38 - 1:43

easy to say why that's the case. Basically, to be absolutely clear here, we have our
1:43 - 1:48

message m. Which after padding becomes m000. So we had to add three
1:48 - 1:53

0's to it. And here we have the message m
zero, an m that ends with zero. And after
1:53 - 1:58

padding, we basically now have to add two
0's to it. And lo and behold, they become
1:58 - 2:03

the same pad, so that they're gonna have
exactly the same tag which allows the
2:03 - 2:08

adversary to mount the existential forgery
attack. So this is not a good idea. In
2:08 - 2:13

fact appending all 0s is a terrible idea.
And if you think about concrete case where
2:13 - 2:17

this comes up imagine the automatic
clearing house system used for clearing
2:17 - 2:22

checks. I might have a check for a $100
and the tag on that check. Well, now, the
2:22 - 2:26

attacker basically could append a zero to
my check and make it a check for $1000,
2:26 - 2:30

and that wouldn't actually change the tag.
So this ability to extend the message
2:30 - 2:34

without changing the tag actually could
have pretty disastrous consequences. So I
2:34 - 2:39

hope this example convinces you that the
padding function itself must be a one to
2:39 - 2:43

one function. In other words, it should be
the case that two distinct messages always
2:43 - 2:47

map to two distinct padded messages. We
shouldn't actually have a collision on the
2:47 - 2:51

padding function. Another way of saying it
is that the padding function must be
2:51 - 2:55

invertible. That guarantees that the
padding function is one to one. So a
2:55 - 3:00

standard way to do this was proposed by
the International Standards Organization
3:00 - 3:05

ISO. What they suggested is basically,
let's append the string 100000 to the end
3:05 - 3:10

of the message to make the message be a
multiple of the block length. Now to see
3:10 - 3:14

that this padding is invertible, all we do
is describe the inversion algorithm
3:14 - 3:19

which simply is gonna scan the message
from right to left, until it hits the
3:19 - 3:24

first one and then it's gonna remove all
the bits to the right of this one,
3:24 - 3:28

including the one. And you see that once
we've removed the pattern this way, we
3:28 - 3:32

obtain the original message. So here's an
example, so here we have a message where
3:32 - 3:37

the last block happens to be shorter than
the block length, and then we
3:37 - 3:41

append the 1,0,0 string to it.
It's very easy to see what the pad is,
3:41 - 3:45

simply look for the first one from the
right, we can remove this pad and recover
3:45 - 3:50

the original message back. Now there's one
corner case that's actually quite
3:50 - 3:54

important, and that is what do we do if
the original message length is already the
3:54 - 3:59

multiple of a block size? In that case
it's really very, very important that we
3:59 - 4:04

add an extra dummy block. That contains
the pad 1000. And again, I can't tell you
4:04 - 4:08

how many products and standards have
actually made this mistake where they
4:08 - 4:13

didn't add a dummy block and as a result,
the MAC is insecure because there's an
4:13 - 4:17

easy existential forgery attack. And let
me show you why. So suppose in case the
4:17 - 4:22

message is a multiple of a block length,
suppose we didn't add a dummy block and we
4:22 - 4:26

literally MAC-ed this message here. Well,
the result now is that if you look at
4:26 - 4:31

the message which is a multiple of the
block size and a message which is not a
4:31 - 4:36

multiple of the block size but is padded
to the block size, and imagine it so
4:36 - 4:41

happens that this message m prime one
happens to end with 1-0-0. At this point,
4:41 - 4:45

you realize that here, the original
message. Here, let me draw it this way.
4:45 - 4:50

You realize that the original message
after padding. Would become identical to
4:50 - 4:55

the second message that was not padded at
all. And as a result, if I ask for the tag
4:55 - 5:00

on this message over here, I would obtain
also the tag on the second message that
5:00 - 5:04

happened to end in 1-0-0. Okay, so if we
didn't add the dummy block, basically,
5:04 - 5:09

again, the pad would be not invertible,
because two different messages, two
5:09 - 5:13

distinct messages, happen to map to the
same padded result. Again, as a result,
5:13 - 5:18

the MAC becomes insecure. So to summarize,
this ISO standard is a perfectly fine way
5:18 - 5:23

to pad, except you have to remember also
to add a dummy block in case message is a
5:23 - 5:27

multiple of the block length to begin
with. Now some of you might be wondering
5:27 - 5:31

if there is a padding scheme that never
needs to add a dummy block, and the answer
5:31 - 5:35

is that if you look at a deterministic
padding function, then it's pretty easy to
5:35 - 5:39

argue that there will always be cases
where we need to pad, and the reason is
5:39 - 5:44

just literally the number of messages that
are multiples of the block length is much
5:44 - 5:49

smaller than the total number of messages
that need not be a multiple of the block
5:49 - 5:53

length. And as a result we can't have a
one to one function from this bigger
5:53 - 5:57

set of all messages to the smaller set of
messages which are a multiple of
5:57 - 6:01

the block length. There will always be cases
where we have to extend the original
6:01 - 6:05

message and in this case that would
correspond to adding this dummy padding
6:05 - 6:09

block. However, there is a very clever
idea called CMAC which shows that using a
6:09 - 6:14

randomized padding function we can avoid
having to ever add a dummy block. And so
6:14 - 6:18

let me explain how CMAC works. So CMAC
actually uses three keys. And, in fact,
6:18 - 6:23

sometimes this is called a three key
construction. So this first key, K, is
6:23 - 6:28

used in the CBC, the standard CBC MAC
algorithm. And then the keys, K1 and K2,
6:28 - 6:33

are used just for the padding scheme at
the very, very last block. And in fact in
6:33 - 6:38

the CMAC standard, the keys K1, K2 are
derived from the key K by some sort of a
6:38 - 6:44

pseudo random generator. So the way CMAC
works is as follows. Well, if the message
6:44 - 6:49

happens to not be a multiple of a block
length, then we append the ISO padding to
6:49 - 6:54

it. But then we also XOR this last
block with a secret key, K1, that the
6:54 - 6:59

adversary doesn't know. However, if the
message is a multiple of the block length,
6:59 - 7:03

then of course, we don't append anything
to it. But we XOR it with a different
7:03 - 7:07

key, K2, that, again, the adversary
doesn't actually know. So it turns out,
7:07 - 7:11

just by doing that, it's now impossible to
apply the extension attacks that we could
7:11 - 7:15

do on the cascade function, and on
raw CBC. Because the poor
7:15 - 7:19

adversary actually doesn't know what is
the last block that went into the
7:19 - 7:23

function. He doesn't know K1, and therefore,
he doesn't know the value at this
7:23 - 7:27

particular point and as a result, he can't do
an extension attack. In fact, this is a
7:27 - 7:32

provable statement. And so that this
construction here is simply by XOR-ing
7:32 - 7:36

K1 or XOR-ing K2 is really a PRF.
Despite not having to do a final
7:36 - 7:40

encryption step after the raw CBC
function is computed. So, that's one
7:40 - 7:45

benefit, that there's no final encryption
step. And the second benefit is that we resolve
7:45 - 7:49

this ambiguity between whether padding did
happen or padding didn't happen by using
7:49 - 7:54

two different keys to distinguish the case
that the message is a multiple of the block
7:54 - 7:59

length versus the case where it's not but
we have a pad appended to the message. So
7:59 - 8:03

the two distinct keys resolve this
ambiguity between the two cases, and as a
8:03 - 8:07

result, this padding actually is
sufficiently secure. And as I said,
8:07 - 8:12

there's actually a nice security theorem
that goes with CMAC that shows that the
8:12 - 8:16

CMAC construction really is a pseudo-random
function, with the same security
8:16 - 8:20

properties as CBC-MAC. So I wanted to
mention that CMAC is a federal standard
8:20 - 8:25

standardized by NIST and if you now, these
days, wanted to use a CBC-MAC for
8:25 - 8:29

anything, you would actually be using CMAC
as the standard way to do it. But
8:29 - 8:34

particularly in CMAC, the underlying block
cypher is AES and that gives us a secure
8:34 - 8:38

CBC-MAC derived from AES. So that's the end
on the segment and in the next segment
8:38 - 8:40

we'll talk about a parallel MAC.

Title:: MAC padding (9 min)
Video Language:: English

	mjomble edited English subtitles for MAC padding (9 min)
	mjomble edited English subtitles for MAC padding (9 min)
	mjomble edited English subtitles for MAC padding (9 min)
	stanford-bot edited English subtitles for MAC padding (9 min)
	stanford-bot edited English subtitles for MAC padding (9 min)
	stanford-bot added a translation

English subtitles

Revisions

Revision 2

mjomble

MAC padding (9 min)

Revisions

Our website uses cookies

Operating cookies (Required)