Return to Video

MAC padding (9 min)

  • 0:00 - 0:04
    In the last segment we talked about the
    CBC-MAC and the NMAC, but throughout
  • 0:04 - 0:08
    the segment we always assumed that the
    message length was a multiple of the block
  • 0:08 - 0:12
    size. In this segment, we're going to see
    what to do when the message length is not
  • 0:12 - 0:17
    a multiple of the block size. So recall
    that the encrypted CBC mac or ECBC-MAC for
  • 0:17 - 0:21
    short uses pseudorandom permutation F to
    actually compute the CBC function as we
  • 0:21 - 0:26
    discussed in the last segment. But in the
    last segment, we always assumed that the
  • 0:26 - 0:30
    message itself could be broken into an
    integer number of blocks for the block
  • 0:30 - 0:35
    cipher. And the question is what to do
    when the message length is not a multiple
  • 0:35 - 0:39
    of the block size. So here we have a
    message where the last block actually is
  • 0:39 - 0:43
    shorter than the full block and the
    question is how to compute the ECBC-MAC in
  • 0:43 - 0:48
    that case. So the solution of course is to
    pad the message and the first pad that
  • 0:48 - 0:52
    comes to mind is to simply pad the message
    with all zeros. In other words we take the
  • 0:52 - 0:57
    last block and just add zeros to it until
    the last block becomes as long as one full
  • 0:57 - 1:02
    block size. And so my question to you is
    whether the resulting MAC is secure. So
  • 1:02 - 1:07
    the answer is no, the MAC is not secure.
    And let me explain why, basically the
  • 1:07 - 1:12
    problem is that it's possible now to come
    up with messages so that message m and the
  • 1:12 - 1:18
    message m concatenated zero happen to have
    exactly the same Pad. And as a result once
  • 1:18 - 1:23
    we plug in both m and m||0 into ECBC we'll
    get the same tag out, which means that
  • 1:23 - 1:28
    both m and m||0 have the
    same tag and therefore the attacker can
  • 1:28 - 1:33
    mount an existential forgery. He would ask
    for the tag on the message m. And then he
  • 1:33 - 1:38
    would output as its forgery the tag and the
    message m||0. And it's
  • 1:38 - 1:43
    easy to say why that's the case. Basically, to be absolutely clear here, we have our
  • 1:43 - 1:48
    message m. Which after padding becomes m000. So we had to add three
  • 1:48 - 1:53
    0's to it. And here we have the message m
    zero, an m that ends with zero. And after
  • 1:53 - 1:58
    padding, we basically now have to add two
    0's to it. And lo and behold, they become
  • 1:58 - 2:03
    the same pad, so that they're gonna have
    exactly the same tag which allows the
  • 2:03 - 2:08
    adversary to mount the existential forgery
    attack. So this is not a good idea. In
  • 2:08 - 2:13
    fact appending all 0s is a terrible idea.
    And if you think about concrete case where
  • 2:13 - 2:17
    this comes up imagine the automatic
    clearing house system used for clearing
  • 2:17 - 2:22
    checks. I might have a check for a $100
    and the tag on that check. Well, now, the
  • 2:22 - 2:26
    attacker basically could append a zero to
    my check and make it a check for $1000,
  • 2:26 - 2:30
    and that wouldn't actually change the tag.
    So this ability to extend the message
  • 2:30 - 2:34
    without changing the tag actually could
    have pretty disastrous consequences. So I
  • 2:34 - 2:39
    hope this example convinces you that the
    padding function itself must be a one to
  • 2:39 - 2:43
    one function. In other words, it should be
    the case that two distinct messages always
  • 2:43 - 2:47
    map to two distinct padded messages. We
    shouldn't actually have a collision on the
  • 2:47 - 2:51
    padding function. Another way of saying it
    is that the padding function must be
  • 2:51 - 2:55
    invertible. That guarantees that the
    padding function is one to one. So a
  • 2:55 - 3:00
    standard way to do this was proposed by
    the International Standards Organization
  • 3:00 - 3:05
    ISO. What they suggested is basically,
    let's append the string 100000 to the end
  • 3:05 - 3:10
    of the message to make the message be a
    multiple of the block length. Now to see
  • 3:10 - 3:14
    that this padding is invertible, all we do
    is describe the inversion algorithm
  • 3:14 - 3:19
    which simply is gonna scan the message
    from right to left, until it hits the
  • 3:19 - 3:24
    first one and then it's gonna remove all
    the bits to the right of this one,
  • 3:24 - 3:28
    including the one. And you see that once
    we've removed the pattern this way, we
  • 3:28 - 3:32
    obtain the original message. So here's an
    example, so here we have a message where
  • 3:32 - 3:37
    the last block happens to be shorter than
    the block length, and then we
  • 3:37 - 3:41
    append the 1,0,0 string to it.
    It's very easy to see what the pad is,
  • 3:41 - 3:45
    simply look for the first one from the
    right, we can remove this pad and recover
  • 3:45 - 3:50
    the original message back. Now there's one
    corner case that's actually quite
  • 3:50 - 3:54
    important, and that is what do we do if
    the original message length is already the
  • 3:54 - 3:59
    multiple of a block size? In that case
    it's really very, very important that we
  • 3:59 - 4:04
    add an extra dummy block. That contains
    the pad 1000. And again, I can't tell you
  • 4:04 - 4:08
    how many products and standards have
    actually made this mistake where they
  • 4:08 - 4:13
    didn't add a dummy block and as a result,
    the MAC is insecure because there's an
  • 4:13 - 4:17
    easy existential forgery attack. And let
    me show you why. So suppose in case the
  • 4:17 - 4:22
    message is a multiple of a block length,
    suppose we didn't add a dummy block and we
  • 4:22 - 4:26
    literally MAC-ed this message here. Well,
    the result now is that if you look at
  • 4:26 - 4:31
    the message which is a multiple of the
    block size and a message which is not a
  • 4:31 - 4:36
    multiple of the block size but is padded
    to the block size, and imagine it so
  • 4:36 - 4:41
    happens that this message m prime one
    happens to end with 1-0-0. At this point,
  • 4:41 - 4:45
    you realize that here, the original
    message. Here, let me draw it this way.
  • 4:45 - 4:50
    You realize that the original message
    after padding. Would become identical to
  • 4:50 - 4:55
    the second message that was not padded at
    all. And as a result, if I ask for the tag
  • 4:55 - 5:00
    on this message over here, I would obtain
    also the tag on the second message that
  • 5:00 - 5:04
    happened to end in 1-0-0. Okay, so if we
    didn't add the dummy block, basically,
  • 5:04 - 5:09
    again, the pad would be not invertible,
    because two different messages, two
  • 5:09 - 5:13
    distinct messages, happen to map to the
    same padded result. Again, as a result,
  • 5:13 - 5:18
    the MAC becomes insecure. So to summarize,
    this ISO standard is a perfectly fine way
  • 5:18 - 5:23
    to pad, except you have to remember also
    to add a dummy block in case message is a
  • 5:23 - 5:27
    multiple of the block length to begin
    with. Now some of you might be wondering
  • 5:27 - 5:31
    if there is a padding scheme that never
    needs to add a dummy block, and the answer
  • 5:31 - 5:35
    is that if you look at a deterministic
    padding function, then it's pretty easy to
  • 5:35 - 5:39
    argue that there will always be cases
    where we need to pad, and the reason is
  • 5:39 - 5:44
    just literally the number of messages that
    are multiples of the block length is much
  • 5:44 - 5:49
    smaller than the total number of messages
    that need not be a multiple of the block
  • 5:49 - 5:53
    length. And as a result we can't have a
    one to one function from this bigger
  • 5:53 - 5:57
    set of all messages to the smaller set of
    messages which are a multiple of
  • 5:57 - 6:01
    the block length. There will always be cases
    where we have to extend the original
  • 6:01 - 6:05
    message and in this case that would
    correspond to adding this dummy padding
  • 6:05 - 6:09
    block. However, there is a very clever
    idea called CMAC which shows that using a
  • 6:09 - 6:14
    randomized padding function we can avoid
    having to ever add a dummy block. And so
  • 6:14 - 6:18
    let me explain how CMAC works. So CMAC
    actually uses three keys. And, in fact,
  • 6:18 - 6:23
    sometimes this is called a three key
    construction. So this first key, K, is
  • 6:23 - 6:28
    used in the CBC, the standard CBC MAC
    algorithm. And then the keys, K1 and K2,
  • 6:28 - 6:33
    are used just for the padding scheme at
    the very, very last block. And in fact in
  • 6:33 - 6:38
    the CMAC standard, the keys K1, K2 are
    derived from the key K by some sort of a
  • 6:38 - 6:44
    pseudo random generator. So the way CMAC
    works is as follows. Well, if the message
  • 6:44 - 6:49
    happens to not be a multiple of a block
    length, then we append the ISO padding to
  • 6:49 - 6:54
    it. But then we also XOR this last
    block with a secret key, K1, that the
  • 6:54 - 6:59
    adversary doesn't know. However, if the
    message is a multiple of the block length,
  • 6:59 - 7:03
    then of course, we don't append anything
    to it. But we XOR it with a different
  • 7:03 - 7:07
    key, K2, that, again, the adversary
    doesn't actually know. So it turns out,
  • 7:07 - 7:11
    just by doing that, it's now impossible to
    apply the extension attacks that we could
  • 7:11 - 7:15
    do on the cascade function, and on
    raw CBC. Because the poor
  • 7:15 - 7:19
    adversary actually doesn't know what is
    the last block that went into the
  • 7:19 - 7:23
    function. He doesn't know K1, and therefore,
    he doesn't know the value at this
  • 7:23 - 7:27
    particular point and as a result, he can't do
    an extension attack. In fact, this is a
  • 7:27 - 7:32
    provable statement. And so that this
    construction here is simply by XOR-ing
  • 7:32 - 7:36
    K1 or XOR-ing K2 is really a PRF.
    Despite not having to do a final
  • 7:36 - 7:40
    encryption step after the raw CBC
    function is computed. So, that's one
  • 7:40 - 7:45
    benefit, that there's no final encryption
    step. And the second benefit is that we resolve
  • 7:45 - 7:49
    this ambiguity between whether padding did
    happen or padding didn't happen by using
  • 7:49 - 7:54
    two different keys to distinguish the case
    that the message is a multiple of the block
  • 7:54 - 7:59
    length versus the case where it's not but
    we have a pad appended to the message. So
  • 7:59 - 8:03
    the two distinct keys resolve this
    ambiguity between the two cases, and as a
  • 8:03 - 8:07
    result, this padding actually is
    sufficiently secure. And as I said,
  • 8:07 - 8:12
    there's actually a nice security theorem
    that goes with CMAC that shows that the
  • 8:12 - 8:16
    CMAC construction really is a pseudo-random
    function, with the same security
  • 8:16 - 8:20
    properties as CBC-MAC. So I wanted to
    mention that CMAC is a federal standard
  • 8:20 - 8:25
    standardized by NIST and if you now, these
    days, wanted to use a CBC-MAC for
  • 8:25 - 8:29
    anything, you would actually be using CMAC
    as the standard way to do it. But
  • 8:29 - 8:34
    particularly in CMAC, the underlying block
    cypher is AES and that gives us a secure
  • 8:34 - 8:38
    CBC-MAC derived from AES. So that's the end
    on the segment and in the next segment
  • 8:38 - 8:40
    we'll talk about a parallel MAC.
Title:
MAC padding (9 min)
Video Language:
English
mjomble edited English subtitles for MAC padding (9 min)
mjomble edited English subtitles for MAC padding (9 min)
mjomble edited English subtitles for MAC padding (9 min)
stanford-bot edited English subtitles for MAC padding (9 min)
stanford-bot edited English subtitles for MAC padding (9 min)
stanford-bot added a translation

English subtitles

Revisions