0:00:00.000,0:00:03.084 So, in this video, we'll graduate beyond[br]the domain of just vanilla binary search 0:00:03.084,0:00:07.035 trees, like we've been talking about[br]before, and we'll start talking about 0:00:07.035,0:00:10.067 balanced binary search trees.[br]These are the search trees you'd really 0:00:10.067,0:00:13.852 want to use when you want to have real[br]time guarantees on your operation time. 0:00:13.852,0:00:17.627 Cuz they're search trees which are[br]guaranteed to stay balanced, which means 0:00:17.627,0:00:21.015 the height is guaranteed to stay[br]logarithmic, which means all of the 0:00:21.015,0:00:25.005 operations search trees support that we[br]know and love, will also be a logarithmic 0:00:25.005,0:00:27.012 in the number of keys that they're[br]storing. 0:00:27.012,0:00:30.056 So, let's just quickly recap.[br]What is the basic structure tree property? 0:00:30.056,0:00:34.075 It should be the case that at every single[br]node of your search tree, if you go to the 0:00:34.075,0:00:38.060 left, you'll only see keys that are[br]smaller than where you started and if you 0:00:38.060,0:00:42.025 go to the right you only see keys that[br]are bigger than where you started. 0:00:42.025,0:00:45.614 And a really important observation, which[br]is that, given a set of keys, there's 0:00:45.614,0:00:48.875 going to be lot and lots of legitimate,[br]valid, binary search trees with those 0:00:48.875,0:00:50.986 keys.[br]So, we've been having these running 0:00:50.986,0:00:53.532 examples where the keys one, two, three,[br]four, five. 0:00:53.532,0:00:57.589 On the one hand, you can have a nice and[br]balanced search tree that has height only 0:00:57.589,0:01:01.618 two, with the keys one through five.[br]On the other hand, you can also have these 0:01:01.618,0:01:05.957 crazy chains, basically devolved to link[br]lists where the heights for, and elements 0:01:05.957,0:01:08.936 could be as high as N - 1.[br]So, in general, you could have an 0:01:08.936,0:01:13.479 exponential difference in the height.[br]It can be as small, in the best case, as 0:01:13.479,0:01:16.417 logarithmic and as big, in the worst case,[br]as linear. 0:01:16.417,0:01:20.148 So, this obviously motivates search trees[br]that have the additional property that you 0:01:20.148,0:01:23.621 never have to worry about their height.[br]You know they're going to be well 0:01:23.621,0:01:25.589 balanced.[br]You know they're going to have height 0:01:25.589,0:01:28.236 logarithmic.[br]You're never worried about them having 0:01:28.236,0:01:32.309 this really lousy linear height.[br]Remember, why it's so important to have a 0:01:32.309,0:01:35.481 small height?[br]It's because the running time of all of 0:01:35.481,0:01:38.773 the operations of search trees depends on[br]the height. 0:01:38.773,0:01:43.455 You want to do search, you want to[br]insertions, you want to find predecessors 0:01:43.455,0:01:48.334 or whatever, the height is going to be[br]what governs the running time of all those 0:01:48.334,0:01:50.013 properties.[br]So, the high level idea behind balanced search 0:01:50.013,0:01:54.776 trees is really exactly what you think,[br]which is that, you know, because the 0:01:54.776,0:01:58.463 height can't be any better than[br]logarithmic in the number of things you're 0:01:58.463,0:02:02.186 storing, that's because the trees are[br]binary so the number of nodes can only 0:02:02.186,0:02:05.151 double each level so you need a[br]logarithmic number of levels to 0:02:05.151,0:02:07.040 accommodate everything that you are[br]storing. 0:02:07.040,0:02:11.007 But it's got to be logarithmic, lets make[br]sure it stays logarithmic all the time, 0:02:11.007,0:02:16.399 even as we do insertions and deletions.[br]If we can do that, then we get a very rich 0:02:16.399,0:02:21.012 collection of supported operations all[br]running in logarithmic time. 0:02:21.056,0:02:25.048 As usual, n denotes, the number of[br]keys being stored in the tree. 0:02:25.048,0:02:29.006 There are many, many, many different[br]balanced search trees. 0:02:29.006,0:02:33.022 They're not super, most of them are not[br]super different from each other. 0:02:33.022,0:02:37.092 I'm going to talk about one of the more[br]popular ones which are called Red Black 0:02:37.092,0:02:40.062 Trees.[br]So, these were invented back in the '70s. 0:02:40.096,0:02:44.674 These were not the first balanced binary[br]search tree data structures, that honor 0:02:44.674,0:02:49.000 belongs to AVL trees, which again are not[br]very different from red black trees, 0:02:49.000,0:02:51.617 though the invariants are slightly[br]different. 0:02:51.617,0:02:55.997 Another thing you might want to look up[br]and read about is a very cool data 0:02:55.997,0:02:58.454 structure called splay trees, due to[br]Sleator and Tarjan, 0:02:58.454,0:03:01.797 These, unlike red black trees and AVL[br]trees, which only are modified on 0:03:01.797,0:03:05.725 insertions and deletions, which, if you[br]think about it, is sort of what you'd 0:03:05.725,0:03:08.583 expect.[br]Splay trees modify themselves, even when 0:03:08.583,0:03:11.788 you're doing look ups, even when you're[br]doing searches. 0:03:11.788,0:03:15.738 So, they're sometimes called[br]self-adjusting trees for that reason. 0:03:15.738,0:03:20.653 And it's super simple, but they still have[br]kind of amazing guarantees. 0:03:20.653,0:03:25.213 And then finally, going beyond the, just[br]the binary tree paradigm many of you might 0:03:25.213,0:03:28.281 want to look up examples of B trees or[br]also B+ trees. 0:03:28.281,0:03:31.443 These are very relevant for implementing[br]databases. 0:03:31.443,0:03:35.499 Here what the idea is, in a given node[br]you're going to have not just one key but 0:03:35.499,0:03:39.567 many keys and from a node, you have[br]multiple branches that you can take 0:03:39.567,0:03:44.306 depending where you're searching for falls[br]with respect to the multiple keys that are 0:03:44.306,0:03:47.138 at that node.[br]The motivation in a database context for 0:03:47.138,0:03:50.668 going beyond the binary paradigm, is to[br]have a better match up with the memory 0:03:50.668,0:03:53.295 hierarchy.[br]So, that's also very important, although a 0:03:53.295,0:03:57.316 little bit out of the scope here.[br]That said, what we discuss about red-black 0:03:57.316,0:04:01.133 trees, much of the intuition will[br]translate to all of these other balance 0:04:01.133,0:04:05.641 tree data structures, if you ever find[br]yourself in a position where you need to 0:04:05.641,0:04:09.169 learn more about them.[br]So, red black trees are just the same as 0:04:09.169,0:04:13.414 binary search trees, except they also[br]always maintain a number of additional 0:04:13.414,0:04:16.356 invariants.[br]And so, what I'm going to focus on in this 0:04:16.356,0:04:19.775 video is, first of all, what the[br]invariants are, and then how the 0:04:19.775,0:04:22.849 invariants guarantee that the height will[br]be logarithmic. 0:04:22.849,0:04:27.080 Time permitting, at some point, there will[br]be optional videos more about the guts, 0:04:27.080,0:04:31.612 more about the implementations of red[br]black trees namely how do you maintain 0:04:31.612,0:04:34.098 these invariants under insertions and[br]deletions. 0:04:34.098,0:04:37.676 That's quite a bit more complicated, so[br]that's appropriate for, for optional 0:04:37.676,0:04:40.530 material.[br]But understanding what the invariants are 0:04:40.530,0:04:44.794 and what role they play in controlling the[br]height is very accessible, and it's 0:04:44.794,0:04:47.291 something I think every programmer should[br]know. 0:04:47.291,0:04:51.152 So, there, I'm going to write down four[br]invariants and really, the bite comes from 0:04:51.152,0:04:54.420 the second two, okay, from the third and[br]the fourth invariant. 0:04:54.420,0:04:57.986 The first two invariants you know, are[br]just really cosmetic. 0:04:57.986,0:05:01.980 So, the first one we're going to store one[br]bit of information additionally at each 0:05:01.980,0:05:06.783 node, beyond just the key and we're going[br]call this bit as indicating whether it's a 0:05:06.783,0:05:09.827 red or a black node.[br]You might be wondering, you know, why red 0:05:09.827,0:05:12.464 black?[br]Well, I asked my colleague, Leo Guibas 0:05:12.464,0:05:16.076 about that a few years ago.[br]And he told me that when he and Professor 0:05:16.076,0:05:20.990 Sedgewick were writing up this article the[br]journals were, just had access to a 0:05:20.990,0:05:24.793 certain kind of new printing technology[br]that allowed very limited color in the 0:05:24.793,0:05:28.097 printed copies of the journals.[br]And so, they were eager to use it, and so 0:05:28.097,0:05:32.085 they named the data structure red black,[br]so they could have these nice red and 0:05:32.085,0:05:36.028 black pictures in the journal article.[br]Unfortunately, there was then some snafu, 0:05:36.028,0:05:40.172 and at the end of the day, that technology[br]wasn't actually available, so it wasn't 0:05:40.172,0:05:44.038 actually printed the way they were[br]envisioning it but the name has stuck. 0:05:44.038,0:05:47.897 So, that's the rather idiosyncratic reason[br]why these data structures got the name 0:05:48.055,0:05:51.942 that they did, red black trees.[br]So, secondly we're going to maintain the 0:05:51.942,0:05:56.060 invariant that the roots of the search[br]tree is always black, it can never be red. 0:05:56.060,0:05:58.092 Okay.[br]So, with the superficial pair of 0:05:58.092,0:06:02.045 invariants out of the way, let's go to the[br]two main ones. 0:06:02.045,0:06:06.023 So, first of all, we're never going to[br]allow two reds in a row. 0:06:06.023,0:06:12.820 By which, I mean, if you have a red node[br]in the search tree, then its children must 0:06:12.820,0:06:16.011 be black.[br]If you think about for a second, you 0:06:16.011,0:06:20.019 realize this also implies that if a notice[br]red, and it has a parent, then that 0:06:20.019,0:06:24.076 parent has to be a black node.[br]So, in that sense, there are no two red 0:06:24.076,0:06:30.079 nodes in a row anywhere in the tree.[br]And the final invariant which is also 0:06:30.079,0:06:37.046 rather severe is that every path you might[br]take from a root to a null pointer, passes 0:06:37.046,0:06:41.019 through exactly the same number of black[br]nodes. 0:06:41.019,0:06:45.956 So, to be clear on what I mean by a root[br]null path, what you should think about is an 0:06:45.956,0:06:49.371 unsuccessful search, right?[br]So, what happens in an unsuccessful 0:06:49.371,0:06:53.225 search, you start at the root depending on[br]whether you need to go smaller or bigger, 0:06:53.225,0:06:57.154 you go left or right respectably.[br]You keep going left right as appropriate 0:06:57.154,0:07:01.053 until eventually you hit a null pointer.[br]So, I want you to think about the process 0:07:01.053,0:07:04.938 that which you start at the root and then,[br]eventually, fall off the end of the tree. 0:07:04.938,0:07:07.098 In doing so, you traverse some number of[br]nodes. 0:07:07.098,0:07:11.030 Some of those nodes will be black some of[br]those nodes will be red. 0:07:11.030,0:07:15.364 And I want you to keep track of the number[br]of black nodes and the constraints that a 0:07:15.364,0:07:19.529 red black tree, by definition, must[br]satisfy, is that no matter what path you 0:07:19.529,0:07:24.096 take through the tree starting from the[br]root terminating at a null pointer, the 0:07:24.096,0:07:27.278 number of black nodes traversed, has to be[br]exactly the same. 0:07:27.278,0:07:31.788 It cannot depend on the path, it has to be[br]exactly the same on every single root null path. 0:07:31.788,0:07:34.559 [br]Let's move on to some examples. 0:07:34.559,0:07:38.408 So, here's a claim.[br]And this is meant to, kind of, whet your 0:07:38.408,0:07:42.667 appetite for the idea that red black trees[br]must be pretty balanced. 0:07:42.667,0:07:45.745 They have to have height, basically[br]logarithmic. 0:07:45.745,0:07:49.019 So, remember, what's the most unbalanced[br]search tree? 0:07:49.019,0:07:54.123 Well, that's these chains.[br]So, the claim is, even a chain with three 0:07:54.123,0:07:58.393 nodes can not be a red black tree.[br]So, what's the proof? 0:07:58.393,0:08:04.965 Well, consider such a search tree.[br]So, maybe, with the key values one, two and three. 0:08:04.965,0:08:08.465 So, the question that we're asking is, is 0:08:08.465,0:08:13.382 there a way to color the node, these three[br]nodes, red and black so that all four of 0:08:13.382,0:08:17.060 the invariants are satisfied.[br]So, we need to color each red or black. 0:08:17.060,0:08:20.379 Remember, variant two says, the root, the[br]one has to be black. 0:08:20.379,0:08:24.412 So, we have four possibilities for how to[br]use the color two and three. 0:08:24.412,0:08:27.446 But really, because of the third[br]invariant, we only have three possibilities. 0:08:27.446,0:08:30.246 We can't color two and three both red, cuz 0:08:30.246,0:08:34.018 then we'd have two reds in a row.[br]So, we can either make two red, three 0:08:34.018,0:08:37.007 black, two black, three red, or both two[br]and three black. 0:08:37.007,0:08:41.069 And all of the cases are the same.[br]Just to give one example, suppose that we 0:08:41.069,0:08:44.083 colored the node two, red, and one and[br]three are black. 0:08:44.083,0:08:48.491 The claim is invariant four has been[br]broken and invariant four is going to be 0:08:48.491,0:08:53.001 broken no matter how we try to color two[br]and three red and black. 0:08:53.001,0:08:56.619 What is invariant four says?[br]It says, really on any unsuccessful 0:08:56.619,0:09:00.006 search, you pass through the same number[br]of black nodes. 0:09:00.006,0:09:03.062 And so, one unsuccessful search would be,[br]you search for zero. 0:09:03.062,0:09:07.083 And if you search for a zero, you go to[br]the root, you immediately go left to hit a 0:09:07.083,0:09:10.027 null pointer.[br]So, you see exactly one black node. 0:09:10.027,0:09:12.063 Namely one.[br]On the other hand, suppose you searched 0:09:12.063,0:09:15.285 for four, then you'd start at the root,[br]and you'd go right, and you go to two, 0:09:15.285,0:09:19.396 you'd go right, and you go to three, you'd[br]go right again, and only then will you get 0:09:19.396,0:09:22.018 a null pointer.[br]And on that, unsuccessful search, you'd 0:09:22.018,0:09:24.060 encounter two black nodes, both the one[br]and the three. 0:09:24.060,0:09:28.008 So, it's a violation of the fourth[br]invariant, therefore, this would not be a 0:09:28.008,0:09:30.063 red black tree.[br]I'll leave that for you to check, that no 0:09:30.063,0:09:34.043 matter how you try to code two and three[br]red or black, you're going to break one of 0:09:34.043,0:09:37.003 the invariants.[br]If they're both red, you'd break the third 0:09:37.003,0:09:39.023 invariant.[br]If at most one is red, you'd break the 0:09:39.023,0:09:42.040 fourth invariant.[br]So, that's a non-example of a red-black tree. 0:09:42.040,0:09:44.410 So, let's look at an example of a red-black tree. 0:09:44.410,0:09:48.014 One, a search tree where you can actually 0:09:48.014,0:09:52.016 color the nodes red or black so that all[br]four invariants are maintained. 0:09:52.016,0:09:56.057 So, one search tree which is very easy to[br]make red black is a perfectly balanced one. 0:09:56.057,0:09:59.008 So, for example, let's consider this three 0:09:59.008,0:10:03.049 nodes search tree has the keys three,[br]five, and seven and let's suppose the five 0:10:03.049,0:10:06.022 is the root.[br]So, it has one child on each side, the 0:10:06.022,0:10:09.040 three and the seven.[br]So, can this be made a red black tree? 0:10:09.040,0:10:11.097 So, remember what that question really[br]means. 0:10:11.097,0:10:16.049 It's asking can we color theses three[br]nodes some combination of red and black so 0:10:16.049,0:10:19.005 that all four of the invariants are[br]satisfied? 0:10:19.075,0:10:24.010 If you think about it a little bit, you[br]realize, yeah, you can definitely color 0:10:24.010,0:10:27.061 these nodes red or black to make and[br]satisfy for the invariants. 0:10:27.061,0:10:31.000 In particular, suppose we color all three[br]of the nodes, black. 0:10:31.036,0:10:34.048 We've satisfied variant number one, we've[br]colored all the nodes. 0:10:34.048,0:10:37.095 We've satisfied variant number two, and[br]particularly, the root is black. 0:10:37.095,0:10:41.076 We've satisfied invariant number three.[br]There's no reds at all, so there's 0:10:41.076,0:10:45.019 certainly no two reds in a row.[br]And, if you think about it, we've 0:10:45.019,0:10:48.073 satisfied invariant four because this tree[br]is perfectly balanced. 0:10:48.073,0:10:52.098 No matter what you unsuccessfully search[br]for, you're going to encounter two black 0:10:52.098,0:10:55.043 nodes.[br]If you search for, say, one, you're going 0:10:55.043,0:10:59.002 to encounter three and five.[br]If you search for, say, six, you're going 0:10:59.002,0:11:01.909 to encounter five and seven.[br]So, all root null paths have exactly 0:11:01.909,0:11:05.077 two black nodes and variant number four is[br]also satisfied. 0:11:05.077,0:11:08.046 So, that's great.[br]But, of course, the whole point of having 0:11:08.046,0:11:11.033 a binary search tree data structure is you[br]want to be dynamic. 0:11:11.033,0:11:13.068 You want to accommodate insertions and[br]deletions. 0:11:13.068,0:11:17.047 Every time you have an insertion or a[br]deletion into a red black tree, you get a 0:11:17.047,0:11:19.290 new node.[br]Let's say, an insertion, you get a new 0:11:19.290,0:11:23.279 node, you have to color it something.[br]And now, all of a sudden, you got to worry 0:11:23.279,0:11:25.485 about breaking one of these four[br]invariants. 0:11:25.485,0:11:28.952 So, let me just show you some easy cases[br]where you can accommodate insertions 0:11:28.952,0:11:32.056 without too much work.[br]Time permitting we will include some 0:11:32.056,0:11:35.998 optional videos with the notion of[br]rotations which do more fundamental 0:11:35.998,0:11:40.540 restructuring of search trees so that they[br]can maintain the four invariants, and stay 0:11:40.699,0:11:44.164 nearly perfectly balanced.[br]So, if we have this red black tree where 0:11:44.164,0:11:48.367 everything's black, and we insert, say,[br]six, that's going to get inserted down 0:11:48.367,0:11:51.054 here.[br]Now, if we try to color it black, it's no 0:11:51.054,0:11:55.033 longer going to be a red black tree.[br]And that's because, if we do an 0:11:55.033,0:11:59.622 unsuccessful search now for, say, 5.5,[br]we're going to encounter three black 0:11:59.622,0:12:03.462 nodes, where if we do an unsuccessful[br]search for one, we only encounter two 0:12:03.462,0:12:05.706 black nodes.[br]So, that's not going to work. 0:12:05.706,0:12:09.969 But the way we can fix it is instead of[br]coloring the six black, we color it red. 0:12:09.969,0:12:13.544 And now, this six is basically invisible[br]to invariant number four. 0:12:13.544,0:12:16.406 It doesn't show up in any root null[br]paths. 0:12:16.406,0:12:20.858 So, because you have two black nodes in[br]all roots in all paths before, before the 0:12:20.858,0:12:24.275 six was there, that's still true now that[br]you have this red six. 0:12:24.275,0:12:28.993 So, all four invariants are satisfied once[br]you insert the six and color it red. 0:12:28.993,0:12:33.209 If we then insert, say, an eight, we can[br]pull exactly the same trick, we can call 0:12:33.209,0:12:37.142 it an eight red.[br]Again, it doesn't participate in invariant 0:12:37.142,0:12:42.021 four at all so we haven't broken it.[br]Moreover, we still don't have two reds in 0:12:42.021,0:12:45.073 a row, so we haven't broken invariant[br]number three either. 0:12:45.073,0:12:50.042 So, this is yet another red black tree.[br]In fact, this is not the unique way to 0:12:50.042,0:12:54.092 color the nodes of this search tree, so[br]that it satisfies all four of the 0:12:54.092,0:12:57.082 invariants.[br]If we, instead, recolor six and eight 0:12:57.082,0:13:02.001 black, but at the same time, recolor the[br]node seven, red, we're also golden. 0:13:02.001,0:13:05.000 Clearly, the first three invariants are[br]all satisfied. 0:13:05.000,0:13:09.009 But also, in pushing the red upward,[br]consolidating the red at six and eight, 0:13:09.009,0:13:13.035 and putting it at seven instead, we[br]haven't changed the number of black nodes 0:13:13.035,0:13:16.044 on any given path.[br]Any black, any path that previously went 0:13:16.044,0:13:20.070 through six, went through seven, anything[br]that went through eight, went through 0:13:20.070,0:13:24.923 seven so there's exactly the same number[br]of red and black nodes on each such path 0:13:24.923,0:13:28.077 as there was before.[br]So, all paths still have equal number of 0:13:28.077,0:13:30.835 black nodes and invariant four remains[br]satisfied. 0:13:30.835,0:13:36.595 As I said, I've shown you here only simple[br]examples, where you don't have to do much 0:13:36.595,0:13:39.895 work on an insertion to retain the red[br]black properties. 0:13:39.895,0:13:44.459 In general, if you keep inserting more and[br]more stuff and certainly if you do the 0:13:44.459,0:13:48.771 deletions, you have to work much harder to[br]maintain those four invariants. 0:13:48.771,0:13:52.807 Time permitting, we'll cover just a taste[br]of it in some optional videos. 0:13:52.807,0:13:57.748 So, what's the point of these seemingly[br]arbitrary four invariants of a red black 0:13:57.748,0:14:00.478 tree?[br]Well, the whole point is that if you 0:14:00.478,0:14:05.475 satisfy these four invariants in your[br]search tree, then your height is going to 0:14:05.475,0:14:07.839 be small.[br]And because your height's going to be 0:14:07.839,0:14:10.098 small, all your operations are going to be[br]fast. 0:14:10.098,0:14:15.695 So, let me give you a proof that if a[br]search tree satisfies the four invariants, 0:14:15.695,0:14:20.030 then it has super small height.[br]In fact, no more than double the absolute 0:14:20.030,0:14:24.040 minimum that we conceivably have, almost[br]two times log base two of N. 0:14:24.040,0:14:31.003 So, the formal claim, is that every[br]red-black tree with N nodes, has height O 0:14:31.003,0:14:36.884 of log N, were precisely in those two[br]times log base two of N + 1. 0:14:36.886,0:14:41.723 So, here's the proof.[br]And what's clear about this proof is it's 0:14:41.723,0:14:46.089 very obvious the role played by this[br]invariants three and four. 0:14:46.089,0:14:51.914 Essentially, what the invariants guarantee[br]is that, a red black tree has to look like 0:14:51.914,0:14:57.010 a perfectly balanced tree with at most a[br]sort of factor two inflation. 0:14:57.010,0:15:01.010 So, let's see exactly what I mean.[br]So, let's begin with an observation. 0:15:01.010,0:15:03.083 And this, this has nothing to do with red[br]black trees. 0:15:03.083,0:15:08.006 Forget about the colors for a moment, and[br]just think about the structure of binary 0:15:08.006,0:15:10.059 trees.[br]And let's suppose we have a lower bound on 0:15:10.059,0:15:14.082 how long root null paths are in the tree.[br]So, for some parameter k, and go ahead and 0:15:14.082,0:15:18.079 think of k as, like, ten if you want.[br]Suppose we have a tree where if you start 0:15:18.079,0:15:22.097 from the root, and no matter how it is you[br]navigate left and right, child pointers 0:15:22.097,0:15:26.079 until you terminate in a null pointer.[br]No matter how you do it, you have no 0:15:26.079,0:15:29.032 choice but to see at least k nodes along[br]the way. 0:15:29.032,0:15:34.069 If that hypothesis is satisfied, then if[br]you think about it, the top of this tree 0:15:34.069,0:15:39.032 has to be totally filled in.[br]So, the top of this tree has to include a 0:15:39.032,0:15:43.060 perfectly balanced search tree, binary[br]tree of depth k - 1. 0:15:43.062,0:15:46.028 So, let me draw a picture here of the case[br]of k = three. 0:15:46.033,0:15:50.083 So, if no matter how you go from the root[br]to a null pointer, you have to see at 0:15:50.083,0:15:54.098 least three nodes along the way.[br]That means the top three levels of this 0:15:54.098,0:15:57.087 tree have to be full.[br]So, you have to have the root. 0:15:57.087,0:16:01.074 It has to have both of its children.[br]It has to have all four of its 0:16:01.074,0:16:04.083 grandchildren.[br]The proof of this observation is by 0:16:04.083,0:16:08.019 contradiction.[br]If, in fact, you were missing some nodes 0:16:08.019,0:16:13.016 in any of these top k levels.[br]We'll that would give you a way of hitting 0:16:13.016,0:16:18.045 a null pointer seeing less then k nodes.[br]So, what's the point is, the point is this 0:16:18.045,0:16:23.086 gives us a lower bound on the population[br]of a search tree as a function of the 0:16:23.086,0:16:28.086 lengths of its root null paths.[br]So, the size N of the tree must include at 0:16:28.086,0:16:34.315 least the number of nodes in a perfectly[br]balanced tree of depth k - 1 which is 0:16:34.315,0:16:40.094 2^k - 1, So, for example,[br]when k = 3, it's 2^3 (two cubed) - 1, 0:16:40.094,0:16:45.346 or 7 that's just a basic fact about trees, 0:16:45.346,0:16:49.553 nothing about red black trees.[br]So, let's now combine that with a red 0:16:49.553,0:16:54.740 black tree invariant to see why red black[br]trees have to have small height. 0:16:54.740,0:16:58.828 So again, to recap where we got to on the[br]previous slide. 0:16:58.828,0:17:04.407 The size N, the number of nodes in a tree,[br]is at least 2^k - 1, where k is 0:17:04.407,0:17:08.932 the fewest number of nodes you will ever[br]see on a root null path. 0:17:08.932,0:17:13.503 So, let's rewrite this a little bit and[br]let's actually say, instead of having a 0:17:13.503,0:17:18.366 lower bound on N in terms of k, let's have[br]an upper bound on k in terms of N. 0:17:18.366,0:17:23.481 So, the length of every root null path,[br]the minimum length of every root null 0:17:23.481,0:17:26.803 path is bounded above by log base two of[br]quantity N + 1. 0:17:26.804,0:17:31.243 This is just adding one to both sides and[br]taking the logarithm base two. 0:17:31.243,0:17:35.355 So, what does this buy us?[br]Well, now, let's start thinking about red 0:17:35.355,0:17:38.022 black trees.[br]So now, red black tree with N nodes. 0:17:38.022,0:17:42.208 What does this say?[br]This says that the number of nodes, forget 0:17:42.208,0:17:48.156 about red or black, just the number of[br]nodes on some root null path has to be the 0:17:48.156,0:17:52.622 most log base two of N + 1.[br]In the best case, all of those are black. 0:17:52.622,0:17:57.236 Maybe some of them are red, but in the,[br]in, the maximum case, all of them are 0:17:57.236,0:18:00.065 black.[br]So, we can write in a red black tree with 0:18:00.065,0:18:06.176 N nodes, there is a root null path with at[br]most log base two of N + 1, black nodes. 0:18:06.176,0:18:10.747 This is an even weaker statement than what[br]we just proved. 0:18:10.747,0:18:15.500 We proved that it have some, somehow must[br]have at most log based two, n + 1 total 0:18:15.500,0:18:18.339 nodes.[br]So, certainly, that path has the most log 0:18:18.339,0:18:21.878 base two of N + 1 black nodes.[br]Now, let's, now let's apply the two 0:18:21.878,0:18:25.742 knockout punches of our two invariants.[br]Alright, so fundamentally, what is the 0:18:25.742,0:18:29.167 fourth invariant telling us?[br]It's telling us that if we look at a path 0:18:29.167,0:18:33.014 in our red black tree, we go from the[br]root, we think about, let's say, that's an 0:18:33.014,0:18:35.380 unsuccessful search, we go down to a null[br]pointer. 0:18:35.380,0:18:39.093 It says, if we think of the red nodes as[br]invisible, if we don't count them in our 0:18:39.093,0:18:43.033 tally, then we're only going to see log,[br]basically a logarithmic number of nodes. 0:18:43.033,0:18:47.054 But when we care about the height of the[br]red black tree, of course, we care about 0:18:47.054,0:18:49.730 all of the nodes, the red nodes and the[br]black nodes. 0:18:49.730,0:18:54.169 So, so far we know, that if we only count[br]black nodes then we're good, We only have 0:18:54.169,0:18:56.734 log base two of N + 1 nodes that we need[br]to count. 0:18:56.734,0:18:59.048 So, here's where the third invariant comes[br]in. 0:18:59.048,0:19:04.021 It says, well actually, black nodes are a[br]majority of nodes in the tree. 0:19:04.021,0:19:08.033 In a strong sense, there are no two reds[br]in a row, on any path. 0:19:08.033,0:19:12.096 So, if we know the number of black nodes[br]is small, then because you can't have two 0:19:12.096,0:19:17.042 reds in a row, the number of total nodes[br]on the path is at most twice as large. 0:19:17.042,0:19:21.070 In the worst case, you have a black route,[br]then red, then black, then red, then 0:19:21.070,0:19:26.015 black, then red, then black, et cetera.[br]At the worst case, the number of red nodes 0:19:26.015,0:19:30.089 is equal to the number of black nodes,[br]which doubles the length of the path once 0:19:30.089,0:19:35.035 you start counting the red nodes as well.[br]And this is exactly what it means for a 0:19:35.035,0:19:39.001 tree to have a logarithmic depth.[br]So, this, in fact, proves the claim, if 0:19:39.001,0:19:43.046 the search trees satisfies the invariants[br]one through four, in particular if there's 0:19:43.046,0:19:47.085 no two reds in a row and all root null[br]paths have an equal number of black nodes, 0:19:47.085,0:19:51.056 then, knowing nothing else about this[br]search tree, it's got to be almost 0:19:51.056,0:19:54.026 balanced.[br]It's perfectly balanced up to a factor of 0:19:54.026,0:19:56.022 two.[br]And again, the point then is that 0:19:56.022,0:19:59.827 operations in a search tree and the search[br]trees are going to run in logarithmic 0:19:59.827,0:20:04.043 time, because the height is what governs[br]the running time of those operations. 0:20:04.085,0:20:08.633 Now, in some sense, I've only told you the[br]easy part which is if it just so happens 0:20:08.633,0:20:12.192 that your search tree satisfies these[br]four invariants, then you're good. 0:20:12.192,0:20:15.992 The height is guaranteed to be small so[br]the operations are guaranteed to be fast. 0:20:15.992,0:20:18.992 Clearly that's exactly what you want from[br]this data structure. 0:20:18.992,0:20:22.980 But for the poor soul who has to actually[br]implement this data structure, the hard 0:20:22.980,0:20:26.751 work is maintaining these invariants even[br]as the data structure changes. 0:20:26.751,0:20:31.084 Remember, the point here is to be dynamic,[br]to accommodate insertions and deletions. 0:20:31.084,0:20:35.661 And searches and deletions can disrupt[br]these four invariants and then one has to 0:20:35.661,0:20:40.025 actually change the code to make sure[br]they're satisfied again, so that the tree 0:20:40.025,0:20:44.384 stays balanced, has low height, even under[br]arbitrary sequences of insertions and 0:20:44.384,0:20:46.774 deletions.[br]So, we're not going to cover that in this 0:20:46.774,0:20:49.023 video.[br]It can be done, without significantly 0:20:49.023,0:20:53.048 slowing down any of the operations.[br]It's pretty tricky, takes some nice ideas. 0:20:53.048,0:20:57.050 There's a couple well-known algorithms[br]textbooks that cover those details. 0:20:57.050,0:21:00.068 Or if you look at open source and[br]limitations of balanced search trees, you 0:21:00.068,0:21:02.608 can look at code that does that[br]implementations. 0:21:02.608,0:21:07.147 But, because it can be done in a practical[br]way and because Red Black Tree supports 0:21:07.147,0:21:11.512 such an original array of operations,[br]that's why you will find them used in a 0:21:11.512,0:21:15.276 number practical applications.[br]That's why balanced search trees should be 0:21:15.276,0:21:17.051 part of your programmer tool box.