Red-Black Trees Part 2 - Insert
By John Lenz. December 27, 2015.
This page is part two of the article series You could have invented red-black trees!.
Recall from the part 1 the goal of red-black trees: store a balanced binary search tree such that each node is colored red or black to satisfy the following properties. A root-to-nil path is a sequence of \(L\) or \(R\) such that starting from the root and descending left or right according to the sequence, you end at nil. That is, the last step in the sequence descends in a direction for which there does not exist a child.
- (black-height property) Every root-to-nil path has the same number of black nodes.
- (no-red-edge property) For each node \(v\), it is not the case that both \(v\) and \(v\)'s parent are red.
We want to insert a new element by storing it on a new leaf, and then adding the leaf into the tree so that the properties are maintained. We know this is possible because we could mimic the actions that would be performed on a balanced multiway tree. Instead of doing that, we will directly discover an insert procedure which just maintains the properties. The hope is that we find some optimizations and shortcuts when operating directly on the binary red-black tree.
Our insert procedure will start just like normal binary search tree insert to create a new leaf to hold the new element, and then find the location this new leaf should be added. Recall from the previous discussion that the new leaf was added as a red node, so that the black-height property is not violated because each root-to-nil path has the same number of black nodes as before. But, adding this red leaf might violate the no-red-edge property if it is added below a red node. Therefore, we will add some code to detect and fix a red-edge.
Unbalanced Binary Search Tree Insert
At the moment, I am only concerned with correctness and big-O execution time. Therefore, I will use pseudocode and not care about minor efficiency optimizations. Trees and recursion are BFFs so I will use recursion to describe the insert and remove methods. Recursion is a great fit for trees because it allows us to focus only on a few nodes in the tree at a time, and exactly matches the reasoning for correctness. Later, in part 4 we will think about converting to a real programming language and at that time can perform efficiency optimizations such as converting the recursion to a loop.
The unbalanced binary search tree recursive insert procedure is as follows. It takes as parameters a tree node g
and the new key and value. The return value is the root of the subtree with the key and value added (most of the time this will be g
itself but sometimes the root of the subtree will change).
insert_helper(g, newKey, newVal):
if g == nil:
# Inserting to an empty tree creates a new node
g = new node
g.key = newKey
g.val = newVal
g.color = red
elif g.key == newKey:
# Replace the existing value
g.val = newVal
else:
if newKey < g.key:
# Insert into the left subtree. The new root of the left
# subtree is returned from the recursive call and then becomes
# g's new left child
g.left = insert_helper(g.left, newKey, newVal)
else:
# Similarly, recursively insert to the right
g.right = insert_helper(g.right, newKey, newVal)
# TODO: check for and fix property violations
return g
Insert and Red-Black Properties
The main question is what needs to replace the TODO
in the above code to make sure the red-black properties are satisfied? To make our life easier, we will try and always keep the black-height property satisfied. Since we are inserting a new red node, the initial insert does not violate the black-height property but could produce a red-edge. As long as our tree manipulation to fix the red-edge does not cause the black heights to change, we can concentrate on fixing the red-edge and not have to worry about fixing a black-height problem. Also to make our life even easier, we are going to make sure our tree manipulation does not produce a second red-edge, so that we only have to concentrate on a single red-edge. We don't know if this is actually possible, but our first attempt should be as simple as we can get away with.
To fix the red-edge, our strategy will be to move the red-edge closer to the root or remove it completely. By moving the red-edge closer to the root (and preserving the binary search tree and black-height properties), we can eventually guarantee to fix the red-edge. To see this, consider that the red-edge has moved all the way to the root, so that for example the root and its left child are red. At this point I can just change the root to black. This certainly fixes the red-edge, and since the root is on every single root-to-nil path the black-height property is preserved. (Note that the root is the only node which I can change the color like this and preserve the black-height property.)
There is a subtlety here in how multiple red-edges might be created. Consider that before starting insert, g
is red and g
's children are black. Also, assume that we recurse to the left (so recurse to 3 in the following figure).
g
, it's children, and grandchildrenConsider the situation that while processing the subtree rooted at 3, we create a red edge and can't remove it. According to our plan, we move the red-edge as high in the subtree as possible so the red-edge is between 3 and a child of 3. Since this turns 3 red and 7 is red, we have created two red-edges, one between 7 and 3 and one between 3 and a child of 3. As mentioned above, we would like to avoid creating more than one red-edge. Therefore, we will impose an additional rule that if a node is black (such as 3), it can change to red but there cannot be a red-edge below the node.
Fixing a red-edge
Our goal is to add some code to replace the TODO
in the insert pseudocode above. More precisely, the pseudocode should make sure that when the function returns,
- The black-heights are the same as before the insert started.
- If
g
is red before starting the insert, then the node returned byinsert_helper
could have a red-edge between itself and a child, but this is the only red-edge. - If
g
is black before starting the insert, then the node returned byinsert_helper
could be red but there is no red-edge at all in the subtree.
Definition: To ease our discussion, let's give a name to this. We will say that the subtree rooted at g
is obedient if the above three bullet points hold when inserting something below g
.
Let's start considering the code to replace TODO
to make sure g
is obedient. If g
starts out red, then nothing needs to be done! Indeed, if g
is red then before insert the tree looks like the above figure with g
's children black. While recursing to a child, the child could change to red but there are no red-edges below the child since the child is obedient (every parent's dream). If the child changes to red, then we have a red-edge between g
and a child of g
, but we are perfectly safe returning with the tree in this configuration.
Now consider when g
starts out black and we recurse to a child. After the recursion returns, the only possible red-edge is between a child of g
and a grandchild of g
. If there isn't a red edge, we don't need to do anything and can just return. If there is a red-edge between a child and a grandchild of g
, we need to perform some processing to remove the red-edge. It must be removed because g
started black and we don't want to get g
in trouble with the law.
So consider that g
is black and there is a red edge between a child and a grandchild. There are four possibilities since there are four edges; we will consider them one by one. In the figures below, I will draw a small fragment of the tree starting from g
. Therefore, you should consider that above g
there are more nodes (g
is not the root) and the nodes at the bottom of the figures are not leaves.
g.left and g.left.left are red
Remember this is only a fragment of the tree, there are nodes above 7 and nodes below each of 1, 5, 9, and 11, I just haven't drawn them. If 3 and 1 are red, then 5 must be black since there is only a single red-edge. There are now two possible subcases, either 11 is red or 11 is black. Let's consider the case when 11 is red. Since there is only a single red-edge violation, the tree looks like the following.
To fix this, we could try turning 3 black. This certainly fixes the red-edge violation, but then the black-height property fails, because each root-to-nil path that passes through 3 has increased the number of black nodes by one (remember that we have only drawn a fragment of the tree). We can fix this by converting 7 to red to decrease the number of black nodes on such paths, but turning 7 to red also removes a black node from all root-to-nil paths passing through 11. We can fix those paths by turning 11 to black and we have now fixed the black-height property and have no red-edge at all in the subtree rooted at g
.
push_black
operationdef push_black(g):
g.color = red
g.left.color = black
g.right.color = black
We can now call push_black
from the insert pseudocode above by replacing the TODO
with some code which looks like the following (see below for the final version):
if g.left.color = red and g.right.color = red:
push_black(g)
return g
# TODO: check for and fix other possible red-edge violations
That was only one case. So what about we still have a red violation between 1 and 3 but 11 is black. We can't call push_black anymore since it won't preserve the black-height property.
I'd still like to turn 3 black and 7 red since that fixes the red-edge violation and keeps the same number of black nodes on root-to-nil paths that pass through 3. The problem is that changing 7 to red decreases the number of black nodes on paths through 11 which causes the black-height property to be violated.
To restore the black-heights, we need another black node on root-to-nil paths that go through 11. The only black node around is 3, so we will move 3 up to become the new root of the subtree. Since we must preserve the binary search tree property, 7 will move down to become 3's right child, and 5 will move over to become 7's left child. This operation is called rotating the tree right. Combining the color swap with the rotation, we obtain an operation called flip_right
.
def rotate_right(g):
u = g.left
g.left = u.right
u.right = g
return u
def flip_right(g):
swap colors of g and g.left
return rotate_right(g)
In pictures,
Note how the flip_right
function returns the new root u
. We can therefore return this u
from insert_helper
, as it is the new root of the subtree. Thus we can add pseudocode such as the following in the TODO
section of the insert procedure:
if g.left.color = red and g.left.left.color = red and g.right.color = black:
return flip_right(g)
# TODO: check for and fix other red-edge violations
The flip removes the red-edge, but what about the black-heights? Recall that the figure is just a fragment of the larger tree, so we must consider all root-to-nil paths. Such paths could either avoid the subtree rooted at 7 completely, enter through 7 and exit through 1, enter through 7 and exit through 5, exit through 9, or exit through 13. For each such path, we check that it has the same number of black nodes as before. Paths exiting through 1 had one black node before (7) and still have one black node after (3). Paths exiting through 5 had two before (7, 5) and still have two after (3, 5). Paths exiting through 9 had two before (7, 11) not including 9 itself, and still have two after (3, 11) after. Paths exiting through 13 before had two black nodes (7, 11) not including 13 itself, and after still have two (3, 11).
g.left and g.left.right are red
Consider 3 and 5 are red, which forces 1 to be black since there is only a single red-edge.
We could start thinking about color changes and rotations, but before doing so there is a general trick we have in coding (actually in all of mathematics). Can we modify the tree to transform it into something we have already solved? In particular, can we transform the tree to make g.left
and g.left.left
red? Of course, we need to do so in a way that the black-heights are maintained. Indeed, we can rotate left at 3 which will push 5 up to be the left child of 7 and make 3 the left child of 5.
In pseudocode:
def rotate_left(w):
u = w.right
w.right = u.left
u.left = w
return u
We can then add the following pseudocode to the TODO
section, making sure to place it above the code which fixes g.left
and g.left.left
. Note how we set g.left
to be the return value from rotate_left
, effectively changing the root of the subtree to 5.
if g.left.color = red and g.left.right.color = red:
g.left = rotate_left(g.left)
g.right and a child of g.right are red
Consider when there is a red-edge between g.right
and a child of g.right
. Before digging in and trying various rotations and color changes, we ask ourselves if it can be reduced to a previous case. While not reducing directly, we have essentially already solved all the possibilities here when we considered the red-edge to the left of g
. For the color properties, left and right are symmetric. So let's go back to the above code and just swap every left and right.
First if g.right
and g.right.left
are red this is essentially the same as g.left
and g.left.right
are red above, which we fixed by rotating left. Since we are changing left and right, now we rotate right. In pictures,
if g.right.color = red and g.right.left.color = red:
g.right = rotate_right(g.right)
Next, we now fix when g.right
and g.right.right
are both red, which is essentially the same as g.left
and g.left.left
above. If both children of g
are red, push_black
fixes it. If g.right
is red and g.left
is black, we flip left at g
. This swaps the color of g
and g.right
and then rotates left.
if g.left.color = red and g.right.color = red:
push_black(g)
return g
if g.right.color = red and g.right.right.color = red:
return flip_left(g)
Making g obedient
We have now covered every case and so can combine the above code fragments into the following procedure fix_red_edge
(along with its helper rotate and flip functions). Note that there is some optimization that can be performed since some of the if checks can be merged and others are redundant depending on which child we recurse to, but at the moment we are focused on correctness.
def fix_red_edge(g):
if g.left.color = red and g.right.color = red:
push_black(g)
return g
# Red-edge to the left of g
if g.left.color = red and g.left.right.color = red:
g.left = rotate_left(g.left)
if g.left.color = red and g.left.left.color = red:
return flip_right(g)
# Red-edge to the right of g
if g.right.color = red and g.right.left.color = red:
g.right = rotate_right(g.right)
if g.right.color = red and g.right.right.color = red:
return flip_left(g)
# No fixes needed
return g
Up until now, we have been wearing our explorer's hat, thinking of cases and trying out various rotations to maintain the red-black properties. Now, we want to put on our detective hat and make sure that we have actually covered all cases and each case is solved correctly. We will do so by proving a lemma that states that fix_red_edge
works properly. This is typical in data structures: we first explore the operations and cases and after that we write up a lemma which summarizes our work. In a textbook presentation, the lemma usually comes first to more concisely explain the procedure.
Proposed Lemma: Assume that before calling insert g
is the root of a red-black tree, and assume that the child we recurse to is obedient. Then g
is obedient.
To attempt to prove this, we would copy the above discussion with the various cases, re-organize it slightly, and check that it fits together properly. But while doing so, we would detect a minor issue that we have not yet thought about: what happens when children don't exist? The above discussion assumed a full tree with children and grandchildren of g
, but what happens if some children don't exist? We will get null pointer exceptions when executing statements such as if g.right.right.color == red
if those nodes don't exist. But if we look back through the cases, we will see that if nodes don't exist essentially the same logic can be used. For example, consider fixing g.left
and g.left.left
are red which we did by flipping right.
Now consider that 5 did not exist, that is u.right
is nil (the NULL pointer or however it is represented in our programming language). The flip code sets g.left = u.right
so will set g.left
to nil, correctly making 7
have no left child. Also, considering the black-height property, the same logic as before still holds. Instead of a root-to-nil path that exits through 5, we have a root-to-nil path which ends going right from 3 before the flip and ends going left from 7 after the flip. I won't go into detail, but if you go back through the above discussion considering nodes that don't exist, you will see that everything is safe. The only issue is the possible null pointer exceptions when checking the colors of nodes, but notice that we only check for red nodes and if a red node doesn't exist we don't have to fix anything since there isn't a red-edge. Therefore, the if statements can just add nil checks.
Insert Pseudocode
Combining everything above together, we get the following psuedocode for insert:
def push_black(g):
g.color = red
g.left.color = black
g.right.color = black
def rotate_right(g):
u = g.left
g.left = u.right
u.right = g
return u
def flip_right(g):
swap colors of g and g.left
return rotate_right(g)
def rotate_left(g):
u = g.right
g.right = u.left
u.left = g
return u
def flip_left(g):
swap colors on g and g.right
return rotate_left(g)
def is_red(n):
return n != nil and n.color = red
def fix_red_edge(g):
if is_red(g.left) and is_red(g.right):
push_black(g)
return g
# Red-edge to the left of g
if is_red(g.left) and is_red(g.left.right):
g.left = rotate_left(g.left)
if is_red(g.left) and is_red(g.left.left):
return flip_right(g)
# Red-edge to the right of g
if is_red(g.right) and is_red(g.right.left):
g.right = rotate_right(g.right)
if is_red(g.right) and is_red(g.right.right):
return flip_left(g)
# No fixes needed
return g
def insert_helper(g, newKey, newVal):
if g == nil:
# Inserting to an empty tree creates a new node
g = new node
g.key = newKey
g.val = newVal
g.color = red
return g
elif g.key == newKey:
# Replace the existing value
g.val = newVal
return g
else:
if newKey < g.key:
# Insert into the left subtree. The new root of the left
# subtree is returned from the recursive call and then becomes
# g's new left child
g.left = insert_helper(g.left, newKey, newVal)
else:
# Similarly, recursively insert to the right
g.right = insert_helper(g.right, newKey, newVal)
return fix_red_edge(g)
def insert(newKey, newVal):
this.root = insert_helper(this.root, newKey, newVal)
if is_red(this.root):
this.root.color = black
Again, we want a lemma which states that the above code works correctly. The purpose of the lemma is to verify that everything we discussed above actually combines together and works properly.
Lemma Let \(g\) be the root of a red-black tree and let \(b\) be the number of black nodes along a \(g\)-to-nil path (which is the same for any such path). Let \(g' = insert\_helper(g, newKey, newVal)\). Then \(g'\) is the root of a binary search tree consisting of the elements of \(g\) plus \((newKey, newVal)\). In addition, the black-height property holds in the tree rooted at \(g'\). Each \(g'\)-to-nil path has exactly \(b\) black nodes (so the number of black nodes along paths did not change). Finally,
- If \(g\) was black, then the tree rooted at \(g'\) has no red-edges and is actually a red-black tree. But \(g'\) could be red.
- If \(g\) was red, then there could be a red-edge between \(g'\) and \(g'.left\) but no other red edges. Also, \(g'\) is still red and not black.
Proof The proof is by induction on the size of the tree rooted at \(g\).
Base case when \(g\) is nil. Returning a singleton red node correctly satisfies the properties.
Base case when \(g.key = newKey\). In this case, replacing the value and returning \(g' = g\) is correct.
Inductive case when \(newKey < g.key\). We insert to the left to correctly preserve the BST property. Also, all operations performed preserve the black-height property. To finish this case, we need to check the existence of a red-edge.
If \(g.left\) did not originally exist, then \(g.left\) becomes a new red node storing \((newKey, newVal)\). This might produce a red-edge between \(g\) and \(g.left\), but we can return with \(g' = g\) without any processing.
If \(g.left\) was originally red, then \(g\) was originally black. By induction, it could be the case that \(g.left\) and \(g.left.left\) are red after the recursion to the left. The potential red-edge between \(g.left\) and \(g.left.left\) is fixed by pushing black or flipping right, depending on the color of \(g.right\). (Here you would copy down the above discussion and figures on why this removes the red-edge.)
If \(g.left\) was originally black, then by induction \(g.left\) could change to red but there is no red-edge anywhere in the tree rooted at \(g.left\). The only possible red-edge is between \(g\) itself and \(g.left\) if \(g\) was originally red, but in this case we can just return without modifying the tree since we are allowed to return with a red-edge between \(g' = g\) and \(g.left\).
Inductive case when \(newKey > g.key\). We insert to the right to correctly preserve the BST property. This is very similar to the previous case when inserting to the left, just switching left and right.
C++11
As I mentioned before, we were focused on correctness and will think about optimization later. But, we can convert the above unoptimized psuedocode to C++ just to get a feeling for the red-black tree. Also, we can get some additional testing. As Knuth wrote, "Beware of bugs in the above code; I have only proved it correct, not tried it." Here is my C++11 code which directly translates the above pseudocode to C++11. It also includes a method to print the tree and check the red-black properties.