Understanding the DIFF Algorithm in React

Time： 2024-10-26 Column：Mobile & Frontend views：365

In previous articles, we introduced the initialization process, rendering process, and commit process of React, which provided a basic understanding of React's workflow. Now, we need to delve into the finer details of how React operates. Understanding these core details will not only enhance our comprehension of React but also help us learn about its design philosophy. More importantly, it will prepare us to impress interviewers with our knowledge!

Two years ago, I wrote an article titled "An Analysis of the Diff Strategies in React and Vue." I recommend checking it out in your spare time. In that article, I discussed the differences in the diff implementations of various mainstream frameworks that are based on the virtual DOM. Essentially, any front-end framework based on the virtual DOM needs to use a diff algorithm to minimize updates. In this article, I will detail how React identifies differences between two virtual DOMs in the beginWork process previously mentioned in this column, including how it tags nodes and how it moves them during the commit process. By the end of this article, you will have a deeper understanding of the following questions:

What objects participate in React's diff?
If the real DOM needs to be updated, does the Fiber need to be updated?
What is the complete flow of the diff process?
And more...

Source Code

Participating Roles

From previous articles, we know that during the construction of the workInProgress tree 🌲, we need to go through the beginWork process. Today’s focus, the diff algorithm, occurs during this process. The beginWork function generates the workInProgress tree based on the latest ReactElement and the current Fiber tree. Therefore, the objects involved in the diff algorithm are the new ReactElement and the current tree 🌲. The entire diff process can be described with the following diagram:

Understanding the DIFF Algorithm in React

In the beginWork process, we know that ReactElement is structured as an array, while the Fiber tree at a specific level is structured as a linked list. Thus, we cannot use the traditional two-pointer strategy. The core function of the diff algorithm is to tag nodes in the newly generated workInProgress tree for use in the subsequent commit process.

Core Code

For a single node, there’s no need to perform a diff; a direct comparison will suffice. Thus, the diff only occurs when there are multiple nodes at a particular level. The core of this is the reconcileChildrenArray function, which we will break down step by step.

function reconcileChildrenArray(
  returnFiber,
  currentFirstChild,
  newChildren,
  lanes
) {
  var resultingFirstChild = null; // New first child Fiber
  var previousNewFiber = null; // Previous new Fiber
  var oldFiber = currentFirstChild; // First old Fiber
  var lastPlacedIndex = 0; // Marker
  var newIdx = 0; // Index
  var nextOldFiber = null; 

  // First round of comparisons
  for (; oldFiber !== null && newIdx < newChildren.length; newIdx++) {
    if (oldFiber.index > newIdx) {
      nextOldFiber = oldFiber;
      oldFiber = null;
    } else {
      nextOldFiber = oldFiber.sibling;
    }

    var newFiber = updateSlot( // Start comparing the first old fiber with the first newChild
      returnFiber,
      oldFiber,
      newChildren[newIdx],
      lanes
    );

    if (newFiber === null) { // Indicates the beginning of differences; exit the loop
      if (oldFiber === null) {
        oldFiber = nextOldFiber;
      }
      break;
    }

    lastPlacedIndex = placeChild(newFiber, lastPlacedIndex, newIdx);

    if (previousNewFiber === null) {
      resultingFirstChild = newFiber;
    } else {
      previousNewFiber.sibling = newFiber;
    }

    previousNewFiber = newFiber;
    oldFiber = nextOldFiber;
  }

  // New ReactElements have been exhausted; simply delete all old nodes
  if (newIdx === newChildren.length) {
    deleteRemainingChildren(returnFiber, oldFiber); // Mark nodes for deletion
    return resultingFirstChild;
  }

  // Old fibers have been exhausted; create new nodes
  if (oldFiber === null) {
    for (; newIdx < newChildren.length; newIdx++) {
      var _newFiber = createChild(returnFiber, newChildren[newIdx], lanes);

      if (_newFiber === null) {
        continue;
      }

      lastPlacedIndex = placeChild(_newFiber, lastPlacedIndex, newIdx);

      if (previousNewFiber === null) {
        resultingFirstChild = _newFiber;
      } else {
        previousNewFiber.sibling = _newFiber;
      }

      previousNewFiber = _newFiber;
    }
    return resultingFirstChild;
  } 

  // Both sets of fibers still exist; start the real diff logic
  var existingChildren = mapRemainingChildren(returnFiber, oldFiber); // Keep scanning and use the map to restore deleted items as moves.

  for (; newIdx < newChildren.length; newIdx++) {
    var _newFiber2 = updateFromMap( // Find existing fiber from existingChildren
      existingChildren,
      returnFiber,
      newIdx,
      newChildren[newIdx],
      lanes
    );

    if (_newFiber2 !== null) {
      if (_newFiber2.alternate !== null) {
        existingChildren.delete(
          _newFiber2.key === null ? newIdx : _newFiber2.key
        );
      }

      lastPlacedIndex = placeChild(_newFiber2, lastPlacedIndex, newIdx);

      if (previousNewFiber === null) {
        resultingFirstChild = _newFiber2;
      } else {
        previousNewFiber.sibling = _newFiber2;
      }

      previousNewFiber = _newFiber2;
    }
  }

  existingChildren.forEach(function (child) {
    return deleteChild(returnFiber, child);
  });

  return resultingFirstChild;
}

The code may seem lengthy, but don’t worry; we will analyze it step by step. This segment can be broken down into three parts:

The first round of comparisons to find differences
Assertions
The real diff logic

The First Round

To make it easier to understand, let's refer to the current tree (linked list) as "old" and the new ReactElements array as "new." We start with several global variables: oldFiber representing the first node of "old," and newIdx for the incrementing index. Now let's look at the first round of traversal.

The first round will traverse each node in "new," starting from the first node and comparing it to the first node in "old." If the two nodes are the same, we reuse the old fiber, moving oldFiber to the next node in the linked list, incrementing the index, and continuing the first round of the loop. If they differ, we exit the first round. During this exit, we need to reset the reference relationships of the new fiber nodes.

Assuming we have the following example:

Understanding the DIFF Algorithm in React

In this case, we will enter the first round of comparisons. After finding that a is the same, we attempt to reuse this fiber. Note that reusing doesn't mean we simply take it; instead, we selectively clone its properties, resulting in a newFiber, and we continue the loop.

Understanding the DIFF Algorithm in React

When we reach the current node, we find that it differs and cannot be reused, causing us to exit the current loop.

We can summarize three situations in which we might exit the first round:

All old nodes have been used up.
All new nodes have been used up.
A non-reusable node has been found.

Assertions

In the assertions section, we handle the first and second cases. If the old nodes are exhausted, it indicates that there are many new nodes in this update that need to be added to the page. Therefore, we can simply create new fiber nodes for them and tag them as additions.

JavaScript Code Explanation

// Indicating that oldFiber has been exhausted, we can directly create new nodes.
if (oldFiber === null) {
  for (; newIdx < newChildren.length; newIdx++) {
    var _newFiber = createChild(returnFiber, newChildren[newIdx], lanes); // Create directly
    if (_newFiber === null) {
      continue;
    }
    lastPlacedIndex = placeChild(_newFiber, lastPlacedIndex, newIdx); // Tag with Placement
    if (previousNewFiber === null) {
      resultingFirstChild = _newFiber;
    } else {
      previousNewFiber.sibling = _newFiber;
    }
    previousNewFiber = _newFiber;
  }
  return resultingFirstChild;
}

If the new nodes are exhausted, it means that in the new update, we need to remove the remaining nodes. Therefore, we directly tag the child nodes that need to be deleted in the parent fiber node.

JavaScript Code Explanation

// Indicating that the new ReactElements have been exhausted, we can simply delete all the old nodes.
if (newIdx === newChildren.length) {
  deleteRemainingChildren(returnFiber, oldFiber); // Mark nodes for deletion
  return resultingFirstChild;
}

Non-reusable Nodes

When the first non-reusable node is encountered, we begin to execute the actual diff strategy. The core of this strategy is to use a variable called lastPlacedIndex to record the criteria for determining whether a right shift is necessary. Its initial value is 0, and it will only increase, never decrease. The idea in React is to continuously move the old nodes to the right, gradually transforming them to resemble the new nodes.

Returning to our previous example, we will first create a map to store the key relationships of the fibers.

After creating this map, we iterate through each node in new, checking if it exists in the map. If it does, it means it can be reused, and we can decide whether to tag it based on the following criteria:

Understanding the DIFF Algorithm in React

The fiber itself also retains its index in the current list. We retrieve the old index of the reusable fiber.

If this index is less than lastPlacedIndex, we tag it; otherwise, we assign lastPlacedIndex to this index.

In the previous example, we see that "c" is in position 2; therefore, we do not tag it and continue iterating.

Understanding the DIFF Algorithm in React

We then repeat the process. We find that "d" can be reused, and its old index is 3, which does not meet the condition, so we execute logic 2 and continue the loop.

Understanding the DIFF Algorithm in React

Upon reaching the last node, which can be reused with an old index of 1 that meets the condition, we execute logic 1. Since this is the last node, we exit the loop.

Understanding the DIFF Algorithm in React

We can see that after executing the diff algorithm, a new workInProgress tree is created that requires no movement because it was generated based on the latest ReactElement. The purpose of tagging is to move the actual DOM. In the commit process, when we reach the "d" fiber node, we only need to move the corresponding real DOM of "b" to the far right of this list. This can be visually verified, as the actual DOM transforms into the desired configuration:

// Original DOM
a b c d

Move b -> far right

a c d b

At this point, the diff algorithm has completed its mission.

Revisiting the Right Move Strategy

In fact, the right move strategy mentioned above is not rigorous. Here, we will use an example to correct this statement. Suppose we have the following changes.

Understanding the DIFF Algorithm in React

After applying the diff strategy we mentioned, the result of our tagging looks like this:

Understanding the DIFF Algorithm in React

If we move all tagged elements to the far right, the resulting order may not be what we desire:

a b c d

Move a to the right

b c d a

Move c to the right

b d a c

As we can see, after these moves, we get "bdac" instead of the expected "badc." Therefore, we need to adjust our strategy. We will modify the move strategy to move the elements to the first untagged element on the right rather than to the far right. In the example above, the first untagged element to the right of "a" is "d," and the first untagged element for "c" is the far right boundary. After this adjustment, let’s re-evaluate:

a b c d 

Move a before d

b c a d

Move c to the far right

b a d c

This achieves the expected effect, and not only does the previous example reach the desired result as well.

In the React source code, we can find the current tagged element's position for insertion through var before = getHostSibling(finishedWork);. Its strategy is to locate the first untagged element on the far right.

Conclusion: Therefore, we can consider React's diff strategy as a right-shifting strategy that tags the nodes that need to move, then traverses through these tagged nodes in the commit phase to move them before the first untagged element on the right.

Principles

From the above content, we understand the specific steps of React's diff algorithm. However, if we only know how it works, there is plenty of online content discussing this process, but simply knowing this is not enough. After a while, we may forget it. So, let's talk about why the above approach works and why it is designed this way.

Understanding the DIFF Algorithm in React

If you closely observe these nodes, the purpose of the diff is to transform "old" into "new." We can think naturally: if we want something to transform into something new through certain modifications, the natural way is to observe the new closely, fix the old parts little by little until it becomes the new version. In the diff process, our goal is to observe "new" and figure out how to move to make "old" look like "new." To simplify the problem, we define a principle: movement can only be towards the right, as any list can be adjusted to any configuration through multiple rightward moves.

Observing the first node of "new," we find it is in the second position of "old."

Since it is the first position in the "new" list, we need to find a way to move the second position of "old" to the first position.

Following the principle of only moving nodes to the right, we can move all nodes before the second position in "old" to the right.

While moving, if you shift a node to the right, it effectively equates to moving the nodes after it forward.

Let's temporarily move it to the far right.

Thus, we can use a variable to track the current node's position in the old nodes, represented by lastPlacedIndex.

This variable signifies that all preceding nodes need to move because I am positioned earlier in "new," needing to shift the "old" nodes ahead of me to the right to position myself closer to the front.

The React diff algorithm embodies these principles. In reality, different frameworks have varying diff strategies based on different scenarios, so this is not the only approach. We can also design our strategies to achieve the same purpose of diffing.

💰 Support Us