Tutorials
Algorithms
Basic
Prefix Sum and Difference Arrays

Prefix Sum

Definition

A prefix sum is simply the sum of the first a1,a2,,ana_1, a_2, \dots, a_n terms of a sequence. This technique is widely used as a precomputation step to answer range sum queries in constant time. For example, once you have computed the prefix sum array, finding the sum of numbers in any subarray can be done very efficiently.

Example Problem

Suppose you have an array A of N positive integers. Your task is to create a new array B where each element B[i] is the sum of the elements from A[0] to A[i]. For instance:

Input:

5
1 2 3 4 5

Output:

1 3 6 10 15

How to Solve

The simplest way to build the prefix sum array is by using a simple recurrence:

B[0]=A[0],and for i1,B[i]=B[i1]+A[i].B[0] = A[0], \qquad \text{and for } i \ge 1,\quad B[i] = B[i-1] + A[i].

Below is a sample implementation in Python:

def compute_prefix_sum(A):
    N = len(A)
    B = [0] * N
    B[0] = A[0]
    for i in range(1, N):
        B[i] = B[i - 1] + A[i]
    return B
 
# Example usage:
A = [1, 2, 3, 4, 5]
print(compute_prefix_sum(A))  # Output: [1, 3, 6, 10, 15]

Multidimensional Prefix Sum

For two-dimensional or even higher-dimensional arrays, the concept of prefix sum is extended to sum over multiple indices.

2D Prefix Sum with Inclusion-Exclusion

Given a matrix A of size m×nm \times n, its 2D prefix sum array S is defined as

Si,j=iijjAi,j.S_{i,j} = \sum_{i' \le i} \sum_{j' \le j} A_{i',j'}.

A direct way to compute each entry is by using the following recurrence relation:

Si,j=Ai,j+Si1,j+Si,j1Si1,j1.S_{i,j} = A_{i,j} + S_{i-1,j} + S_{i,j-1} - S_{i-1,j-1}.

The subtraction of Si1,j1S_{i-1,j-1} avoids double counting the overlapping submatrix. Once S is computed, you can answer any submatrix sum query defined by the top-left corner (i₁, j₁) and the bottom-right corner (i₂, j₂) in constant time using:

Sum=Si2,j2Si1,j2Si2,j1+Si1,j1.\text{Sum} = S_{i_2,j_2} - S_{i_1,j_2} - S_{i_2,j_1} + S_{i_1,j_1}.

Implementation Note: When processing higher dimensions (say, k dimensions), a similar inclusion-exclusion approach is applied. However, note that the number of terms grows exponentially with the dimension (approximately 2k2^k), potentially making the algorithm slower for very high values of k.

Dimensional-by-Dimensional Prefix Sum

Another approach for handling a k-dimensional array is to compute the prefix sum along one dimension at a time. Fix all dimensions except one, compute the 1D prefix sum in that dimension, and then repeat for all dimensions. This method runs in basically O(kN)O(kN) time for an array with NN total entries, which is usually acceptable.

Special Case: Sum Over Subsets (SOS)

A common high-dimensional application is in problems known as Sum Over Subsets. Here, you are given a function f defined on all subsets of a set with n elements, and you want to compute another function g such that

g(S)=TSf(T).g(S) = \sum_{T \subseteq S} f(T).

By representing each subset as an n-bit binary string, the subset relation corresponds to the coordinate-wise relation. In this view, the SOS problem is equivalent to computing an n-dimensional prefix sum where each index can only be 0 or 1. The algorithm runs in O(n2n)O(n \cdot 2^n) time.

Tree-Based Prefix Sum

When working with tree data structures, you can also use a prefix sum-like technique. Assume that for each node i, you know the cumulative weight (or cost) from that node up to the root, which we denote as sumi\text{sum}_i.

  • For problems involving node weights, the sum along the path between nodes x and y can be computed using:

    Path Sum=sumx+sumysumlcasumparent(lca),\text{Path Sum} = \text{sum}_x + \text{sum}_y - \text{sum}_{\text{lca}} - \text{sum}_{\text{parent(lca)}},

    where lca stands for the lowest common ancestor of x and y.

  • For edge weighted trees, the formula simplifies to:

    Path Sum=sumx+sumy2sumlca.\text{Path Sum} = \text{sum}_x + \text{sum}_y - 2 \cdot \text{sum}_{\text{lca}}.

You can compute the lowest common ancestor (LCA) using standard algorithms available in many textbooks.

Difference Arrays

Concept

Difference arrays can be thought of as the inverse operation of prefix sums. For a given sequence a, we define a difference array b by:

b1=a1,bi=aiai1 for i2.b_1 = a_1,\quad b_i = a_i - a_{i-1} \text{ for } i \ge 2.

Key Properties

  • You can recover the original array a by computing the prefix sum of b:

    an=i=1nbi.a_n = \sum_{i=1}^n b_i.

  • The total sum of array a can be expressed as a weighted sum over b:

    Total Sum=i=1n(ni+1)bi.\text{Total Sum} = \sum_{i=1}^n (n-i+1)\, b_i.

The beauty of difference arrays lies in their simplicity when performing range updates. For example, if you want to add a constant k to every element in the interval [l, r] in array a, you can update b as follows:

bl+=k,br+1=k.b_l \mathrel{+}= k,\quad b_{r+1} \mathrel{-}= k.

After processing all such updates, a single pass of prefix sum computation on b will yield the final state of a.

Tree-Based Difference

The idea of difference arrays extends naturally to trees and can be very useful in situations where there are frequent updates along paths in a tree, along with subsequent queries.

Point Difference on Trees

Suppose you have multiple operations that add a unit value to every node along a path between two nodes (s, t). Instead of updating all nodes on each path explicitly (which would be too slow for many operations), you can use a tree difference strategy. If you let d[i] represent the difference value for node i, then for a single path query from s to t:

  1. Increase d[s] by 1.
  2. Increase d[t] by 1.
  3. Decrease d[lca(s, t)] by 1.
  4. Decrease the difference value at the parent of the lca by 1.

This approach ensures that when you later process the tree (typically via a DFS), the accumulated difference at each node gives you the number of times it was visited as part of the query paths.

Edge Difference on Trees

If your updates are meant for the edges of the tree rather than the nodes, you can adjust the strategy. A common approach is:

  1. Increase the difference at both endpoints s and t by 1.
  2. Decrease the difference at the lca(s, t) by 2.

Since it is trickier to work with edges directly, one strategy is to “push” the edge contributions to the adjacent nodes. In either case, the method translates the update operation into a format where a DFS can combine the differences appropriately.

Example Problem Scenario

Imagine you have a tree of N nodes with N − 1 connecting edges. There are several path queries (like transporting milk along a path), and each query adds a unit of "pressure" to every node along the path. By applying the tree difference technique described above, you can efficiently compute how many times each node was included in these paths. After accumulating the differences with a DFS, identifying the node with the maximum pressure becomes straightforward.

Below is a simplified sketch in Python for applying point difference updates on a tree. (Note: In an actual implementation, you would also incorporate a method for computing the LCA and performing the DFS.)

# Suppose we have a tree represented as an adjacency list and an array 'd' for differences.
# The function 'update_difference(s, t, lca, parent_lca)' represents the update for a single path.
 
def update_difference(d, s, t, lca, parent_lca):
    d[s] += 1
    d[t] += 1
    d[lca] -= 1
    if parent_lca is not None:
        d[parent_lca] -= 1
 
# Later, a DFS traversal would be used to propagate these difference values to compute
# the final number of times each node is visited along all paths.

Exercises

Try solving the following problems to deepen your understanding of prefix sums and difference arrays:

Prefix Sum Practice:

  • Compute the sum of elements over various ranges quickly using a precomputed prefix sum array.
  • Reverse-engineer an array from its prefix sum representation.

Multidimensional Prefix Sum:

  • Given a 2D grid, answer multiple submatrix sum queries efficiently.
  • Extend your solution to handle higher dimensions where appropriate.

Tree-Based Techniques:

  • Given a tree with weighted nodes or edges, compute path sums using tree-based prefix sums.
  • Use tree difference methods to process a series of path update queries and determine the final values at each node.

Difference Arrays:

  • Implement range updates on a sequence (where each update adds a constant value to an interval) using difference arrays.
  • Apply tree difference concepts to handle update queries on tree structures.

By practicing these problems, you will strengthen your ability to efficiently process queries and updates in various data structures and algorithmic contexts.

Happy coding!