Pairing Functions: Cantor & Szudzik

Overview
Cantor Pairing
- Limitations of Cantor
Szudzik Pairing
Signed Pairing
Performance Comparison
References

Overview

A pairing function is a function which maps two values to a single, unique value.

$$pair(x, y) = z$$

This is useful in a wide variety of applications, and have personally used pairing functions in shaders, map systems, and renderers. Essentially any time you want to compose a unique identifier from a pair of values. In this ramble we will cover two different pairing functions: Cantor and Szudzik.

Try them out in the demo below:

cantor(,) = 0 szudzik(,) = 0

Cantor Pairing

$$index = {(x + y)(x + y + 1) \over 2} + y$$

This can be easily implemented in any language. An example in JavaScript:

1
2
3
4


function cantorPair(x, y) 
{
    return (0.5 * (x + y) * (x + y + 1)) + y;
}

How Cantor pairing works is that you can imagine traversing a 2D field, where each real number point is given a value based on the order it which it was visited. For the Cantor function, this graph is traversed in a diagonal function is illustrated in the graphic below.

Limitations of Cantor

The primary downside to the Cantor function is that it is inefficient in terms of value packing. In a perfectly efficient function we would expect the value of pair(9, 9) to be 99. This means that all one hundred possible variations of ([0-9], [0-9]) would be covered (keeping in mind our values are 0-indexed).

However, cantor(9, 9) = 200. So we use 200 pair values for the first 100 combinations, an efficiency of 50%.

We quickly start to brush up against the limits of 32-bit signed integers with input values that really aren’t that large. For example, cantor(33000, 33000) = 2,178,066,000 which would result in an overflow.

Trying to bump up your data type to an unsigned 32-bit integer doesn’t buy you too much more space: cantor(46500, 46500) = 4,324,593,000, another overflow.

Szudzik Pairing

Like Cantor, the Szudzik function can be easily implemented anywhere. Another JavaScript example:

1
2
3
4


function szudzikPair(x, y)
{
    return (x >= y ? (x * x) + x + y : (y * y) + x);
}

Szudzik can also be visualized as traversing a 2D field, but it covers it in a box-like pattern. This graphics demonstrates the path that Szudzik takes over the field:

The primary benefit of the Szudzik function is that it has more efficient value packing. Comparing against Cantor we see:

    cantor(9, 9)  = 200
    szudzik(9, 9) = 99

Yes, the Szudzik function has 100% packing efficiency. As such, we can calculate the max input pair to Szudzik to be the square root of the maximum integer value. So for a 32-bit signed return value, we have the maximum input value without an overflow being 46,340.

For a 32-bit unsigned return value the maximum input value for Szudzik is 65,535.

Signed Pairing

Neither Cantor nor Szudzik pairing functions work natively with negative input values.

However, a simple transformation can be applied so that negative input can be used. It should be noted though that all returned pair values are still positive, as such the packing efficiency for both functions will degrade.

$$b = \left{\begin{array}{ll} -2y - 1 & : y < 0\
2y & : y \ge 0 \end{array} \right.$$

Signed Cantor

In JavaScript:

1
2
3
4
5
6


function cantorPairSigned(x, y) 
{
    const a = (x >= 0.0 ? 2.0 * x : (-2.0 * x) - 1.0);
    const b = (y >= 0.0 ? 2.0 * y : (-2.0 * y) - 1.0);
    return cantorPair(a, b);
}

Signed Szudzik

In JavaScript:

1
2
3
4
5
6


function szudzikPairSigned(x, y) 
{
    const a = (x >= 0.0 ? 2.0 * x : (-2.0 * x) - 1.0);
    const b = (y >= 0.0 ? 2.0 * y : (-2.0 * y) - 1.0);
    return szudzikPair(a, b) * 0.5;
}

Signed Szudzik, Improved Packing

Additional space can be saved, giving improved packing efficiency, by transferring half to the negative axis.

$$index = \left{\begin{array}{ll} c & : (a < 0 \cap b < 0) \cup (a \ge 0 \cap b \ge 0)\
-c - 1 & : (a < 0 \cap b \ge 0) \cup (a \ge 0 \cap b < 0) \end{array} \right.$$

In JavaScript:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


function szudzikPairSignedB(x, y) 
{
    const a = (x >= 0.0 ? 2.0 * x : (-2.0 * x) - 1.0);
    const b = (y >= 0.0 ? 2.0 * y : (-2.0 * y) - 1.0);
    const c = szudzikPair(a, b) * 0.5;

    if(((a >= 0.0) && (b < 0.0)) || ((a < 0.0) && (b >= 0.0))) {
        return -c - 1;
    }
    
    return c;
}

Performance Comparison

The performance between Cantor and Szudzik is virtually identical, with Szudzik having a slight advantage.

Implementation	Ops/sec	%
Szudzik	894,767,289	100.00
Cantor	888,441,837	99.29
Szudzik Signed	885,070,104	98.92
Cantor Signed	884,386,294	98.84
Szudzik Signed (Alt)	863,042,799	96.45

The full results of the performance comparison can be found on jsperf.

References

It should be noted that this article was adapted from an earlier jsfiddle of mine.

That fiddle makes note of the following references:

Szudzik, Matthew. An Elegant Pairing Function. http://szudzik.com/ElegantPairing.pdf.
Pairing Functions. https://en.wikipedia.org/wiki/Pairing_function