Xor Trick to Solve Odd Numbered Integers
Johann “Myrkraverk” Oskarsson
September 19, 2022
Problem description. A friend online, George, gave this coding exercise problem the other day.
Create a function f that when given an array of numbers, returns the number that appears an odd amount of times.
Only one number will ever appear an odd number of times, and there will always be one number that appears an odd amount of times.
e.g.
f([4]) //=> 4
f([0]) //=> 0
f([1,1,2,3,3]) //=> 2
f([1,7,1,7,3,7,3]) //=> 7
Now, there is a neat Xor trick to solve this problem. What follows is the proof. To keep things simple, yet interesting, we will only concern ourselves with two-bit integers.
I did not think of this trick myself, my own solution was the naïve implementation described below, coded in C. I am merely explaining how and why it works.
Rather than a truly rigorous proof like we’d see in a math journal this one is rather informal, and can probably be looked at as a sketch of a proof, more than a mathematical proof on its own.
A naïve way to solve the problem. Let’s look at the last example from George adapted to two-bit integers with a 7→2 substitution.
f([1,2,1,2,3,2,3]) //=> 2
The way a human would probably do this, is to sort first,
f([1,1,2,2,2,3,3]) //=> 2
and then count, we have 2×1, which is even, so we continue, and count 3×2 which is odd, so we stop. We have found our answer, 2, and don’t have to count the rest.
Introduction to Xor. People intimately familiar with the Xor function can skip this section.
Xor is a bitwise function, and is one when both bits are different, and zero otherwise. The long form name of it is exclusive or, and is almost always referred to as xor in comman parlance. Let’s look at a truth table for two-bit integers.
a |
b |
a ⊕ b |
00 |
00 |
00 |
00 |
01 |
01 |
00 |
10 |
10 |
00 |
11 |
11 |
01 |
00 |
01 |
01 |
01 |
00 |
01 |
10 |
11 |
01 |
11 |
10 |
10 |
00 |
10 |
10 |
01 |
11 |
10 |
10 |
00 |
10 |
11 |
01 |
11 |
00 |
11 |
11 |
01 |
10 |
11 |
10 |
01 |
11 |
11 |
00 |
The above table is in binary. It covers the numbers in the set {0, 1, 2, 3}. Hand checking this table, we can confirm that xor is commutative. That means it does not matter which number we start with, the result is the same left to right, or right to left. We will use this property later.
Another important property of xor is that a number xored with itself is always zero. We will make use of this property too.
The proof. Now, let’s take a look at the example problem above,
f([1,1,2,2,2,3,3]) //=> 2
and if we then go ahead and xor together all the numbers separately, we get
1⊕1⊕2⊕2⊕2⊕3⊕3
which we can simplify, by noting that every two numbers that are the same, are zero,
0⊕0⊕2⊕0 where 1⊕1=0, 2⊕2=0, and 3⊕3=0.
Now, we also know that 0⊕0=0, so this further simplifies to
0⊕2⊕0
and since a number xored with zero is the number itself, it further simplifies to
2.
All of this sounds rather pointless until we realize that xor is commutative, and we don’t have to sort first. We can simply xor all the numbers together, and the last one standing is the number we want,
1⊕2⊕1⊕2⊕3⊕2⊕3=2. □
Optimality. While at the surface, counting the individual numbers means we can stop as soon as we hit the odd numbers, and therefore do not have to look at the rest of the numbers, bringing the average case down to something less than O(n) 1, the fact remains that a computer must sort first, and therefore look at all the numbers anyway.
That means the xor method is the fastest we have.