Fun with roots
The sqrt
function from
math.h
that is used in C and C++ standard libraries doesn’t have an
“integer-only” version. In other words, there is no function with this
signature:
int sqrt(int arg);
So a few weeks ago I thought: let’s try to implement one and see if it’s any
faster than using the floating-point sqrt
function and casting the result to
an int
. Actually, since we don’t need negative numbers in this case, I’ll be
using uint32_t
instead of int
.
(While writing this article, I came across this Wikipedia page which contains a lot of useful information, I’m adding it here for reference).
The infamous “naive implementation”
The most naive implementation I can think of is to simply test every integer starting from zero. This would have a runtime of , since we can stop once we find the answer.
To spice it up a little, the code below does 2 interesting things.
The first is that it treats any number greater than
(0x0000FFFF
) as an edge case, since its square would overflow a uint32_t
.
The second is that it avoids multiplications
by using the fact that integer squares are sums of consecutive odd numbers:
.
uint32_t naive_sqrt(uint32_t n) {
constexpr uint32_t MAX_ROOT = 0x0000FFFF;
if (n >= MAX_ROOT * MAX_ROOT) return MAX_ROOT;
uint32_t base = 0;
uint32_t next_square = 1;
uint32_t diff = 3;
while (next_square <= n) {
next_square += diff;
diff += 2;
++base;
}
return base;
}
Binary search
The naive case above is basically a linear search. Given that numbers are sorted by nature, we can use binary search to bring the runtime down to . Here’s my implementation.
// Implementation of integer sqrt with binary search
uint32_t binary_search_sqrt(uint32_t n) {
if (n == 0) return 0;
if (n < 4) return 1;
uint32_t lo = 1;
uint32_t hi = n;
uint32_t result = 0;
while(lo <= hi) {
uint32_t mid = (lo + hi) / 2;
uint32_t inverse = n / mid;
if (inverse == mid) {
return mid;
}
if (mid < inverse) {
lo = mid + 1;
result = mid;
} else {
hi = mid - 1;
}
}
return result;
}
This is your typical binary search algorithm: split the number line between 0
and n
into 2 parts and recurse on the left if the middle number is too large,
or on the right if it is too small.
Newton’s Method
Newton’s method is a successive approximation method that is widely used in simulation software to solve equations. It works by making an initial guess and iteratively refining it with first-order approximations, until a close enough answer is obtained.
(On a sidenote, this video is very interesting and links Newton’s method to fractals).
With Newton’s method, we can approximately solve a generic equation of the form . We start with an initial guess for the solution, , and iteratively refine it. is our approximation after k iterations. The formula for Newton’s method states:
In our case, computing the square root of n
is equivalent to solving the equation
If we plug this into the definition above:
And here’s the code.
// Integer square root using Newton's method
uint32_t newton_sqrt(uint32_t n) {
if (n <= 1) return n;
uint32_t x0 = 1 << 16; // initial guess
uint32_t x1;
while (true) {
x1 = (x0 + n/x0) / 2;
if (x1 >= x0) { // stopping condition
return x0;
}
x0 = x1;
}
}
The stopping condition is usually based on the difference between two
consecutive iterations. With floating point numbers, it can be any arbitrarily
small number, but with integers we can stop when x1 >= x0
. Intuitively, this
works because x0
is definitely larger than the correct value and at each
iteration our result gets a little bit closer.
Benchmarks
Which of these do you think is faster?
- Casting to a
double
, using the in-builtsqrt
function and casting back to an integer. - Linear Search
- Binary Search
- Newton’s Method
I ran a quick benchmark by computing 1 << 20
(about a million) random roots with
each algorithm and averaging out the time. Here are the results on my machine.
Method | Time |
---|---|
Built-in | 35 ns |
Linear Search | 73 ms |
Binary search | 308 ns |
Newton's Method | 96 ns |
Well… the built-in method wins by far. I guess these simple implementations can’t compete with the fine-tuned C library functions!
Bonus
What about cube roots, or higher?
You can of course use the pow
function to compute any power (or root) that you
like, but can we use Newton’s Method to find a simple integer implementation?
We’re trying to solve this equation:
If we plug that into Newton’s Method, we get:
It should be simple to substitute this in the code for the square root, but I haven’t tried this yet. I’ll update the article if I get around to doing it.
If you want to read more on these topics, I would recommend starting from this Wikipedia page and this StackOverflow answer.