Fun with roots

2022-11-13

The sqrt function from math.h that is used in C and C++ standard libraries doesn’t have an “integer-only” version. In other words, there is no function with this signature:

int sqrt(int arg);

So a few weeks ago I thought: let’s try to implement one and see if it’s any faster than using the floating-point sqrt function and casting the result to an int. Actually, since we don’t need negative numbers in this case, I’ll be using uint32_t instead of int.

(While writing this article, I came across this Wikipedia page which contains a lot of useful information, I’m adding it here for reference).

The infamous “naive implementation”

The most naive implementation I can think of is to simply test every integer starting from zero. This would have a runtime of O(n)\mathcal{O}\left(\sqrt{n}\right) , since we can stop once we find the answer.

To spice it up a little, the code below does 2 interesting things. The first is that it treats any number greater than 21612^{16} - 1 (0x0000FFFF) as an edge case, since its square would overflow a uint32_t. The second is that it avoids multiplications by using the fact that integer squares are sums of consecutive odd numbers: m2=1+3+5++2m1m^2 = 1 + 3 + 5 + \cdots + 2m-1 .

uint32_t naive_sqrt(uint32_t n) {
    constexpr uint32_t MAX_ROOT = 0x0000FFFF;

    if (n >= MAX_ROOT * MAX_ROOT) return MAX_ROOT;

    uint32_t base = 0;
    uint32_t next_square = 1;
    uint32_t diff = 3;

    while (next_square <= n) {
        next_square += diff;
        diff += 2;
        ++base;
    }

    return base;
}

The naive case above is basically a linear search. Given that numbers are sorted by nature, we can use binary search to bring the runtime down to O(logn)\mathcal{O}(\log{n}) . Here’s my implementation.

// Implementation of integer sqrt with binary search
uint32_t binary_search_sqrt(uint32_t n) {
    if (n == 0) return 0;
    if (n < 4) return 1;

    uint32_t lo = 1;
    uint32_t hi = n;
    uint32_t result = 0;

    while(lo <= hi) {
        uint32_t mid = (lo + hi) / 2;
        uint32_t inverse = n / mid;
        if (inverse == mid) {
            return mid;
        }

        if (mid < inverse) {
            lo = mid + 1;
            result = mid;
        } else {
            hi = mid - 1;
        }
    }
    return result;
}

This is your typical binary search algorithm: split the number line between 0 and n into 2 parts and recurse on the left if the middle number is too large, or on the right if it is too small.

Newton’s Method

Newton’s method is a successive approximation method that is widely used in simulation software to solve equations. It works by making an initial guess and iteratively refining it with first-order approximations, until a close enough answer is obtained.

(On a sidenote, this video is very interesting and links Newton’s method to fractals).

With Newton’s method, we can approximately solve a generic equation of the form f(x)=0f(x) = 0 . We start with an initial guess for the solution, x0x_0 , and iteratively refine it. xkx_k is our approximation after k iterations. The formula for Newton’s method states:

xk+1=xkf(xk)f(xk)x_{k+1} = x_k - \frac{f(x_k)}{f^{\prime}(x_k)}

In our case, computing the square root of n is equivalent to solving the equation

x2n=0x^2 - n = 0

If we plug this into the definition above:

xk+1=xkxk2n2xkx_{k+1} = x_k - \frac{x_k^2 - n}{2x_k} xk+1=12(xk+nxk)\Rightarrow x_{k+1} = \frac{1}{2}\left(x_k + \frac{n}{x_k}\right)

And here’s the code.

// Integer square root using Newton's method
uint32_t newton_sqrt(uint32_t n) {
    if (n <= 1) return n;

    uint32_t x0 = 1 << 16; // initial guess
    uint32_t x1;

    while (true) {
        x1 = (x0 + n/x0) / 2;
        if (x1 >= x0) { // stopping condition
            return x0;
        }
        x0 = x1;
    }
}

The stopping condition is usually based on the difference between two consecutive iterations. With floating point numbers, it can be any arbitrarily small number, but with integers we can stop when x1 >= x0. Intuitively, this works because x0 is definitely larger than the correct value and at each iteration our result gets a little bit closer.

Benchmarks

Which of these do you think is faster?

I ran a quick benchmark by computing 1 << 20 (about a million) random roots with each algorithm and averaging out the time. Here are the results on my machine.

Method Time
Built-in 35 ns
Linear Search 73 ms
Binary search 308 ns
Newton's Method 96 ns

Well… the built-in method wins by far. I guess these simple implementations can’t compete with the fine-tuned C library functions!

Bonus

What about cube roots, or higher?

You can of course use the pow function to compute any power (or root) that you like, but can we use Newton’s Method to find a simple integer implementation?

We’re trying to solve this equation: xrn=0x^r - n = 0

If we plug that into Newton’s Method, we get: xk+1=xkxkrnrxkr1x_{k+1} = x_k - \frac{x_k^r - n}{r x_k^{r-1}} xk+1=1r((r1)xk+nxkr1)\Rightarrow x_{k+1} = \frac{1}{r}\left( (r-1) x_k + \frac{n}{x_k^{r-1}}\right)

It should be simple to substitute this in the code for the square root, but I haven’t tried this yet. I’ll update the article if I get around to doing it.

If you want to read more on these topics, I would recommend starting from this Wikipedia page and this StackOverflow answer.