[Math] Division by multiplication: simplest possible explanation

Visit Godbolt's Compiler Explorer and try:

int div_by_3(int x) {
    return x/3;
}

For clang 18.1.0 x64 -O3 you'll see:

div_by_3:
        movsxd  rax, edi
        imul    rax, rax, 1431655766
        mov     rcx, rax
        shr     rcx, 63
        shr     rax, 32
        add     eax, ecx
        ret

1431655766=0x55555556

Why IMUL (multiplication) is used instead of IDIV (division)?!

This is a classic optimization trick.

I'll try to explain it again.

Let's start with the problem from many programming textbooks. How to convert temperature in Fahrenheit scale to Celsius scale?

\[ c=(f-32) \cdot (\frac{5}{9}) \]

(I would say that Fahrenheit scale is first 'shifted' (by subtracting 32) and then 'shrinked/compressed' (by multiplication by \( \frac{5}{9} \)). Other conversion formulas between (other) temperature scales has these 'shifting' and '(de)compression' steps.)

But working on integer values, you can't represent \( \frac{5}{9} \) number as an integer number. You can, however, do this:

\[ c=\frac{((f-32) \cdot 5)}{9} \]

In terms of code: c=((f-32)*5)/9.

The precision of the result (temperature in Celsius) would be as good as if using floating point operations.

In other words, this is multiplication by \( \frac{5}{9} \) ratio using integer arithmetic without loss of precision (given the fact that there will be no overflow during multiplication of 5).

Now back to division by 3. You may also know that multiplication by \( \frac{1}{3} \) is the same as division by 3. But we can't represent \( \frac{1}{3} \) as an integer number.

We also know that the division by \( 2^x \) comes almost with no cost -- just by bit shifting. Also, keep in mind that MUL/IMUL x64 instructions stores high half of the product to RDX and low half to RAX, because the product is as twice large as multiplier and multiplicand.

Now rewrite

\[ x \cdot \frac{1}{3} \]

... by multiplicating both numerator and denominator by \( \frac{c}{3} \)), where \( c=2^{32} \) (this constant is to be used below for the sake of brevity, this is also UINT32_MAX+1) ...

\[ x \cdot (\frac{1 \cdot \frac{c}{3}}{ 3 \cdot \frac{c}{3}}) = x \cdot (\frac{\frac{c}{3}}{c}) \]

... and of course, the resulting ratio is the same.

It can be rewritten further as

\[ \frac{x \cdot (\frac{c}{3})}{c} \]

\( \left \lfloor{(\frac{c}{3})}\right \rfloor \) is 0x55555555 and \( \left \lceil{(\frac{c}{3})}\right \rceil \) is 0x55555556.

Division by \( 2^{32} \) (in the code by GCC) is done by the 'shr rax, 32' instruction.

This is why you see the 0x55555556 constant -- this is \( \left \lceil{(\frac{c}{3})}\right \rceil \).

Now let's rewrite this to pure C:

#include <stdint.h>
#include <assert.h>

uint32_t div_by_3(uint32_t x) {
        return x/3;
}

uint32_t my_div_by_3(uint32_t x) {
        // cast the constant to 64-bit unsigned
        return x*0x55555556UL>>32;
}

void main() {
        // self-test
        // for positive signed numbers (yet)
        for (uint32_t i=0; i<0x80000000; i++) {
                uint32_t x=div_by_3(i);
                uint32_t y=my_div_by_3(i);
                assert (x==y);
        }
}

In fact things are a bit more complicated, and explained further in 1, and 2.

But this short blog post is the first step in understanding.


The problem of understanding all this is usually because that last bit shifting/division step is 'hidden' from eyes while the IMUL instruction manifesting in code.

Also try this on Godbolt's Compiler Explorer:

#include <stdint.h>
uint64_t f(uint64_t x) {
    return x/117;
}

GCC 14.2 -O3:

f(unsigned long):
        movabs  rax, -8356217400911164407 ; ignore this constant so far
        mul     rdi
        mov     rax, rdx ; actually, this is shift by 64 bits
        shr     rax, 6   ; ignore this so far
        ret

(the post first published at 20241004.)


List of my other blog posts.

Subscribe to my news feed,

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.