[Math] Hexdump and binary logarithm

This is one down-to-programmer's-earth application of binary algorithms.

I wanted to print a hexdump of a buffer. Each line is preceded by an address:

0x0 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x10 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"
0xe0 e0 e1 e2 e3 e4 e5 e6 e7-e8 e9 ea eb ec ed ee ef "................"
...
0xf0 f0 f1 f2 f3 f4 f5 f6 f7-f8 f9 fa fb fc fd fe ff "................"
0x100 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x110 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"


No good. But we can print addresses as 32-bit values:

0x00000000 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x00000010 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"
0x000000e0 e0 e1 e2 e3 e4 e5 e6 e7-e8 e9 ea eb ec ed ee ef "................"
...
0x000000f0 f0 f1 f2 f3 f4 f5 f6 f7-f8 f9 fa fb fc fd fe ff "................"
0x00000100 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x00000110 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"


Fine, but what would you do if you have 64-bit addresses? This is too bulky:

0x0000000000000000 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x0000000000000010 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"
0x00000000000000e0 e0 e1 e2 e3 e4 e5 e6 e7-e8 e9 ea eb ec ed ee ef "................"
...
0x00000000000000f0 f0 f1 f2 f3 f4 f5 f6 f7-f8 f9 fa fb fc fd fe ff "................"
0x0000000000000100 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x0000000000000110 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"


One solution is to align all values somehow according to the last value (or address). How many hex digits we need to print the last (0x110) address? 3. So let's all addresses will be printed as 3-digit numbers.

To get exact number of hex digits, we just use binary logarithm.

$\lceil \frac{log_2(0x110)}{4} \rceil$

Why ceiling? Say, the last address has 11 bits, and 11/4=2, but to print 11 binary digits, we want 3 hexadecimal digits. So the result of division must be "ceiled" to 3.

Now it's neat:

0x000 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x010 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"
0x0e0 e0 e1 e2 e3 e4 e5 e6 e7-e8 e9 ea eb ec ed ee ef "................"
...
0x0f0 f0 f1 f2 f3 f4 f5 f6 f7-f8 f9 fa fb fc fd fe ff "................"
0x100 00 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f "................"
0x110 10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f "................"


The code:

...
// http://stackoverflow.com/questions/11376288/fast-computing-of-log2-for-64-bit-integers
static const int tab64[64]=
{
63,  0, 58,  1, 59, 47, 53,  2,
60, 39, 48, 27, 54, 33, 42,  3,
61, 51, 37, 40, 49, 18, 28, 20,
55, 30, 34, 11, 43, 14, 22,  4,
62, 57, 46, 52, 38, 26, 32, 41,
50, 36, 17, 19, 29, 10, 13, 21,
56, 45, 25, 31, 35, 16,  9, 12,
44, 24, 15,  8, 23,  7,  6,  5
};

uint64_t uint64_log2 (uint64_t value)
{
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
value |= value >> 16;
value |= value >> 32;
return tab64[((uint64_t)((value - (value >> 1))*0x07EDD5E59A4E28C2)) >> 58];
}

unsigned division_ceil (unsigned x, unsigned y)
{
// https://stackoverflow.com/questions/2745074/fast-ceiling-of-an-integer-division-in-c-c
return (x + y - 1) / y;
};

void Log::hexdump(BYTE *buf, size_t size, size_t ofs)
{
unsigned fill_zeroes;
{
fill_zeroes=0;
}
else
{
fill_zeroes=division_ceil(binary_digits, 4);
};

...

out << "0x" << std::hex << std::setw(fill_zeroes) << (starting_offset + pos + ofs) << " ";
...


Now the code of my small logger library

Also, "Math-for-programmers" has more examples of logarithms.

Naive approach

I've found this is in a real code:

ostream & operator << (ostream & os, const FmtNat & fn) {
int64_t n = fn.num;
assert (n >= 0);                                              // 123456
if (n < 10)                    os << "     " << n;            // .....n
else if (n < 100)              os << "    " << n;             // ....nn
else if (n < 1000)             os << "   " << n;              // ...nnn
else if (n < 10000)            os << "  " << n;               // ..nnnn
else if (n < 100000)           os << " " << n;                // .nnnnn
else if (n < 1000000)          os << n;                       // nnnnnn
else if (n < 10000000)         os << n/1000          << "e3"; // nnnne3
else if (n < 100000000)        os << n/10000         << "e4"; // nnnne4
else if (n < 1000000000)       os << n/100000        << "e5"; // nnnne5
...


This is like reinvention of decimal logarithm.