TAOCP 7.1.3 Exercise 198, UTF-8 encoding and program synthesis by sketching

Found this exercise in TAOCP 7.1.3 (Bitwise Tricks and Techniques):

This is like program synthesis by sketching: you give a sketch with several "holes" missing and ask some automated software to fill the "holes". In our case, a,b and c are "holes".

Let's find them using Z3:

from z3 import *

a, b, c=BitVecs('a b c', 22)


def bytes_in_UTF8_seq(x):
    if (x>>7)==0:
        return 1
    if (x>>5)==0b110:
        return 2
    if (x>>4)==0b1110:
        return 3
    if (x>>3)==0b11110:
        return 4
    # invalid 1st byte
    return None

for x in range(256):
    if t!=None:
        s.add(((a >> ((x>>b) & c)) & 3) == (t-1))

# enumerate all solutions:
while s.check() == sat:
    m = s.model()

    print "a,b,c = 0x%x 0x%x 0x%x" % (m[a].as_long(), m[b].as_long(), m[c].as_long())

    block = []
    for d in m:
        block.append(t != m[d])
print "results total=", len(results)

I tried various bit widths for a, b and c and found that 22 bits are enough. I've lots of results like:


a,b,c = 0x250100 0x3 0x381416
a,b,c = 0x258100 0x3 0x381416
a,b,c = 0x258900 0x3 0x381416
a,b,c = 0x250900 0x3 0x381416
a,b,c = 0x251100 0x3 0x381416
a,b,c = 0x259100 0x3 0x381416
a,b,c = 0x259100 0x3 0x389416
a,b,c = 0x251100 0x3 0x389416
a,b,c = 0x251100 0x3 0x189416
a,b,c = 0x259100 0x3 0x189416
a,b,c = 0x259100 0x3 0x189016


It seems that several least significant bits of a and c are not used. After little experimentation, I've come to this:


# make a,c more aesthetically appealing:


And the results:

a,b,c = 0x250000 0x3 0x36
a,b,c = 0x250000 0x3 0x16
a,b,c = 0x250000 0x3 0x96
a,b,c = 0x250000 0x3 0xd6
a,b,c = 0x250000 0x3 0xf6
a,b,c = 0x250000 0x3 0x76
a,b,c = 0x250000 0x3 0xb6
a,b,c = 0x250000 0x3 0x56
results total= 8

Pick any.

But how it works? Its operation is very similar to the bitwise trick related to leading/trailing zero bits counting based on De Bruijn sequences. Read more about it: https://yurichev.com/blog/de_bruijn/, https://yurichev.com/blog/de_bruijn_Z3/.

→ [list of blog posts]

Please drop me email about any bug(s) and suggestion(s): dennis(@)yurichev.com.