## TAOCP 7.1.3 Exercise 198, UTF-8 encoding and program synthesis by sketching

Found this exercise in TAOCP 7.1.3 (Bitwise Tricks and Techniques):

This is like program synthesis by sketching: you give a sketch with several "holes" missing and ask some automated software to fill the "holes". In our case, a,b and c are "holes".

Let's find them using Z3:

from z3 import *

a, b, c=BitVecs('a b c', 22)

s=Solver()

def bytes_in_UTF8_seq(x):
if (x>>7)==0:
return 1
if (x>>5)==0b110:
return 2
if (x>>4)==0b1110:
return 3
if (x>>3)==0b11110:
return 4
# invalid 1st byte
return None

for x in range(256):
t=bytes_in_UTF8_seq(x)
if t!=None:
s.add(((a >> ((x>>b) & c)) & 3) == (t-1))

# enumerate all solutions:
results=[]
while s.check() == sat:
m = s.model()

print "a,b,c = 0x%x 0x%x 0x%x" % (m[a].as_long(), m[b].as_long(), m[c].as_long())

results.append(m)
block = []
for d in m:
t=d()
block.append(t != m[d])

print "results total=", len(results)



I tried various bit widths for a, b and c and found that 22 bits are enough. I've lots of results like:

...

a,b,c = 0x250100 0x3 0x381416
a,b,c = 0x258100 0x3 0x381416
a,b,c = 0x258900 0x3 0x381416
a,b,c = 0x250900 0x3 0x381416
a,b,c = 0x251100 0x3 0x381416
a,b,c = 0x259100 0x3 0x381416
a,b,c = 0x259100 0x3 0x389416
a,b,c = 0x251100 0x3 0x389416
a,b,c = 0x251100 0x3 0x189416
a,b,c = 0x259100 0x3 0x189416
a,b,c = 0x259100 0x3 0x189016

...


It seems that several least significant bits of a and c are not used. After little experimentation, I've come to this:

...

# make a,c more aesthetically appealing:

...


And the results:

a,b,c = 0x250000 0x3 0x36
a,b,c = 0x250000 0x3 0x16
a,b,c = 0x250000 0x3 0x96
a,b,c = 0x250000 0x3 0xd6
a,b,c = 0x250000 0x3 0xf6
a,b,c = 0x250000 0x3 0x76
a,b,c = 0x250000 0x3 0xb6
a,b,c = 0x250000 0x3 0x56
results total= 8


Pick any.

But how it works? Its operation is very similar to the bitwise trick related to leading/trailing zero bits counting based on De Bruijn sequences. Read more about it: https://yurichev.com/blog/de_bruijn/, https://yurichev.com/blog/de_bruijn_Z3/.