[CBMC][SMT] Formal verification of population count functions

(Previously.)

This time we'll prove equivalence of several 64-bit functions, a case when bruteforce isn't feasible.

The population count function is the function that counts number of 1 bits in the input word, AKA Hamming weight.

Many of them I've copypasted from the wikipedia article about this function and from the chessprogramming.org website. See also John Regehr's blog post about it.

These hackish functions are very arcane, hard to understand and therefore are highly prone to (unnoticed) typos.

I run CBMC and SMT solvers on my ancient Intel Core 2 Duo T9600 clocked at 2.13GHz.

CBMC

So far, CBMC can prove this easily:

// time cbmc --trace --function check 1.c

int popcount64a(uint64_t x)
{
	x = (x & m1 ) + ((x >>  1) & m1 );
	x = (x & m2 ) + ((x >>  2) & m2 );
	x = (x & m4 ) + ((x >>  4) & m4 );
	x = (x & m8 ) + ((x >>  8) & m8 );
	x = (x & m16) + ((x >> 16) & m16);
	x = (x & m32) + ((x >> 32) & m32);
	return x;
}

int popcount64b(uint64_t x)
{
	x -= (x >> 1) & m1;             
	x = (x & m2) + ((x >> 2) & m2); 
	x = (x + (x >> 4)) & m4;        
	x += x >>  8;
	x += x >> 16;
	x += x >> 32;
	return x & 0x7f;
}

int popcount64c(uint64_t x)
{
	x -= (x >> 1) & m1;             
	x = (x & m2) + ((x >> 2) & m2); 
	x = (x + (x >> 4)) & m4;        
	return (x * h01) >> 56;
}

int popcount64_naive(uint64_t x)
{
	uint64_t rt=0, i;
	for (i=0; i<64; i++)
		rt=rt+((x>>i)&1);
	return rt;
}

void check(uint64_t c)
{
	assert (popcount64a(c)==popcount64b(c));
	// etc
};

These functions maybe quite naive, but I've been interested how CBMC handles arrays:

int popcount64_table(uint64_t x)
{
	uint64_t tbl[16]={0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};

	uint64_t rt=0;

	rt=rt+tbl[(x>>(0*4))&0xf];
	rt=rt+tbl[(x>>(1*4))&0xf];
	...
	rt=rt+tbl[(x>>(14*4))&0xf];
	rt=rt+tbl[(x>>(15*4))&0xf];
	
	return rt;
}

int popcount64_table2(uint64_t x)
{
	uint64_t tbl[256]={
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
...
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};

	uint64_t rt=0;

	rt=rt+tbl[(x>>(0*8))&0xff];
	rt=rt+tbl[(x>>(1*8))&0xff];
	...
	rt=rt+tbl[(x>>(6*8))&0xff];
	rt=rt+tbl[(x>>(7*8))&0xff];
	
	return rt;
}

Things gets a bit harder with a function copypasted from a well-known HAKMEM. It takes ~160 seconds to get job done, despite the somewhat hard (for SAT/SMT solvers) division/remainder function with the odd divisor (63):

int hakmem169_32(uint32_t x)
{
   x = x  - ((x >> 1)  & 033333333333)
          - ((x >> 2)  & 011111111111);
   x = (x +  (x >> 3)) & 030707070707 ;
   return x % 63; /* casting out 63 */
} 

int hakmem169_64(uint64_t x)
{
	return hakmem169_32(x>>32) + hakmem169_32(x&0xffffffff);
};

Things gets harder on another arcane function attributed to B.Kernighan:

int popcount64d(uint64_t x)
{
	int count;
	for (count=0; x; count++)
		x &= x - 1; // x = x & (x-1)
	return count;
}

CBMC stuck in seemingly infinite "unwinding loop":

CBMC version 5.10 (cbmc-5.10) 64-bit x86_64 linux
Parsing 1.c
Converting
Type-checking 1
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
Unwinding loop popcount64d.0 iteration 1 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 2 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 3 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 4 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 5 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 6 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 7 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 8 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 9 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 10 file 1.c line 50 function popcount64d thread 0

...

Unwinding loop popcount64d.0 iteration 1584 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 1585 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 1586 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 1587 file 1.c line 50 function popcount64d thread 0
Unwinding loop popcount64d.0 iteration 1588 file 1.c line 50 function popcount64d thread 0

...

etc

I couldn't wait it for finish, so I've interrupted.

So I tried to help CBMC by manually limiting the upper bound of the loop:

int popcount64d_my(uint64_t x)
{
	int count;
	for (count=0; count<64 && x; count++)
		x &= x - 1;
	return count;
}
CBMC version 5.10 (cbmc-5.10) 64-bit x86_64 linux
Parsing 1.c
Converting
Type-checking 1
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
Unwinding loop popcount64d_my.0 iteration 1 file 1.c line 58 function popcount64d_my thread 0
Unwinding loop popcount64d_my.0 iteration 2 file 1.c line 58 function popcount64d_my thread 0

...

Unwinding loop popcount64d_my.0 iteration 63 file 1.c line 58 function popcount64d_my thread 0
Unwinding loop popcount64d_my.0 iteration 64 file 1.c line 58 function popcount64d_my thread 0
size of program expression: 517 steps
simple slicing removed 2 assignments
Generated 1 VCC(s), 1 remaining after simplification
Passing problem to propositional reduction
converting SSA
Running propositional reduction
Post-processing
Solving with MiniSAT 2.2.1 with simplifier
17496 variables, 66127 clauses

And it takes ~42 minutes on my venerable notebook to verify this function.

I also tried to unroll the loop manually:

int popcount64d_unrolled(uint64_t x)
{
	int count=0;
	if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};
	if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};
	
	...

	if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};
	if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};if (x) {x &= x-1; count++;};

	// be sure x==0 upon the function exit...
	assert(x==0);
	return count;
}

Now it takes only 16 minutes. Please note the assert() upon the function exit. Yes, "x" must be "zeroed" upon the exit for obvious reasons. However, CBMC would prove that the assert() will never throw.

What if I comment one "if (x)..." line with 4 if's?

SAT checker: instance is SATISFIABLE
Runtime decision procedure: 73.0928s

** Results:
[popcount64d_unrolled.assertion.1] assertion x==0: FAILURE
[check.assertion.1] assertion popcount64d_unrolled(c)==popcount64a(c): FAILURE

Trace for popcount64d_unrolled.assertion.1:

State 24 file 1.c line 164 thread 0
----------------------------------------------------
  INPUT c: 18446743244780863487ul (11111111 11111111 11111111 00111110 11111111 11111111 11111111 11111111)

State 27 file 1.c line 164 thread 0
----------------------------------------------------
  c=18446743244780863487ul (11111111 11111111 11111111 00111110 11111111 11111111 11111111 11111111)

State 31 file 1.c line 166 function check thread 0
----------------------------------------------------
  x=18446743244780863487ul (11111111 11111111 11111111 00111110 11111111 11111111 11111111 11111111)

...

You see -- the "x" variable is almost "filled" with 1's. Yes, our modified function (with 4 of if's removed) will run incorrectly when "x" has 0-3 zero bits.

Also, SAT solver gave "SATISFIABLE", meaning, it found a counterexample. Otherwise, it would say "UNSATISFIABLE".

SMT solver

Can we do the same using SMT solver?

Here I am encoding popcount64a() function and also its "naive" counterpart in SMT-LIB language:

(declare-fun c0 () (_ BitVec 64)) (assert (= c0 (_ bv0 64))) ; always 0
(declare-fun c1 () (_ BitVec 64)) (assert (= c1 (_ bv1 64))) ; always 1

(declare-fun m1  () (_ BitVec 64)) (assert (= m1 #x5555555555555555))
(declare-fun m2  () (_ BitVec 64)) (assert (= m2 #x3333333333333333))
(declare-fun m4  () (_ BitVec 64)) (assert (= m4 #x0f0f0f0f0f0f0f0f))
(declare-fun m8  () (_ BitVec 64)) (assert (= m8 #x00ff00ff00ff00ff))
(declare-fun m16 () (_ BitVec 64)) (assert (= m16 #x0000ffff0000ffff))
(declare-fun m32 () (_ BitVec 64)) (assert (= m32 #x00000000ffffffff))

...

(assert (= a_x1  (bvadd (bvand a_x m1)   (bvand (bvlshr a_x  (_ bv1 64)) m1))))
(assert (= a_x2  (bvadd (bvand a_x1 m2)  (bvand (bvlshr a_x1 (_ bv2 64)) m2))))
(assert (= a_x3  (bvadd (bvand a_x2 m4)  (bvand (bvlshr a_x2 (_ bv4 64)) m4))))
(assert (= a_x4  (bvadd (bvand a_x3 m8)  (bvand (bvlshr a_x3 (_ bv8 64)) m8))))
(assert (= a_x5  (bvadd (bvand a_x4 m16) (bvand (bvlshr a_x4 (_ bv16 64)) m16))))
(assert (= a_out (bvadd (bvand a_x5 m32) (bvand (bvlshr a_x5 (_ bv32 64)) m32))))

; popcount64(): the naive way:
; (x>>0)&1 + (x>>1)&1 + (x>>2)&1 + ...

(assert (= naive_out
(bvadd 
       (bvand (bvlshr naive_x (_ bv0 64)) (_ bv1 64))
       (bvand (bvlshr naive_x (_ bv1 64)) (_ bv1 64))

       ...
       
       (bvand (bvlshr naive_x (_ bv62 64)) (_ bv1 64))
       (bvand (bvlshr naive_x (_ bv63 64)) (_ bv1 64))
)))

...

(assert (= naive_x a_x)) (assert (not (= naive_out a_out)))

Kernighan's unrolled function a bit tougher:

(declare-fun kern_x0 () (_ BitVec 64))
(declare-fun kern_x1 () (_ BitVec 64))
(assert (= kern_x1 (ite (= kern_x0 c0) kern_x0 (bvand kern_x0 (bvsub kern_x0 c1)))))
(declare-fun kern_x2 () (_ BitVec 64))
(assert (= kern_x2 (ite (= kern_x1 c0) kern_x1 (bvand kern_x1 (bvsub kern_x1 c1)))))

...

(declare-fun kern_x63 () (_ BitVec 64))
(assert (= kern_x63 (ite (= kern_x62 c0) kern_x62 (bvand kern_x62 (bvsub kern_x62 c1)))))
(declare-fun kern_x64 () (_ BitVec 64))
(assert (= kern_x64 (ite (= kern_x63 c0) kern_x63 (bvand kern_x63 (bvsub kern_x63 c1)))))

; be sure x==0 upon the function exit...
;(assert (not (= kern_x64 c0)))

(declare-fun kern_out () (_ BitVec 64))

; add 1 to kern_out each time if kern_xX!=0:
(assert (= kern_out (bvadd
	(ite (= kern_x0 c0) c0 c1) (ite (= kern_x1 c0) c0 c1) (ite (= kern_x2 c0) c0 c1) (ite (= kern_x3 c0) c0 c1)
	(ite (= kern_x4 c0) c0 c1) (ite (= kern_x5 c0) c0 c1) (ite (= kern_x6 c0) c0 c1) (ite (= kern_x7 c0) c0 c1)

	...
	
	(ite (= kern_x56 c0) c0 c1) (ite (= kern_x57 c0) c0 c1) (ite (= kern_x58 c0) c0 c1) (ite (= kern_x59 c0) c0 c1)
	(ite (= kern_x60 c0) c0 c1) (ite (= kern_x61 c0) c0 c1) (ite (= kern_x62 c0) c0 c1) (ite (= kern_x63 c0) c0 c1)
)))

...

(assert (= naive_x kern_x0)) (assert (not (= naive_out kern_out)))

Again, we can prove here that the last state of "x" would always be zero.

Maybe I encoded the problem in a wrong way, but this time, both CVC4 and Z3 stuck and couldn't solve anything withing ~15 minutes timeout. However, Boolector can prove it for ~1 minute.

Of course, I could misunderstood something.

SMT arrays

Now what about array encoding?

;int popcount64_table(uint64_t x)
;{
;	uint64_t tbl[16]={0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4};
;	uint64_t rt=0;
;	rt=rt+tbl[(x>>(0*4))&0xf];
;	rt=rt+tbl[(x>>(1*4))&0xf];
;...

; fill table of constants:
(declare-const tbl (Array (_ BitVec 64) (_ BitVec 64)))
(assert (= tbl (store tbl (_ bv0 64) (_ bv0 64))))
(assert (= tbl (store tbl (_ bv1 64) (_ bv1 64))))
...
(assert (= tbl (store tbl (_ bv14 64) (_ bv3 64))))
(assert (= tbl (store tbl (_ bv15 64) (_ bv4 64))))

(declare-fun tbl_x () (_ BitVec 64))
(declare-fun tbl_out () (_ BitVec 64))

;	rt=rt+tbl[(x>>(0*4))&0xf];
;	rt=rt+tbl[(x>>(1*4))&0xf];
;	...
(assert (= tbl_out
(bvadd
       (select tbl (bvand (bvlshr tbl_x (_ bv0 64)) (_ bv15 64)))
       (select tbl (bvand (bvlshr tbl_x (_ bv4 64)) (_ bv15 64)))
       ...
       (select tbl (bvand (bvlshr tbl_x (_ bv56 64)) (_ bv15 64)))
       (select tbl (bvand (bvlshr tbl_x (_ bv60 64)) (_ bv15 64)))
)))

Here we use "store" function to populate array. At each line, tbl is reassigned. We then use "select" function to address elements of the array.

Arrays are implemented in SMT solvers using UF's (uninterpreted functions), but logically, you can think about them as about chains of ITE's (if-then-else). Or like a switch() construct found in many popular PLs.

Even more, Boolector during model generation ("get-model"), shows arrays as ITE chains:

  (define-fun f (
   (f_x0 (_ BitVec 64))) (_ BitVec 64)
    (ite (= f_x0 #b0000000000000000000000000000000000000000000000000000000000000000) #b0000000000000000000000000000000000000000000000000000000000000000
    (ite (= f_x0 #b0000000000000000000000000000000000000000000000000000000000000001) #b0000000000000000000000000000000000000000000000000000000000000001

...
    
    (ite (= f_x0 #b0000000000000000000000000000000000000000000000000000000000001111) #b0000000000000000000000000000000000000000000000000000000000000100
      #b0000000000000000000000000000000000000000000000000000000000000000)))))))))))))))))

(We see here that the default value is 0, it's like "default" in "switch()" statement.)

As well as Z3:

  (define-fun f ((x!0 (_ BitVec 64))) (_ BitVec 64)
    (ite (= x!0 #x0000000000000003) #x0000000000000002
    (ite (= x!0 #x0000000000000004) #x0000000000000001
    ...
    (ite (= x!0 #x0000000000000002) #x0000000000000001
    (ite (= x!0 #x0000000000000009) #x0000000000000002
      #x0000000000000000))))))))))))))))

You can think about "select" function as if it simply evaluates internal ITE chain. And "store" function simply prepends new ITE in front of an array (or ITE chain) returning a new array (or chain).

Anyway, this version Boolector can solve for ~15 seconds, Z3 for ~7 minutes and CVC4 is stuck.

SMT uninterpreted functions

Since my array is a constant one, I can try to implement it using UF. (UF is more sophisticated concept. I'll write about them in near future.)

Here f() acts like the "select" function. And it's populated like:

"assert f(0) is 0, f(1) is 1 ... f(15) is (4)"

(declare-fun f ((_ BitVec 64)) (_ BitVec 64))
(assert (= (f (_ bv0 64)) (_ bv0 64)))
(assert (= (f (_ bv1 64)) (_ bv1 64)))
...
(assert (= (f (_ bv14 64)) (_ bv3 64)))
(assert (= (f (_ bv15 64)) (_ bv4 64)))

(declare-fun f_x () (_ BitVec 64))
(declare-fun f_out () (_ BitVec 64))

;	rt=rt+f((x>>(0*4))&0xf);
;	rt=rt+f((x>>(1*4))&0xf);
;	...
(assert (= f_out
(bvadd
       (f (bvand (bvlshr f_x (_ bv0 64)) (_ bv15 64)))
       (f (bvand (bvlshr f_x (_ bv4 64)) (_ bv15 64)))
       ...
       (f (bvand (bvlshr f_x (_ bv56 64)) (_ bv15 64)))
       (f (bvand (bvlshr f_x (_ bv60 64)) (_ bv15 64)))
)))

Boolector and Z3 can solve this for ~10-15 seconds, CVC4 is stuck.

Moral of the story

SAT/SMT solvers are highly capricious for various types of input. So it's good idea to try several of them. And of course, I probably misunderstood something, don't know yet, what exactly.

Sources of information that has been used

Claire Wolf's collection of CBMC examples (do grep cbmc).

Also, I've found SMT benchmark collection as a invaluable source of information. Benchmarks are typically generated by a SMT applications, so you can get into better understanding, how SMT is used.

clc-gitlab.cs.uiowa.edu, smt-lib.loria.fr, starexec.org (click on "spaces", etc).

These benchmarks are used in SMT-COMP competition (among SMT solvers).

The files used

Here