[SAT] UNSAT proofs

Why would you use an UNSAT proof checker?

Unfortunately, SAT solvers are written by humans and prone to bugs.

When SAT solvers find a solution, it can be checked easily. Even visually. Imagine a solved/filled Sudoku puzzle. It may be hard to solve it (even with a computer), but you can check it visually: scan and check each row/column/3*3 block.

As of arbitrary CNF -- just plug your solution into it and evaluate it.

However, you may be unsatisfied with the 'UNSAT' response (pun intended). A SAT solver can report 'UNSAT' due to a bug -- unfortunately, there have been such bugs in solvers.

In other words, for us, SAT solvers' end users and programmers, proof checkers provide an additional test of a SAT solver. Being in paranoid mode, you can assure yourself with proof checker(s).

Also, bit-flips in RAM are sometimes possible. (To mitigate this, ECC RAM can be used.)

Sometimes CPU/RAM bugs are very real. This is a true story from my practice: an old AMD Athlon running with stopped fan (due to dust) and overheating, glitched severely without stopping or hanging.

bzip2 reported a rare error while running on it. I was so astonished, so that I even saved it:

bzip2: Caught a SIGSEGV or SIGBUS whilst compressing.

   Possible causes are (most likely first):
   (1) This computer has unreliable memory or cache hardware
       (a surprisingly common problem; try a different machine.)
   (2) A bug in the compiler used to create this executable
       (unlikely, if you didn't compile bzip2 yourself.)
   (3) A real bug in bzip2 -- I hope this should never be the case.
   The user's manual, Section 4.3, has more info on (1) and (2).

   If you suspect this is a bug in bzip2, or are unsure about (1)
   or (2), feel free to report it to me at: jseward@bzip.org.
   Section 4.3 of the user's manual describes the info a useful
   bug report should have.  If the manual is available on your
   system, please try and read it before mailing me.  If you don't
   have the manual or can't be bothered to read it, mail me anyway.

Also, proofs can be used for SAT solvers testing.

What is an UNSAT proof?

Put simply, a UNSAT proof is a trace/log-file of a SAT-solver. Which says, in other words: "You see, I got around all possible (dark) corners, but I didn't find anything".

Interestingly, Marijn Heule patched 'minisat' and 'glucose' SAT solvers to produce proofs and these patches are very small.

But obviously, a SAT solver with proof output works slower. And proof files can be huge. So you can only turn this on being in paranoid/testing mode.

Why would you trust proof checkers?

Because proof checkers are simpler and smaller, if compared to highly optimized SAT solvers that may exploit all sorts of low-level hacks/tricks and esoteric data structures.

And (sometimes) slower. (This is debated -- some proof checkers may require the same time that was used for SAT solvers to produce proof.)

And implemented differently.

It's generally considered that several proof checkers can assure you with almost 100% confidence. Like drat-trim + ACL2 (see below).

Kissat

This is a SAT solver by Armin Biere that can produce DRAT proofs in binary format. Binary -- so to be processed faster. But other proof formats may be in text form, like LRAT.

Run it:

% kissat fname.cnf fname.proof

satch

Also by Armin Biere.

Run it:

% satch fname.cnf fname.proof

drat-trim

drat-trim is a proof checker by Marijn Heule.

To run it:

% drat-trim fname.cnf fname.proof

Sample output:

...

c WARNING: backward mode ignores deletion of (pseudo) unit clause [0] -38 0
c finished parsing, read 16370253 bytes from proof file
c detected empty clause; start verification via backward checking
c 8499 of 32270 clauses in core
c 156638 of 244611 lemmas in core using 6289126 resolution steps
c 0 RAT lemmas in core; 140808 redundant literals in core lemmas
s VERIFIED
c verification time: 18.282 seconds

drat-trim can also convert your DRAT proof to LRAT format:

% drat-trim fname.cnf fname.proof -L fname.lrat

An LRAT proof can be checked using a checker that is written in the ACL2 programming language and proved sound by the ACL2 theorem-prover.

ACL2 LRAT proof checker

There is a module in ACL2 theorem prover that can check LRAT proofs.

This checker processes CNF file + LRAT proof (which can be prepared by drat-trim).

ACL2 can be tricky to install and run.

Installation (as checked on Ubuntu 20.04.3 LTS)

Official installation instruction

My recipe:

% sudo apt install sbcl

% git clone https://github.com/acl2/acl2.git

% cd acl2

% make LISP=$(command -v sbcl)

% acl2=$PWD/saved_acl2

% cd books/projects/sat/lrat/stobj-based

% ../../../../build/cert.pl -j`nproc` --acl2 "$acl2" *.lisp

% $acl2

Then type:

(include-book "run")

And then:

:q

And then:

(save-exec
  "lrat-check"
  "Large executable including run.lisp, saved with --dynamic-space-size 240000"
  :host-lisp-args "--dynamic-space-size 240000"
)

While in the ".../acl2/books/projects/sat/lrat/stobj-based" directory:

% ./run.sh fname.cnf fname.lrat

Sample output:

s VERIFIED
  (see /home/i/Downloads/t2.out)

Sample output file (*.out):

...

ACL2 !>; (LRAT::PARSE-CNF-FILE LRAT::CNF-FILE ...) took
; 0.05 seconds realtime, 0.05 seconds runtime
; (12,298,368 bytes allocated).
; (LRAT::PARSE-LRAT-FILE LRAT::LRAT-FILE ...) took
; 9.19 seconds realtime, 8.72 seconds runtime
; (2,385,596,992 bytes allocated).
; (VALID-PROOFP$-TOP FORMULA ...) took
; 8.81 seconds realtime, 8.71 seconds runtime
; (79,608,704 bytes allocated).
; (LRAT::VERIFY-LRAT-PROOF-FN LRAT::CNF-FILE ...) took
; 18.05 seconds realtime, 17.48 seconds runtime
; (2,477,504,064 bytes allocated).
s VERIFIED

Or run this to get output dumped to stdout:

% ./run.sh fname.cnf fname.lrat nil t

ACL2's LRAT checker peculiarities

It doesn't accept CNF files with empty line at the end (or with two '\n' bytes at the end).

It doesn't accept CNF files with comments (lines started with "c ").

It doesn't accept CNF files with tautological clauses. (They can be filtered out by Lingeling SAT solver/preprocessor. See the end of this.)

(These are: "... X ... -X ...". Such a clause can be satisfied with either X=false or X=true, so such a clause is to be ignored or removed.)

Sample CNF files conforming to these rules you can find in "acl2/books/projects/sat/lrat/tests" directory.

I modified my SAT_lib.py library (that generates CNF files) to align with those rules. Tautological clauses are now filtered out. CNF comments can be turned off.

'drabt' proof checker

By Armin Biere

Works as drat-trim, checks (binary) DRAT proofs. But slower than drat-trim.

'rate' proof checker

Newer proof checker written by Johannes Altmanninger, in Rust.

Checks (binary) DRAT proofs.

Sample output:

c mode: RAT
c formula-parsing time (s):                     0.028
c binary proof mode
c proof-parsing time (s):                       0.649
c preprocessing time (s):                       0.116
c verification time (s):                        7.636
c premise clauses:                              32270
c proof steps:                                 461674
c RUP introductions:                           144773
c RAT introductions:                                0
c deletions:                                   238520
c reason deletions:                               624
c reason deletions shrinking trail:                 0
c total time (s):                               8.535
c memory (MB):                                     31
s VERIFIED

My tests

I used 3 test files (from my book) that are UNSAT:

1. Thirty-six officers problem by Leonard Euler AKA Mutually orthogonal Latin squares of order 6. No solution is possible.

2. Conway's Life: Garden of Eden, a generation that doesn't have a previous one. My test is the latest known GoE (9*11), found by Steven Eker in 2017. No previous state is possible.

3. Various population count functions. In form: expr1 != expr2. No counterexample can be found.

Some of my handcrafted SMT-LIB 2.x file (t2.smt) has been converted to CNF using Boolector:

boolector --smt2 --dump-dimacs t2.smt > t2.cnf

All the generated files

Marijn Heule's mathematical proofs

Marijn Heule attacks various mathematical problems with SAT solvers. But since you can't sell a simple "UNSAT" as a mathematical proof, his (patched/tweaked) SAT solvers produce huge DRAT proofs.

Some are quite big: 200TB.

In 2017, he produced another proof: 2 petabytes.

As of 2022, these two proofs of his are known as the biggest mathematical proofs ever.

M.Heule is known for extensive work on proofs, so you can follow his work for more information.

What is omitted here

Other proof formats, SAT solvers and proof checkers. Sorry if I didn't mention other tools.

But as far as I know, the (binary) DRAT proof format is the latest development.

Also, many details are omitted that are too technical and are interesting/important for SAT solvers' developers, not solvers' users.

Formally verified SAT solver by Mathias Fleury

See: 1, 2, 3, 4.

Based on Isabelle and Isabelle-LLVM.

It is proved by Isabelle theorem prover that this SAT solver will always works correctly. Quite a milestone for both worlds: SAT and theorem proving.

But it's slow in comparison with SAT solvers.

Thanks to those who helped me:

Armin Biere, Marijn Heule, Johannes Altmanninger, Matt Kaufmann...


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.