Unfortunately, SAT solvers are written by humans and prone to bugs.
When SAT solvers find a solution, it can be checked easily. Even visually. Imagine a solved/filled Sudoku puzzle. It may be hard to solve it (even with a computer), but you can check it visually: scan and check each row/column/3*3 block.
As of arbitrary CNF -- just plug your solution into it and evaluate it.
However, you may be unsatisfied with the 'UNSAT' response (pun intended). A SAT solver can report 'UNSAT' due to a bug -- unfortunately, there have been such bugs in solvers.
In other words, for us, SAT solvers' end users and programmers, proof checkers provide an additional test of a SAT solver. Being in paranoid mode, you can assure yourself with proof checker(s).
Also, bit-flips in RAM are sometimes possible. (To mitigate this, ECC RAM can be used.)
Sometimes CPU/RAM bugs are very real. This is a true story from my practice: an old AMD Athlon running with stopped fan (due to dust) and overheating, glitched severely without stopping or hanging.
bzip2 reported a rare error while running on it. I was so astonished, so that I even saved it:
bzip2: Caught a SIGSEGV or SIGBUS whilst compressing. Possible causes are (most likely first): (1) This computer has unreliable memory or cache hardware (a surprisingly common problem; try a different machine.) (2) A bug in the compiler used to create this executable (unlikely, if you didn't compile bzip2 yourself.) (3) A real bug in bzip2 -- I hope this should never be the case. The user's manual, Section 4.3, has more info on (1) and (2). If you suspect this is a bug in bzip2, or are unsure about (1) or (2), feel free to report it to me at: email@example.com. Section 4.3 of the user's manual describes the info a useful bug report should have. If the manual is available on your system, please try and read it before mailing me. If you don't have the manual or can't be bothered to read it, mail me anyway.
Also, proofs can be used for SAT solvers testing.
Put simply, a UNSAT proof is a trace/log-file of a SAT-solver. Which says, in other words: "You see, I got around all possible (dark) corners, but I didn't find anything".
Interestingly, Marijn Heule patched 'minisat' and 'glucose' SAT solvers to produce proofs and these patches are very small.
But obviously, a SAT solver with proof output works slower. And proof files can be huge. So you can only turn this on being in paranoid/testing mode.
Because proof checkers are simpler and smaller, if compared to highly optimized SAT solvers that may exploit all sorts of low-level hacks/tricks and esoteric data structures.
And (sometimes) slower. (This is debated -- some proof checkers may require the same time that was used for SAT solvers to produce proof.)
And implemented differently.
It's generally considered that several proof checkers can assure you with almost 100% confidence. Like drat-trim + ACL2 (see below).
This is a SAT solver by Armin Biere that can produce DRAT proofs in binary format. Binary -- so to be processed faster. But other proof formats may be in text form, like LRAT.
% kissat fname.cnf fname.proof
% satch fname.cnf fname.proof
drat-trim is a proof checker by Marijn Heule.
To run it:
% drat-trim fname.cnf fname.proof
... c WARNING: backward mode ignores deletion of (pseudo) unit clause  -38 0 c finished parsing, read 16370253 bytes from proof file c detected empty clause; start verification via backward checking c 8499 of 32270 clauses in core c 156638 of 244611 lemmas in core using 6289126 resolution steps c 0 RAT lemmas in core; 140808 redundant literals in core lemmas s VERIFIED c verification time: 18.282 seconds
drat-trim can also convert your DRAT proof to LRAT format:
% drat-trim fname.cnf fname.proof -L fname.lrat
An LRAT proof can be checked using a checker that is written in the ACL2 programming language and proved sound by the ACL2 theorem-prover.
There is a module in ACL2 theorem prover that can check LRAT proofs.
This checker processes CNF file + LRAT proof (which can be prepared by drat-trim).
ACL2 can be tricky to install and run.
Official installation instruction
% sudo apt install sbcl % git clone https://github.com/acl2/acl2.git % cd acl2 % make LISP=$(command -v sbcl) % acl2=$PWD/saved_acl2 % cd books/projects/sat/lrat/stobj-based % ../../../../build/cert.pl -j`nproc` --acl2 "$acl2" *.lisp % $acl2
(save-exec "lrat-check" "Large executable including run.lisp, saved with --dynamic-space-size 240000" :host-lisp-args "--dynamic-space-size 240000" )
While in the ".../acl2/books/projects/sat/lrat/stobj-based" directory:
% ./run.sh fname.cnf fname.lrat
s VERIFIED (see /home/i/Downloads/t2.out)
Sample output file (*.out):
... ACL2 !>; (LRAT::PARSE-CNF-FILE LRAT::CNF-FILE ...) took ; 0.05 seconds realtime, 0.05 seconds runtime ; (12,298,368 bytes allocated). ; (LRAT::PARSE-LRAT-FILE LRAT::LRAT-FILE ...) took ; 9.19 seconds realtime, 8.72 seconds runtime ; (2,385,596,992 bytes allocated). ; (VALID-PROOFP$-TOP FORMULA ...) took ; 8.81 seconds realtime, 8.71 seconds runtime ; (79,608,704 bytes allocated). ; (LRAT::VERIFY-LRAT-PROOF-FN LRAT::CNF-FILE ...) took ; 18.05 seconds realtime, 17.48 seconds runtime ; (2,477,504,064 bytes allocated). s VERIFIED
Or run this to get output dumped to stdout:
% ./run.sh fname.cnf fname.lrat nil t
It doesn't accept CNF files with empty line at the end (or with two '\n' bytes at the end).
It doesn't accept CNF files with comments (lines started with "c ").
It doesn't accept CNF files with tautological clauses. (They can be filtered out by Lingeling SAT solver/preprocessor. See the end of this.)
(These are: "... X ... -X ...". Such a clause can be satisfied with either X=false or X=true, so such a clause is to be ignored or removed.)
Sample CNF files conforming to these rules you can find in "acl2/books/projects/sat/lrat/tests" directory.
I modified my SAT_lib.py library (that generates CNF files) to align with those rules. Tautological clauses are now filtered out. CNF comments can be turned off.
By Armin Biere
Works as drat-trim, checks (binary) DRAT proofs. But slower than drat-trim.
Newer proof checker written by Johannes Altmanninger, in Rust.
Checks (binary) DRAT proofs.
c mode: RAT c formula-parsing time (s): 0.028 c binary proof mode c proof-parsing time (s): 0.649 c preprocessing time (s): 0.116 c verification time (s): 7.636 c premise clauses: 32270 c proof steps: 461674 c RUP introductions: 144773 c RAT introductions: 0 c deletions: 238520 c reason deletions: 624 c reason deletions shrinking trail: 0 c total time (s): 8.535 c memory (MB): 31 s VERIFIED
I used 3 test files (from my book) that are UNSAT:
1. Thirty-six officers problem by Leonard Euler AKA Mutually orthogonal Latin squares of order 6. No solution is possible.
2. Conway's Life: Garden of Eden, a generation that doesn't have a previous one. My test is the latest known GoE (9*11), found by Steven Eker in 2017. No previous state is possible.
3. Various population count functions. In form: expr1 != expr2. No counterexample can be found.
Some of my handcrafted SMT-LIB 2.x file (t2.smt) has been converted to CNF using Boolector:
boolector --smt2 --dump-dimacs t2.smt > t2.cnf
All the generated files
Marijn Heule attacks various mathematical problems with SAT solvers. But since you can't sell a simple "UNSAT" as a mathematical proof, his (patched/tweaked) SAT solvers produce huge DRAT proofs.
Some are quite big: 200TB.
In 2017, he produced another proof: 2 petabytes.
As of 2022, these two proofs of his are known as the biggest mathematical proofs ever.
M.Heule is known for extensive work on proofs, so you can follow his work for more information.
Other proof formats, SAT solvers and proof checkers. Sorry if I didn't mention other tools.
But as far as I know, the (binary) DRAT proof format is the latest development.
Also, many details are omitted that are too technical and are interesting/important for SAT solvers' developers, not solvers' users.
See: 1, 2, 3, 4.
Based on Isabelle and Isabelle-LLVM.
It is proved by Isabelle theorem prover that this SAT solver will always works correctly. Quite a milestone for both worlds: SAT and theorem proving.
But it's slow in comparison with SAT solvers.
Armin Biere, Marijn Heule, Johannes Altmanninger, Matt Kaufmann...
List of my other blog posts.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.