[CBMC] Recovering a plain text using only CRC64 hash

Previously, I've shown how to recover a short (printable) text by CRC32, using KLEE. This time the task is harder -- is it possible to recover a 12-char string by its CRC64 hash? (Bruteforce is not the option.)

Common sense says no, but if the input string is constrained in some way (say, it can consist only of a..z symbols and space), then it's possible:

#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>

uint64_t CRC64(uint64_t crc, uint8_t *buf, size_t len)
{
        int k;

        crc = ~crc;
        while (len--)
        {
                crc ^= *buf++;
                for (k = 0; k < 8; k++)
                        crc = crc & 1UL ? (crc >> 1) ^ 0x42f0e1eba9ea3693UL : crc >> 1;
        }
        return crc;
}

//#define STR "lorem ipsum "
#define STRLEN 12
#define HASH 0x791b385d86c37ffc

void check()
{
	char buf[STRLEN+1];
	buf[STRLEN]=0;
	int string_correct=1;
	for (int i=0; i<STRLEN; i++)
	{
		uint8_t t=buf[i];
		int char_correct=(t==' ' || (t>='a' && t<='z'));
		if (!char_correct)
			string_correct=0;
	};
	if (string_correct)
	{
		assert (CRC64(0, buf, STRLEN)!=HASH);
	};
};

int main()
{
};   

CBMC do the job very fast:

cbmc --trace --function check 1.c

...

CBMC version 5.10 (cbmc-5.10) 64-bit x86_64 linux
Parsing crc64.c
Converting
Type-checking crc64
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Starting Bounded Model Checking
Unwinding loop check.0 iteration 1 file crc64.c line 30 function check thread 0
Unwinding loop check.0 iteration 2 file crc64.c line 30 function check thread 0

...

Unwinding loop CRC64.0 iteration 8 file crc64.c line 15 function CRC64 thread 0
Unwinding loop CRC64.1 iteration 12 file crc64.c line 12 function CRC64 thread 0
size of program expression: 705 steps
simple slicing removed 4 assignments
Generated 1 VCC(s), 1 remaining after simplification
Passing problem to propositional reduction
converting SSA
Running propositional reduction
Post-processing
Solving with MiniSAT 2.2.1 with simplifier
5883 variables, 13598 clauses
SAT checker: instance is SATISFIABLE
Runtime decision procedure: 0.118034s

** Results:
[check.assertion.1] assertion CRC64(0, buf, STRLEN)!=HASH: FAILURE

Trace for check.assertion.1:

State 17 file crc64.c line 27 function check thread 0
----------------------------------------------------
  buf={ 'l', 'o', 'r', 'e', 'm', ' ', 'i', 'p', 's', 'u', 'm', ' ', 0 } ({ 01101100, 01101111, 01110010, 01100101, 01101101, 00100000, 01101001, 01110000, 01110011, 01110101, 01101101, 00100000, 00000000 })

...

This is it! Indeed, a CRC64 hash has 64 bits. But how many bits has a 12-character string, where each symbol can be one of 27?

$log_2(27^{12}) \approx 57$ bits.

I've failed when trying 13-char string: $log_2(27^{13}) \approx 61$ bits (closer to 64). CBMC can easily find a 13-char string satisfying our 64-bit CRC64 hash, but the result is different from "lorem ipsum". Perhaps, we could enumerate all possible strings using SMT solver...

Thanks to Martin Nyx Brain again, for help.