[Python][SAT] Toy package manager under 200 SLOC on top of SAT solver

(The following text has been copypasted to the SAT/SMT by example book.)

OK, I should admit, the title is a clickbait. The truth is: ~200 SLOC of main python code + ~1000 SLOC of low-level SAT utils + picosat SAT solver.

Ubuntu Linux

Many Ubuntu Linux users see this often:

% sudo apt install katomic
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  a2jmidid libsamplerate0:i386 pulseaudio-module-jack python3-cffi python3-jack-client python3-ply python3-pycparser qasmixer
  qastools-common zita-ajbridge
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  kdoctools5 khelpcenter libgrantlee-templates5 libkf5kdegames-data libkf5kdegames7 libkf5newstuff-data libkf5newstuff5
  libkf5newstuffcore5 qml-module-org-kde-newstuff
The following NEW packages will be installed:
  katomic kdoctools5 khelpcenter libgrantlee-templates5 libkf5kdegames-data libkf5kdegames7 libkf5newstuff-data libkf5newstuff5
  libkf5newstuffcore5 qml-module-org-kde-newstuff
0 upgraded, 10 newly installed, 0 to remove and 20 not upgraded.
Need to get 7,175 kB of archives.
After this operation, 28.3 MB of additional disk space will be used.
Do you want to continue? [Y/n]

During installation of a package, other packages are to be installed - libraries, optional packages, etc. Also, 'apt' utility tries to find the most fresh packages.

How an Ubuntu package is defined?

Format: 3.0 (quilt)
Source: katomic
Binary: katomic
Architecture: any
Version: 4:20.08.0-1
...
Build-Depends: cmake (>= 3.5~), debhelper-compat (= 13), extra-cmake-modules (>= 5.30.0~), gettext, libkf5config-dev (>= 5.30.0~), libkf5coreaddons-dev (>= 5.30.0~), libkf5crash-dev (>= 5.30.0~), libkf5dbusaddons-dev (>= 5.30.0~), libkf5doctools-dev (>= 5.30.0~), libkf5i18n-dev (>= 5.30.0~), libkf5kdegames-dev (>= 4.9.0~), libkf5newstuff-dev (>= 5.30.0~), libkf5widgetsaddons-dev (>= 5.30.0~), libkf5xmlgui-dev (>= 5.30.0~), pkg-kde-tools (>> 0.15.15), qtbase5-dev (>= 5.7.0~)
Package-List:
 katomic deb games optional arch=any
...

( src )

Here you see that the 'katomic' package requires the 'debhelper-compat' package of exact version 13, 'pkg-kde-tools' newer than version 0.15.15, 'gettext' package of unspecific version, etc...

More on .dsc files file format: 1, 2, 3, 4.

Almost all Linux distributions has some kind of this system. Haskell has 'Cabal' system (example package).

This task is a good specimen of constraint satisfaction problem (CSP), which is NP-problem, of course.

My previous example using Z3

Ctrl-F "package manager", here.

Or, see this: 1 2.

Unlike my previous example, the present one can be used in practice for up to 1000 packages, maybe even more...

My JSON file format

My toy package manager uses config in JSON format.

This is example of test100/vers.json (available versions for each package). (I package numbers in [0..) limit and version numbers in [2000...2020] limit.)

...
        "8" : [2005, 2006, 2008, 2010, 2011, 2014, 2019],
        "9" : [2000, 2002, 2003, 2004, 2007, 2011, 2015, 2017],
...

This is test100/deps.json file (dependencies):

...
"8":
{
    "2005" : [[1, 0, 2005], [3, 0, 9999], [7, 0, 2005]],
    "2006" : [[1, 0, 9999], [3, 2005, 9999], [7, 0, 2006]],
    "2008" : [[1, 0, 2008], [3, 0, 2008], [7, 0, 9999]],
    "2010" : [[1, 0, 9999], [3, 0, 9999], [7, 2010, 9999]],
    "2011" : [[1, 0, 9999], [3, 0, 2011], [7, 0, 9999]],
    "2014" : [[1, 2008, 9999], [3, 2012, 9999], [7, 2010, 9999]],
    "2019" : [[1, 2019, 9999], [3, 0, 9999], [7, 0, 2019]]
},
"9":
{
    "2000" : [[5, 2000, 9999], [6, 2000, 9999], [7, 0, 9999], [8, 2000, 9999]],
    "2002" : [[5, 0, 9999], [6, 2002, 9999], [7, 0, 9999], [8, 2001, 9999]],
    "2003" : [[5, 2001, 9999], [6, 0, 2003], [8, 2003, 9999]],
    "2004" : [[5, 0, 2004], [6, 2004, 9999], [8, 0, 9999]],
    "2007" : [[5, 2007, 2007], [6, 0, 2007], [8, 0, 9999]],
    "2011" : [[4, 0, 9999], [5, 2002, 9999], [6, 0, 9999], [8, 0, 2011]],
    "2015" : [[4, 0, 9999], [5, 2007, 9999], [6, 0, 9999], [8, 0, 2015]],
    "2017" : [[4, 0, 9999], [5, 2010, 9999], [6, 0, 9999], [8, 0, 9999]]
},
...

It defines a list of dependencies for each package/version. Each dependency is a list of triplets. Each triple is: package number, minimal version allowed, maximal version allowed. 0 and 9999 are allowed for min/max versions.

And this a test100/conflicts.json file, that lists of conflicting packages:

[
[[40, 2013, 2017], [68, 2015, 2020]],
[[76, 2000, 2010], [32, 2000, 2012]],
[[40, 2013, 2017], [71, 2002, 2002]],
[[72, 2004, 2007], [15, 2001, 2016]],
[[36, 2001, 2007], [21, 2000, 2010]],
[[98, 2007, 2010], [14, 2004, 2012]],

...

This means, package number 80 in versions [2011..2012] cannot be installed with package 45 in versions [2005..2017]. However, other versions may be allowed.

Why JSON? To make hand editing possible.

In my previous implementation of my toy package manager in Racket, I used S-expressions, but they are less familiar to a general audience. Another option is XML, but it's too verbose...

Version 1: just get a solution

I'm assigning a boolean variable to each package/version.

Making only one version per package possible

First, only one version per package is allowed, so I'm using one-hot encoding and 'AtMost1' constraint for each package. However, a bit may be absent at all. It will leave version value in undefined state.

IMPLY constraint

Second, for each dependency, I'm adding an IMPLY constraint. If a package 1 version 2004 requires package 2 versions 2000..2002, the following constraint is to be added:

pkg-1-ver-2004 => pkg-2-ver-2000 OR pkg-2-ver-2001 OR pkg-2-ver-2002

('=>' is a sign for logical implication.)

That means that if pkg-1-ver-2004=True, one of pkg-2-... must be also true. However, converse is not required. It's possible that pkg-2-ver-2002=True, but pkg-1-ver-2004=False.

See the wikipedia page about it.

For a common computer programmer, this can be explained in terms of algol-like programming language:

if pkg-1-ver-2004:
    set True to one of these: pkg-2-ver-2000 OR pkg-2-ver-2001 OR pkg-2-ver-2002
if another package/version:
    set True to ...

... but these things wouldn't work in reverse.

Handling conflicts

For example, package 1 versions [2000..2001] cannot be installed simultaneously with package 2 versions [2005..2007]. I'll add this constraint:

AND(pkg-1-ver-2000 OR pkg-1-ver-2001, pkg-2-ver-2005 OR pkg-2-ver-2006 OR pkg-2-ver-2007)=False

...or...

NOT(AND(pkg-1-ver-2000 OR pkg-1-ver-2001, pkg-2-ver-2005 OR pkg-2-ver-2006 OR pkg-2-ver-2007))==True

NAND gate can also be called "not both". In my example, in plain English language, that means "pkg-1-ver-2000 OR pkg-1-ver-2001 cannot be true if pkg-2-ver-2005 OR pkg-2-ver-2006 OR pkg-2-ver-2007 is also true and vice versa".

In other words, these two OR expressions cannot be both true, but can be both false.

Getting solution

You'll get a correct solution, but other package/versions may also be turned on. Because you can't 'ground' them to false by default. 'Unconnected packages' are 'dangling' chaotically each time you run SAT solver.

My idea is to collect information from solution recursively, starting at the initial list of desired packages supplied by an end-user. See the collect_from_solution_recursively() function.

Test

Let's take a test data from 'test100' folder and run it:

% python3 v1.py test100 0 1 5 10 50
get_first_solution
initial_pkgs: [0, 1, 5, 10, 50]
going to run solver
SAT
first solution:
0:2018; 1:2020; 2:2019; 3:2019; 4:2020; 5:2015; 6:2016; 7:2020; 8:2014; 10:2015; 11:2018; 12:2019; 13:2020; 14:2012; 15:2020; 16:2017; 17:2019; 18:2020; 19:2018; 21:2020; 22:2000; 23:2001; 28:2001; 29:2020; 34:2020; 35:2014; 36:2015; 50:2014;

The solution is correct, but versions are 'random' (in allowed limits), not maximized.

The source code.

Circular dependencies

As a by-product, my toy package manager can support circular dependencies.

Open this example (deps.json) in the 'test_circ' folder:

{
"0":
{
    "2016" : [[2,2005,2015]]
},
"1":
{
    "2018" : [[0,2001,2016]]
},
"2":
{
    "2015" : [[1,2006,2018]]
}
}
% python3 v1.py test_circ 0
get_first_solution
initial_pkgs: [0]
going to run solver
SAT
first solution:
0:2016; 1:2018; 2:2015;

Indeed, there are 3 implications: 0 => 1; 1 => 2; 2 => 0. By the rules of boolean logic, that means, all 3 boolean variables would be simultaneously false or true.

I'm not sure if they are handled in a real package manager like Ubuntu's 'apt'. But this feature is possible.

Version 2: maximize versions

The package manager in this form is somewhat useless - an end-user would want to get newest version for each package. So we would maximize them.

This problem seems to be natural to be solved using MaxSAT solver, and I tried, but stuck with a problem: MaxSAT solver tries to raise all package's versions, including 'invisible' ones, not limiting to the ones that has been picked during solving.

This is the algorithm I've conceived:

Get an initial solution (like in version 1);

Pick a package/version from solution, fix it at highest version, check
if SAT:
    Leave it as is, pick another package/version
if UNSAT:
    Decrement current version until SAT

Example trace:

% python3 v2.py test100 5
get_first_solution
initial_pkgs: [5]
going to run solver
SAT
first solution:
0:2018; 5:2006;
trying version 2015 for package 5
SAT
solution: 0:2018; 1:2020; 4:2020; 5:2015;
trying version 2020 for package 4
fixed_versions: {5: 2015}
SAT
solution: 0:2018; 1:2020; 4:2020; 5:2015;
trying version 2020 for package 1
fixed_versions: {5: 2015, 4: 2020}
SAT
solution: 0:2018; 1:2020; 4:2020; 5:2015;
trying version 2018 for package 0
fixed_versions: {5: 2015, 4: 2020, 1: 2020}
SAT
solution: 0:2018; 1:2020; 4:2020; 5:2015;
final solution: 0:2018; 1:2020; 4:2020; 5:2015;

Now here I'm trying to 'install' package 4 and 11. My toy package manager tries to raise versions of all packages, including dependent ones:

% python3 v2.py test100 4 11
get_first_solution
initial_pkgs: [4, 11]
going to run solver
SAT
first solution:
4:2020; 11:2007;
trying version 2018 for package 11
SAT
solution: 0:2018; 1:2006; 4:2020; 5:2015; 11:2018;
trying version 2015 for package 5
fixed_versions: {11: 2018}
SAT
solution: 0:2018; 1:2006; 4:2020; 5:2015; 11:2018;
trying version 2020 for package 4
fixed_versions: {11: 2018, 5: 2015}
SAT
solution: 0:2018; 1:2006; 4:2020; 5:2015; 11:2018;
trying version 2020 for package 1
fixed_versions: {11: 2018, 5: 2015, 4: 2020}
SAT
solution: 0:2018; 1:2020; 4:2020; 5:2015; 11:2018;
trying version 2018 for package 0
fixed_versions: {11: 2018, 5: 2015, 4: 2020, 1: 2020}
SAT
solution: 0:2018; 1:2020; 4:2020; 5:2015; 11:2018;
final solution: 0:2018; 1:2020; 4:2020; 5:2015; 11:2018;

By the way, it's important to note that the first solution can coincide with the 'best' solution. But may not.

The source code.

SAT assumptions

I'm using SAT assumptions. What is this?

Imagine you play Doom/Quake/another shooter videogame. You gone far and you stumbled before the fork in the road. You know you may get killed by a monster. So you save the state of the game, you take a first path, you get killed by a monster, you restore a state, you take a second path, etc, etc. You want to save the state because you don't want to lose what you've achieved up to this point.

It takes a time for SAT solver to process input clauses, to 'warm up', like an engine.

Here I'm using SAT assumption after all the constraints processed, and I 'assume' that a package X has version Y. I execute sat() call of picosat, it returns SAT or UNSAT, and all 'assumptions' are reset after each sat() call, but the picosat SAT solver is still 'warmed up', and all constraints are still there. And then I 'assume' that a package X has version Y-1, and execute sat() call again, etc, etc.

It's impossible to add assumptions running picosat executable, so I have to load the libpicosat.so shared library and call all these functions one after another.

It's important to note that I could run picosat executable instead, without any assumptions, but then the whole process would be longer: because you'll have to generate CNF file each time, picosat SAT solver have to process all the clauses from the beginning, 'warm its engine', etc.

Version 3: MUS: get info about conflicts

In case of conflict, my toy package manager just gives "UNSAT", which is probably not a very useful response. Can we get more information, which packages conflicts?

I use PicoMUS utility here. (I wrote about it previously.).

The PicoMUS utility from picosat SAT solver suite will tell, which CNF clauses are to be removed to make UNSAT instance SAT. But for that we need to track, which clauses may be related to which packages.

For example, when IMPLY constraints are added, they clause numbers (from c.CNF_next_idx) are saved into a dictionary:

    for p in range(0, packages_total):
        for v in deps[p].keys():
            tmp=[s.OR_list(version_range_to_list_of_vars(dep[0], dep[1], dep[2])) for dep in deps[p][v]]
            if len(tmp)>0:
                if get_MUS:
                    clause_start=s.CNF_next_idx+1
                s.IMPLY_always(vars[p][v], s.AND_list(tmp))
                if get_MUS:
                    clause_stop=s.CNF_next_idx-1+1
                    #print ("IMPLY", p, v, "clauses=[", clause_start, clause_stop, "]")
                    [IMPLY_clauses_packages[c].add(p) for c in range(clause_start, clause_stop+1)]

And when PicoMUS utility returns list of clauses, we get a list of packages from our records, that are connected to these clauses:

        print ("running picomus")
        MUS_clauses, MUS_vars=s.get_MUS_vars()
...
        IMPLY_clauses_packages_out=set()
...
        for c in MUS_clauses:
...
            if c in IMPLY_clauses_packages.keys():
                IMPLY_clauses_packages_out.update(IMPLY_clauses_packages[c])
...
        print ("conflicted packages (IMPLY):", sorted(list(IMPLY_clauses_packages_out)))

We also keep tabs on clauses added when fixing packages user wants to install and clauses generated during conflict.json file processing.

Now a small example.

vers.json:

{
        "0" : [2011, 2016],
        "1" : [2013, 2018],
        "2" : [2010, 2015]
}

deps.json:

{
"0":
{
    "2011" : [],
    "2016" : []
},
"1":
{
    "2013" : [],
    "2018" : []
},
"2":
{
    "2010" : [],
    "2015" : []
}
}

conflicts.json:

[
[[1, 2013, 2018], [2, 2010, 2015]]
]

And I'm trying to 'install' packages 0, 1 and 2, which is impossible:

% python3 v3.py test2 0 1 2
get_first_solution, get_MUS= False
initial_pkgs: [0, 1, 2]
going to run solver
UNSAT
get_first_solution, get_MUS= True
initial_pkgs: [0, 1, 2]
running picomus
conflicted packages (set by user): [1, 2]
conflicted packages (IMPLY): []
conflicted packages (from conflicts.json): [1, 2]

Now you see that packages 1 and 2 cannot be installed with each other.

Another example (test6). Package 1 requires package 0 of version 2014 (only this version) and package 2 requires package 0 version 2011 only:

{
"0":
{
    "2011" : [],
    "2014" : []
},
"1":
{
    "2018" : [[0,2014,2014]]
},
"2":
{
    "2015" : [[0,2011,2011]]
}
}

My toy package manager find the conflict:

% python3 v3.py test6 1 2
get_first_solution, get_MUS= False
initial_pkgs: [1, 2]
going to run solver
UNSAT
get_first_solution, get_MUS= True
initial_pkgs: [1, 2]
running picomus
conflicted packages (set by user): [1, 2]
conflicted packages (IMPLY): [1, 2]
conflicted packages (from conflicts.json): []

Now example from my randomly generated test data (test100):

% python3 v3.py test200 138
get_first_solution, get_MUS= False
initial_pkgs: [138]
going to run solver
UNSAT
get_first_solution, get_MUS= True
initial_pkgs: [138]
running picomus
conflicted packages (set by user): [138]
conflicted packages (IMPLY): [1, 2, 3, 4, 5, 6, 8, 9, 11, 12, 13, 14, 15, 16, 21, 22, 28, 30, 31, 32, 34, 35, 37, 40, 44, 45, 50, 51, 54, 56, 58, 59, 61, 63, 70, 78, 81, 83, 85, 97, 98, 126, 128, 137, 138]
conflicted packages (from conflicts.json): [12, 30, 52, 61]

Unfortunately, PicoMUS is slow and usable only for debugging. It may take minutes to find a MUS.

On my Intel Core 2 Duo CPU clocked at 2.13GHz:

python3 v3.py test200 187  102.24s user 0.15s system 99% cpu 1:43.05 total
python3 v3.py test500 492  101.44s user 0.17s system 99% cpu 1:41.92 total
python3 v3.py test1000 980  38.11s user 0.17s system 99% cpu 38.602 total

It's unlikely, an end user would wait this long.

The source code.

All the files

Download.

All executables are for Linux x64.

Future work

'Remove' operation.

Further reading

Russ Cox: 1, 2.

Several attempts exists on adding SAT solvers to package managers in major *NIX distributions:

Fedora-specific: 1, 2.

Debian-specific: 1, 2.

SUSE-specific: 1, 2, 3, 4, 5.

FreeBSD: 1. Has picosat SAT solver: 1.

More comments:

Fedora, OpenSUSE, Windows 7, PubGrub and Dart, PHP, Anaconda.

A paper: OPIUM: Optimal Package Install/Uninstall Manager.


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.