[CS] Deduplicate your music collection using z3 SMT-solver

Not that I'm very proud of it, but I've downloaded a discography of jazz saxophonist Kenny G, at some pirate resource (I won't say which).

As it happens, the discography was packed divided by albums, live records, bootlegs and also compilations disks (18 of them). (One compilation is probably from Russia or ex-USSR, has Russian title.)

I don't know, how real these compact discs was which were sources for the mp3s. But I used it as source data.

There are 118 tracks/songs in total.

There are more popular tracks/songs and less popular (a first number is a number of compilation disks that include that track).

Now the problem --- what if I want to have all these 118 tracks in my (physical) collection of CDs, but I want to minimize number of CDs to buy? This is a good example of the (minimum) set cover problem. (Previously I used it to minimize number of test files, Ctrl-F "Making smallest possible test suite using Z3".)

Each track must be included. We enumerate all tracks and list all disks that include a track and join them by OR operation.

Some tracks are popular and shared among many compilation CDs:

track 'Don't Make Me Wait For Love':
Or(disk_0 == 1,
   disk_3 == 1,
   disk_4 == 1,
   disk_5 == 1,
   disk_6 == 1,
   disk_7 == 1,
   disk_8 == 1,
   disk_9 == 1,
   disk_11 == 1,
   disk_11 == 1,
   disk_12 == 1,
   disk_14 == 1)

Some are less popular and present only on a single compilation CD:

track 'Everytime I Close My Eyes [Version II]':
Or(disk_14 == 1)
track 'Follow Me All My Time':
Or(disk_14 == 1)

All these OR constraints must be satisfied. And number of disks is to be minimized.

The final result may not be very impressive, but --- 14 compilation CDs if carefully picked, has all 118 tracks: stdout.txt. That is, I saved 4 CDs. This is still a good/useful result.

How many ways are there to pick 14 items out of 18-item collection? This is C(18,14)=3060 (or 18 choose 14) --- too tedious to solve this problem manually.

Of course, many people listen music today using streaming services, youtube, social networks, by one-track granularity.

But hipsters are still into vinyl LP records today, maybe they will like my method. Also, the new Wim Wender's "Perfect Days" movie tells us about revived (nostalgic?) interest in audiocassettes in Japan --- I don't know how close this to reality, but it may be so at some point of (near) future --- I wouldn't be surprised.

All the files, including source code. Python code has only ~85 SLOC.

Input file format is very simple --- replace compilation CD titles and tracks/songs with whatever you want and you can run this code for anything else. It's universal.

(the post first published at 20250324, updated 20250325.)


List of my other blog posts.

Subscribe to my news feed,

Some time ago (before 24-Mar-2025) there was Disqus JS script for comments. I dropped it --- it was so motley, distracting, animated, with too much ads. I never liked it. Also, comments didn't appeared correctly (Disqus was buggy). Also, my blog is too chamberlike --- not many people write comments here. So I decided to switch to the model I once had at least in 2020 --- send me your comments by email to blog at yurichev dot com (don't forget to include URL to this blog post) and I'll copy&paste it here manually and maybe reply to it.

Let's party like it's ~1993-1996, in this ultimate, radical and uncompromisingly primitive pre-web1.0-style blog and website.