[Unix] Determining file versions using 'join' command

I once being paid for fixing a huge software package that was already installed, but incorrectly maintained and patched. Several files inside was from version 1, other from version 2, some were patched, some were not. Presumably, user(s) just copied files from newer versions, then from older (in attempt to rollback), etc.

What a mess. It was a hodge-podge of different versions and patches.

I needed to determine quickly a version of each file in installed software.

Here I'm using coreutils to demonstrate my idea. I took random files from coreutils of versions 6.4, 7.4, 8.10, 9.0 and put to the coreutils-hodgepodge folder.

Now I'm running sha1sum to get SHA1 sums of all files through all versions:

find coreutils-6.4 -type f -exec sha1sum {} \; | sort > coreutils-6.4.sha1sum
find coreutils-7.4 -type f -exec sha1sum {} \; | sort > coreutils-7.4.sha1sum
find coreutils-8.10 -type f -exec sha1sum {} \; | sort > coreutils-8.10.sha1sum
find coreutils-9.0 -type f -exec sha1sum {} \; | sort > coreutils-9.0.sha1sum
find coreutils-hodgepodge -type f -exec sha1sum {} \; | sort > coreutils-hodgepodge.sha1sum

The resulting *.sha1sum files has SHA1 sum for each file, like:

000148c2c406912c0ffb0cee963c16d712163131  coreutils-6.4/tests/sort/05a.X
000148c2c406912c0ffb0cee963c16d712163131  coreutils-6.4/tests/sort/06a.X
0018e0bc16561535b7c32146bee437921006aef6  coreutils-6.4/man/id.1
00661d2ab274ca379dd3e885be273cb115d299c4  coreutils-6.4/man/help2man
0081b627c43a01f7194d6fc07bde53c0b830c0c4  coreutils-6.4/tests/sort/22b.X
...

Now let's use 'join' to find similar files in two folders. This is will print similar files in coreutils-hodgepodge and coreutils-6.4. Essentially, this command picks all the lines from two files that have the same first column (SHA1 hash):

% join coreutils-hodgepodge.sha1sum coreutils-6.4.sha1sum

...

000148c2c406912c0ffb0cee963c16d712163131 coreutils-hodgepodge/tests/sort/05a.X coreutils-6.4/tests/sort/05a.X
000148c2c406912c0ffb0cee963c16d712163131 coreutils-hodgepodge/tests/sort/05a.X coreutils-6.4/tests/sort/06a.X
000148c2c406912c0ffb0cee963c16d712163131 coreutils-hodgepodge/tests/sort/06a.X coreutils-6.4/tests/sort/05a.X
000148c2c406912c0ffb0cee963c16d712163131 coreutils-hodgepodge/tests/sort/06a.X coreutils-6.4/tests/sort/06a.X
0018e0bc16561535b7c32146bee437921006aef6 coreutils-hodgepodge/man/id.1 coreutils-6.4/man/id.1
00661d2ab274ca379dd3e885be273cb115d299c4 coreutils-hodgepodge/man/help2man coreutils-6.4/man/help2man
0081b627c43a01f7194d6fc07bde53c0b830c0c4 coreutils-hodgepodge/tests/sort/22b.X coreutils-6.4/tests/sort/22b.X
008719b6e097d3224b55723f2f58f51b6463ff8f coreutils-hodgepodge/tests/uniq/35.I coreutils-6.4/tests/uniq/35.I
0093ee2b4cfd28cbbc22a15bdcd7ef86ab4e84e6 coreutils-hodgepodge/old/textutils/ChangeLog coreutils-6.4/old/textutils/ChangeLog

Also, I couldn't stand the urge and compared coreutils-5.0 from 2003 and coreutils-9.0 from 2021. How many files are still here? 129:

% join coreutils-5.0.sha1sum coreutils-9.0.sha1sum | wc -l
129

% join coreutils-5.0.sha1sum coreutils-9.0.sha1sum | head
02816e1a717f48117844d1636f1d32d9fdf26bae coreutils-5.0/tests/pr/3b3l15-t coreutils-9.0/tests/pr/3b3l15-t
055cf4df93ddb8e073129ce5b98e5ea30a578c47 coreutils-5.0/src/ls-vdir.c coreutils-9.0/src/ls-vdir.c
060df5d68ed7fbd4b1eba577a7e39f2a11212039 coreutils-5.0/man/fold.x coreutils-9.0/man/fold.x
062e8aaba51d25b7fd72c99a9a40e57a64133696 coreutils-5.0/po/boldquot.sed coreutils-9.0/po/boldquot.sed
079d6300eb1959a1f5c9e8bf5d2cc3588f5f3046 coreutils-5.0/man/fmt.x coreutils-9.0/man/fmt.x
08c3d950fc01eed68cb50bfca295a44e2f86da9b coreutils-5.0/src/ls-ls.c coreutils-9.0/src/ls-ls.c
09f378320068b4e88904a70c7ce3dea7b452fe00 coreutils-5.0/tests/pr/n+8l20-FF coreutils-9.0/tests/pr/n+8l20-FF
0a07cd2e895df4bf88942cbc6bcd18bce7777f1c coreutils-5.0/tests/pr/a3l15-t coreutils-9.0/tests/pr/a3l15-t
0b8fae6d285ad97454d2f732b63323d572fcf192 coreutils-5.0/tests/pr/t_tab_ coreutils-9.0/tests/pr/t_tab_
0dbb1a43546d5fd0a01cac86203aca42730b073a coreutils-5.0/man/ptx.x coreutils-9.0/man/ptx.x
...

N.B.: 'join' require first column to be sorted.

Also, for finding similar files and directories, see my DDFF utility.


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.