I've been struck by Unix psychosis again! ... trying to find some mp3s for my media player:
% find . -name "*.mp3" ... ./Bluey - Leap Of Faith (2013)/01 - Stronger.mp3 ./Bluey - Leap Of Faith (2013)/07 - Keep Myself Together.mp3 ./Bluey - Leap Of Faith (2013)/06 - Live Like A Millionaire.mp3 ./Bluey - Leap Of Faith (2013)/04 - Take A Chance On Me.mp3 ./Bluey - Leap Of Faith (2013)/05 - If You Really Wanna.mp3 ./All For Love (2002)/02. Do You.mp3 ./All For Love (2002)/09. Why Not.mp3 ./All For Love (2002)/08. Just Say It.mp3 ...
But the output is too verbose. Let's rephrase: which directories have at least one .mp3 file?
% find . -name "*.mp3" | awk -F/ -vOFS=/ '{NF=NF-1; print $0}' | uniq ... ./George Howard - Collection/1985 - Dancing In The Sun ./George Howard - Collection/1992 - Do I Ever Cross Your Mind ./Angel Eyes (2003) ./Candy Dulfer Saxuality ./Blue Note - Blue Berlin- Blue Note Plays The Music of Irving Berlin ./Charles Johnson Jr.-2017-True Worship ./David Sanborn - 1976-1981 (Japanese Remasters)/1978 - Heart To Heart ./David Sanborn - 1976-1981 (Japanese Remasters)/1979 - Hideaway ...
Other solutions. But mine is more readable and less cryptic. Easier to understand and to memorize.
Unix psychosis again!
What are most popular artists in my mp3 collection? (Relies on MP3 ID3 v2.3 tags.)
% find -name "*.mp3" -exec id3v2 -R {} \; | grep TPE1 | sort | uniq -c | sort -n -r 434 TPE1: Shakatak 411 TPE1: George Benson 295 TPE1: Candy Dulfer 267 TPE1: Acoustic Alchemy 263 TPE1: Special EFX 161 TPE1: Eric Marienthal 160 TPE1: Christophe Goze 154 TPE1: Courtney Pine 150 TPE1: Fourplay 132 TPE1: Jeffrey Osborne 127 TPE1: Soul Ballet 97 TPE1: Jeff Golub 89 TPE1: Chieli Minucci 86 TPE1: Melvin Sparks 86 TPE1: David Sanborn 68 TPE1: Blue Knights 65 TPE1: Cindy Bradley 59 TPE1: George Howard 54 TPE1: Oli Silk 52 TPE1: Dave Koz ...
For FLACs:
% find -name "*.flac" -exec metaflac --show-tag=Artist {} \; | cut -d '=' -f2 | sort | uniq -c | sort -n -r | head -20 291 Chillhop Guitar 231 Diana Krall 139 Herb Alpert 125 The Crusaders 113 The Cinematic Orchestra 102 Al Jarreau 96 Secret Chiefs 3 90 Brian Culbertson 89 Mogwai 85 Grover Washington, Jr. 81 Henry Cow 68 Bill LaBounty 65 Eric Marienthal 62 George Benson 52 George Duke 52 Brian Simpson 52 Bobby Caldwell 51 Skeleton Crew 50 Herb Alpert & The Tijuana Brass 49 Hypnosonics
A tickler file or 43 Folders System is a collection of date-labeled file folders organized in a way that allows time-sensitive documents to be filed according to the future date on which each document needs action. Documents within the folders of a tickler file can be to-do lists, pending bills, unpaid invoices, travel tickets, hotel reservations, meeting information, birthday reminders, coupons, claim tickets, call-back notes, follow-up reminders, maintenance reminders, or any other papers that require future action. Each day, the folder having the current date is retrieved from the tickler file so that any documents within it may be acted on. Essentially, a tickler file provides a way to send a reminder to oneself in the future—"tickling" one's memory. ... 43 divisions In larger institutional uses, a tickler file would be chronological, with one section for each year or day, sometimes encompassing more than a century in as much detail as appropriate (especially for dates far in the future). A more common technique is to have index cards with forty-three dividers or a system with forty-three folders or two accordion files. The forty-three divisions come from the sum of two numbers, thirty-one and twelve, corresponding to the maximum thirty-one days in a Gregorian or Julian month and the twelve months in a year. Using folders, items scheduled for the current month are placed within the appropriate daily folder. Items which need to be done in a future month are placed in the corresponding monthly folder. Every day, the current daily folder is emptied and placed at the back of the set. At the start of a new month, the items for that month are removed from the month folder and placed in the corresponding daily folders.
( src )
A blog post updated:
I'm a heavy long time user of Vim and *NIX shell, but still find unexpected features.
Working via SSH shell on my book in LaTeX form, for some reason, I wanted to find all places with the "\ref{fig..." string in it and edit a bit.
% grep -nR "\\\\ref{fig" latin/MOLS_naive/main_EN.tex:165:[See the problem 15: \ref{fig:knuth_MOLS3_10}.] puzzles/pipe/main_EN.tex:75:The C program generates ANSI-colored output like it has been showed above (\ref{fig:pipe_shuffled}, \ref{fig:pipe_solved}) plus solvers/MK85/adder/main.tex:32:Take a look at a full adder: \ref{fig:full_adder}. solvers/MK85/adder/main.tex:36:\ref{fig:four_bit_adder}.
(-n grep option is for adding line number.)
Now pipe grep's output to a text file, open it in Vim, point the cursor to the 'filename:line_number' string and press gF (open the file under cursor and point to line number mentioned). Edit the "\ref{fig..." string, press bd (close current buffer), press cursor down to point at the next file name, repeat...
Yes, your <favorite IDE> can do this as well, easily. And Vim GUI also can do this easily (vimgrep). But can you do this via remote SSH connection, in console?
Remember serial number for Windows (or 'product key'). 5 groups of 5 characters.
I've forgot to mention also this. How many bits/bytes can be stored in such a 'product key'?
Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux ... >>> possible_characters=26+10 >>> characters_total=25 >>> possible_characters**characters_total 808281277464764060643139600456536293376 >>> import math >>> math.log(possible_characters**characters_total,2) 129.24812503605781 >>> (math.log(possible_characters**characters_total,2))/8 16.156015629507227
~129 bits or ~16 bytes.
Is this for real?
# bash VMware-Workstation-17.6.1-24319023.x86_64.bundle Extracting VMware Installer...done. Installing VMware Installer 3.1.0 Copying files...
All this is so obvious for me and probably for many people, but maybe someone here will want to hear that from me as well.
I never used an IDE in all my life. Even when I used MS Windows heavily.
Briefly, I used Eclipse IDE for coding for Altera Nios soft CPU. And also used MSVS, just to see how it feels, but never liked it.
Most of my computing time I spent in:
Learn (VIM|Emacs) and all important *NIX utils and you'll stop using (any) IDE. You'll see in future that this was a very important investment in your professional life.
Also, the author mentioned RCS (Revision Control System), which is very old-school.
It's so old-school that it only work on file-level. But this is an advantage I once found. If you want to have only two files (source code + repository of changes) and you're the sole maintainer/author and you don't need to publish your code on fancy syntax-highlighting github -- RCS do the job better than git, svn, etc, just because of the compactness.
(Lifehack, as they say.)
While staying at random and cheap hotels, one habit I develop -- carry a minijack<->RCA cable to connect TV in hotel room to my mp3-player or android.
[Math] How to encode two values in one.
I had to do this once, to encode a coordinate in just a number. (Graph vertex mapped to coordinate...)
There are several ways to do that.
For example, display's resolution: 1280x800. Easiest way to do this: x*1280+y to encode. To decode: x=floor(value / 1280), y=value % 1280. In other words, encoded value is just a number of pixel on display in range [0, 1280*800).
Other way is encoding it as: x*1000+y. For example, x=990, y=123:
>>> 990*1000+123 990123
Decoding: x=floor(value/1000), y=value % 1000. (Of course, 1000 must be greater than width of display (800).)
In that value you clearly see 2 values in it. Change 1000 factor to 0x1000 and these values will be seen in hexadecimal form, that can be handy sometimes.
Not that you should use this instead of pair/tuple/structure/class. But this could server as yet another demonstration of modulo arithmetics.
Nice use of boolean algebra in old-school Windows GUI functions.
I cut 3 pages from the Charles Petzold's Programming Windows book (1998).
GNU GMP library (for bignums) doesn't have logarithmic functions. But what can be used instead is a function that measures a length of a bignum for a specific base.
Let's see:
#include <stdio.h> #include <gmp.h> void main() { mpz_t x; mpz_init(x); mpz_set_ui(x, 1); mpz_mul_ui(x, x, 23456789); mpz_mul_ui(x, x, 876543210); mpz_mul_ui(x, x, 1234); mpz_mul_ui(x, x, 8765); gmp_printf("x=%Zd\n", x); gmp_printf("mpz_sizeinbase(x, 2)=%d\n", mpz_sizeinbase(x, 2)); gmp_printf("mpz_sizeinbase(x, 10)=%d\n", mpz_sizeinbase(x, 10)); mpz_clear(x); };
x=222386782399521958566900 mpz_sizeinbase(x, 2)=78 mpz_sizeinbase(x, 10)=24
We can check this in Python:
>>> import math >>> x=222386782399521958566900 >>> math.log(x,2) 77.5574172261662 >>> math.log(x,10) 23.347108971302386
So the output of the mpz_sizeinbase() is like ceil(log(x)) or (int)log(x) if you want:
>>> math.ceil(math.log(x,2)) 78 >>> math.ceil(math.log(x,10)) 24
Yes, sometimes you don't need very precise value of logarithmic function. Integer part would be enough.
Internally, GMP calculates bits of a number and then multiplies that value by log(2)/log(b) precomputed value, where b is 'destination' base:
/* Compute the number of digits in base for nbits bits, making sure the result is never too small. The two variants of the macro implement the same function; the GT2 variant below works just for bases > 2. */ #define DIGITS_IN_BASE_FROM_BITS(res, nbits, b) \ do { \ mp_limb_t _ph, _dummy; \ size_t _nbits = (nbits); \ umul_ppmm (_ph, _dummy, mp_bases[b].logb2, _nbits); \ _ph += (_dummy + _nbits < _dummy); \ res = _ph + 1; \ } while (0) #define DIGITS_IN_BASEGT2_FROM_BITS(res, nbits, b) \ do { \ mp_limb_t _ph, _dummy; \ size_t _nbits = (nbits); \ umul_ppmm (_ph, _dummy, mp_bases[b].logb2 + 1, _nbits); \ res = _ph + 1; \ } while (0)
( gmp-6.3.0/gmp-impl.h )
Can a number be reverse engineered? Disassembled? One method to say something about number's structure is (integer) factoring.
Also, try Wolfram Alpha.
And of course, search it on OEIS.
Sometimes you generate a random password, and it has two same characters in it, next to each other. But that password is still random, despite the fact it looks 'less random'.
Mathematically, a '12345' password is a perfectly random pick from a list of words under regexp '[0-9][a-z][A-Z]{5}'.
A number like 777777 can be a random pick from a list of numbers in [0..999999] range.
On the other hand, passwords like '12345' and '777777' may be in all password lists of all password/hash crackers, yes. Highly probable.
IOW, it's not enough to pick a password randomly.
Of course, we ourselves were hardly free of this sort of self-referentiality: WikiLeaks was WL, Julian was J, I was S (for my alias last name, Schmitt), and others on the team were also referred to by individual letters. There was an internal logic to the abbreviations. The more important someone was within WL, the shorter his nickname. If you came across someone with a single initial in the WL chat room, you could be almost certain it was one of the project’s official representatives.
( Daniel Domscheit-Berg -- Inside WikiLeaks )
That reminds us Huffman coding.
Under heavy attack of Unix psychosis again. A small script to show file extension statistics in file tree.
find . -print | awk -F . '{print $NF}' | sort | uniq -c | sort -n
awk -F . '{print $NF}' - dot is a separator and print $NF prints last field.
As of Linux kernel 6.9.7:
201 py 340 gitignore 805 json 879 sh 1332 S 1696 txt 2212 dtsi 2925 dts 3540 rst 4010 yaml 24804 h 33843 c
Very useful to quickly determine project's contents. It's like a colorful statistics bar at github that shows which programming languages were used in project.
Anniversary: the beginners.re domain has been registered 10 years ago: 2014-06-28.
Earliest version from archive.org.
(Actually, this was copypasted from the previous home of the RE4B book: yurichev.com/RE-book.html.)
Funny thing, that HTML file was never rewritten from scratch for the last ~10 years. It was just modified from time to time.
Thank you everyone who helped, translated, found bugs, donated!
Most popular filename in filetree?
In Linux kernel? (As of 6.9.2?)
~/tmp/linux-6.9.2 % find . -type f | sed "s/.*\///" | sort | uniq -c | sort -n | tail -25 51 file.c 52 irq.h 57 README 57 pci.c 57 pipeline.json 60 inode.c 61 trace.c 65 cache.json 65 debugfs.c 66 common.h 66 init.c 68 memory.json 71 core.h 72 config 74 trace.h 75 irq.c 82 setup.c 87 Build 100 main.c 134 Kbuild 134 core.c 284 index.rst 340 .gitignore 1698 Kconfig 2959 Makefile
sed option as recommended here.
Probably a nice interview question: a very popular (and actively used today) programming language with only one data type -- string.
Which is?..
This is a real problem I encountered where I used binary search.
These days, SSH servers use RSA, ECDSA and ED25519 host keys (see the '/etc/ssh' path on your Unix).
But when Ubuntu Linux included a SSH server with this support? Installing each Ubuntu version is tiresome. How can I find the version I need?
OK, Ubuntu first appeared in ~2004 (ECDSA wasn't used in SSH in Ubuntu releases). Today is the year 2024 (ECDSA keys are used in SSH).
N.B. Ubuntu release version is usually reflect release year (latest Ubuntu version is 24.04).
Let's find something in between - version 10.10. Downloaded. Installed. No ECDSA key.
I divide the period between year 2010 and 2024. Roughly, this is version 18.04. Has ECDSA key.
Now I divide the period between year 2010 and 2018. Picking version 14.10. Has ECDSA key.
I divide the period between 2010 and 2014. Picking version 12.04. Has ECDSA key.
The version left is 11.x. I pick 11.10, it has ECDSA key.
So it appeard in Ubuntu version 11.x.
This is manual binary search.
Previously: Simplest possible intro to finite state machines
Now even simpler. Any GUI makes your program/code FSM. Which switches its state according to user actions -- mouse clicks, etc.
See also in Wikipedia: Inversion of control, Event-driven programming.
Let’s start with the most important part: All Cognac is brandy, but not all brandy is Cognac.
This is a simple layman's explanation about difference between superset and subset. Brandy is superset, Cognac is subset of brandy.
Fancy video recorded for the Cracking simple XOR cipher with simulated annealing blog post.
Simplest possible way to find file duplicates. This is a real example: I scanned my JPEGs:
% sha1sum * | sort 10d362df18146931eb19d3439cb87c721c8cf04d picturepicture_150629094012183577199990_43688.jpg 1204fd7db1ddf7731b45f3f619acf6d84ddc8829 picturepicture_150629094554853115199991_23543.jpg 32d1ffcd1d29633169c13439855d8258e9d2777f photo_2024-04-19_07-10-01.jpg 4304f417a725128fae72b03015df32a10376b8c4 picturepicture_15741979933266112278035_32336.jpg 46da40dfa492e39f1abdc513683ba96bbf67eba3 172657250_1839087716258578_3589359660610295879_n.jpg 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b picturepicture_150629093292145452199988_65085.jpg 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e picturepicture_150629095521251013199994_78127.jpg 6d191ccd8059492603295ee4ca01c7e69fe8b626 picturepicture_15062909525317464199993_12081.jpg a175cdb8d33bbb11669131b6eac8a6b69fa39c24 picturepicture_150629094841634976199992_60003.jpg a8e9a3b607910749c8e27ef7023dc4920198706d picturepicture_150629093743938254199989_55578.jpg b94765d3166bcae81840c0ce705d716f37d43242 1008725326.jpg d2822f3f9a8e42438fa65c2b7208e471edcf7367 1008257482.jpg d2822f3f9a8e42438fa65c2b7208e471edcf7367 2.jpg dab04e1dc144caa8c400da470c5b9be138805147 asd.jpg dab04e1dc144caa8c400da470c5b9be138805147 picturepicture_150629095998351164199995_76371.jpg dabe5ef01ddf9866a8820ea5e9c965b37899b736 photo_2024-04-01_09-43-56.jpg e3442a01abe7ba3baf9b0c49cc005fd29d244ff8 picturepicture_158161769749639660285488_79164.jpg
Again, using 'uniq -c' to find duplicates and sort by number of occurrences:
% sha1sum * | cut -d ' ' -f1 | sort | uniq -c | sort -n 1 10d362df18146931eb19d3439cb87c721c8cf04d 1 1204fd7db1ddf7731b45f3f619acf6d84ddc8829 1 32d1ffcd1d29633169c13439855d8258e9d2777f 1 4304f417a725128fae72b03015df32a10376b8c4 1 46da40dfa492e39f1abdc513683ba96bbf67eba3 1 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b 1 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e 1 6d191ccd8059492603295ee4ca01c7e69fe8b626 1 a175cdb8d33bbb11669131b6eac8a6b69fa39c24 1 a8e9a3b607910749c8e27ef7023dc4920198706d 1 b94765d3166bcae81840c0ce705d716f37d43242 1 dabe5ef01ddf9866a8820ea5e9c965b37899b736 1 e3442a01abe7ba3baf9b0c49cc005fd29d244ff8 2 d2822f3f9a8e42438fa65c2b7208e471edcf7367 2 dab04e1dc144caa8c400da470c5b9be138805147
Or awk:
% sha1sum * | sort | awk '{a[$1]++;}END{for(key in a){if (a[key]>=2) {print key, a[key]}}}' d2822f3f9a8e42438fa65c2b7208e471edcf7367 2 dab04e1dc144caa8c400da470c5b9be138805147 2
BTW, another prominent example of floating point number encoding is color encoding for resistors.
2-3 color bands or decimal numbers of fraction/mantissa and one color band for multiplier (acts as (binary) exponent).
Couple of blog posts updated:
[RevEng] Learn reverse engineering: but where to start?
[Crypto][Python] D.Bleichenbacher attack on RSA PKCS#1, part IV
[Unix] Logging your work using suckless dwm
Yet again, about using the 'uniq -c' command.
Finding most web pages that are requested, but not found (404):
$ cat access.log.* | grep " 404 " | cut -d ' ' -f 7 | sort | uniq -c | sort -n ... 709 /.well-known/security.txt 722 /.well-known/traffic-advice 864 /.env 997 /sitemap.xml 1311 //robots.txt 1622 /apple-touch-icon-precomposed.png 2603 /xmlrpc.php 5105 /wp-login.php 16092 /ads.txt 55807 /robots.txt
Maybe not very funny, but meta-diff is diffing two diff files using 'diff' *nix utility again.
(I had to do this today.)
Formally speaking, this is transitive relation:
Introducer status is transferable; that is, an introducers’ introducer will become your introducer as well. (syncthing manual)
And again, I've been attacked by heavy Unix psychosis.
Getting all mp3 files from In Our Time podcast
curl http://podcasts.files.bbci.co.uk/b006qykl.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | sort | uniq | wget -nc -i -
xmlstarlet to tidy XML output (must be installed).
grep -o to extract only URL with mp3 extension
wget -nc -i - to download URLs from stdin, but skip existing files (XML output may contain duplicates)
In the same manner, get ~5 recent BBC news podcasts (only head -20 added and RSS URL changed):
curl https://podcasts.files.bbci.co.uk/p02nq0gn.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | head -20 | wget -nc -i -
head -20, again, because of MP3 URL duplicates.
Getting Luke’s ENGLISH Podcast, but only for the year 2024.
wget -m https://teacherluke.co.uk/ cd teacherluke.co.uk/2024 grep -R https | grep mp3 | grep -oE '(http|https)://(.*).mp3' | grep -v paypal | grep "^https://open.acast.com" > URLs
URLs file needs a bit cleaning, and then:
wget -i URLs