Short notes


20241205 23:07:43 CET

I've been struck by Unix psychosis again! ... trying to find some mp3s for my media player:

% find . -name "*.mp3"
...
./Bluey - Leap Of Faith (2013)/01 - Stronger.mp3
./Bluey - Leap Of Faith (2013)/07 - Keep Myself Together.mp3
./Bluey - Leap Of Faith (2013)/06 - Live Like A Millionaire.mp3
./Bluey - Leap Of Faith (2013)/04 - Take A Chance On Me.mp3
./Bluey - Leap Of Faith (2013)/05 - If You Really Wanna.mp3
./All For Love (2002)/02.  Do You.mp3
./All For Love (2002)/09.  Why Not.mp3
./All For Love (2002)/08.  Just Say It.mp3
...

But the output is too verbose. Let's rephrase: which directories have at least one .mp3 file?

% find . -name "*.mp3" | awk -F/ -vOFS=/ '{NF=NF-1; print $0}' | uniq
...
./George Howard - Collection/1985 - Dancing In The Sun
./George Howard - Collection/1992 - Do I Ever Cross Your Mind
./Angel Eyes (2003)
./Candy Dulfer Saxuality
./Blue Note - Blue Berlin- Blue Note Plays The Music of Irving Berlin
./Charles Johnson Jr.-2017-True Worship
./David Sanborn - 1976-1981 (Japanese Remasters)/1978 - Heart To Heart
./David Sanborn - 1976-1981 (Japanese Remasters)/1979 - Hideaway
...

Other solutions. But mine is more readable and less cryptic. Easier to understand and to memorize.


20241203 17:32:25 CET

Unix psychosis again!

What are most popular artists in my mp3 collection? (Relies on MP3 ID3 v2.3 tags.)

 % find -name "*.mp3" -exec id3v2 -R {} \; | grep TPE1 | sort | uniq -c | sort -n -r
    434 TPE1: Shakatak
    411 TPE1: George Benson
    295 TPE1: Candy Dulfer
    267 TPE1: Acoustic Alchemy
    263 TPE1: Special EFX
    161 TPE1: Eric Marienthal
    160 TPE1: Christophe Goze
    154 TPE1: Courtney Pine
    150 TPE1: Fourplay
    132 TPE1: Jeffrey Osborne
    127 TPE1: Soul Ballet
     97 TPE1: Jeff Golub
     89 TPE1: Chieli Minucci
     86 TPE1: Melvin Sparks
     86 TPE1: David Sanborn
     68 TPE1: Blue Knights
     65 TPE1: Cindy Bradley
     59 TPE1: George Howard
     54 TPE1: Oli Silk
     52 TPE1: Dave Koz
...

For FLACs:

 % find -name "*.flac" -exec metaflac --show-tag=Artist {} \;  | cut -d '=' -f2 | sort | uniq -c | sort -n -r | head -20
    291 Chillhop Guitar
    231 Diana Krall
    139 Herb Alpert
    125 The Crusaders
    113 The Cinematic Orchestra
    102 Al Jarreau
     96 Secret Chiefs 3
     90 Brian Culbertson
     89 Mogwai
     85 Grover Washington, Jr.
     81 Henry Cow
     68 Bill LaBounty
     65 Eric Marienthal
     62 George Benson
     52 George Duke
     52 Brian Simpson
     52 Bobby Caldwell
     51 Skeleton Crew
     50 Herb Alpert & The Tijuana Brass
     49 Hypnosonics

20241114 14:14:23 CET

This is also logarithmic scale:
A tickler file or 43 Folders System is a collection of date-labeled file folders organized in a way that allows time-sensitive documents to be filed according to the future date on which each document needs action. Documents within the folders of a tickler file can be to-do lists, pending bills, unpaid invoices, travel tickets, hotel reservations, meeting information, birthday reminders, coupons, claim tickets, call-back notes, follow-up reminders, maintenance reminders, or any other papers that require future action. Each day, the folder having the current date is retrieved from the tickler file so that any documents within it may be acted on. Essentially, a tickler file provides a way to send a reminder to oneself in the future—"tickling" one's memory.

...

43 divisions

In larger institutional uses, a tickler file would be chronological, with one section for each year or day, sometimes encompassing more than a century in as much detail as appropriate (especially
for dates far in the future). A more common technique is to have index cards with forty-three dividers or a system with forty-three folders or two accordion files. The forty-three divisions come from the sum of two numbers, thirty-one and twelve, corresponding to the maximum thirty-one days in a Gregorian or Julian month and the twelve months in a year.

Using folders, items scheduled for the current month are placed within the appropriate daily folder. Items which need to be done in a future month are placed in the corresponding monthly folder. Every day, the current daily folder is emptied and placed at the back of the set. At the start of a new month, the items for that month are removed from the month folder and placed in the corresponding daily folders.

( src )


20241028 17:46:29 CET

A blog post updated:

[Unix] Maildir, mutt and vim


20241024 15:21:35 CEST

I'm a heavy long time user of Vim and *NIX shell, but still find unexpected features.

Working via SSH shell on my book in LaTeX form, for some reason, I wanted to find all places with the "\ref{fig..." string in it and edit a bit.

% grep -nR "\\\\ref{fig"
latin/MOLS_naive/main_EN.tex:165:[See the problem 15: \ref{fig:knuth_MOLS3_10}.]
puzzles/pipe/main_EN.tex:75:The C program generates ANSI-colored output like it has been showed above (\ref{fig:pipe_shuffled}, \ref{fig:pipe_solved}) plus
solvers/MK85/adder/main.tex:32:Take a look at a full adder: \ref{fig:full_adder}.
solvers/MK85/adder/main.tex:36:\ref{fig:four_bit_adder}.

(-n grep option is for adding line number.)

Now pipe grep's output to a text file, open it in Vim, point the cursor to the 'filename:line_number' string and press gF (open the file under cursor and point to line number mentioned). Edit the "\ref{fig..." string, press bd (close current buffer), press cursor down to point at the next file name, repeat...

Yes, your <favorite IDE> can do this as well, easily. And Vim GUI also can do this easily (vimgrep). But can you do this via remote SSH connection, in console?


20241021 15:26:27 CEST

Remember serial number for Windows (or 'product key'). 5 groups of 5 characters.

I've forgot to mention also this. How many bits/bytes can be stored in such a 'product key'?

Python 3.10.12 (main, Sep 11 2024, 15:47:36) [GCC 11.4.0] on linux
...
>>> possible_characters=26+10
>>> characters_total=25
>>> possible_characters**characters_total
808281277464764060643139600456536293376
>>> import math
>>> math.log(possible_characters**characters_total,2)
129.24812503605781
>>> (math.log(possible_characters**characters_total,2))/8
16.156015629507227

~129 bits or ~16 bytes.


20241013 18:44:55 CEST

Is this for real?

 # bash VMware-Workstation-17.6.1-24319023.x86_64.bundle
Extracting VMware Installer...done.
Installing VMware Installer 3.1.0
    Copying files...

20240927 16:10:02 CEST

Unix as IDE.

All this is so obvious for me and probably for many people, but maybe someone here will want to hear that from me as well.

I never used an IDE in all my life. Even when I used MS Windows heavily.

Briefly, I used Eclipse IDE for coding for Altera Nios soft CPU. And also used MSVS, just to see how it feels, but never liked it.

Most of my computing time I spent in:

Learn (VIM|Emacs) and all important *NIX utils and you'll stop using (any) IDE. You'll see in future that this was a very important investment in your professional life.

Also, the author mentioned RCS (Revision Control System), which is very old-school.

It's so old-school that it only work on file-level. But this is an advantage I once found. If you want to have only two files (source code + repository of changes) and you're the sole maintainer/author and you don't need to publish your code on fancy syntax-highlighting github -- RCS do the job better than git, svn, etc, just because of the compactness.


20240806 21:33:00 CEST

(Lifehack, as they say.)

While staying at random and cheap hotels, one habit I develop -- carry a minijack<->RCA cable to connect TV in hotel room to my mp3-player or android.


20240722 04:06:57 CEST

[Math] How to encode two values in one.

I had to do this once, to encode a coordinate in just a number. (Graph vertex mapped to coordinate...)

There are several ways to do that.

For example, display's resolution: 1280x800. Easiest way to do this: x*1280+y to encode. To decode: x=floor(value / 1280), y=value % 1280. In other words, encoded value is just a number of pixel on display in range [0, 1280*800).

Other way is encoding it as: x*1000+y. For example, x=990, y=123:

>>> 990*1000+123
990123

Decoding: x=floor(value/1000), y=value % 1000. (Of course, 1000 must be greater than width of display (800).)

In that value you clearly see 2 values in it. Change 1000 factor to 0x1000 and these values will be seen in hexadecimal form, that can be handy sometimes.

Not that you should use this instead of pair/tuple/structure/class. But this could server as yet another demonstration of modulo arithmetics.


20240720 02:16:52 CEST

Nice use of boolean algebra in old-school Windows GUI functions.

I cut 3 pages from the Charles Petzold's Programming Windows book (1998).


20240718 02:39:39 CEST

GNU GMP library (for bignums) doesn't have logarithmic functions. But what can be used instead is a function that measures a length of a bignum for a specific base.

Let's see:

#include <stdio.h>
#include <gmp.h>

void main()
{
        mpz_t x;
        mpz_init(x);
        mpz_set_ui(x, 1);
        mpz_mul_ui(x, x, 23456789);
        mpz_mul_ui(x, x, 876543210);
        mpz_mul_ui(x, x, 1234);
        mpz_mul_ui(x, x, 8765);

        gmp_printf("x=%Zd\n", x);
        gmp_printf("mpz_sizeinbase(x, 2)=%d\n", mpz_sizeinbase(x, 2));
        gmp_printf("mpz_sizeinbase(x, 10)=%d\n", mpz_sizeinbase(x, 10));

        mpz_clear(x);
};
x=222386782399521958566900
mpz_sizeinbase(x, 2)=78
mpz_sizeinbase(x, 10)=24

We can check this in Python:

>>> import math

>>> x=222386782399521958566900

>>> math.log(x,2)
77.5574172261662

>>> math.log(x,10)
23.347108971302386

So the output of the mpz_sizeinbase() is like ceil(log(x)) or (int)log(x) if you want:

>>> math.ceil(math.log(x,2))
78

>>> math.ceil(math.log(x,10))
24

Yes, sometimes you don't need very precise value of logarithmic function. Integer part would be enough.

Internally, GMP calculates bits of a number and then multiplies that value by log(2)/log(b) precomputed value, where b is 'destination' base:

/* Compute the number of digits in base for nbits bits, making sure the result
   is never too small.  The two variants of the macro implement the same
   function; the GT2 variant below works just for bases > 2.  */
#define DIGITS_IN_BASE_FROM_BITS(res, nbits, b)                         \
  do {                                                                  \
    mp_limb_t _ph, _dummy;                                              \
    size_t _nbits = (nbits);                                            \
    umul_ppmm (_ph, _dummy, mp_bases[b].logb2, _nbits);                 \
    _ph += (_dummy + _nbits < _dummy);                                  \
    res = _ph + 1;                                                      \
  } while (0)
#define DIGITS_IN_BASEGT2_FROM_BITS(res, nbits, b)                      \
  do {                                                                  \
    mp_limb_t _ph, _dummy;                                              \
    size_t _nbits = (nbits);                                            \
    umul_ppmm (_ph, _dummy, mp_bases[b].logb2 + 1, _nbits);             \
    res = _ph + 1;                                                      \
  } while (0)

( gmp-6.3.0/gmp-impl.h )


20240717 11:38:41 CEST

Can a number be reverse engineered? Disassembled? One method to say something about number's structure is (integer) factoring.

Also, try Wolfram Alpha.

And of course, search it on OEIS.


20240709 00:02:06 CEST

Sometimes you generate a random password, and it has two same characters in it, next to each other. But that password is still random, despite the fact it looks 'less random'.

Mathematically, a '12345' password is a perfectly random pick from a list of words under regexp '[0-9][a-z][A-Z]{5}'.

A number like 777777 can be a random pick from a list of numbers in [0..999999] range.

On the other hand, passwords like '12345' and '777777' may be in all password lists of all password/hash crackers, yes. Highly probable.

IOW, it's not enough to pick a password randomly.


20240707 17:43:29 CEST

Of course, we ourselves were hardly free of this sort of self-referentiality:
WikiLeaks was WL, Julian was J, I was S (for my alias last name, Schmitt), and
others on the team were also referred to by individual letters. There was an
internal logic to the abbreviations. The more important someone was within WL,
the shorter his nickname. If you came across someone with a single initial in
the WL chat room, you could be almost certain it was one of the project’s
official representatives.

( Daniel Domscheit-Berg -- Inside WikiLeaks )

That reminds us Huffman coding.


20240704 23:40:39 CEST

Under heavy attack of Unix psychosis again. A small script to show file extension statistics in file tree.

find . -print | awk -F . '{print $NF}' | sort | uniq -c | sort -n

awk -F . '{print $NF}' - dot is a separator and print $NF prints last field.

As of Linux kernel 6.9.7:

    201 py
    340 gitignore
    805 json
    879 sh
   1332 S
   1696 txt
   2212 dtsi
   2925 dts
   3540 rst
   4010 yaml
  24804 h
  33843 c

Very useful to quickly determine project's contents. It's like a colorful statistics bar at github that shows which programming languages were used in project.


20240627 13:51:28 CEST

Anniversary: the beginners.re domain has been registered 10 years ago: 2014-06-28.

Earliest version from archive.org.

(Actually, this was copypasted from the previous home of the RE4B book: yurichev.com/RE-book.html.)

Funny thing, that HTML file was never rewritten from scratch for the last ~10 years. It was just modified from time to time.

Thank you everyone who helped, translated, found bugs, donated!


20240529 22:42:47 CEST

Most popular filename in filetree?

In Linux kernel? (As of 6.9.2?)

~/tmp/linux-6.9.2 % find . -type f | sed "s/.*\///" | sort | uniq -c | sort -n | tail -25
  51 file.c
  52 irq.h
  57 README
  57 pci.c
  57 pipeline.json
  60 inode.c
  61 trace.c
  65 cache.json
  65 debugfs.c
  66 common.h
  66 init.c
  68 memory.json
  71 core.h
  72 config
  74 trace.h
  75 irq.c
  82 setup.c
  87 Build
 100 main.c
 134 Kbuild
 134 core.c
 284 index.rst
 340 .gitignore
1698 Kconfig
2959 Makefile

sed option as recommended here.


20240527 12:46:49 CEST

Probably a nice interview question: a very popular (and actively used today) programming language with only one data type -- string.

Which is?..


20240523 23:07:01 EEST

This is a real problem I encountered where I used binary search.

These days, SSH servers use RSA, ECDSA and ED25519 host keys (see the '/etc/ssh' path on your Unix).

But when Ubuntu Linux included a SSH server with this support? Installing each Ubuntu version is tiresome. How can I find the version I need?

OK, Ubuntu first appeared in ~2004 (ECDSA wasn't used in SSH in Ubuntu releases). Today is the year 2024 (ECDSA keys are used in SSH).

N.B. Ubuntu release version is usually reflect release year (latest Ubuntu version is 24.04).

Let's find something in between - version 10.10. Downloaded. Installed. No ECDSA key.

I divide the period between year 2010 and 2024. Roughly, this is version 18.04. Has ECDSA key.

Now I divide the period between year 2010 and 2018. Picking version 14.10. Has ECDSA key.

I divide the period between 2010 and 2014. Picking version 12.04. Has ECDSA key.

The version left is 11.x. I pick 11.10, it has ECDSA key.

So it appeard in Ubuntu version 11.x.

This is manual binary search.


20240521 17:01:04 EEST

Previously: Simplest possible intro to finite state machines

Now even simpler. Any GUI makes your program/code FSM. Which switches its state according to user actions -- mouse clicks, etc.

See also in Wikipedia: Inversion of control, Event-driven programming.


20240514 20:22:12 EEST

Let’s start with the most important part: All Cognac is brandy, but not all brandy is Cognac.

This is a simple layman's explanation about difference between superset and subset. Brandy is superset, Cognac is subset of brandy.


20240510 17:24:04 EEST

Fancy video recorded for the Cracking simple XOR cipher with simulated annealing blog post.


20240510 16:17:35 EEST

Simplest possible way to find file duplicates. This is a real example: I scanned my JPEGs:

 % sha1sum * | sort
10d362df18146931eb19d3439cb87c721c8cf04d  picturepicture_150629094012183577199990_43688.jpg
1204fd7db1ddf7731b45f3f619acf6d84ddc8829  picturepicture_150629094554853115199991_23543.jpg
32d1ffcd1d29633169c13439855d8258e9d2777f  photo_2024-04-19_07-10-01.jpg
4304f417a725128fae72b03015df32a10376b8c4  picturepicture_15741979933266112278035_32336.jpg
46da40dfa492e39f1abdc513683ba96bbf67eba3  172657250_1839087716258578_3589359660610295879_n.jpg
5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b  picturepicture_150629093292145452199988_65085.jpg
574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e  picturepicture_150629095521251013199994_78127.jpg
6d191ccd8059492603295ee4ca01c7e69fe8b626  picturepicture_15062909525317464199993_12081.jpg
a175cdb8d33bbb11669131b6eac8a6b69fa39c24  picturepicture_150629094841634976199992_60003.jpg
a8e9a3b607910749c8e27ef7023dc4920198706d  picturepicture_150629093743938254199989_55578.jpg
b94765d3166bcae81840c0ce705d716f37d43242  1008725326.jpg
d2822f3f9a8e42438fa65c2b7208e471edcf7367  1008257482.jpg
d2822f3f9a8e42438fa65c2b7208e471edcf7367  2.jpg
dab04e1dc144caa8c400da470c5b9be138805147  asd.jpg
dab04e1dc144caa8c400da470c5b9be138805147  picturepicture_150629095998351164199995_76371.jpg
dabe5ef01ddf9866a8820ea5e9c965b37899b736  photo_2024-04-01_09-43-56.jpg
e3442a01abe7ba3baf9b0c49cc005fd29d244ff8  picturepicture_158161769749639660285488_79164.jpg

Again, using 'uniq -c' to find duplicates and sort by number of occurrences:

 % sha1sum * | cut -d ' ' -f1 | sort | uniq -c | sort -n
      1 10d362df18146931eb19d3439cb87c721c8cf04d
      1 1204fd7db1ddf7731b45f3f619acf6d84ddc8829
      1 32d1ffcd1d29633169c13439855d8258e9d2777f
      1 4304f417a725128fae72b03015df32a10376b8c4
      1 46da40dfa492e39f1abdc513683ba96bbf67eba3
      1 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b
      1 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e
      1 6d191ccd8059492603295ee4ca01c7e69fe8b626
      1 a175cdb8d33bbb11669131b6eac8a6b69fa39c24
      1 a8e9a3b607910749c8e27ef7023dc4920198706d
      1 b94765d3166bcae81840c0ce705d716f37d43242
      1 dabe5ef01ddf9866a8820ea5e9c965b37899b736
      1 e3442a01abe7ba3baf9b0c49cc005fd29d244ff8
      2 d2822f3f9a8e42438fa65c2b7208e471edcf7367
      2 dab04e1dc144caa8c400da470c5b9be138805147

Or awk:

 % sha1sum * | sort | awk '{a[$1]++;}END{for(key in a){if (a[key]>=2) {print key, a[key]}}}'
d2822f3f9a8e42438fa65c2b7208e471edcf7367 2
dab04e1dc144caa8c400da470c5b9be138805147 2

More advanced examples: 1, 2.


20240510 16:06:43 EEST

BTW, another prominent example of floating point number encoding is color encoding for resistors.

1, 2.

2-3 color bands or decimal numbers of fraction/mantissa and one color band for multiplier (acts as (binary) exponent).


20240510 15:43:48 EEST

Couple of blog posts updated:

[RevEng] Learn reverse engineering: but where to start?

[Crypto][Python] D.Bleichenbacher attack on RSA PKCS#1, part IV

[Unix] Logging your work using suckless dwm

[Unix] Maildir, mutt and vim


20240510 15:28:56 EEST

Yet again, about using the 'uniq -c' command.

(Previously: 1, 2, 3.)

Finding most web pages that are requested, but not found (404):

$ cat access.log.* | grep " 404 " | cut -d ' ' -f 7 | sort | uniq -c | sort -n
...
    709 /.well-known/security.txt
    722 /.well-known/traffic-advice
    864 /.env
    997 /sitemap.xml
   1311 //robots.txt
   1622 /apple-touch-icon-precomposed.png
   2603 /xmlrpc.php
   5105 /wp-login.php
  16092 /ads.txt
  55807 /robots.txt

20240507 15:42:31 EEST

Maybe not very funny, but meta-diff is diffing two diff files using 'diff' *nix utility again.

(I had to do this today.)


20240430 12:14:23 EEST

Formally speaking, this is transitive relation:

Introducer status is transferable; that is, an introducers’ introducer will become your introducer as well. (syncthing manual)


20240422 07:15:31 EEST

And again, I've been attacked by heavy Unix psychosis.

In Our Time podcast

Getting all mp3 files from In Our Time podcast

curl http://podcasts.files.bbci.co.uk/b006qykl.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | sort | uniq | wget -nc -i -

xmlstarlet to tidy XML output (must be installed).

grep -o to extract only URL with mp3 extension

wget -nc -i - to download URLs from stdin, but skip existing files (XML output may contain duplicates)

BBC news podcast

In the same manner, get ~5 recent BBC news podcasts (only head -20 added and RSS URL changed):

curl https://podcasts.files.bbci.co.uk/p02nq0gn.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | head -20 | wget -nc -i -

head -20, again, because of MP3 URL duplicates.

Luke’s ENGLISH Podcast

Getting Luke’s ENGLISH Podcast, but only for the year 2024.

wget -m https://teacherluke.co.uk/
cd teacherluke.co.uk/2024
grep -R https | grep mp3 | grep -oE '(http|https)://(.*).mp3' | grep -v paypal | grep "^https://open.acast.com" > URLs

URLs file needs a bit cleaning, and then:

wget -i URLs

→ [back to the main page]