## Short notes

### 20240806 21:33:00 CEST

(Lifehack, as they say.)

While staying at random and cheap hotels, one habit I develop -- carry a minijack<->RCA cable to connect TV in hotel room to my mp3-player or android.

### 20240722 04:06:57 CEST

[Math] How to encode two values in one.

I had to do this once, to encode a coordinate in just a number. (Graph vertex mapped to coordinate...)

There are several ways to do that.

For example, display's resolution: 1280x800. Easiest way to do this: x*1280+y to encode. To decode: x=floor(value / 1280), y=value % 1280. In other words, encoded value is just a number of pixel on display in range [0, 1280*800).

Other way is encoding it as: x*1000+y. For example, x=990, y=123:

>>> 990*1000+123
990123


Decoding: x=floor(value/1000), y=value % 1000. (Of course, 1000 must be greater than width of display (800).)

In that value you clearly see 2 values in it. Change 1000 factor to 0x1000 and these values will be seen in hexadecimal form, that can be handy sometimes.

Not that you should use this instead of pair/tuple/structure/class. But this could server as yet another demonstration of modulo arithmetics.

### 20240720 02:16:52 CEST

Nice use of boolean algebra in old-school Windows GUI functions.

I cut 3 pages from the Charles Petzold's Programming Windows book (1998).

### 20240718 02:39:39 CEST

GNU GMP library (for bignums) doesn't have logarithmic functions. But what can be used instead is a function that measures a length of a bignum for a specific base.

Let's see:

#include <stdio.h>
#include <gmp.h>

void main()
{
mpz_t x;
mpz_init(x);
mpz_set_ui(x, 1);
mpz_mul_ui(x, x, 23456789);
mpz_mul_ui(x, x, 876543210);
mpz_mul_ui(x, x, 1234);
mpz_mul_ui(x, x, 8765);

gmp_printf("x=%Zd\n", x);
gmp_printf("mpz_sizeinbase(x, 2)=%d\n", mpz_sizeinbase(x, 2));
gmp_printf("mpz_sizeinbase(x, 10)=%d\n", mpz_sizeinbase(x, 10));

mpz_clear(x);
};

x=222386782399521958566900
mpz_sizeinbase(x, 2)=78
mpz_sizeinbase(x, 10)=24


We can check this in Python:

>>> import math

>>> x=222386782399521958566900

>>> math.log(x,2)
77.5574172261662

>>> math.log(x,10)
23.347108971302386


So the output of the mpz_sizeinbase() is like ceil(log(x)) or (int)log(x) if you want:

>>> math.ceil(math.log(x,2))
78

>>> math.ceil(math.log(x,10))
24


Yes, sometimes you don't need very precise value of logarithmic function. Integer part would be enough.

Internally, GMP calculates bits of a number and then multiplies that value by log(2)/log(b) precomputed value, where b is 'destination' base:

/* Compute the number of digits in base for nbits bits, making sure the result
is never too small.  The two variants of the macro implement the same
function; the GT2 variant below works just for bases > 2.  */
#define DIGITS_IN_BASE_FROM_BITS(res, nbits, b)                         \
do {                                                                  \
mp_limb_t _ph, _dummy;                                              \
size_t _nbits = (nbits);                                            \
umul_ppmm (_ph, _dummy, mp_bases[b].logb2, _nbits);                 \
_ph += (_dummy + _nbits < _dummy);                                  \
res = _ph + 1;                                                      \
} while (0)
#define DIGITS_IN_BASEGT2_FROM_BITS(res, nbits, b)                      \
do {                                                                  \
mp_limb_t _ph, _dummy;                                              \
size_t _nbits = (nbits);                                            \
umul_ppmm (_ph, _dummy, mp_bases[b].logb2 + 1, _nbits);             \
res = _ph + 1;                                                      \
} while (0)


( gmp-6.3.0/gmp-impl.h )

### 20240717 11:38:41 CEST

Can a number be reverse engineered? Disassembled? One method to say something about number's structure is (integer) factoring.

Also, try Wolfram Alpha.

And of course, search it on OEIS.

### 20240709 00:02:06 CEST

Sometimes you generate a random password, and it has two same characters in it, next to each other. But that password is still random, despite the fact it looks 'less random'.

Mathematically, a '12345' password is a perfectly random pick from a list of words under regexp '[0-9][a-z][A-Z]{5}'.

A number like 777777 can be a random pick from a list of numbers in [0..999999] range.

On the other hand, passwords like '12345' and '777777' may be in all password lists of all password/hash crackers, yes. Highly probable.

IOW, it's not enough to pick a password randomly.

### 20240707 17:43:29 CEST

Of course, we ourselves were hardly free of this sort of self-referentiality:
WikiLeaks was WL, Julian was J, I was S (for my alias last name, Schmitt), and
others on the team were also referred to by individual letters. There was an
internal logic to the abbreviations. The more important someone was within WL,
the shorter his nickname. If you came across someone with a single initial in
the WL chat room, you could be almost certain it was one of the project’s
official representatives.


( Daniel Domscheit-Berg -- Inside WikiLeaks )

That reminds us Huffman coding.

### 20240704 23:40:39 CEST

Under heavy attack of Unix psychosis again. A small script to show file extension statistics in file tree.

find . -print | awk -F . '{print $NF}' | sort | uniq -c | sort -n awk -F . '{print$NF}' - dot is a separator and print $NF prints last field. As of Linux kernel 6.9.7:  201 py 340 gitignore 805 json 879 sh 1332 S 1696 txt 2212 dtsi 2925 dts 3540 rst 4010 yaml 24804 h 33843 c  Very useful to quickly determine project's contents. It's like a colorful statistics bar at github that shows which programming languages were used in project. ### 20240627 13:51:28 CEST Anniversary: the beginners.re domain has been registered 10 years ago: 2014-06-28. Earliest version from archive.org. (Actually, this was copypasted from the previous home of the RE4B book: yurichev.com/RE-book.html.) Funny thing, that HTML file was never rewritten from scratch for the last ~10 years. It was just modified from time to time. Thank you everyone who helped, translated, found bugs, donated! ### 20240529 22:42:47 CEST Most popular filename in filetree? In Linux kernel? (As of 6.9.2?) ~/tmp/linux-6.9.2 % find . -type f | sed "s/.*\///" | sort | uniq -c | sort -n | tail -25 51 file.c 52 irq.h 57 README 57 pci.c 57 pipeline.json 60 inode.c 61 trace.c 65 cache.json 65 debugfs.c 66 common.h 66 init.c 68 memory.json 71 core.h 72 config 74 trace.h 75 irq.c 82 setup.c 87 Build 100 main.c 134 Kbuild 134 core.c 284 index.rst 340 .gitignore 1698 Kconfig 2959 Makefile  sed option as recommended here. ### 20240527 12:46:49 CEST Probably a nice interview question: a very popular (and actively used today) programming language with only one data type -- string. Which is?.. ### 20240523 23:07:01 EEST This is a real problem I encountered where I used binary search. These days, SSH servers use RSA, ECDSA and ED25519 host keys (see the '/etc/ssh' path on your Unix). But when Ubuntu Linux included a SSH server with this support? Installing each Ubuntu version is tiresome. How can I find the version I need? OK, Ubuntu first appeared in ~2004 (ECDSA wasn't used in SSH in Ubuntu releases). Today is the year 2024 (ECDSA keys are used in SSH). N.B. Ubuntu release version is usually reflect release year (latest Ubuntu version is 24.04). Let's find something in between - version 10.10. Downloaded. Installed. No ECDSA key. I divide the period between year 2010 and 2024. Roughly, this is version 18.04. Has ECDSA key. Now I divide the period between year 2010 and 2018. Picking version 14.10. Has ECDSA key. I divide the period between 2010 and 2014. Picking version 12.04. Has ECDSA key. The version left is 11.x. I pick 11.10, it has ECDSA key. So it appeard in Ubuntu version 11.x. This is manual binary search. ### 20240521 17:01:04 EEST Now even simpler. Any GUI makes your program/code FSM. Which switches its state according to user actions -- mouse clicks, etc. See also in Wikipedia: Inversion of control, Event-driven programming. ### 20240514 20:22:12 EEST This is a simple layman's explanation about difference between superset and subset. Brandy is superset, Cognac is subset of brandy. ### 20240510 17:24:04 EEST Fancy video recorded for the Cracking simple XOR cipher with simulated annealing blog post. ### 20240510 16:17:35 EEST Simplest possible way to find file duplicates. This is a real example: I scanned my JPEGs:  % sha1sum * | sort 10d362df18146931eb19d3439cb87c721c8cf04d picturepicture_150629094012183577199990_43688.jpg 1204fd7db1ddf7731b45f3f619acf6d84ddc8829 picturepicture_150629094554853115199991_23543.jpg 32d1ffcd1d29633169c13439855d8258e9d2777f photo_2024-04-19_07-10-01.jpg 4304f417a725128fae72b03015df32a10376b8c4 picturepicture_15741979933266112278035_32336.jpg 46da40dfa492e39f1abdc513683ba96bbf67eba3 172657250_1839087716258578_3589359660610295879_n.jpg 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b picturepicture_150629093292145452199988_65085.jpg 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e picturepicture_150629095521251013199994_78127.jpg 6d191ccd8059492603295ee4ca01c7e69fe8b626 picturepicture_15062909525317464199993_12081.jpg a175cdb8d33bbb11669131b6eac8a6b69fa39c24 picturepicture_150629094841634976199992_60003.jpg a8e9a3b607910749c8e27ef7023dc4920198706d picturepicture_150629093743938254199989_55578.jpg b94765d3166bcae81840c0ce705d716f37d43242 1008725326.jpg d2822f3f9a8e42438fa65c2b7208e471edcf7367 1008257482.jpg d2822f3f9a8e42438fa65c2b7208e471edcf7367 2.jpg dab04e1dc144caa8c400da470c5b9be138805147 asd.jpg dab04e1dc144caa8c400da470c5b9be138805147 picturepicture_150629095998351164199995_76371.jpg dabe5ef01ddf9866a8820ea5e9c965b37899b736 photo_2024-04-01_09-43-56.jpg e3442a01abe7ba3baf9b0c49cc005fd29d244ff8 picturepicture_158161769749639660285488_79164.jpg  Again, using 'uniq -c' to find duplicates and sort by number of occurrences:  % sha1sum * | cut -d ' ' -f1 | sort | uniq -c | sort -n 1 10d362df18146931eb19d3439cb87c721c8cf04d 1 1204fd7db1ddf7731b45f3f619acf6d84ddc8829 1 32d1ffcd1d29633169c13439855d8258e9d2777f 1 4304f417a725128fae72b03015df32a10376b8c4 1 46da40dfa492e39f1abdc513683ba96bbf67eba3 1 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b 1 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e 1 6d191ccd8059492603295ee4ca01c7e69fe8b626 1 a175cdb8d33bbb11669131b6eac8a6b69fa39c24 1 a8e9a3b607910749c8e27ef7023dc4920198706d 1 b94765d3166bcae81840c0ce705d716f37d43242 1 dabe5ef01ddf9866a8820ea5e9c965b37899b736 1 e3442a01abe7ba3baf9b0c49cc005fd29d244ff8 2 d2822f3f9a8e42438fa65c2b7208e471edcf7367 2 dab04e1dc144caa8c400da470c5b9be138805147  Or awk:  % sha1sum * | sort | awk '{a[$1]++;}END{for(key in a){if (a[key]>=2) {print key, a[key]}}}'
d2822f3f9a8e42438fa65c2b7208e471edcf7367 2
dab04e1dc144caa8c400da470c5b9be138805147 2


### 20240510 16:06:43 EEST

BTW, another prominent example of floating point number encoding is color encoding for resistors.

1, 2.

2-3 color bands or decimal numbers of fraction/mantissa and one color band for multiplier (acts as (binary) exponent).

### 20240510 15:43:48 EEST

Couple of blog posts updated:

### 20240510 15:28:56 EEST

Yet again, about using the 'uniq -c' command.

(Previously: 1, 2, 3.)

\$ cat access.log.* | grep " 404 " | cut -d ' ' -f 7 | sort | uniq -c | sort -n
...
709 /.well-known/security.txt
864 /.env
997 /sitemap.xml
1311 //robots.txt
1622 /apple-touch-icon-precomposed.png
2603 /xmlrpc.php
55807 /robots.txt


### 20240507 15:42:31 EEST

Maybe not very funny, but meta-diff is diffing two diff files using 'diff' *nix utility again.

(I had to do this today.)

### 20240430 12:14:23 EEST

Formally speaking, this is transitive relation:

### 20240422 07:15:31 EEST

And again, I've been attacked by heavy Unix psychosis.

#### In Our Time podcast

Getting all mp3 files from In Our Time podcast

curl http://podcasts.files.bbci.co.uk/b006qykl.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | sort | uniq | wget -nc -i -

xmlstarlet to tidy XML output (must be installed).

grep -o to extract only URL with mp3 extension

wget -nc -i - to download URLs from stdin, but skip existing files (XML output may contain duplicates)

#### BBC news podcast

In the same manner, get ~5 recent BBC news podcasts (only head -20 added and RSS URL changed):

curl https://podcasts.files.bbci.co.uk/p02nq0gn.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | head -20 | wget -nc -i -

head -20, again, because of MP3 URL duplicates.

#### Luke’s ENGLISH Podcast

Getting Luke’s ENGLISH Podcast, but only for the year 2024.

wget -m https://teacherluke.co.uk/
cd teacherluke.co.uk/2024
grep -R https | grep mp3 | grep -oE '(http|https)://(.*).mp3' | grep -v paypal | grep "^https://open.acast.com" > URLs


URLs file needs a bit cleaning, and then:

wget -i URLs