Short notes


20240722 04:06:57 CEST

[Math] How to encode two values in one.

I had to do this once, to encode a coordinate in just a number. (Graph vertex mapped to coordinate...)

There are several ways to do that.

For example, display's resolution: 1280x800. Easiest way to do this: x*1280+y to encode. To decode: x=floor(value / 1280), y=value % 1280. In other words, encoded value is just a number of pixel on display in range [0, 1280*800).

Other way is encoding it as: x*1000+y. For example, x=990, y=123:

>>> 990*1000+123
990123

Decoding: x=floor(value/1000), y=value % 1000. (Of course, 1000 must be greater than width of display (800).)

In that value you clearly see 2 values in it. Change 1000 factor to 0x1000 and these values will be seen in hexadecimal form, that can be handy sometimes.

Not that you should use this instead of pair/tuple/structure/class. But this could server as yet another demonstration of modulo arithmetics.


20240720 02:16:52 CEST

Nice use of boolean algebra in old-school Windows GUI functions.

I cut 3 pages from the Charles Petzold's Programming Windows book (1998).


20240718 02:39:39 CEST

GNU GMP library (for bignums) doesn't have logarithmic functions. But what can be used instead is a function that measures a length of a bignum for a specific base.

Let's see:

#include <stdio.h>
#include <gmp.h>

void main()
{
        mpz_t x;
        mpz_init(x);
        mpz_set_ui(x, 1);
        mpz_mul_ui(x, x, 23456789);
        mpz_mul_ui(x, x, 876543210);
        mpz_mul_ui(x, x, 1234);
        mpz_mul_ui(x, x, 8765);

        gmp_printf("x=%Zd\n", x);
        gmp_printf("mpz_sizeinbase(x, 2)=%d\n", mpz_sizeinbase(x, 2));
        gmp_printf("mpz_sizeinbase(x, 10)=%d\n", mpz_sizeinbase(x, 10));

        mpz_clear(x);
};
x=222386782399521958566900
mpz_sizeinbase(x, 2)=78
mpz_sizeinbase(x, 10)=24

We can check this in Python:

>>> import math

>>> x=222386782399521958566900

>>> math.log(x,2)
77.5574172261662

>>> math.log(x,10)
23.347108971302386

So the output of the mpz_sizeinbase() is like ceil(log(x)) or (int)log(x) if you want:

>>> math.ceil(math.log(x,2))
78

>>> math.ceil(math.log(x,10))
24

Yes, sometimes you don't need very precise value of logarithmic function. Integer part would be enough.

Internally, GMP calculates bits of a number and then multiplies that value by log(2)/log(b) precomputed value, where b is 'destination' base:

/* Compute the number of digits in base for nbits bits, making sure the result
   is never too small.  The two variants of the macro implement the same
   function; the GT2 variant below works just for bases > 2.  */
#define DIGITS_IN_BASE_FROM_BITS(res, nbits, b)                         \
  do {                                                                  \
    mp_limb_t _ph, _dummy;                                              \
    size_t _nbits = (nbits);                                            \
    umul_ppmm (_ph, _dummy, mp_bases[b].logb2, _nbits);                 \
    _ph += (_dummy + _nbits < _dummy);                                  \
    res = _ph + 1;                                                      \
  } while (0)
#define DIGITS_IN_BASEGT2_FROM_BITS(res, nbits, b)                      \
  do {                                                                  \
    mp_limb_t _ph, _dummy;                                              \
    size_t _nbits = (nbits);                                            \
    umul_ppmm (_ph, _dummy, mp_bases[b].logb2 + 1, _nbits);             \
    res = _ph + 1;                                                      \
  } while (0)

( gmp-6.3.0/gmp-impl.h )


20240717 11:38:41 CEST

Can a number be reverse engineered? Disassembled? One method to say something about number's structure is (integer) factoring.

Also, try Wolfram Alpha.

And of course, search it on OEIS.


20240709 00:02:06 CEST

Sometimes you generate a random password, and it has two same characters in it, next to each other. But that password is still random, despite the fact it looks 'less random'.

Mathematically, a '12345' password is a perfectly random pick from a list of words under regexp '[0-9][a-z][A-Z]{5}'.

A number like 777777 can be a random pick from a list of numbers in [0..999999] range.

On the other hand, passwords like '12345' and '777777' may be in all password lists of all password/hash crackers, yes. Highly probable.

IOW, it's not enough to pick a password randomly.


20240707 17:43:29 CEST

Of course, we ourselves were hardly free of this sort of self-referentiality:
WikiLeaks was WL, Julian was J, I was S (for my alias last name, Schmitt), and
others on the team were also referred to by individual letters. There was an
internal logic to the abbreviations. The more important someone was within WL,
the shorter his nickname. If you came across someone with a single initial in
the WL chat room, you could be almost certain it was one of the project’s
official representatives.

( Daniel Domscheit-Berg -- Inside WikiLeaks )

That reminds us Huffman coding.


20240704 23:40:39 CEST

Under heavy attack of Unix psychosis again. A small script to show file extension statistics in file tree.

find . -print | awk -F . '{print $NF}' | sort | uniq -c | sort -n

awk -F . '{print $NF}' - dot is a separator and print $NF prints last field.

As of Linux kernel 6.9.7:

    201 py
    340 gitignore
    805 json
    879 sh
   1332 S
   1696 txt
   2212 dtsi
   2925 dts
   3540 rst
   4010 yaml
  24804 h
  33843 c

Very useful to quickly determine project's contents. It's like a colorful statistics bar at github that shows which programming languages were used in project.


20240627 13:51:28 CEST

Anniversary: the beginners.re domain has been registered 10 years ago: 2014-06-28.

Earliest version from archive.org.

(Actually, this was copypasted from the previous home of the RE4B book: yurichev.com/RE-book.html.)

Funny thing, that HTML file was never rewritten from scratch for the last ~10 years. It was just modified from time to time.

Thank you everyone who helped, translated, found bugs, donated!


20240529 22:42:47 CEST

Most popular filename in filetree?

In Linux kernel? (As of 6.9.2?)

~/tmp/linux-6.9.2 % find . -type f | sed "s/.*\///" | sort | uniq -c | sort -n | tail -25
  51 file.c
  52 irq.h
  57 README
  57 pci.c
  57 pipeline.json
  60 inode.c
  61 trace.c
  65 cache.json
  65 debugfs.c
  66 common.h
  66 init.c
  68 memory.json
  71 core.h
  72 config
  74 trace.h
  75 irq.c
  82 setup.c
  87 Build
 100 main.c
 134 Kbuild
 134 core.c
 284 index.rst
 340 .gitignore
1698 Kconfig
2959 Makefile

sed option as recommended here.


20240527 12:46:49 CEST

Probably a nice interview question: a very popular (and actively used today) programming language with only one data type -- string.

Which is?..


20240523 23:07:01 EEST

This is a real problem I encountered where I used binary search.

These days, SSH servers use RSA, ECDSA and ED25519 host keys (see the '/etc/ssh' path on your Unix).

But when Ubuntu Linux included a SSH server with this support? Installing each Ubuntu version is tiresome. How can I find the version I need?

OK, Ubuntu first appeared in ~2004 (ECDSA wasn't used in SSH in Ubuntu releases). Today is the year 2024 (ECDSA keys are used in SSH).

N.B. Ubuntu release version is usually reflect release year (latest Ubuntu version is 24.04).

Let's find something in between - version 10.10. Downloaded. Installed. No ECDSA key.

I divide the period between year 2010 and 2024. Roughly, this is version 18.04. Has ECDSA key.

Now I divide the period between year 2010 and 2018. Picking version 14.10. Has ECDSA key.

I divide the period between 2010 and 2014. Picking version 12.04. Has ECDSA key.

The version left is 11.x. I pick 11.10, it has ECDSA key.

So it appeard in Ubuntu version 11.x.

This is manual binary search.


20240521 17:01:04 EEST

Previously: Simplest possible intro to finite state machines

Now even simpler. Any GUI makes your program/code FSM. Which switches its state according to user actions -- mouse clicks, etc.

See also in Wikipedia: Inversion of control, Event-driven programming.


20240514 20:22:12 EEST

Let’s start with the most important part: All Cognac is brandy, but not all brandy is Cognac.

This is a simple layman's explanation about difference between superset and subset. Brandy is superset, Cognac is subset of brandy.


20240510 17:24:04 EEST

Fancy video recorded for the Cracking simple XOR cipher with simulated annealing blog post.


20240510 16:17:35 EEST

Simplest possible way to find file duplicates. This is a real example: I scanned my JPEGs:

 % sha1sum * | sort
10d362df18146931eb19d3439cb87c721c8cf04d  picturepicture_150629094012183577199990_43688.jpg
1204fd7db1ddf7731b45f3f619acf6d84ddc8829  picturepicture_150629094554853115199991_23543.jpg
32d1ffcd1d29633169c13439855d8258e9d2777f  photo_2024-04-19_07-10-01.jpg
4304f417a725128fae72b03015df32a10376b8c4  picturepicture_15741979933266112278035_32336.jpg
46da40dfa492e39f1abdc513683ba96bbf67eba3  172657250_1839087716258578_3589359660610295879_n.jpg
5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b  picturepicture_150629093292145452199988_65085.jpg
574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e  picturepicture_150629095521251013199994_78127.jpg
6d191ccd8059492603295ee4ca01c7e69fe8b626  picturepicture_15062909525317464199993_12081.jpg
a175cdb8d33bbb11669131b6eac8a6b69fa39c24  picturepicture_150629094841634976199992_60003.jpg
a8e9a3b607910749c8e27ef7023dc4920198706d  picturepicture_150629093743938254199989_55578.jpg
b94765d3166bcae81840c0ce705d716f37d43242  1008725326.jpg
d2822f3f9a8e42438fa65c2b7208e471edcf7367  1008257482.jpg
d2822f3f9a8e42438fa65c2b7208e471edcf7367  2.jpg
dab04e1dc144caa8c400da470c5b9be138805147  asd.jpg
dab04e1dc144caa8c400da470c5b9be138805147  picturepicture_150629095998351164199995_76371.jpg
dabe5ef01ddf9866a8820ea5e9c965b37899b736  photo_2024-04-01_09-43-56.jpg
e3442a01abe7ba3baf9b0c49cc005fd29d244ff8  picturepicture_158161769749639660285488_79164.jpg

Again, using 'uniq -c' to find duplicates and sort by number of occurrences:

 % sha1sum * | cut -d ' ' -f1 | sort | uniq -c | sort -n
      1 10d362df18146931eb19d3439cb87c721c8cf04d
      1 1204fd7db1ddf7731b45f3f619acf6d84ddc8829
      1 32d1ffcd1d29633169c13439855d8258e9d2777f
      1 4304f417a725128fae72b03015df32a10376b8c4
      1 46da40dfa492e39f1abdc513683ba96bbf67eba3
      1 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b
      1 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e
      1 6d191ccd8059492603295ee4ca01c7e69fe8b626
      1 a175cdb8d33bbb11669131b6eac8a6b69fa39c24
      1 a8e9a3b607910749c8e27ef7023dc4920198706d
      1 b94765d3166bcae81840c0ce705d716f37d43242
      1 dabe5ef01ddf9866a8820ea5e9c965b37899b736
      1 e3442a01abe7ba3baf9b0c49cc005fd29d244ff8
      2 d2822f3f9a8e42438fa65c2b7208e471edcf7367
      2 dab04e1dc144caa8c400da470c5b9be138805147

Or awk:

 % sha1sum * | sort | awk '{a[$1]++;}END{for(key in a){if (a[key]>=2) {print key, a[key]}}}'
d2822f3f9a8e42438fa65c2b7208e471edcf7367 2
dab04e1dc144caa8c400da470c5b9be138805147 2

More advanced examples: 1, 2.


20240510 16:06:43 EEST

BTW, another prominent example of floating point number encoding is color encoding for resistors.

1, 2.

2-3 color bands or decimal numbers of fraction/mantissa and one color band for multiplier (acts as (binary) exponent).


20240510 15:43:48 EEST

Couple of blog posts updated:

[RevEng] Learn reverse engineering: but where to start?

[Crypto][Python] D.Bleichenbacher attack on RSA PKCS#1, part IV

[Unix] Logging your work using suckless dwm

[Unix] Maildir, mutt and vim


20240510 15:28:56 EEST

Yet again, about using the 'uniq -c' command.

(Previously: 1, 2, 3.)

Finding most web pages that are requested, but not found (404):

$ cat access.log.* | grep " 404 " | cut -d ' ' -f 7 | sort | uniq -c | sort -n
...
    709 /.well-known/security.txt
    722 /.well-known/traffic-advice
    864 /.env
    997 /sitemap.xml
   1311 //robots.txt
   1622 /apple-touch-icon-precomposed.png
   2603 /xmlrpc.php
   5105 /wp-login.php
  16092 /ads.txt
  55807 /robots.txt

20240507 15:42:31 EEST

Maybe not very funny, but meta-diff is diffing two diff files using 'diff' *nix utility again.

(I had to do this today.)


20240430 12:14:23 EEST

Formally speaking, this is transitive relation:

Introducer status is transferable; that is, an introducers’ introducer will become your introducer as well. (syncthing manual)


20240422 07:15:31 EEST

And again, I've been attacked by heavy Unix psychosis.

In Our Time podcast

Getting all mp3 files from In Our Time podcast

curl http://podcasts.files.bbci.co.uk/b006qykl.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | sort | uniq | wget -nc -i -

xmlstarlet to tidy XML output (must be installed).

grep -o to extract only URL with mp3 extension

wget -nc -i - to download URLs from stdin, but skip existing files (XML output may contain duplicates)

BBC news podcast

In the same manner, get ~5 recent BBC news podcasts (only head -20 added and RSS URL changed):

curl https://podcasts.files.bbci.co.uk/p02nq0gn.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | head -20 | wget -nc -i -

head -20, again, because of MP3 URL duplicates.

Luke’s ENGLISH Podcast

Getting Luke’s ENGLISH Podcast, but only for the year 2024.

wget -m https://teacherluke.co.uk/
cd teacherluke.co.uk/2024
grep -R https | grep mp3 | grep -oE '(http|https)://(.*).mp3' | grep -v paypal | grep "^https://open.acast.com" > URLs

URLs file needs a bit cleaning, and then:

wget -i URLs

→ [back to the main page]