(Lifehack, as they say.)
While staying at random and cheap hotels, one habit I develop -- carry a minijack<->RCA cable to connect TV in hotel room to my mp3-player or android.
[Math] How to encode two values in one.
I had to do this once, to encode a coordinate in just a number. (Graph vertex mapped to coordinate...)
There are several ways to do that.
For example, display's resolution: 1280x800. Easiest way to do this: x*1280+y to encode. To decode: x=floor(value / 1280), y=value % 1280. In other words, encoded value is just a number of pixel on display in range [0, 1280*800).
Other way is encoding it as: x*1000+y. For example, x=990, y=123:
>>> 990*1000+123 990123
Decoding: x=floor(value/1000), y=value % 1000. (Of course, 1000 must be greater than width of display (800).)
In that value you clearly see 2 values in it. Change 1000 factor to 0x1000 and these values will be seen in hexadecimal form, that can be handy sometimes.
Not that you should use this instead of pair/tuple/structure/class. But this could server as yet another demonstration of modulo arithmetics.
Nice use of boolean algebra in old-school Windows GUI functions.
I cut 3 pages from the Charles Petzold's Programming Windows book (1998).
GNU GMP library (for bignums) doesn't have logarithmic functions. But what can be used instead is a function that measures a length of a bignum for a specific base.
Let's see:
#include <stdio.h> #include <gmp.h> void main() { mpz_t x; mpz_init(x); mpz_set_ui(x, 1); mpz_mul_ui(x, x, 23456789); mpz_mul_ui(x, x, 876543210); mpz_mul_ui(x, x, 1234); mpz_mul_ui(x, x, 8765); gmp_printf("x=%Zd\n", x); gmp_printf("mpz_sizeinbase(x, 2)=%d\n", mpz_sizeinbase(x, 2)); gmp_printf("mpz_sizeinbase(x, 10)=%d\n", mpz_sizeinbase(x, 10)); mpz_clear(x); };
x=222386782399521958566900 mpz_sizeinbase(x, 2)=78 mpz_sizeinbase(x, 10)=24
We can check this in Python:
>>> import math >>> x=222386782399521958566900 >>> math.log(x,2) 77.5574172261662 >>> math.log(x,10) 23.347108971302386
So the output of the mpz_sizeinbase() is like ceil(log(x)) or (int)log(x) if you want:
>>> math.ceil(math.log(x,2)) 78 >>> math.ceil(math.log(x,10)) 24
Yes, sometimes you don't need very precise value of logarithmic function. Integer part would be enough.
Internally, GMP calculates bits of a number and then multiplies that value by log(2)/log(b) precomputed value, where b is 'destination' base:
/* Compute the number of digits in base for nbits bits, making sure the result is never too small. The two variants of the macro implement the same function; the GT2 variant below works just for bases > 2. */ #define DIGITS_IN_BASE_FROM_BITS(res, nbits, b) \ do { \ mp_limb_t _ph, _dummy; \ size_t _nbits = (nbits); \ umul_ppmm (_ph, _dummy, mp_bases[b].logb2, _nbits); \ _ph += (_dummy + _nbits < _dummy); \ res = _ph + 1; \ } while (0) #define DIGITS_IN_BASEGT2_FROM_BITS(res, nbits, b) \ do { \ mp_limb_t _ph, _dummy; \ size_t _nbits = (nbits); \ umul_ppmm (_ph, _dummy, mp_bases[b].logb2 + 1, _nbits); \ res = _ph + 1; \ } while (0)
( gmp-6.3.0/gmp-impl.h )
Can a number be reverse engineered? Disassembled? One method to say something about number's structure is (integer) factoring.
Also, try Wolfram Alpha.
And of course, search it on OEIS.
Sometimes you generate a random password, and it has two same characters in it, next to each other. But that password is still random, despite the fact it looks 'less random'.
Mathematically, a '12345' password is a perfectly random pick from a list of words under regexp '[0-9][a-z][A-Z]{5}'.
A number like 777777 can be a random pick from a list of numbers in [0..999999] range.
On the other hand, passwords like '12345' and '777777' may be in all password lists of all password/hash crackers, yes. Highly probable.
IOW, it's not enough to pick a password randomly.
Of course, we ourselves were hardly free of this sort of self-referentiality: WikiLeaks was WL, Julian was J, I was S (for my alias last name, Schmitt), and others on the team were also referred to by individual letters. There was an internal logic to the abbreviations. The more important someone was within WL, the shorter his nickname. If you came across someone with a single initial in the WL chat room, you could be almost certain it was one of the project’s official representatives.
( Daniel Domscheit-Berg -- Inside WikiLeaks )
That reminds us Huffman coding.
Under heavy attack of Unix psychosis again. A small script to show file extension statistics in file tree.
find . -print | awk -F . '{print $NF}' | sort | uniq -c | sort -n
awk -F . '{print $NF}' - dot is a separator and print $NF prints last field.
As of Linux kernel 6.9.7:
201 py 340 gitignore 805 json 879 sh 1332 S 1696 txt 2212 dtsi 2925 dts 3540 rst 4010 yaml 24804 h 33843 c
Very useful to quickly determine project's contents. It's like a colorful statistics bar at github that shows which programming languages were used in project.
Anniversary: the beginners.re domain has been registered 10 years ago: 2014-06-28.
Earliest version from archive.org.
(Actually, this was copypasted from the previous home of the RE4B book: yurichev.com/RE-book.html.)
Funny thing, that HTML file was never rewritten from scratch for the last ~10 years. It was just modified from time to time.
Thank you everyone who helped, translated, found bugs, donated!
Most popular filename in filetree?
In Linux kernel? (As of 6.9.2?)
~/tmp/linux-6.9.2 % find . -type f | sed "s/.*\///" | sort | uniq -c | sort -n | tail -25 51 file.c 52 irq.h 57 README 57 pci.c 57 pipeline.json 60 inode.c 61 trace.c 65 cache.json 65 debugfs.c 66 common.h 66 init.c 68 memory.json 71 core.h 72 config 74 trace.h 75 irq.c 82 setup.c 87 Build 100 main.c 134 Kbuild 134 core.c 284 index.rst 340 .gitignore 1698 Kconfig 2959 Makefile
sed option as recommended here.
Probably a nice interview question: a very popular (and actively used today) programming language with only one data type -- string.
Which is?..
This is a real problem I encountered where I used binary search.
These days, SSH servers use RSA, ECDSA and ED25519 host keys (see the '/etc/ssh' path on your Unix).
But when Ubuntu Linux included a SSH server with this support? Installing each Ubuntu version is tiresome. How can I find the version I need?
OK, Ubuntu first appeared in ~2004 (ECDSA wasn't used in SSH in Ubuntu releases). Today is the year 2024 (ECDSA keys are used in SSH).
N.B. Ubuntu release version is usually reflect release year (latest Ubuntu version is 24.04).
Let's find something in between - version 10.10. Downloaded. Installed. No ECDSA key.
I divide the period between year 2010 and 2024. Roughly, this is version 18.04. Has ECDSA key.
Now I divide the period between year 2010 and 2018. Picking version 14.10. Has ECDSA key.
I divide the period between 2010 and 2014. Picking version 12.04. Has ECDSA key.
The version left is 11.x. I pick 11.10, it has ECDSA key.
So it appeard in Ubuntu version 11.x.
This is manual binary search.
Previously: Simplest possible intro to finite state machines
Now even simpler. Any GUI makes your program/code FSM. Which switches its state according to user actions -- mouse clicks, etc.
See also in Wikipedia: Inversion of control, Event-driven programming.
Let’s start with the most important part: All Cognac is brandy, but not all brandy is Cognac.
This is a simple layman's explanation about difference between superset and subset. Brandy is superset, Cognac is subset of brandy.
Fancy video recorded for the Cracking simple XOR cipher with simulated annealing blog post.
Simplest possible way to find file duplicates. This is a real example: I scanned my JPEGs:
% sha1sum * | sort 10d362df18146931eb19d3439cb87c721c8cf04d picturepicture_150629094012183577199990_43688.jpg 1204fd7db1ddf7731b45f3f619acf6d84ddc8829 picturepicture_150629094554853115199991_23543.jpg 32d1ffcd1d29633169c13439855d8258e9d2777f photo_2024-04-19_07-10-01.jpg 4304f417a725128fae72b03015df32a10376b8c4 picturepicture_15741979933266112278035_32336.jpg 46da40dfa492e39f1abdc513683ba96bbf67eba3 172657250_1839087716258578_3589359660610295879_n.jpg 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b picturepicture_150629093292145452199988_65085.jpg 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e picturepicture_150629095521251013199994_78127.jpg 6d191ccd8059492603295ee4ca01c7e69fe8b626 picturepicture_15062909525317464199993_12081.jpg a175cdb8d33bbb11669131b6eac8a6b69fa39c24 picturepicture_150629094841634976199992_60003.jpg a8e9a3b607910749c8e27ef7023dc4920198706d picturepicture_150629093743938254199989_55578.jpg b94765d3166bcae81840c0ce705d716f37d43242 1008725326.jpg d2822f3f9a8e42438fa65c2b7208e471edcf7367 1008257482.jpg d2822f3f9a8e42438fa65c2b7208e471edcf7367 2.jpg dab04e1dc144caa8c400da470c5b9be138805147 asd.jpg dab04e1dc144caa8c400da470c5b9be138805147 picturepicture_150629095998351164199995_76371.jpg dabe5ef01ddf9866a8820ea5e9c965b37899b736 photo_2024-04-01_09-43-56.jpg e3442a01abe7ba3baf9b0c49cc005fd29d244ff8 picturepicture_158161769749639660285488_79164.jpg
Again, using 'uniq -c' to find duplicates and sort by number of occurrences:
% sha1sum * | cut -d ' ' -f1 | sort | uniq -c | sort -n 1 10d362df18146931eb19d3439cb87c721c8cf04d 1 1204fd7db1ddf7731b45f3f619acf6d84ddc8829 1 32d1ffcd1d29633169c13439855d8258e9d2777f 1 4304f417a725128fae72b03015df32a10376b8c4 1 46da40dfa492e39f1abdc513683ba96bbf67eba3 1 5458ed63bf77b4c9fd460a1fb3e2a46649b7ce7b 1 574fc3ec5d6d110a992dd5ca28a3bb2a8201b34e 1 6d191ccd8059492603295ee4ca01c7e69fe8b626 1 a175cdb8d33bbb11669131b6eac8a6b69fa39c24 1 a8e9a3b607910749c8e27ef7023dc4920198706d 1 b94765d3166bcae81840c0ce705d716f37d43242 1 dabe5ef01ddf9866a8820ea5e9c965b37899b736 1 e3442a01abe7ba3baf9b0c49cc005fd29d244ff8 2 d2822f3f9a8e42438fa65c2b7208e471edcf7367 2 dab04e1dc144caa8c400da470c5b9be138805147
Or awk:
% sha1sum * | sort | awk '{a[$1]++;}END{for(key in a){if (a[key]>=2) {print key, a[key]}}}' d2822f3f9a8e42438fa65c2b7208e471edcf7367 2 dab04e1dc144caa8c400da470c5b9be138805147 2
BTW, another prominent example of floating point number encoding is color encoding for resistors.
2-3 color bands or decimal numbers of fraction/mantissa and one color band for multiplier (acts as (binary) exponent).
Couple of blog posts updated:
[RevEng] Learn reverse engineering: but where to start?
[Crypto][Python] D.Bleichenbacher attack on RSA PKCS#1, part IV
[Unix] Logging your work using suckless dwm
Yet again, about using the 'uniq -c' command.
Finding most web pages that are requested, but not found (404):
$ cat access.log.* | grep " 404 " | cut -d ' ' -f 7 | sort | uniq -c | sort -n ... 709 /.well-known/security.txt 722 /.well-known/traffic-advice 864 /.env 997 /sitemap.xml 1311 //robots.txt 1622 /apple-touch-icon-precomposed.png 2603 /xmlrpc.php 5105 /wp-login.php 16092 /ads.txt 55807 /robots.txt
Maybe not very funny, but meta-diff is diffing two diff files using 'diff' *nix utility again.
(I had to do this today.)
Formally speaking, this is transitive relation:
Introducer status is transferable; that is, an introducers’ introducer will become your introducer as well. (syncthing manual)
And again, I've been attacked by heavy Unix psychosis.
Getting all mp3 files from In Our Time podcast
curl http://podcasts.files.bbci.co.uk/b006qykl.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | sort | uniq | wget -nc -i -
xmlstarlet to tidy XML output (must be installed).
grep -o to extract only URL with mp3 extension
wget -nc -i - to download URLs from stdin, but skip existing files (XML output may contain duplicates)
In the same manner, get ~5 recent BBC news podcasts (only head -20 added and RSS URL changed):
curl https://podcasts.files.bbci.co.uk/p02nq0gn.rss | xmlstarlet format --indent-tab | grep -oE '(http|https)://(.*).mp3' | head -20 | wget -nc -i -
head -20, again, because of MP3 URL duplicates.
Getting Luke’s ENGLISH Podcast, but only for the year 2024.
wget -m https://teacherluke.co.uk/ cd teacherluke.co.uk/2024 grep -R https | grep mp3 | grep -oE '(http|https)://(.*).mp3' | grep -v paypal | grep "^https://open.acast.com" > URLs
URLs file needs a bit cleaning, and then:
wget -i URLs