Pentagonal numbers are generated by the formula, Pn=n(3n−1)/2. The first ten pentagonal numbers are: 1, 5, 12, 22, 35, 51, 70, 92, 117, 145, ... It can be seen that P4 + P7 = 22 + 70 = 92 = P8. However, their difference, 70 − 22 = 48, is not pentagonal. Find the pair of pentagonal numbers, Pj and Pk, for which their sum and difference are pentagonal and D = |Pk − Pj| is minimised; what is the value of D?

#!/usr/bin/python3 # pip install ortools # https://pypi.org/project/ortools/ from ortools.constraint_solver import pywrapcp def main(): solver = pywrapcp.Solver("PE_44") max_limit=10**7 # may be raised gradually P1=solver.IntVar(0, max_limit, "P1") P2=solver.IntVar(0, max_limit, "P2") P_diff=solver.IntVar(0, max_limit, "P_diff") P_sum=solver.IntVar(0, max_limit, "P_sum") n1=solver.IntVar(0, max_limit, "n1") n2=solver.IntVar(0, max_limit, "n2") n_diff=solver.IntVar(0, max_limit, "n_diff") n_sum=solver.IntVar(0, max_limit, "n_sum") solver.Add(P1>1) solver.Add(P2>1) solver.Add(P_diff>1) solver.Add(P_sum>1) solver.Add(P1!=P2) solver.Add(n1>1) solver.Add(n2>1) solver.Add(n_diff>1) solver.Add(n_sum>1) solver.Add(2*P1==n1*(3*n1-1)) solver.Add(2*P2==n2*(3*n2-1)) solver.Add(2*P_diff==n_diff*(3*n_diff-1)) solver.Add(2*P_sum==n_sum*(3*n_sum-1)) solver.Add(P1-P2==P_diff) solver.Add(P1+P2==P_sum) # objective objective = solver.Minimize(P_diff, 1) solution = solver.Assignment() db = solver.Phase([P1, P2, P_diff, P_sum, n1, n2, n_diff, n_sum], solver.CHOOSE_MIN_SIZE_LOWEST_MAX, solver.ASSIGN_MIN_VALUE) solver.NewSearch(db, [objective]) assert solver.NextSolution()==True print(P1) print(P2) print(P_diff) print(P_sum) print(n1) print(n2) print(n_diff) print(n_sum) solver.EndSearch() main()

P1(7042750) P2(1560090) P_diff(5482660) P_sum(8602840) n1(2167) n2(1020) n_diff(1912) n_sum(2395)

And again, my solution is not fast. 1-2 hours. But I spent maybe 10-15 minutes on coding, and I have no corresponding math experience.

]]>И еще у меня там появились страницы с короткими заметками. (Без комментов.) (Хотя эти комменты мне итак никто толком не пишет.)

Итого:

Основной блог на английском: https://yurichev.com/n/ RSS: https://yurichev.com/n/rss.xml

Блог на русском: https://yurichev.com/nr/ RSS: https://yurichev.com/nr/rss.xml

Заметки короткие на английском: https://yurichev.com/notes-EN.html На русском: https://yurichev.com/notes-RU.html

Запланированная эволюция:

- Некоторые короткие заметки постепенно мутируют в посты в блоге.
- Некоторые посты в блоге постепенно коагулируются и мутируют в PDF-ки.
- Некоторые PDF-ки постепенно мутируют в книги.
- Некоторые книги постепенно мутируют в
*толстые*книги. - Некоторые
*толстые*книги постепенно мутируют в ...

This problem could be solved with a SAT/SMT solver, but alas -- OR-tools model counting much faster

And there are ~7M solutions, OR-tools can count it fast, several minutes. (I wouldn't write exact count of solutions, because this is a Project Euler problem after all, supposed to be solved as a task.)

#!/usr/bin/python3 # See also: https://oeis.org/A093199 from ortools.constraint_solver import pywrapcp def main(): solver = pywrapcp.Solver("...") cells=[[solver.IntVar(0, 9, "%d_%d" % (r, c)) for r in range(4)] for c in range(4)] sum_=solver.IntVar(0, 9*4, "sum_") #print (cells) for r in range(4): solver.Add(cells[r][0]+cells[r][1]+cells[r][2]+cells[r][3]==sum_) for c in range(4): solver.Add(cells[0][c]+cells[1][c]+cells[2][c]+cells[3][c]==sum_) solver.Add(cells[0][0]+cells[1][1]+cells[2][2]+cells[3][3]==sum_) solver.Add(cells[3][0]+cells[2][1]+cells[1][2]+cells[0][3]==sum_) solution = solver.Assignment() # flatten x=[] for r in range(4): for c in range(4): x.append(cells[r][c]) db = solver.Phase(x, solver.CHOOSE_MIN_SIZE_LOWEST_MAX, solver.ASSIGN_MIN_VALUE) solver.NewSearch(db) solutions = 0 while solver.NextSolution(): #for r in range(4): # for c in range(4): # print ("%d " % cells[r][c].Value(), end="") # print ("") #print ("") solutions+=1 solver.EndSearch() return solutions solutions = main() print (f"{solutions=}")]]>

The problem above could be solved using SMT solver like z3, but there are too much solutions to enumerate. SAT/SMT is not good in model/solution enumeration/counting (yet?) So I tried Google OR-tools.

#!/usr/bin/python3 import sys, time from ortools.constraint_solver import pywrapcp def main(digits_total): start_t=time.time() solver = pywrapcp.Solver("...") # declare variables x = [solver.IntVar(0, 9, "x%i" % i) for i in range(digits_total)] for i in range(digits_total-2): solver.Add(x[i+0]+x[i+1]+x[i+2]<=9) solution = solver.Assignment() db = solver.Phase(x, solver.CHOOSE_MIN_SIZE_LOWEST_MAX, solver.ASSIGN_MIN_VALUE) solver.NewSearch(db) solutions = 0 while solver.NextSolution(): #print("x: ", [x[i].Value() for i in range(digits_total)]) solutions+=1 solver.EndSearch() finish_t=time.time() print(f"{digits_total=} {solutions=} seconds spent: {int(finish_t-start_t)}") for i in range(3, 20+1): main(i)

It enumerates solutions fastly. Faster than SAT/SMT solvers. But slows down gradually:

digits_total=3 solutions=220 seconds spent: 0 digits_total=4 solutions=1210 seconds spent: 0 digits_total=5 solutions=6655 seconds spent: 0 digits_total=6 solutions=34243 seconds spent: 0 digits_total=7 solutions=180829 seconds spent: 0 digits_total=8 solutions=963886 seconds spent: 0 digits_total=9 solutions=5093737 seconds spent: 5 digits_total=10 solutions=26932543 seconds spent: 27 digits_total=11 solutions=142701909 seconds spent: 149 digits_total=12 solutions=755538278 seconds spent: 824 digits_total=13 solutions=3999038946 seconds spent: 4104

I cheated and tried to find these numbers on OEIS and found this:

A241615 Number of length n+2 0..9 arrays with no consecutive three elements summing to more than 9 220, 1210, 6655, 34243, 180829, 963886, 5093737, 26932543, 142701909, 755538278, 3999038946, 21172904049, 112098384491, 593455432350, 3141868198978, 16633824615067, 88062718713584, 466221475528171, 2468274573927916

I tried several numbers on projecteuler.net website with no luck. Ouch. I forgot about this clause in problem definition: "(without any leading zero)".

So I added that constraint:

solver.Add(x[0]!=0)

Now results are different:

digits_total=3 solutions=165 seconds spent: 0 digits_total=4 solutions=990 seconds spent: 0 digits_total=5 solutions=5445 seconds spent: 0 digits_total=6 solutions=27588 seconds spent: 0 digits_total=7 solutions=146586 seconds spent: 0 digits_total=8 solutions=783057 seconds spent: 0 digits_total=9 solutions=4129851 seconds spent: 4 digits_total=10 solutions=21838806 seconds spent: 23 digits_total=11 solutions=115769366 seconds spent: 118 digits_total=12 solutions=612836369 seconds spent: 627

... but still it works too slow to count to digits_total=20.

But I started to suspect something.

First two numbers on OEIS A241615: 220, 1210. And 1210-220=990 Second and third numbers: 1210, 6655. And 6655-1210=5445. These are solutions from my second program.

Let's try other pairs:

#!/usr/bin/python3 A241615=[ 220, 1210, 6655, 34243, 180829, 963886, 5093737, 26932543, 142701909, 755538278, 3999038946, 21172904049, 112098384491, 593455432350, 3141868198978, 16633824615067, 88062718713584, 466221475528171, 2468274573927916, 13067553701179851] for i in range(1, len(A241615)-1): print (i, A241615[i+1]-A241615[i])

1 5445 2 27588 3 146586 4 783057 5 4129851 6 21838806 7 115769366 8 612836369 9 3243500668 10 17173865103 11 90925480442 12 481357047859 13 2548412766628 14 13491956416089 15 71428894098517 16 378158756814587 17 2002053098399745 18 10599279127251935

And the winner is (picked from this list): 378158756814587.

It's indeed so: we 'remove' solutions starting with zero. And solutions count without first zero number is just previous entry. Leading zero doesn't break that 'sum of 3 any consecutive numbers <= 9' constraint, because adding zero doesn't change anything.

]]>Suddenly, one mention on some cryptocurrency website. Mallob is used in a new cryptocurrency. How unexpected.

That reminds us a famous quote: "The street finds its own uses for things" ( William Gibson ).

In context:

I missed her. Missing her reminded me of my one night in the House of Blue Lights, because I'd gone there out of missing someone else. I'd gotten drunk to begin with, then I'd started hitting Vasopressin inhalers. If your main squeeze has just decided to walk out on you, booze and Vasopressin are the ultimate in masochistic pharmacology; the juice makes you maudlin and the Vasopressin makes you remember, I mean really remember. Clinically they use the stuff to counter senile amnesia, but the street finds its own uses for things.

( William Gibson -- BURNING CHROME (1986) )

]]>Of course, this is GF(2) math. XOR is addition and 'XOR-product' is multiplication. So it's in fact \( a^2 + 2ab + b^2 \) over GF(2). I tried to rewrite this equation to \( (a+b)^2 \), but my knowledge of GF(2) is so miserable, so I couln't.

GF(2) multiplication once was implemented by me, where it was used to factor numbers over GF(2). The very same code I can use here.

Now the equation is seemengly easy to solve:

\[ (a \otimes a) \oplus (2 \otimes a \otimes b) \oplus (b \otimes b) = 5 \]

And so I did: PE_877_ideal_world.py. But it would be only possible in an ideal world. Unfortunately, that works only for a under \( ~10^{6} \). \( log_2(10^{18}) = ~60 \) bits, this search space is too big.

% ./PE_877_ideal_world.py f() k=0x5 a_n=0x0, b_n=0x3, result_n=0x5 a_n=0x6, b_n=0xf, result_n=0x5 a_n=0x3, b_n=0x6, result_n=0x5 a_n=0x18000, b_n=0x3f303, result_n=0x5 a_n=0xf303, b_n=0x18000, result_n=0x5 a_n=0x6606, b_n=0xf303, result_n=0x5 a_n=0x3f0f, b_n=0x6606, result_n=0x5 a_n=0x1818, b_n=0x3f0f, result_n=0x5 a_n=0xf3f, b_n=0x1818, result_n=0x5 a_n=0x666, b_n=0xf3f, result_n=0x5 a_n=0x3f3, b_n=0x666, result_n=0x5 a_n=0x180, b_n=0x3f3, result_n=0x5 a_n=0xf3, b_n=0x180, result_n=0x5 a_n=0x66, b_n=0xf3, result_n=0x5 a_n=0x3f, b_n=0x66, result_n=0x5 a_n=0x18, b_n=0x3f, result_n=0x5 a_n=0xf, b_n=0x18, result_n=0x5 a_n=0x3f0f3f, b_n=0x660666, result_n=0x5 a_n=0x181818, b_n=0x3f0f3f, result_n=0x5 a_n=0xf3f0f, b_n=0x181818, result_n=0x5 a_n=0x66606, b_n=0xf3f0f, result_n=0x5 a_n=0x3f303, b_n=0x66606, result_n=0x5 solutions for k=0x5 = 22

However, the output exhibits some clearly visible patterns.

We see some symmetry. One value if used as a, is then used as b. Maybe like ladder.

First variables are 0 and 3 for a and b.

a_n=0x0, b_n=0x3, result_n=0x5 a_n=0x3, b_n=0x6, result_n=0x5 a_n=0x6, b_n=0xf, result_n=0x5 a_n=0x18, b_n=0x3f, result_n=0x5 a_n=0x3f, b_n=0x66, result_n=0x5 a_n=0x66, b_n=0xf3, result_n=0x5 a_n=0xf3, b_n=0x180, result_n=0x5 a_n=0x180, b_n=0x3f3, result_n=0x5 a_n=0x3f3, b_n=0x666, result_n=0x5 a_n=0x666, b_n=0xf3f, result_n=0x5 a_n=0xf3f, b_n=0x1818, result_n=0x5 a_n=0x1818, b_n=0x3f0f, result_n=0x5 a_n=0x3f0f, b_n=0x6606, result_n=0x5 a_n=0x6606, b_n=0xf303, result_n=0x5 a_n=0xf303, b_n=0x18000, result_n=0x5 a_n=0x18000, b_n=0x3f303, result_n=0x5 ...

We see that the following algorithm is manifests itself: find initial/seed pair of values, then try a value as a, and find corresponding b. Then use value of b as a, and find another b. And so on.

Step 1.

Find initial/seed pairs under a<0x1000 and b<0x1000.

Keep in mind, it's still possible to have solutions like:

x y x z

Add a,b and b,a pairs as blocking clause, collect all values.

Step 2.

Inf loop Set a to one of collected value (via assume()) Find all b's, add them to collected values. Add a to blocking clause.

List all solutions, also swap a and b. If a<=b, count that pair as a solution.

(See also: OEIS A000032: Lucas numbers.)

This algorithm is implemented in: PE_877_878.py.

And of course, there is some patterns in these values, but I couldn't crack it.

Now new version: PE_877.py. Result: PE_877_result.txt.

The result is correct.

Now PE 878. As k grows, patterns becomes less visible, if visible at all: k_0xf41fd.txt.

Enumerating all k under \( 10^6 \), this code worked for several days on 13 ARM CPU cores. Yes, painfully slow. But the result is correct. I managed to solve this problem after all, without any specific math knowledge about GF(2).

This is the coolness of SAT/SMT -- it's very helpful for prototyping work, experimenting.

]]>It was easy, I solved it using Wolfram Mathematica:

countSolutions[n_] := Length[Solve[1/x + 1/y == 1/n && x > 0 && y > 0 && x >= y, {x, y}, Integers]] Do[cnt = countSolutions[n]; If[cnt > 1000, Print[{n, cnt}]; Abort[]], {n, 1, 10^10}]

Slow (several hours), but doable.

PE 110 problem harder -- the bar is raised up to \( 4 \cdot 10^6 \) solutions. No bruteforce is possible.

In the thread on PE about problem 108, I've found a mention about OEIS A018892. I hooked on this clue and tried not to read other solutions and hints. OK, what is this?

A018892 Number of ways to write 1/n as a sum of exactly 2 unit fractions a(n) = (tau(n^2)+1)/2. Number of elements in the set {(x,y): x|n, y|n, x<=y, gcd(x,y)=1}

In Wolfram Mathematica, it can be expressed as:

A018892[n_] := (Length[Divisors[n^2]] + 1)/2

The problem is now different. Find such a (minimal) number for which A018892 function will exceed \( 4 \cdot 10^6 \). In other words, find a (minimal) number for which Divisors[] function will exceed \( 2 \cdot 4 \cdot 10^6 - 1 \).

Let's ask ourselves, how to find number of divisors? Oh, it's easy, just factorize that number. For example: 12348. Run msieve with it. It will say:

Wed Jul 17 00:56:20 2024 factoring 12348 (5 digits) Wed Jul 17 00:56:20 2024 p1 factor: 2 Wed Jul 17 00:56:20 2024 p1 factor: 2 Wed Jul 17 00:56:20 2024 p1 factor: 3 Wed Jul 17 00:56:20 2024 p1 factor: 3 Wed Jul 17 00:56:20 2024 p1 factor: 7 Wed Jul 17 00:56:20 2024 p1 factor: 7 Wed Jul 17 00:56:20 2024 p1 factor: 7

Indeed, \( 2\cdot 2\cdot 3\cdot 3\cdot 7\cdot 7\cdot 7==12348 \). Another way of representing it is: \( 2^ 2\cdot 3^2 \cdot 7^3 \).

This expression can be viewed as: \( p1^{c1} \cdot p2^{c2} \cdot ...\), where p1, p2 - prime numbers and c1, c2 -- coefficients or powers or exponents.

Number of divisors is: \( (c1+1)(c2+1)(c3+1)... \) For our number, coefficients are 2, 2 and 3. \( (2+1)(2+1)(3+1)=36 \).

Yes, 36 divisors, let's check in Wolfram Mathematica:

In[]:= Length@Divisors[12348] Out[]= 36

Now another question, how can you find a number that has 4099948 solutions? Or \( 2 \cdot 4099948-1=8199895 \) divisors? Ah, it's easy. Factor this number:

In[]:= FactorInteger[2*4099948-1] Out[]= {{5, 1}, {11, 1}, {29, 1}, {53, 1}, {97, 1}}

There are 5 primes: 5, 11, 29, 53 and 97. But in fact, these are only coefficients to another expression.

We only have to find p1, p2, p3, p4 and p5 for the expression: \( p1^{5-1} \cdot p2^{11-1} \cdot p3^{29-1} \cdot p4^{53-1} \cdot p5^{97-1} \). Let's plug first 5 primes:

In[]:= p1=2; p2=3; p3=5; p4=7; p5=11; In[]:= x=p1^(5-1) * p2^(11-1) * p3^(29-1) * p4^(53-1) * p5^(97-1) Out[]= 291936574071356139797183542300802080273263700534750232097183894934260994188740440655397\ 62795433862023628730959481433237126302432907576845964140240483880043029785156250000 In[]:= Length@Divisors[x] Out[]= 8199895

In other words, number of solutions of equation \( \frac{1}{x} + \frac{1}{y} = \frac{1}{n} \) is 4099948, where n is:

In[]:= Sqrt[x] Out[]= 5403115527835363060896154958947115524913225008014850942090010084438627775073242187500

So far so good. But this number is too big.

Now the problem is to find such a group of 5 (distincs) primes for which that number will be minimized.

Simplest idea -- just sort that list, let first (smallest) prime have biggest coefficient and last (biggest) prime have smallest coefficient.

#!/usr/bin/python3 import math # https://math.stackexchange.com/questions/1039519/finding-prime-factors-by-taking-the-square-root # DIY factoring functions, but for small numbers it is OK def factor_helper(n): for i in range(2, int(math.ceil(math.sqrt(n)))+1): if n % i == 0: return i # this is prime return n def factor(n): rt=[] while True: p=factor_helper(n) if p==1: break rt.append (p) n=n//p return rt first_primes=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] smallest_result=-1 smallest_divisors=-1 smallest_solutions=-1 def try_(solutions_we_want, divisors_we_want): global smallest_result global smallest_divisors global smallest_solutions factors=factor(divisors_we_want) if max(factors)>100: # skip it, because we need small results anyway return factors=sorted(factors, reverse=True) rt=1 for p, f in zip(first_primes, factors): rt*=p**(f-1) try: result=int(math.sqrt(rt)) except OverflowError: # skip it, because we need small results anyway return print (divisors_we_want, result) if (smallest_result==-1) or (result<smallest_result): smallest_result=result smallest_divisors=divisors_we_want smallest_solutions=solutions_we_want #for solutions_we_want in range(1000, 2000): # for PE 108 for solutions_we_want in range(4*10**6, 4*10**6+100000): # for PE 110 divisors_we_want=(solutions_we_want*2)-1; try_ (solutions_we_want, divisors_we_want) print (f"{smallest_result=}") print (f"{smallest_solutions=}") print (f"{smallest_divisors=}")

And that worked. For PE 108:

smallest_result=180180 smallest_solutions=1013 smallest_divisors=2025

For PE 110:

smallest_result=9350130049860600 smallest_solutions=4018613 smallest_divisors=8037225

And both 180180 and 9350130049860600 numbers are happens to be there:

A126098 Where records occur in A018892.

As if Project Euler author took idea right from here. Maybe it was so. But the code at OEIS is simpler than mine.

]]>See also chapter 5 here.

And so, this code generates two similar tables, but the second part is slightly faster:

#!/usr/bin/env python3 for x in range(15): print (x, x*x) print ("") a=0 for x in range(15): print (x, a) a = a + 2*x + 1

Let's see why increasing square by 2x+1 is like recalculating square at each iteration.

Imagine a square with 4 'dots' side:

**** **** **** ****

How would you grow it by 1 'dot'? Add a line at top and at right:

++++. ****+ ****+ ****+ ****+

There are two lines of length 4 '+' and also one corner '.'. The resulting square has side of 5 dots.

So, in the 2x+1 expression, 2x is the number of '+' to by added in two 'lines', and +1 is that corner '.'.

This how you 'grow' square without recalculating x^2. And that may be faster sometimes, as I stated before.

Now about x^3:

#!/usr/bin/env python3 for x in range(0, 15): print (x, x*x*x) print ("") a=0 for x in range(0, 15): print (x, a) a = a + 3*x*x + x*3 + 1

To grow a cube, you add this number of 'dots': 3x^2 + 3x + 1.

I'm going to demonstrate all this visually. Here is a cube with side of 4 cubelets. 4*4*4 cube is gray:

How do you 'grow' it?

- You add 3 square plates of size 4*4 to 3 sides of cube: blue cubelets: 3x^2 of them.
- There are also 3 red 'lines' to cover newly formed 'trenches': 3x of them.
- And there is also one single green 'corner' cubelet to be added: +1.

Thus you can 'grow' a cube without recalculating x^3 each time, with the help of calculus. These expressions are actually derivatives of x^2 and x^3.

Also, Wolfram Mathematica notebook I used is here.

Further work -- tesseract.

]]>The solution using Google OR-tools:

#!/usr/bin/python3 # pip install ortools # https://pypi.org/project/ortools/ from ortools.constraint_solver import pywrapcp def main(): solver = pywrapcp.Solver("PE_142") max_limit=1000000 max_limit_sqr=(max_limit*2)**2 x=solver.IntVar(0, max_limit, "x") y=solver.IntVar(0, max_limit, "y") z=solver.IntVar(0, max_limit, "z") a1=solver.IntVar(0, max_limit_sqr, "a1") a2=solver.IntVar(0, max_limit_sqr, "a2") a3=solver.IntVar(0, max_limit_sqr, "a3") a4=solver.IntVar(0, max_limit_sqr, "a4") a5=solver.IntVar(0, max_limit_sqr, "a5") a6=solver.IntVar(0, max_limit_sqr, "a6") solver.Add(x>y) solver.Add(y>z) solver.Add(z>0) solver.Add(a1>0) solver.Add(a2>0) solver.Add(a3>0) solver.Add(a4>0) solver.Add(a5>0) solver.Add(a6>0) solver.Add(x + y == a1*a1) solver.Add(x - y == a2*a2) solver.Add(x + z == a3*a3) solver.Add(x - z == a4*a4) solver.Add(y + z == a5*a5) solver.Add(y - z == a6*a6) # objective objective = solver.Minimize(x+z+y, 1) solution = solver.Assignment() db = solver.Phase([x,y,z], solver.CHOOSE_MIN_SIZE_LOWEST_MAX, solver.ASSIGN_MIN_VALUE) solver.NewSearch(db, [objective]) assert solver.NextSolution()==True print("x: ", x.Value()) print("y: ", y.Value()) print("z: ", z.Value()) print() solver.EndSearch() print("failures:", solver.Failures()) print("branches:", solver.Branches()) print("WallTime:", solver.WallTime()) main()

% my_time.py ./1.py ['./1.py'] x: 434657 y: 420968 z: 150568 failures: 215193 branches: 430387 WallTime: 457060 ['/home/i/dotfiles/bin/my_time.py', './1.py'] seconds: 457 or: 7m37s

Thanks to Håkan Kjellerstrand for his snippets, including those for OR-tools, which are so easy to study and modify...

Z3 SMT solver should work as well, but don't.

]]>Here is a real Golang code I just wrote. It finds a valid email in input string, using regular expression. Like anyone would do.

The task is to collect emails from a (very) big text file(s). (Not that I'm very proud of what I do.)

But when I first check for "@" and "." character(s) (at least one must present) in input string, that gives some speed-up:

if (strings.Contains(text, "@")) { if (strings.Contains(text, ".")) { isEmailValid(text) } }

(My regexp is: `[a-z0-9][a-z0-9._\-]{1,25}[a-z0-9]@[a-z0-9][a-z0-9.\-]+[a-z0-9]\.[a-z]{2,25}`. And of course I compile it once, at start.)

For example, I tried to search for emails in a bunch of random RFC text files (~9300 files, ~513 MB). AMD Ryzen 5 3600, one thread. Non-optimized regexp matcher (email_extract_test_v1.go) -- ~43 seconds, my fancy version with two additional strings.Contains() calls (email_extract_test_v2.go) -- only ~2.3 seconds!

Summarizing, you may want first to check if a character(s) or a substring(s) is present in input string, using a standard string function(s) of your programming language, before running regexp matcher with a complex expression. This may be a speed improvement.

Also, your string is already in L(0|1)d? (regexp pun intended) cache after these checks, so a regexp matcher has a fast (enough) access to it.

Also, characters can be counted. For example, "@" must occur only once in email. Counting it using your PL's library function may improve speed as well. (But beware - only one email from a text string would be extracted instead of multiple.)

if (strings.Count(text, "@")==1) { if (strings.Contains(text, ".")) { isEmailValid(text) } }

But in my case (email_extract_test_v3.go), that was again ~2.3 seconds, almost no improvement. (But an output result is slightly shorter due to a bug described above.)

Also, please note, all measurements should be done after the *cache warm-up*.
That is, run your code several times before measurement(s),
so that your filesystem's driver will populate its cache with your test files
(at least partially).
Of course, the computer I used for these experiments has more RAM
than ~513 MB of test text files.

And of course, other regexp libraries (in your favorite PL(s)) may exhibit very different results from mine. YMMV, as they say.

(Updated 20240707 17:44:54 CEST.)

This is also may be faster, if you want to find a "CAD" substring that is not a part of other word:

if strings.Contains(str, "CAD") { b1, _ = regexp.MatchString("\\bCAD\\b", str) }]]>

Calculate \( \sqrt[3]{65} \)

#!/usr/bin/env python3 volume=65 # litres # as cube print (volume**(1./3.))

We got ~4. This is a cube with side of 4 decimetres. Or, if you prefer, 40cm * 40cm * 40cm.

But my backpack is not cubical. It's rather a short bar, maybe. Find two cubes' dimensions that equals to 65 litres:

\( \sqrt[3]{\frac{65}{2}} \)

#!/usr/bin/env python3 volume=65 # litres # as two cubes print ((volume/2)**(1./3.))

We got 3.1 decimeters. That is two cubes with side of 31 cm. Or, a (short) bar with dimensions: 31cm * 31cm * 31*2 cm. Or, 31cm * 31cm * 62 cm.

What if my backpack is longer? Like, 3 cubes?

\( \sqrt[3]{\frac{65}{3}} \)

#!/usr/bin/env python3 volume=65 # litres # as three cubes print ((volume/3)**(1./3.))

We got ~2.8 decimeters. That is, my backpack may have dimensions 28cm * 28cm * 28*3cm. Or, 28cm * 28cm * 84cm -- this is close to reality. (Of course, 28*28*84 = 65856.)

Python has no cube root function, but \( \sqrt[x]{y} = y^{\frac{1}{x}} \) by convention.

]]>In fact, it's possible to encrypt using SSH keys. Why not?

My toy SSH client can download public key from arbitrary SSH server, I added an option for it. Here I access my live webserver actually:

% ./toyssh_v5.py -h vps7.yurichev.org -save_serv_pub_key -server_host_algo rsa-sha2-256 Server's public key saved to vps7.yurichev.org.ssh_host_rsa_key.pub Fatal error: can't go further without username/passsword/pubkey/prikey. exiting.

(*-server_host_algo* option is important,
otherwise ECDSA or ED25519 key would be saved instead.)

The resulting file is almost the same as on my webserver, excluding the last user@host part:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCxWpYVDk6hDsgwSODF0jHx1SmGnhuGzswlkqyua5NN1t9u4k5zQJi0WI/iiZH8wE51u2htujxf7ie/DZ26fh0uuHfz6EA7qPdtCB+q4kT8tlXVAWzUw3TslnAKotk36N4euwt9vGSslBHE3GHP7uIOULajtSrfdFU7GpqQRogu0zs2jr3S5Sgmx0CVX9rS3ZurwZ816UJCCOmkoanmM1CSwVfL18iLcs+asxuPAZVu1ZvqWwv6MrM3vdmU92DANxIF4wrrwYiJhrIKVhDfNHxNIn2O5akad73JVzCgSzZil9GIagdIo0LO/dyH2r6BoaGjuqbK2vEBdS8tMrhL+BbD5EUUUN/sHopbHwl1Gska4fs7zIoh1clO+1bHfze/8U8lxezxPka+ctl5vzOYTUFc+BS8t8jCRqBCC5iN/YtAPmPWbfxA2VzXCh1Ny4a+e9HB6ZchNvUq5N8sfchCdIeasEtmQ2EVfaLbObjnMRVGDRiH5myiEQWzxO80ti1cf5k= root@vps7.yurichev.org

It's no secret. It's the same as on my webserver:

root@vps7:~# cat /etc/ssh/ssh_host_rsa_key.pub ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCxWpYVDk6hDsgwSODF0jHx1SmGnhuGzswlkqyua5NN1t9u4k5zQJi0WI/iiZH8wE51u2htujxf7ie/DZ26fh0uuHfz6EA7qPdtCB+q4kT8tlXVAWzUw3TslnAKotk36N4euwt9vGSslBHE3GHP7uIOULajtSrfdFU7GpqQRogu0zs2jr3S5Sgmx0CVX9rS3ZurwZ816UJCCOmkoanmM1CSwVfL18iLcs+asxuPAZVu1ZvqWwv6MrM3vdmU92DANxIF4wrrwYiJhrIKVhDfNHxNIn2O5akad73JVzCgSzZil9GIagdIo0LO/dyH2r6BoaGjuqbK2vEBdS8tMrhL+BbD5EUUUN/sHopbHwl1Gska4fs7zIoh1clO+1bHfze/8U8lxezxPka+ctl5vzOYTUFc+BS8t8jCRqBCC5iN/YtAPmPWbfxA2VzXCh1Ny4a+e9HB6ZchNvUq5N8sfchCdIeasEtmQ2EVfaLbObjnMRVGDRiH5myiEQWzxO80ti1cf5k= root@ubuntu-4gb-hel1-1

Now anyone can get my RSA public key. And I have corresponding RSA private key. Can anyone send me an encrypted message? Yes. But I don't know any standard tool for that. I wrote mine.

#!/usr/bin/env python3 # May need to be upgraded: # python3 -m pip install --upgrade cryptography import cryptography.hazmat.primitives.serialization import cryptography.hazmat.primitives.asymmetric.padding import cryptography.hazmat.primitives.hashes import sys # load pub key from server: f=open(sys.argv[1], "rb") contents=f.read() f.close() pub_key=cryptography.hazmat.primitives.serialization.load_ssh_public_key(contents) plaintext=b"Hi, Dennis. The Secret Meetup is today, at The Venue 123. 1800 UTC." # https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/#encryption algorithm=cryptography.hazmat.primitives.hashes.SHA256() mgf=cryptography.hazmat.primitives.asymmetric.padding.MGF1(algorithm=algorithm) ciphertext=pub_key.encrypt(plaintext, cryptography.hazmat.primitives.asymmetric.padding.OAEP(mgf, algorithm, label=None)) f=open("cipher.bin", "wb") f.write(ciphertext) f.close()

% ./ssh_encrypt.py vps7.yurichev.org.ssh_host_rsa_key.pub

The resulting cipher.bin file is of 384 bytes. I can decrypt it on my webserver with the following code:

#!/usr/bin/env python3 # May need to be upgraded: # python3 -m pip install --upgrade cryptography import cryptography.hazmat.primitives.serialization import cryptography.hazmat.primitives.asymmetric.padding import cryptography.hazmat.primitives.hashes import sys # load ciphertext: f=open(sys.argv[1], "rb") ciphertext=f.read() f.close() f=open("/etc/ssh/ssh_host_rsa_key", "rb") contents=f.read() f.close() pri_key=cryptography.hazmat.primitives.serialization.load_ssh_private_key(contents, password=b"") # https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/#decryption algorithm=cryptography.hazmat.primitives.hashes.SHA256() mgf=cryptography.hazmat.primitives.asymmetric.padding.MGF1(algorithm=algorithm) plainext=pri_key.decrypt(ciphertext, cryptography.hazmat.primitives.asymmetric.padding.OAEP(mgf, algorithm, label=None)) print (plainext)

root@vps7:~/1# ./ssh_decrypt.py cipher.bin b'Hi, Dennis. The Secret Meetup is today, at The Venue 123. 1800 UTC.'

Neat. But be warned. I did this as a coding exercise, as a proof-of-concept. As it happens in cryptography often, this solution may be insecure if implemented incorrectly.

Why ssh-keygen can sign/verify but not encrypt/decrypt? I don't know but maybe because all this is possible only with RSA keys. But today, some sysadmins disable RSA keys in favor of ECDSA and ED25519 keys, which only allows signing/verifying. ssh-keygen's feature of encryption, if implemented, would not be universal. (This is only my speculation.)

]]>This is a toy SSH client supports ECDH and ECDSA.

New KEX algos: diffie-hellman-group18-sha512, ecdh-sha2-nistp256, ecdh-sha2-nistp384, ecdh-sha2-nistp521.

New server/host algo: ecdsa-sha2-nistp256.

ECDH is used in place of DH and called 'ecdh-sha2-nistp256' - SHA2-256 will be used with it and NIST P-256 curve. So if KEX algorithm 'ecdh-sha2-nistp256' is picked, it will be used instead of DH.

I also added ECDSA support, which is used as a digital signature for DH or ECDH reply from server.

Thanks to cryptography.io module/library, these addition to my toyssh client were easy, as easy, as for the last version of my toytls client.

However, needless to say that TLS uses ASN1 encoding actively, as well as cryptography.io module. But in SSH protocol, ECDSA signature is transmitted as two bignums - r and s (as in DSA). (See calls to decode_dss_signature() and encode_dss_signature() functions -- these convert between ASN1 encoding and pair of bignums.)

ECDSA is like upgraded DSA, and despite DSA outdatedness, I added it to SSH nevertheless, just as a stepping stone towards ECDSA. It helped me. So I advise to learn DSA first, before ECDSA.

Also, you can login to server using ECDSA key. Generate it as: 'ssh-keygen -t ecdsa'.

Then copy-paste contents of id_ecdsa.pub file to $HOME/.ssh/authorized_keys file.

Then login: 'ssh user@host -i ~/tmp/id_ecdsa'. (Or add this info to $HOME/.ssh/config file.)

Toyssh can also login with such a key:

./toyssh_v5.py -h host -u user -pubkey id_ecdsa.pub -prikey id_ecdsa -server_host_algo ecdsa-sha2-nistp256

Here it is used instead of RSA.

Now a very important thing to understand: ECC cryptography is not mandatory to be used for both steps (KEX and signature).

Here is I use ECDH instead of DH, but (old) RSA signature:

toyssh_v5.py -h host -u user -pass pw -kex ecdh-sha2-nistp256 -server_host_algo rsa-sha2-256

And here I use (old) DH and ECDSA:

toyssh_v5.py -h host -u user -pass pw -kex diffie-hellman-group14-sha256 -server_host_algo ecdsa-sha2-nistp256

Usual openssh commands are:

ssh -v -oKexAlgorithms=diffie-hellman-group-exchange-sha256 -oHostkeyAlgorithms=ecdsa-sha2-nistp256 i@localhost

... and:

ssh -v -oKexAlgorithms=ecdh-sha2-nistp256 -oHostkeyAlgorithms=rsa-sha2-256 user@host

(But (open)SSH client will grumble about fingerprints saved/cached in '$HOME/.ssh/known_hosts' when switching between host key algorithms.)

Both RSA and ECDSA can sign DH or ECDH reply from server.

For that, almost all modern SSH servers supports at least three signature algorithms: RSA, ECDSA, ED25519. Run 'ls /etc/ssh' and you'll see private/public keys for all three algorithms -- they were generated during your Unix installation.

Bottom line: this toy SSH can connect to almost any modern SSH server, including those with disabled (outdated and/or old) RSA and DH algorithms.

Issues: 'keyboard-interactive' auth is not supported. (Required by SSH-2.0-OpenSSH_9.5 FreeBSD-20231004, for example.)

]]>Function: logb x This function returns the binary exponent of x. More precisely, if x is finite and nonzero, the value is the logarithm base 2 of |x|, rounded down to an integer. If x is zero or infinite, the value is infinity; if x is a NaN, the value is a NaN. (logb 10) ⇒ 3 (logb 10.0e20) ⇒ 69 (logb 0) ⇒ -1.0e+INF

( https://www.gnu.org/software/emacs/manual/html_node/elisp/Float-Basics.html )

It's indeed so. IEEE 754 usually encoded in binary, including exponent.

Let's grind out the exponent from the single-precision IEEE 754 number:

// copypasted from https://stackoverflow.com/questions/15685181/how-to-get-the-sign-mantissa-and-exponent-of-a-floating-point-number #include <stdio.h> #include <stdlib.h> typedef union { float f; struct { unsigned int mantisa : 23; unsigned int exponent : 8; unsigned int sign : 1; } parts; } float_cast; int main(int argc, char *argv[]) { int x = strtod(argv[1], NULL); float_cast d1 = { .f = x }; printf("sign = %d\n", d1.parts.sign); printf("exponent = %d\n", d1.parts.exponent); // https://en.wikipedia.org/wiki/Exponent_bias printf("exponent-exponent_bias(127) = %d\n", d1.parts.exponent-127); printf("mantisa = %d\n", d1.parts.mantisa); }

% ./a.out 123456 sign = 0 exponent = 143 exponent-exponent_bias(127) = 16 mantisa = 7413760

Yes:

% python3 >>> import math >>> math.log(123456, 2) 16.913637428049103And this is nice, zero mantissa:

% ./a.out 1024 sign = 0 exponent = 137 exponent-exponent_bias(127) = 10 mantisa = 0 % ./a.out 65536 sign = 0 exponent = 143 exponent-exponent_bias(127) = 16 mantisa = 0 ...

Calculate: \( 2^{10}=1024 \), \( 2^{16}=65536 \).

Also, all this can be checked with the online IEEE 754 calculator.

]]>It's indeed a good idea.

- There is a huge difference between 1G and 2G.
- There is a (maybe significant) difference between 5G and 6G.
- There is a (slight) difference betewen 10G and 11G.
- The difference between 100G and 101G is hardly noticable.
- The difference between 1000G and 1001G is negligible.

Of course, no one really need to allocate precisly 1001G to VM. 1000G would be OK, as well as 1024G and maybe even 1050G, 1100G...

]]>In Spring 2023, I scanned ~65k random SSH hosts.

Most popular SSH server banners:

16467 serv_banner: SSH-2.0-OpenSSH_7.4 7070 serv_banner: SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.11 5465 serv_banner: SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.6 3941 serv_banner: SSH-2.0-OpenSSH_8.0 2824 serv_banner: SSH-2.0-OpenSSH_7.9p1 Debian-10+deb10u4 2438 serv_banner: SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.5 2271 serv_banner: SSH-2.0-OpenSSH_8.4p1 Debian-5+deb11u3 1837 serv_banner: SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.7 1668 serv_banner: SSH-2.0-OpenSSH_8.9p1 Ubuntu-3ubuntu0.7 1612 serv_banner: SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.8 1376 serv_banner: SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u2 1042 serv_banner: SSH-2.0-OpenSSH_7.9p1 Debian-10+deb10u2 1002 serv_banner: SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.10 932 serv_banner: SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.3 855 serv_banner: SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u7 830 serv_banner: SSH-2.0-OpenSSH_8.7 671 serv_banner: SSH-2.0-OpenSSH_9.6 646 serv_banner: SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.5 588 serv_banner: SSH-2.0-OpenSSH_8.4p1 Debian-5+deb11u1 520 serv_banner: SSH-2.0-OpenSSH_9.0 365 serv_banner: SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.9 362 serv_banner: SSH-2.0-OpenSSH_9.3p1 Ubuntu-1ubuntu3.2 313 serv_banner: SSH-2.0-OpenSSH_8.4p1 Debian-5 311 serv_banner: SSH-2.0-OpenSSH_9.6p1 Ubuntu-3ubuntu13 302 serv_banner: SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1 287 serv_banner: SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.3 280 serv_banner: SSH-2.0-OpenSSH_8.9p1 263 serv_banner: SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.2 254 serv_banner: SSH-2.0-OpenSSH_7.9 FreeBSD-20200214 232 serv_banner: SSH-2.0-OpenSSH_8.4p1 ...

KEX algorithms offered:

64328 kex_algorithms: diffie-hellman-group-exchange-sha256 62699 kex_algorithms: curve25519-sha256@libssh.org 61952 kex_algorithms: ecdh-sha2-nistp256 61950 kex_algorithms: ecdh-sha2-nistp384 61944 kex_algorithms: ecdh-sha2-nistp521 57380 kex_algorithms: curve25519-sha256 56943 kex_algorithms: diffie-hellman-group16-sha512 56864 kex_algorithms: diffie-hellman-group18-sha512 56256 kex_algorithms: diffie-hellman-group14-sha256 34628 kex_algorithms: diffie-hellman-group14-sha1 26616 kex_algorithms: kex-strict-s-v00@openssh.com 18861 kex_algorithms: diffie-hellman-group-exchange-sha1 15732 kex_algorithms: diffie-hellman-group1-sha1 11834 kex_algorithms: sntrup761x25519-sha512@openssh.com ...

Server host algorithms offered:

63704 server_host_algorithms: rsa-sha2-256 63703 server_host_algorithms: rsa-sha2-512 60837 server_host_algorithms: ssh-ed25519 60339 server_host_algorithms: ecdsa-sha2-nistp256 52337 server_host_algorithms: ssh-rsa 2015 server_host_algorithms: ssh-dss 847 server_host_algorithms: rsa-sha2-512-cert-v01@openssh.com 847 server_host_algorithms: rsa-sha2-256-cert-v01@openssh.com 832 server_host_algorithms: ssh-rsa-cert-v01@openssh.com ...

MAC algorithms offered:

64134 mac_algorithms: hmac-sha2-256 64120 mac_algorithms: hmac-sha2-512 62094 mac_algorithms: hmac-sha2-256-etm@openssh.com 62066 mac_algorithms: hmac-sha2-512-etm@openssh.com 61446 mac_algorithms: umac-128-etm@openssh.com 61035 mac_algorithms: umac-128@openssh.com 60644 mac_algorithms: hmac-sha1 59133 mac_algorithms: hmac-sha1-etm@openssh.com 54943 mac_algorithms: umac-64@openssh.com 54675 mac_algorithms: umac-64-etm@openssh.com ...

Encryption algorithms offered:

64588 encryption_algorithms: aes256-ctr 64024 encryption_algorithms: aes128-ctr 60953 encryption_algorithms: aes256-gcm@openssh.com 60401 encryption_algorithms: aes128-gcm@openssh.com 60109 encryption_algorithms: chacha20-poly1305@openssh.com 59000 encryption_algorithms: aes192-ctr 18820 encryption_algorithms: aes128-cbc 18790 encryption_algorithms: aes256-cbc 14933 encryption_algorithms: 3des-cbc 14679 encryption_algorithms: aes192-cbc 13638 encryption_algorithms: blowfish-cbc 13459 encryption_algorithms: cast128-cbc ...

RSA modulus in case of RSA negotiation:

36512 binlog(RSA_modulus_n): 2048 25764 binlog(RSA_modulus_n): 3072 1134 binlog(RSA_modulus_n): 4096 233 binlog(RSA_modulus_n): 1024 ...

My other blog posts about SSH protocol dissected: 1, 2, 3, 4.

]]>HTTPS certificates (included those obtained from Let's Encrypt) can be used for that as well.

This is a real data on my https://yurichev.com webserver.

root@vps7:/etc/letsencrypt/live/yurichev.com# ls -la ... lrwxrwxrwx 1 root root 42 Apr 4 23:05 cert.pem -> ../../archive/yurichev.com-0001/cert22.pem lrwxrwxrwx 1 root root 43 Apr 4 23:05 chain.pem -> ../../archive/yurichev.com-0001/chain22.pem lrwxrwxrwx 1 root root 47 Apr 4 23:05 fullchain.pem -> ../../archive/yurichev.com-0001/fullchain22.pem lrwxrwxrwx 1 root root 45 Apr 4 23:05 privkey.pem -> ../../archive/yurichev.com-0001/privkey22.pem ...

Here I create a message:

# cat msg.txt Hello world. Yes, this is me, D.Y.

Here I use my private key for my webserver (created by certbot):

# openssl dgst -sha256 -sign privkey.pem -out msg.txt.sig msg.txt

The file is binary, but you can convert it to base64 using openssl:

# openssl base64 -in msg.txt.sig -out msg.txt.sig.asc # cat msg.txt.sig.asc k8XS4R3Urpa3G8dqlFCKkPsg76LpwQ7F+jRogLy0dkyJeChbgn+erAmqZLj4SQ1I tNYPjeJjIRwgCWHTPGowvCz2kNgofRkSuSvvH3N+ju1tv7O3GJAQaJiXyvsZRb9V PVZnfb/fz5m3mC3P67QO0ePc22miVbP5LuMceeSj5Sfl/JDwO/JdYvSCs1PkSms2 i22HoYOSo2s//Kb5k+ggJlioo82K5j9a/QCCTOix5VyEtBiS7jGMRap1uMd7ia2b jgRDf6M2LytZJFqhfcM4a8VEAPTI2i82+JqsBnet3QvCjj9FlwaQqbgGW5UHptxJ +em2Vy2FNW2GwmeT9PiAPw==

I publish msg.txt and msg.txt.sig.asc.

Now anyone can download my public key from my webserver:

$ openssl s_client -connect yurichev.com:443 -showcerts 1.pem $ openssl x509 -pubkey -noout -in 1.pem > pubkey.pem

(sed is used here to get only certificate... There are two of them, actually, but OK.)

Convert base64-encoded signature back to binary form:

$ openssl base64 -d -in msg.txt.sig.asc -out msg.txt.sig

Verify it all:

$ openssl dgst -keyform pem -verify pubkey.pem -signature msg.txt.sig msg.txt Verified OK

The second half of this recipe can be done by anyone -- my webserver's public key is easily obtainable.

The only problem to keep in mind is that Let's Encrypt certs are short-lived (3 months). Also, my private key may be stolen.

And now something SSH can't do. (Yet?)

Given the fact you can have my public key so easy, you can encrypt a message for me.

% cat to_dennis.txt Hi, Dennis. The Secret Meetup is today, at The Venue 123. 1800 UTC. % openssl pkeyutl -in to_dennis.txt -out to_dennis.enc -pubin -inkey pubkey.pem -encrypt

The resulting to_dennis.enc is in binary form, you can convert it to base64 or not. Send me that file somehow.

Now I can decrypt it using my private key of my webserver:

root@vps7:/etc/letsencrypt/live/yurichev.com# openssl pkeyutl -in to_dennis.enc -out to_dennis.txt -inkey privkey.pem -decrypt root@vps7:/etc/letsencrypt/live/yurichev.com# cat to_dennis.txt Hi, Dennis. The Secret Meetup is today, at The Venue 123. 1800 UTC.

Anyone can encrypt a message to me, so easilty. I didn't even distribute my public key. It's only on my webserver. And only I can decrypt the message -- well, if my private key wasn't stolen.

This may be handy if you need to say something to someone urgently, in secure form. And you have no time to setup PGP or another secured messenger. And/or you send cold/unsolicited email, out of the blue. This is possible if your recipient has running HTTPS server, nothing else.

You can even publish this encrypted message on your website, soc.network profile, etc. Only recipient can decrypt it. (Well, again, if his/her private key will not be stolen in some (distant) future.)

IOW, anyone with HTTPS server can receive encrypted messages from strangers. Without any additional key setup/exchange.

UPD: 20240526 16:00:45 UTC:

You can post your encrypted and base64-encoded message to, say, pastebin-like service. To get webserver's owner's attention, you can run curl or something like that, many times per second against his/her webserver, so he/she will notice bloated log files.

It's like a small DoS attack.

URL to pastebin can be transmitted in referer's part: curl --referer.

UPD: 20240528 11:48:10 UTC: Further work -- check server's certificate validity, expiration...

UPD: 20240529 13:33:09 EEST: Next part: Encrypt/decrypt with SSH keys

]]>Now with ECC (elliptic curve cryptography) support.

First, the TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 ciphersuite: ECDHE instead of DHE. This is the Diffie-Hellman algorithm somewhat 'ported' to EC. Thanks to cryptography.io Python module, ECDH(E) support is simple. See: generate_premaster_and_client_key_exchange_ECDHE(). So far, we don't want to delve into ECC math, because it's a bit complicated. At this point we can just think about ECDHE as 'upgraded' version of DHE. 'Points' are exchanged between server and client and shared secret is generated.

Only one curve is supported here: SECP256R1, but others can be added as well.

Also, these ciphersuites are added: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256.

ECDHE shared secret will be signed with ECDSA algorithm instead of RSA. That implies that the server has ECC public key instead of RSA public key.

Interestingly, some servers supports two certificates. Let's see at google.com, a report made by testssl.sh:

% ./testssl.sh --color 0 google.com ... Server Certificate #1 Signature Algorithm SHA256 with RSA Server key size RSA 2048 bits (exponent is 65537) Server key usage Digital Signature, Key Encipherment ... Server Certificate #2 Signature Algorithm SHA256 with RSA Server key size EC 256 bits (curve P-256) Server key usage Digital Signature

I added a feature to my ToyTLS so that it saves received certificates as is, in binary form. Which is, in fact, DER.

Let's connect as if client supports only RSA signatures:

% ./toytls_v4.py -save_certs -host google.com -cipher TLS_RSA_WITH_AES_128_CBC_SHA ... Connecting to google.com:443 google.com_TLS_RSA_WITH_AES_128_CBC_SHA_cert_1.der written Warning: mask in wildcard certs is not handled yet: *.google.com, but cert is checked anyway google.com_TLS_RSA_WITH_AES_128_CBC_SHA_cert_2.der written google.com_TLS_RSA_WITH_AES_128_CBC_SHA_cert_3.der written ... % openssl x509 -inform der -noout -text -in google.com_TLS_RSA_WITH_AES_128_CBC_SHA_cert_1.der Certificate: Data: ... Subject: CN = *.google.com Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (2048 bit) Modulus: 00:cc:4d:22:f1:14:c2:12:e6:1c:2d:97:93:c7:3f: 0c:4c:44:b3:b5:86:e8:d3:64:d7:b5:1b:87:b5:53: f5:81:98:21:72:03:15:d4:35:1a:21:e1:63:9f:bb: 5b:c7:4e:3f:3f:d6:d6:27:df:cf:bd:0e:b2:b5:78: ...

Now what if my client pretend it only supports ECC?

% ./toytls_v4.py -save_certs -host google.com -cipher TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA % openssl x509 -inform der -noout -text -in google.com_TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA_cert_1.der Certificate: ... Subject: CN = *.google.com Subject Public Key Info: Public Key Algorithm: id-ecPublicKey Public-Key: (256 bit) pub: 04:e0:26:2a:c5:37:13:59:81:1c:e4:4b:eb:97:03: 0d:59:d2:51:ea:a5:09:0f:50:91:d1:66:0f:4b:a0: 43:1f:92:3e:9f:89:1f:15:38:c2:4a:57:30:ea:bb: 52:3d:21:a2:9a:a8:bb:fe:0d:c9:7b:2b:d7:a1:4d: 9c:7e:dc:f5:1e ASN1 OID: prime256v1 NIST CURVE: P-256 ...

Indeed, client sends ciphersuites it supports in ClientHello message and server can decide what to do. In our case, google.com has two certificates which are sent depending on what client supports.

By the way, second and third certificates in chain are RSA certificates, always the same. Second is used to sign first certificate, either RSA or ECC. (Third is used to sign second.)

Thanks to cryptography.io module, certificate chain can be verified using EC functions, see chk_cert_2().

And ECDHE information signature is verified in unpack_SSL3_MT_SERVER_KEY_EXCHANGE_chk_signature().

But again, a client may not do this. In fact, early versions of ToyTLS didn't check any signatures, but it could download a HTML page successfully. This is bad.

In other words, significant burden of all security lays on client side. Client must check signatures, in certificate chain and signature of ECDHE info.

Bottom line: this toy(-ish) TLS client can connect successfully to all ~1k top/popular/busiest websites, as in May 2024. It's under ~2k SLOC of Python, but relies on cryptography.io module heavily.

The files and the patched libressl-3.8.1 I used (with additional debugging/dumping printf statements (AKA "debugging breadcrumbs").

]]>Writing x=x+1 in code is too oldschool and verbose, but Python doesn't support C-style increment/decrement. It however supports statements like x+=1.

The following code can be used to find such statements and maybe even replace them:

#!/usr/bin/env python3 import re, sys for line in sys.stdin: line=line.rstrip() m=re.match(r"(.*)([a-z0-9_]+)=\2\+(.*)", line) if m!=None: print ("replace", m[0]) print ("to ", m[1]+m[2]+"+="+m[3])

Real data from my ToyTLS code:

replace s=s+": "+alert_type_s[err] to s+=": "+alert_type_s[err] replace cur_seq_n_to_serv=cur_seq_n_to_serv+1 to cur_seq_n_to_serv+=1 replace buf=buf+got to buf+=got replace all_certs=all_certs+certs to all_certs+=certs replace tmp=tmp+struct.pack("<BB", H, L) to tmp+=struct.pack("<BB", H, L) replace tmp=tmp+b"\x00\x0b\x00\x04\x03\x00\x01\x02" to tmp+=b"\x00\x0b\x00\x04\x03\x00\x01\x02"

Vim/Emacs fans can easily create a script for that, I suppose.

Update 20240519 18:50:10 EEST:

Vim search string:

/\([a-z0-9_]\+\)=\1+

Using grep against my ToySSH v4:

% cat toyssh_v4.py | grep '\([a-zA-Z0-9_]\+\)=\1+' ... KEX_ALGOS=KEX_ALGOS+"diffie-hellman-group1-sha1" CIPHER_ALGOS=CIPHER_ALGOS+"none" MAC_ALGOS=MAC_ALGOS+"hmac-sha2-256," SERVER_HOST_ALGOS=SERVER_HOST_ALGOS+"rsa-sha2-256," SERVER_HOST_ALGOS=SERVER_HOST_ALGOS+"ssh-dss" idx=idx+1 idx=idx+2 pkt_len=pkt_len+MAC_SIZE recv_seqno=recv_seqno+1 serv_to_client_ctr=serv_to_client_ctr+blocks_total recv_seqno=recv_seqno+1 send_seqno=send_seqno+1 idx=idx+0x10 idx=idx+1 padlen=padlen+16 buf=buf+cookie buf=buf+pack_str(encryption_algorithms_server_to_client) ...

(UPD: 20240617 15:25:12 CEST: As seen on reddit.)

]]>Funny thing, this is what I've found in libressl:

/* AEAD fixed nonce length. */ if (aead == EVP_aead_aes_128_gcm() || aead == EVP_aead_aes_256_gcm()) iv_len = 4; else if (aead == EVP_aead_chacha20_poly1305()) iv_len = 12; else goto err;

( libressl-3.8.1/ssl/tls12_key_schedule.c )

Usually you don't do pointer arithmetics on pointers to functions (senseless?), but you can compare them and this may be quite practical.

Hence, only equality operation is used on pointers to functions.

]]>This is the update of my ToyTLS client in Python that now supports GCM mode. Ciphersuites now supported are: TLS_RSA_WITH_AES_256_GCM_SHA384, TLS_DHE_RSA_WITH_AES_128_GCM_SHA256, TLS_DHE_RSA_WITH_AES_256_GCM_SHA384.

Couple of words about GCM mode (Galois/Counter Mode).

Take a look on diagram at wikipedia. The 'mult' operation is GF2 operation, yes. But it can be replaced with a regular multiplication operation without loss of security. GF2 multiplication is a close cousin of CRC function (CRC is division with remainder). CRC function could use regular division, but GF2 operations are just faster and more efficient. (I wrote about GF2 and CRC in my math notes.)

In short -- you can think that CRC is actually a division with remainder. And the 'mult' operation in GCM is actually a regular multiplication. This simplification makes things easier to understand.

Another property of CTR (counter mode) and GCM is involution. It's a property of function -- if applied twice, the result will return to the initial. Like, -(-x)=x. Or boolean: NOT(NOT(x))=x. If AES-CTR operation applied twice, plain text will be returned.

During my work on ToyTLS, I confusingly used GCM encryption operation instead of decryption and got plain text, but with attached tag/MAC (GCM encryption adds tag/MAC to the end of ciphertext). (GCM is like CTR but with simultaneous tag/MAC generation.)

The files and the patched libressl-3.8.1 I used (with additional debugging/dumping printf statements (AKA "debugging breadcrumbs").

]]>