[RevEng][CS] Fuzzy topological sorting

(It's assumed that the reader is familiar with topological sorting. (Re-)read the "Dependency graphs and topological sorting" in my book. Also with graphs, DAGs, callgraphs, simulated annealing.)

It's a problem with topological sorting --- no cycles in DAG allowed.

I can fix this. I'll find an order for some vertices so that the number of backward references would be minimized and forward references --- maximized. That would be the fitness function. I use simulated annealing, but many other similar algorithms would also work, I believe.

Now I take an old Linux 0.99.15 (modern Linux is too big for that). I find callgraph for it and find a fuzzy topological order.

That results in such list:

unzip
isofs_read_super
seagate_st0x_detect
ext_link
leak_check_malloc
sys_gettimeofday
sysv_file_write
main
opl3_close
con_write

...

OUTB
inb
put_fs_long
outw
NCR5380_reselect
EXT2_INODES_PER_GROUP
sound_getconf
; fitness at end:
; fitness_fn() fwd= 11494
; fitness_fn() back=50

Low-level functions on bottom, high-level functions on top. Funny thing - inb() and outw() (direct access to I/O ports) are low-level functions and con_write() is a high-level function. Indeed, Linus started kernel as a terminal emulator.

From a reverse engineer's point of view, we can quickly see the level of a function. How high or low it is. Level of abstraction. That maybe very useful. Because many reverse engineers often try to understand, what this or that function does --- is it on top of things, or small low-level utility?

Now also Boolector SMT solver:

translate_formula
parse_smt2_parser
btor_proputils_select_move_prop
apply_add_add_4_eq
btor_node_create_bv_const
set_up_dual_and_collect

...

getc
abs
BTOR_COUNT_STACK
ccadical_melt
BTOR_SIZE_STACK
ccadical_add
mpz_get_str
; fitness at end:
; fitness_fn() fwd= 17448
; fitness_fn() back=101

Here, parse_smt2_parser() is a very high-level function and ccadical_add() is one of at the very low - indeed, this is the function that add term/literal to Cadical SAT solver.

All files, including the utility to find order, in C++.

Further work: use this to sort basic blocks in function, to map better visual layout to execution timeline.

(the post first published at 20260109.)


List of my other blog posts. Subscribe to my news feed,
If you enjoy my work, you can support it on patreon.
Some time ago (before 24-Mar-2025) there was Disqus JS script for comments. I dropped it --- it was so motley, distracting, animated, with too much ads. I never liked it. Also, comments din't appeared correctly (Disqus was buggy). Also, my blog is too chamberlike --- not many people write comments here. So I decided to switch to the model I once had at least in 2020 --- send me your comments by email (don't forget to include URL to this blog post) and I will copy&paste it here manually.
Let's party like it's ~1993-1996, in this ultimate, radical and uncompromisingly primitive pre-web1.0-style blog and website. This website is best viewed under lynx/links/elinks/w3m.
Or use my zulip for feedback.