[Pentesting] Passwords from Wikipedia

Just a quick idea -- getting all possible words from English Wikipedia. (Including 'talk' pages.)

That will contain obscure provincial handball sportsmen, etc.

The resulting size of list of words is 13,029,545, which is of course orders of magnitude smaller that a list of all possible letters combinations. Would be easy for hashcat password cracker.


The same can be done for Russian Wikipedia, but rules of transliterations are to apply.

This can help with some hard Russian names like 'щербицкий':

shcherbickij shcherbickiy shcherbiczkij shcherbiczkiy shcherbitskij
shcherbitskiy shcherbitckij shcherbitckiy shchyerbickij shchyerbickiy
shchyerbiczkij shchyerbiczkiy shchyerbitskij shchyerbitskiy shchyerbitckij
shchyerbitckiy shherbickij shherbickiy shherbiczkij shherbiczkiy shherbitskij
shherbitskiy shherbitckij shherbitckiy shhyerbickij shhyerbickiy shhyerbiczkij
shhyerbiczkiy shhyerbitskij shhyerbitskiy shhyerbitckij shhyerbitckiy

Also, it's not uncommon to type password as a Russian word, but keyboard switched to Latin. Like, 'пароль' -> 'gfhjkm'.


Also, list of Ukrainian words from Ukrainian Wikipedia, transliterated.

Let's test with a hard 'запоріжжя' word:

zaporizhzhya zaporizhzhia zaporizhzhja zaporizhzha zaporizhshya zaporizhshia
zaporizhshja zaporizhsha zaporizhjya zaporizhjia zaporizhjja zaporizhja
zaporizhzjya zaporizhzjia zaporizhzjja zaporizhzja zaporishzhya zaporishzhia
zaporishzhja zaporishzha zaporishshya zaporishshia zaporishshja zaporishsha
zaporishjya zaporishjia zaporishjja zaporishja zaporishzjya zaporishzjia
zaporishzjja zaporishzja zaporijzhya zaporijzhia zaporijzhja zaporijzha
zaporijshya zaporijshia zaporijshja zaporijsha zaporijjya zaporijjia zaporijjja
zaporijja zaporijzjya zaporijzjia zaporijzjja zaporijzja zaporizjzhya
zaporizjzhia zaporizjzhja zaporizjzha zaporizjshya zaporizjshia zaporizjshja
zaporizjsha zaporizjjya zaporizjjia zaporizjjja zaporizjja zaporizjzjya
zaporizjzjia zaporizjzjja zaporizjzja saporizhzhya saporizhzhia saporizhzhja
saporizhzha saporizhshya saporizhshia saporizhshja saporizhsha saporizhjya
saporizhjia saporizhjja saporizhja saporizhzjya saporizhzjia saporizhzjja
saporizhzja saporishzhya saporishzhia saporishzhja saporishzha saporishshya
saporishshia saporishshja saporishsha saporishjya saporishjia saporishjja
saporishja saporishzjya saporishzjia saporishzjja saporishzja saporijzhya
saporijzhia saporijzhja saporijzha saporijshya saporijshia saporijshja
saporijsha saporijjya saporijjia saporijjja saporijja saporijzjya saporijzjia
saporijzjja saporijzja saporizjzhya saporizjzhia saporizjzhja saporizjzha
saporizjshya saporizjshia saporizjshja saporizjsha saporizjjya saporizjjia
saporizjjja saporizjja saporizjzjya saporizjzjia saporizjzjja saporizjzja

Also, during coding of the transliteration utility, I came up with an interesting case of itertools usage in Python.

This is how I solve transliteration problem:

#!/usr/bin/env python3

import itertools, string, sys

# https://en.wikipedia.org/wiki/Romanization_of_Russian
translit_RU={'а': ['a'],
'б': ['b'],
'в': ['v'],
'г': ['g'],
'д': ['d'],
'е': ['e','ye'],
'ё': ['jo','yo','ye'],
'ж': ['zh'],
'з': ['z'],
'и': ['i'],
'й': ['j','y'],
'к': ['k'],
'л': ['l'],
'м': ['m'],
'н': ['n'],
'о': ['o'],
'п': ['p'],
'р': ['r'],
'с': ['s'],
'т': ['t'],
'у': ['u'],
'ф': ['f'],
'х': ['x','h','kh','ch'],
'ц': ['c','cz','ts','tc'],
'ч': ['ch'],
'ш': ['sh'],
'щ': ['shch','shh'],
'ъ': ['\'','ie'],
'ы': ['y','ui'],
'ь': ['\'', ''],
'э': ['eh','e'],
'ю': ['yu','ju','iu'],
'я': ['ya','ja','ia']}

l="закоренелый"

tmp=[translit_RU[c] for c in l]

print (tmp)

for q in itertools.product(*tmp):
    print ("".join(q))

And the result:

[['z'], ['a'], ['k'], ['o'], ['r'], ['e', 'ye'], ['n'], ['e', 'ye'], ['l'], ['y', 'ui'], ['j', 'y']]
zakorenelyj
zakorenelyy
zakoreneluij
zakoreneluiy
zakorenyelyj
zakorenyelyy
zakorenyeluij
zakorenyeluiy
zakoryenelyj
zakoryenelyy
zakoryeneluij
zakoryeneluiy
zakoryenyelyj
zakoryenyelyy
zakoryenyeluij
zakoryenyeluiy

Can you do shorter? Maybe yes, but eventually you'll reinvent/rewrite this itertools.product() function.


All the files.

(the post first published at 20251121.)


List of my other blog posts.

Subscribe to my news feed,

Some time ago (before 24-Mar-2025) there was Disqus JS script for comments. I dropped it --- it was so motley, distracting, animated, with too much ads. I never liked it. Also, comments didn"t appeared correctly (Disqus was buggy). Also, my blog is too chamberlike --- not many people write comments here. So I decided to switch to the model I once had at least in 2020 --- send me your comments by email (don"t forget to include URL to this blog post) and I"ll copy&paste it here manually.

Let"s party like it"s ~1993-1996, in this ultimate, radical and uncompromisingly primitive pre-web1.0-style blog and website.