SEO farms usually have random-looking domain names. (Randomly generated indeed.)
This is how they can be identified -- domain name should have high entropy. As opposing to 'legal' domain names written in some natural language, which usually have lower entropy.
Entropy level of randomly generated string out of a-z characters:
% cat /dev/urandom | tr -dc 'a-z' | fold -w 40 | head -n 1 rzatstkzmnbpsnqiwhruzkbvegljbrcnenxgineq % cat /dev/urandom | tr -dc 'a-z' | fold -w 40 | head -n 1 | ent Entropy = 4.192322 bits per byte.
And some weird words in English (or Latin):
% echo pneumonoultramicroscopicsilicovolcanoconiosis | ent Entropy = 3.560437 bits per byte. % echo supercalifragilisticexpialidocious | ent Entropy = 3.738682 bits per byte.
And place names in unknown (to me) languages, but written in Latin letters:
% echo Taumatawhakatangihangakoauauotamateaturipukakapikimaungahoronukupokaiwhenuakitanatahu | ent Entropy = 3.696873 bits per byte. % echo Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch | ent Entropy = 4.022153 bits per byte.
~4 bits is a high level, but still slightly lower than of a random string (~4.2 bits).
Any natural language, no matter how exotic, expose some regularities, which are (almost) absent in random strings. These regularities lower the final entropy level.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.