[Wolfram] Wordclouds for your book library

I once have written, how many benefits you can have if you store a copy of each your book in plain text form.

Today I created a wordcloud for each book using these plain text files. Wolfram Mathematica can be executed from command line:

% wolframscript -file create_wordcloud.m
fnames = FileNames["*.my-txt", "/home/dennis/texts", Infinity];

processReal[fname_, fnamePNG_] := Block[{},
  Print[fname];
  txt = Import[fname, "Text"];
  cl = WordCloud[txt];
  Export[fnamePNG, cl]]

process[fname_] := Block[
  {fnamePNG = fname <> ".wordcloud.png"},
  If[FileExistsQ[fnamePNG], Print["already exists: " <> fnamePNG],
   processReal[fname, fnamePNG]]]

Map[process, fnames]

See also how the WordCloud function can be customized.

What a nice results:

Bruce Schneier -- Applied Cryptography: Protocols, Algorithms, and Source Code in C

Bertrand Russell - A History of Western Philosophy

Silvanus Phillips Thompson -- Calculus Made Easy

William Chester Jordan -- Europe in the High Middle Ages: The Penguin History of Europe

Applied Cryptanalysis - Breaking Ciphers in the Real World - M. Stamp, R. Low (Wiley, 2007)

Charles Bell, Mats Kindahl, Lars Thalmann - MySQL High Availability_ Tools for Building Robust Data Centers

Johnson, Edward D. - The Handbook of Good English, Revised (1991)

Network Security with OpenSSL (2002)

Kevin David Mitnick & William L. Simon - The Art of Deception: Controlling the Human Element of Security (2002)

Discussion

As they say, you can't judge a book by its cover. But you can definitely say something about its contents by its wordcloud.

Drawback: generation is slow. Of course, Mathematica have to process all texts for all books. And of course, other libraries can be used for that.

N.B. Wolfram Mathematica can eat all your memory and crash with:

No more memory available.
Mathematica kernel has shut down.
Try quitting other applications and then retry.
malloc_consolidate(): invalid chunk size
Aborted

So use ulimit, like: ulimit -Sv 12000000 (12GB of RAM). (it will stop anyway, so restart.)


List of my other blog posts.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.