Today I created a wordcloud for each book using these plain text files. Wolfram Mathematica can be executed from command line:
% wolframscript -file create_wordcloud.m
fnames = FileNames["*.my-txt", "/home/dennis/texts", Infinity]; processReal[fname_, fnamePNG_] := Block[{}, Print[fname]; txt = Import[fname, "Text"]; cl = WordCloud[txt]; Export[fnamePNG, cl]] process[fname_] := Block[ {fnamePNG = fname <> ".wordcloud.png"}, If[FileExistsQ[fnamePNG], Print["already exists: " <> fnamePNG], processReal[fname, fnamePNG]]] Map[process, fnames]
See also how the WordCloud function can be customized.
What a nice results:
As they say, you can't judge a book by its cover. But you can definitely say something about its contents by its wordcloud.
Drawback: generation is slow. Of course, Mathematica have to process all texts for all books. And of course, other libraries can be used for that.
N.B. Wolfram Mathematica can eat all your memory and crash with:
No more memory available. Mathematica kernel has shut down. Try quitting other applications and then retry. malloc_consolidate(): invalid chunk size Aborted
So use ulimit, like: ulimit -Sv 12000000 (12GB of RAM). (it will stop anyway, so restart.)
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.