The dataset was generated by extracting the titles and abstracts from the 18,000 most cited science articles according to the Web of Science. The titles and abstracts for a given subject were then fed into jsvine's Markov chain generator.

I also created a word cloud from these titles (which shows it pays to be a cell biologist who studies expression of human protein receptors).

