Clio Wired

An Introduction to History & New Media

Week 5

Text Mining and Topic Modeling

  1. Ted Underwood , “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago,” Representations 127, 1 (Summer 2014): 64-72
  2. Ted Underwood, “Where to start with text mining” (2012)
  3. Frederick Gibbs and Daniel Cohen, “A Conversation with Data: Prospecting Victorian Words and Ideas,” Victorian Studies 54, 1 (Autumn 2011): 69-77
  4. Cameron Blevins, “Space, Nation, and the Triumph of Region: A View of the World from Houston,” Journal of American History (2014) 101 (1): 122-147
    1. Cameron Blevins, “Mining and Mapping the Production of Space” (2014)
  5. Robert Nelson, “Mining the Dispatch: Introduction”
  6. Miki Kaufman, “”Everything on Paper Will Be Used Against Me”: Quantifying Kissinger” (2014) (Also in blog format, with video)
Discussion Leader: Stephen Rusiecki


Compare results from the four ngram viewers below — and then compare those results with what does Voyant tells you about a text

  • Google Ngram Viewer
  • Bookworm
    • NB the Chronicling America Bookworm only works for single words not phrases
  • NYT Chronicle
    Choose a word or phrase to compare over time — open a browser window for each of the ngram viewers – enter your word or phrase in each — compare the results – what differences did you find and what might have caused them
  • Voyant Tools
  • Embedding your results in your blog post:
    • Embedding Google Ngram Viewer (click on the Embed Chart button – copy the html code – in your blog post, select the ‘Text’ tab and paste that code into the window)
    • Embedding Voyant (click on the export/disk icon for the Voyant window you want to embed, copy the code, and paste as above – see also documentation)
    • Bookworm – you can’t embed – you can link (click on the large link icon top left and copy and paste) and download static images of the chart as pdf, jpeg, png (the arrow button on the right, under the link etc icon)

1 comment for “Week 5

  1. September 28, 2014 at 1:21 pm

    1. Describe text mining in the context of the readings. What are its possibilities for historians? What are its pitfalls?

    2. Frederick W. Gibbs and Daniel J. Cohen believe that text mining is more relevant to open-ended questions, in which “the results of queries should be seen as signposts toward further exploration rather than conclusive evidence” (Gibbs and Cohen, 74). Explain what the authors mean by this statement.

    3. Ted Underwood contends that historians must overcome two obstacles before engaging in text mining: (1) getting the data you need, and (2) getting the digital skills you need. What digital skills does Underwood feel that historians should develop?

    4. According to Cameron Blevins, literary scholar Franco Moretti developed the digital method of “distant reading.” Describe the concept of distant reading. How is distant reading different from text mining? How is distant reading useful for historians?

    5. Cameron Blevins argues that the promise of digital history is “to radically expand our ability to access and draw meaning from the historical record” (Blevins, 146). Do you agree? What other possibilities might Blevins be overlooking?

    6. What is “topic modeling?” How does it relate to text mining and distant reading? How is it useful for historians?

    7. According to Ted Underwood, an Internet search is a form of data mining. But it is only useful if you already know what you are expecting to find. Do you agree? What is Underwood’s remedy for seeking the unknown and the unexpected from the digital record?

Comments are closed.