Digital Archives Are a Gift of Wisdom to Be Used Wisely

by Roy Rosenzweig

JUne 2005

Archives, Overviews

Originally published in the The Chronicle of Higher Education, June 24,2005 Volume 51, Issue 42, Page B20

“What’s the big deal?” was the grumpy question of a fellow participant in a workshop at the Library of Congress in the summer of 1996. The library was showing off its still very new digital archive, which it had dubbed American Memory. The workshop aimed to show how the Web-based repository of photographs, documents, newspapers, films, maps, and sounds could transform teaching. My colleague, who taught at a major research university, was unpersuaded. “I’d rather send students to the library,” he announced.

But to me, it was a big deal — a very big deal — and the answer to a problem I had been grappling with for more than 15 years. When I started teaching as a graduate student in the mid-1970s, I quickly learned that the best way to excite students about my field, history, was to involve them directly with the “stuff” of the past — the primary sources — and to show them, by asking them to do it, what it means to think like a historian. As a graduate-student instructor, that was pretty easy. After all, I was at another of those big research institutions (Harvard University) with one of the nation’s greatest libraries. I could “send students to the library,” and in a short walk from their dorms, they could find more primary sources than they could exhaust in a lifetime.


When I arrived at George Mason University in the fall of 1981 as an assistant professor, things suddenly became much harder. We had a very modest library in those days. And more problematic from the perspective of a 19th- and 20-century American historian, it was a very new library, with relatively few old books, journals, and magazines. I could “send students to the library,” but they would not find the rich bodies of primary sources that Harvard had in abundance. A simple assignment asking them to compare advertisements in two popular magazines of the 1920s was out of the question, especially in an evening section of my survey course, filled with students who could not journey to more-distant libraries because of full-time jobs and family responsibilities.


I now know that my experience was not unique but was shared by scholars in many different fields, at many different institutions. Since then, however, much has changed in the world of Web-based teaching: We have an array of new opportunities, but we also have new limitations that we haven’t yet confronted.


I spent a lot of time in the 1980s devising less-than-satisfactory strategies to work around the constraints — photocopying piles of documents myself and putting them on reserve, for example. But in the latter part of the decade, I began to glimpse a solution. I read in computer magazines about this new thing called the CD-ROM, which could hold thousands of pages of text as well as photographs, sound files, and (later) moving pictures. In the early 1990s, I joined with my friends Stephen Brier and Joshua Brown at the American Social History Project, based at the Graduate Center of the City University of New York, to produce, with the help of the Voyager Company, such a disk. When Who Built America? appeared in 1993, we promoted it with an enthusiasm that now seems quaint. We would hold up the silvery, thin disk and exclaim (often to incredulous audiences) that it contained: Five thousand pages of text! Seven hundred images! Four hours of oral history, music, and speeches! Forty-five minutes of film!


Actually, our enthusiasm was already becoming dated in 1993. That year brought a much more momentous development for the future of technology and teaching than the publication of our CD-ROM — the appearance of Mosaic, the first easy-to-use graphic Web browser that ran on most standard computers. Between mid-1993 and mid-1995, the number of Web servers — the computers that house Web sites — jumped from 130 to 22,000.


Progress in the last 10 years has been nothing short of astonishing. The Library of Congress’s American Memory project now presents more than nine million historical documents. The New York Public Library’s Digital Gallery contains more than 300,000 images digitized from its extraordinary collections. PictureAustralia presents 770,000 images from 28 cultural agencies in that country; the International Dunhuang Project, a cross-national collaboration, serves up 100,000 digitized images of artifacts, manuscripts, and paintings from the trade routes of the Silk Road. Most dramatically, the search-engine behemoth Google has announced plans to digitize at least 15 million books. Hundreds of millions of federal, foundation, and corporate dollars have already gone into digitizing a startlingly large proportion of our cultural heritage, and more is to come.


That is about as dramatic a development in access to cultural resources in a single decade as any of us are likely to see in our lifetimes, and it has opened up enormously exciting possibilities for teachers not just of American history and culture but in numerous disciplines that have experienced similar transformations. To be sure, not everything will become digital (nor should it), but where we instructors once struggled with the scarcity of documents for our students to use, we now participate in what John F. McClymer, a historian at Assumption College, calls a “pedagogy of abundance.” The developments in history are broadly illustrative of both the possibilities and the problems of that pedagogy.


Has the new abundance of electronic resources solved all our difficulties as teachers? Can we now just “send students to the Web?” Most scholars and teachers would answer “no,” immediately starting to talk about the vast quantities of junk out there on the Web. I disagree. The quality of Web-based historical resources is surprisingly good and getting better. My concern is not that students will find junk online, but rather that they will fail to gain full access to the Web’s riches or won’t know what do with those riches when they find them.


Complaints about the low quality of the Web’s resources were loudest in its early days. Just look to the pages of The Chronicle. In a November 1996 essay, a well-known historian proclaimed herself “disturbed by some aspects of … the new technology’s impact on learning and scholarship.” “Like postmodernism,” Gertrude Himmelfarb complained, “the Internet does not distinguish between the true and the false, the important and the trivial, the enduring and the ephemeral.” Internet search engines, she said, “will produce a comic strip or advertising slogan as readily as a quotation from the Bible or Shakespeare.” Himmelfarb was right to sense danger out on the Web — it offers a much less controlled environment than libraries, whose collections have been shaped by generations of professionals — and her worries have been regularly echoed by other scholars.


Yet like a living organism, the Web has developed two remarkable, if imperfect, sets of mechanisms for healing its defects.


The first are the automated approaches that have made the founders of Google billionaires. Himmelfarb was not the only person to notice the inadequacy of search engines in 1996. That year, two Stanford computer-science students, Lawrence Page and Sergey Brin, began building BackRub, a new search engine named for its then-unique capacity to analyze the “back links” to Web sites. Within two years, BackRub became Google, and its use of link analysis (and some other magic) to roughly rank the reputation of sites transformed Web searching.


Google’s ranking system has its limitations. The Hitler Historical Museum’s site, which takes an “unbiased” (i.e., uncritical) view of the German leader, shows up in the first 10 results for a search on “Adolf Hitler.” But the rankings do go some distance toward separating the wheat from the chaff. You can find the Holocaust deniers at the Institute for Historical Review on the Web, but not in the first 100 hits on Google (or Yahoo) if you search on “holocaust”; that may be because few reputable sites link to the so-called institute.


Perhaps less well recognized is that the same algorithmic procedures behind Google, combined with the direct access that the company (as well as Yahoo) offers to its data, open up more-advanced possibilities for sorting out good and bad information mathematically. For example, Dan Cohen, my colleague at the Center for History and New Media at George Mason, has developed H-Bot, the Automated Historical Fact Finder, which can answer historical questions like “When did Charles Darwin publish The Origin of Species?” with a surprising degree of accuracy, simply by querying Google and analyzing the results statistically.


But even the most refined statistical and mathematical tools are unlikely to be able to make the kind of qualitative judgments historians often need to make. A second set of more social mechanisms — nascent forms of peer review — help keep students away from the bogus documents and poor-quality archives they will inevitably encounter online. Just as the Web has spawned plenty of problematic history Web sites, it has also provided a platform for dozens of Web resources with the goal of steering people away from those sites. For example, Thomas Daccord, a high-school teacher at Noble & Greenough School, in Dedham, Mass., has created Best of History Web Sites. History Matters: The U.S. Survey Course on the Web (developed by the social-history project at CUNY and the new-media center at George Mason) annotates the 850 best Web sites in American history; a sibling, World History Matters, at George Mason, has begun to do the same in that field.


Even more interesting is a kind of spontaneous review process generated by the mass of people on the Web. About four years ago, I stumbled across an interesting online historical “document” — an 1829 letter to President Andrew Jackson from Martin Van Buren, then governor of New York, warning of the threat that a new technology, the railroad, posed to the old technology of canals, and urging the federal government to intervene to “preserve the canals.” Van Buren’s worries sound suspicious to most American historians. After all, Van Buren opposed federal intervention in the economy. Yet, at least when I checked in early 2001, the document was presented credulously all over the Web. Libertarians at Citizens for a Sound Economy reproduced it to show how stupid politicians often pigheadedly refuse to allow “the market to work unimpeded by regulatory constraints.” The former president of the Federal Reserve Bank of Dallas (and now chancellor of the Texas A&M University System) used it in a speech that is posted online to chastise the “window breakers in Seattle” opposed to free trade.


But try entering “van buren canals andrew jackson railroads” in Google today. Your first hit is the “urban legends” page, which provides a detailed discussion of why the document is a fraud. Even the libertarians have gotten the message. Two readers of the sound-economy site have used the article’s comment feature to warn that the document is a fake. The same collaborative mechanisms of review — applied more systematically — have made the collectively produced and open-source encyclopedia Wikipedia a surprisingly credible resource for historical facts.


If the Web has become a less dangerous place for students to venture, however, it has also become a considerably more expensive arena, and that poses a much more serious problem for those who want to teach with primary sources. It is hard to remember that, but a decade ago, the Web was largely a noncommercial world. It was only in 1995 that dot-com domains came to dominate over dot-edu addresses. Commercialization has had its impact on what we call the History Web, the online repository of digital primary and secondary sources. In fact, some of the most interesting and exciting of those sources are commercial products, often very costly ones, from giant information conglomerates.


For example, the Thomson Corporation offers Eighteenth Century Collections Online, which includes “every significant English-language and foreign-language title printed in Great Britain during the 18th century” — 33 million text-searchable pages and nearly 150,000 titles. “We own the 18th century,” a Thomson official boasts. Those who want their own share must pay handsomely. A university with 18,000 students can spend more than half a million dollars to acquire the full collection, depending on discounts it receives and other pricing factors. Another extraordinary digital collection, ProQuest Historical Newspapers, contains the full runs of a number of major newspapers. One of my colleagues uses it for weekly primary-source assignments that I could only have dreamed about back in 1981. But a typical university will have to shell out the equivalent of an assistant professor’s salary each year to pay for those digital newspapers.


It seems churlish to complain about extraordinary resources that greatly enrich the possibilities for online research and teaching. Surely Thomson, ProQuest, and other businesses are entitled to recoup their multimillion-dollar investments in digitizing the past. But it still needs to be observed that not every college can pay the entry fee to this new digital world. Some may have to decide whether it is more important to have extraordinary digital resources or people to teach about them.


Thus we are in danger of reproducing the information divide of yesterday — where the richest universities with the biggest physical libraries could offer students far better access to materials than other institutions. Of course there are powerful counters to commercialization, especially the support that public agencies and private foundations have provided for digitization and “open content,” as well as the eclectic and energetic efforts of enthusiasts and scholars who continue to post primary sources out of a passion for their fields.


But even when students have equal access to online resources, they do not necessarily have equal ability to make effective use of the new, global resource. For many students, the abundance of primary sources can be more puzzling and disorienting than liberating and enlightening. Sam Wineburg, a cognitive psychologist who teaches at Stanford’s School of Education, has spent 20 years observing classrooms and talking with both teachers and students about how students read (and misread) historical sources. As his research shows, instructors commonly overstate their ability to analyze primary sources, failing to recognize the challenges that thwart understanding.


In my field, what do students make of the tens of thousands of photographs from the Farm Security Administration put online by the Library of Congress? Most often they see such powerful sources as transparent reflections of a historical “reality”; not, as a historian would, as imperfect refractions — ideological statements by reform-minded photographers who wanted to expose the poverty brought on by the Great Depression and advance the programs of the New Deal. In the resonant phrase of Randy Bass, a professor of English at Georgetown University and director of the university’s Center for New Designs in Learning and Scholarship, the Web has for the first time put “the novice in the archive,” giving access to people who were previously barred by the time and expense of getting to archives, or by the entrance requirements imposed by such collections. But still novices lack the skills for critically evaluating primary sources.


Thus far we have done much better at democratizing access to resources than at providing the kind of instruction that would give meaning to those resources. Hundreds of millions of dollars have gone into digitizing historical resources; the money devoted to using the Web to teach students the kinds of historical procedures that trained historians make part of their routine can be measured in the hundreds of thousands of dollars.


Still, there are some promising beginnings. Picturing Modern America 1880-1920, from the Center for Children and Technology, based in New York, offers some thoughtful “historical thinking exercises” for students that, for example, take them step by step through “reading” a photograph — first posing a question, then looking closely and gathering clues, and finally drawing conclusions. Our own History Matters and World History Matters provide guides to “making sense of evidence,” as well as illustrations of “scholars in action,” in which we show historians analyzing, for instance, a blues song, a Colonial newspaper, or a Thomas Nast cartoon.


In a new project that we have begun in collaboration with Wineburg and his colleagues at Stanford, with the support of the William and Flora Hewlett Foundation, we are building on those approaches on a site that we are calling Historical Thinking Matters. The site, which we will launch in 2006, will, for example, use video clips to model historical thinking; it will use pop-ups and other programming to scaffold primary sources in a way that encourages students to check sources, corroborate evidence, and contextualize it.


For the moment, the danger for students venturing onto the Web is not that they will find either bogus letters or comic strips, but that they won’t know how to “read” the vast number of valuable primary sources that they find. It remains to be seen whether we can create useful online aids that not only make information available, but assist users in learning to discriminate and analyze that information.


The larger lesson here is one that we should have learned over and over again in confronting new technology. The most difficult issues are economic, social, and cultural, not technological. The Web has given us a great gift — an unparalleled global digital library and archive that is growing bigger every day. Our task now is to make sure that it remains accessible to all, and to turn the novices we have admitted to it into experts who can use it with intelligence and thoughtfulness. If we can succeed not just in democratizing access to materials like online historical evidence but also in helping students make sense of that evidence, that will be a very big deal.

Author Bio

Roy Rosenzweig is a professor of history and new media at George Mason University and director of the university’s Center for History and New Media ( He is co-author, with Daniel J. Cohen, of Digital History: A Guide to Preserving, Presenting, and Gathering the Past on the Web, scheduled to be published in the fall by the University of Pennsylvania Press.




The following are listed in the order in which they are discussed in the accompanying article:


American Memory


Web-based repository of photographs, documents, newspapers, films, and maps from the Library of Congress.



770,000 images from 28 cultural agencies in Australia.

International Dunhuang Project


A cross-national collaboration with 100,000 images of artifacts, manuscripts, and paintings from the trade routes of the Silk Road.

H-Bot, the Automated Historical Fact Finder


Automated answers to historical questions.

Best of History Web Sites


A guide created by Thomas Daccord, a teacher at Noble & Greenough School, in Dedham, Mass., to more than 1,000 sites.

History Matters: The U.S. Survey Course on the Web


Annotates 850 Web sites in American history.

World History Matters


A sibling to the History Matters site.

Eighteenth Century Collections Online


Includes “every significant English-language and foreign-language title printed in Great Britain during the 18th century,” according to the publisher. Charges substantial fees to libraries.

ProQuest Historical Newspapers


Full runs of a number of major newspapers. Substantial fees to libraries.

Picturing Modern America 1880-1920




Thoughtful “historical thinking exercises” for students from the Center for Children and Technology, based in New York.


the center today.
Each year, the Roy Rosenzweig Center for History and New Media’s websites receive over 2 million visitors, and more than a million people rely on its digital tools to teach, learn, and conduct research. Donations from supporters help us sustain those resources.