History of the Theological Commons

Background

In early 2008, Princeton Theological Seminary entered into an agreement with Microsoft to digitize print materials in the public domain (defined at that time as published in the U.S. before 1923). Project funding would come from Microsoft, materials would come from the library at Princeton Seminary (named Wright Library in 2021), and digitization would be performed by the Internet Archive in a scanning center located in the library. The goal was to digitize thousands of volumes on theology and religion for inclusion in Microsoft’s Live Search Books service (which was Microsoft’s answer to Google Books). This project aligned perfectly with Princeton Seminary leadership’s vision of extending access to its historical collections, as a means of contributing to the shape of a globally accessible digital library in which theological disciplines would be represented. Other institutions joining in Microsoft’s ambitious program included the British Library, Columbia University, Cornell University, the New York Public Library, the University of California, the University of Toronto, and Yale University.

However, after only a few months, in May 2008, Microsoft abruptly ended the program, leaving Princeton Seminary and the other participants to consider how or whether to proceed without outside financial support. Recognizing the importance of the broader effort and remaining committed to its original vision, Princeton Seminary decided to move its digital efforts forward by retaining its relationship with the Internet Archive, a non-profit organization dedicated to building a free, open, and well preserved digital library. Thus began a fruitful partnership that continues to the present time. The Internet Archive continues to operate a regional scanning center housed in Wright Library. Several other institutions that had joined the Microsoft initiative took a similar approach and have continued to partner with the Internet Archive to digitize large quantities of materials from their public domain holdings.

Through ongoing funding from Princeton Seminary, the Digital Initiatives department in Wright Library routinely submits public domain print materials for digitization through the Internet Archive. No digitization is performed by way of Princeton Seminary staff or equipment; instead, all digitization is performed in the on-site scanning center staffed, equipped, and operated independently by the Internet Archive. Specifically, Princeton Seminary librarians select content for digitization in alignment with the library’s collection development policy, gather the selected physical volumes, and deliver both the volumes and their accompanying digital catalog records to the Internet Archive scanning center. After the non-destructive scanning process, library staff re-shelve the physical volumes. When the Internet Archive completes its processing, which includes rigorous quality assurance procedures, each item enters the vast Internet Archive online library and becomes fully discoverable and searchable through its website. Each volume can be read online using the Internet Archive’s BookReader, which provides a familiar reading experience using full-color images of each page of the volume, from cover to cover.

Through this partnership with the Internet Archive, Princeton Seminary has digitized tens of thousands of books, periodicals, manuscripts, photographs, and postcards, all of which are discoverable and viewable in the Princeton Theological Seminary collection at Internet Archive as well as in the Theological Commons itself.

Building a Commons

Because the Internet Archive has been so successful in carrying out its mission of building and preserving an online digital library, the sheer size of the library is as daunting as it is impressive. For the researcher in theology and related fields, navigating these seas of data to find relevant material can be challenging, because they include every conceivable subject matter, both academic and popular. In late 2010, Dr. Iain Torrance, then the president of Princeton Seminary, asked a subset of library staff members, known informally within the library at the time as “the digital team,” to consider how to maximize discoverability and access to the thousands of volumes Princeton Seminary had digitized, with a focus on the needs of students, scholars, pastors, church leaders, interested laypersons, and other researchers, both locally and worldwide, who would benefit from content particular to their domains of knowledge and practice. Starting from this seed, the digital library team began building an information system, as yet unnamed, with these goals in mind.

A New System for Existing Data

Clearly the first step toward realizing this vision would be content selection. In phase one of this endeavor, we harvested the metadata and full text of every item in the Internet Archive that had originated from the library at Princeton Seminary, and we imported the data into our own database. In phase two, we took a detailed list of Library of Congress subject headings provided by our Collection Development Librarian and performed searches in the Internet Archive system for digital books with those standardized subject terms, irrespective of their library of origin; we then imported those items into our database in the same manner. This procedure soon amassed 50,000 digital texts.

Even this targeted subset, however, is large enough to require tools for finding works of interest to an individual researcher. First and foremost, of course, is the ability to search the digital library for specific words. More than any other factor, this is the principal, revolutionary advantage of the digital representation of text as opposed to its physical embodiment, whether stone tablet, scroll, or printed book. This advantage is greatest if a digital book incorporates not only its metadata — its bibliographic description, as in a library catalog — but also its entire textual content. Knowing this, the Internet Archive not only produces a digital photograph of each page of a given volume but also runs OCR (optical character recognition) software on the page images to produce a digital text transcription of the book’s intellectual content. This step is indispensable, since a digital image of a page is no more searchable by keyword than ink on paper in a manuscript or book. Because the Theological Commons is a repurposing of Internet Archive data, it inherits this advantage; however, by downloading the textual transcriptions into our own database, we can provide searching across only our targeted subset of digital texts.

Another important instrument for finding materials of interest is the use of facets, by which search results can be honed and refined according to predefined categories, such as date of publication, format of the physical item, language, and subject. The subject facet in the Theological Commons is noteworthy in that it represents a balance between complexity and simplicity. The terms for this facet are derived not from the seemingly infinite array of Library of Congress Subject Headings, but rather from the Library of Congress Classification system. In this approach, each digital resource in the Theological Commons is assigned to a broad taxonomy of knowledge containing about forty entries, which is neither overly complex nor overly simple and is therefore maximally useful.

With these core features, with 50,000 digital texts, and with the name “Theological Commons,” this web-accessible resource was publicly released in March 2012.

Enhancements

That milestone provided an opportunity to enhance and refine the system in ways that would further fulfill the original vision. Princeton Seminary purchased new hardware, accompanied by software upgrades; the content expanded to more than 75,000 digital texts; and the digital library team undertook a series of functionality improvements to increase the usability of the Theological Commons in various ways.

If refining search results using facets is good, giving the researcher more flexibility in using those facets would be even better. One important enhancement to the original system allows the user to select multiple values from each facet. For example, rather than having to choose one option — either Theology or Philosophy as the subject, English or German as the language — the user can select all of these, or any other combination of classifications. Such combinations can even include a custom date range for the year of publication. This feature increases exponentially the paths by which a researcher can navigate this digital library, carving out multiple, overlapping slices of the database for searching or browsing.

Another enhancement is the integration of multi-volume sets. In the Internet Archive system, each volume of a multi-volume set is a discrete unit, without linkages to the other volumes comprising the set. For the Theological Commons, Princeton Seminary library staff have exerted considerable effort to reconstitute multi-volume sets, whether monographic or periodical. Each volume in the Theological Commons “knows” the parent set to which it belongs, allowing the user to call up the full set on demand.

The Theological Commons also allows Princeton Seminary librarians to define collections within the broader database, providing yet another way for researchers to divide and recombine the data to suit their research needs. Initially, two specialized collections were defined: the Benson Collection of Hymnals and Hymnology, and the T.F. Torrance Collection of Antiquarian Books. When viewing a given collection, the user can perform searches and utilize facets as usual, but the search results are restricted to works from that collection.

In these ways, the Theological Commons represents a blending of the aims and methods of mass digitization, in the manner of Internet Archive or Google Books, with the informed selection and accurate resource description that are the hallmarks of librarianship. The goal is to walk the tightrope between quantity and curation to achieve a balance that maximizes the reach and usefulness of these digital resources for research and ministry.

Content from Other Institutions

The Theological Commons aspires to live up to its name as a “commons” — a central location or shared resource available to an entire community — in part by incorporating content from outside Princeton Seminary’s own library collections. As noted above, we have imported into the Theological Commons tens of thousands of digital books from the collections of other libraries and archives that have digitized materials through the Internet Archive.

We have deliberately set up multiple places in the Theological Commons where the contributor of the digital resource is acknowledged: the Frequently Asked Questions page; the Browse by Contributor feature, which lists all contributing institutions alphabetically with links to their materials; and the Item Details page for any given item in the system, where the “Contributor” line indicates the originating institution (example).

New Directions

In 2013, the Henry Luce Foundation awarded Princeton Theological Seminary a $1.5 million grant for the expansion of the Theological Commons in two important directions:

Digitizing and providing access to audio and visual materials, to help make digital media resources become a key part of the study of theology and related fields
Digitizing and providing access to theological material published in or pertaining to Africa, Asia, and Latin America, to provide resources of relevance to diverse communities of faith around the globe

Through this generous funding, we were able to digitize, curate and incorporate multiple extensive collections into the Theological Commons:

Princeton Theological Seminary Media Archive — over 6,200 unique, unpublished audio recordings rescued from obsolete and deteriorating reel-to-reel tapes, spanning the second half of the twentieth century; over 2,100 of these recordings are accompanied by searchable transcriptions of the spoken content, produced by human transcriptionists to provide accuracy exceeding 99% despite the frequent use of names and terminology specific to theology and related fields
Latin America Collection — over 6,400 books and periodicals, almost all in Spanish and Portuguese, from 1550 to 2015, including copyrighted material from participating publishers
Moffett Korea Collection — over 6,800 photographs and archival materials pertaining to the history of Korean missions, the Korean churches, and religion in Korea

Later Developments

Collections: As noted above, when the Theological Commons launched in 2012, it initially included two specialized collections. Today, the system incorporates two dozen such featured collections. Some of these are the result of collaborations between Princeton Seminary and other libraries and archives, whereby those organizations’ digitized materials are gathered together and showcased with a dedicated landing page describing the organization and the collection. Examples include the following:

Year Added	Collection
2015	Payne Theological Seminary and A.M.E. Church Archive
2016	Missionary Research Library Pamphlets
2019	Beth Mardutho: The Syriac Institute
2019	The Earl Palmer Collection
2023	Seminario Evangélico de Puerto Rico Collection

User interface: In late 2015 and early 2016, we overhauled the user interface to utilize “responsive web design,” meaning that the interface dynamically adjusts to the screen size and orientation of the device on which it is viewed, from smartphones to tablets to laptops to desktop windows of any size. The new design was released publicly in February 2016.

Images of church architecture: The James R. Tanis Collection of Church Postcards was added in October 2020 as the first image-only collection in the Theological Commons. This collection of 20,000 postcards depicting church architecture across the United States was digitized by the Internet Archive, after which library staff recorded the name, denomination, city, and state of all 20,000 churches, making the collection searchable.

Breaking down silos: Another means of incorporating relevant content has been to take projects that were originally created as individual “silos” of content — separate databases, each with its own website — and bring them into the Theological Commons, where the content can be discovered and utilized more effectively. Two such former silos are the Princeton Theological Seminary Journals, a project that began before the Theological Commons existed, and the Princeton Lectures on Youth, Church, and Culture, originally created in early 2013 when the Theological Commons was still nascent.

Digital audio: Building on the success of the digitization of audio and video from obsolete tape formats to create the Princeton Theological Seminary Media Archive (see above), library staff subsequently collaborated with the Media Services division of Information Technology Services at Princeton Seminary to incorporate over 2,200 audio recordings that were originally recorded in digital form, or “born digital.” These unique, unpublished recordings extend the Princeton Theological Seminary Media Archive to the near present. New recordings are added periodically.

Integration and dissemination: The Theological Commons integrates with other information systems, both internal and external, in multiple ways, making it easier for researchers to find relevant online materials. In Wright Library’s online catalog, each book or periodical that has been digitized includes a link to the digital version in the Theological Commons. In addition, every item in the Theological Commons, inlcuding audiovisual items as well as print materials, can be found using the search box prominently displayed on the Wright Library website. We also share our metadata with outside organizations, so that users of those systems can find and utilize the wealth of resources in the Theological Commons. We have contributed metadata records that identify and describe the items in the Theological Commons with Atla, the Digital Public Library of America, and HathiTrust. In fact, any digital repository can download and repurpose these metadata records using our OAI-PMH service, which conforms to a standardized protocol commonly used by many content management systems in the cultural heritage sector.

Continual expansion: Each year on January 1, copyright expires for books that were published in the United States 95 years prior, meaning that those books fall into the public domain. Accordingly, in the first few months of each calendar year, we utilize our partnership with Internet Archive to digitize books from Wright Library’s collections that fit these criteria. Thereafter, one of the core activities of the Digital Initiatives department in Wright Library is to digitize additional books and periodicals from our holdings that are in the public domain but not yet digitized by Internet Archive from another library. This work is ongoing, and we add new materials to the Theological Commons continually.