Romantic Circles Features | Digitizing Romanticism Home
Beyond the Paper Chase: Building a Comprehensive Online Romantics BibliographyA Progress Report
Kyle Grimes, University of Alabama in Birmingham
Prepared for "Digitizing Romanticism," Session chaired by Neil Fraistat, University of Maryland
Part 2: Technical IssuesAs soon as I began to work on the bibliography project it became clear that some decisions made in the initial proposal and design phase would have very far-reaching consequences, and perhaps the most significant decision of all involves the fundamental database architecture. I considered a number of possibilities. One would have been a hierarchical subject tree—rather like the Yahoo! website or the Voice of the Shuttle—with more general categories as containers for more specific categories and so on. This was an intriguing approach (and one can imagine how wonderfully easy and enlightening it might be to browse through a Romanticism Yahoo), but ultimately it would require each bibliographical item to be placed in its own HTML page and then linked and cross-linked to all sorts of subject headings. The complexity of the linking is rather mind boggling, though the fundamental idea—with its ease of design and virtually assured compatibility with any web browser—is still worth considering. Another possibility was to write the bibliography as one gigantic file, probably encoded either with some customized version of the emergent XML markup language or as an HTML table. This approach would allow for easy and precisely selective keyword searches, and it would offer a more efficient and systematic way of cataloging the basic bibliographical information than the subject hierarchy. Unfortunately, the “one big file” approach also has some serious drawbacks: once the bibliography grows to more than just a few hundred items, the searches can become aggravatingly slow; most of the table will be simply empty space since the bibliographical data required for, say, a dissertation is radically different from that required for an article in an edited collection; it is tricky to handle works with multiple entries in the same category such as works with multiple authors or editors and works that are keyed to multiple subject headings; it is tricky to handle the problem of duplicate entries; and so forth.
The third option—and the one I’m most keen on at the moment—is to custom design a relational database that will maintain bibliographical information as an active relational connection established between several supporting tables. For those unfamiliar with database structures, let me offer a quick overview. First, imagine the “one big file” approach, set up in this case as an HTML table. Here is a simplified example, designed to represent an ordinary journal article:
Table 1: The Single-Table Approach
This seems simple enough—functional, clear, easy to search, and so forth. But problems begin to emerge very quickly when one actually starts to put together a database using the single-table approach. For one thing, the table here is clearly not designed to record the relevant information about, say, a dissertation, monograph, or website, and this problem leaves the designer with two possible solutions: 1) create separate tables for each critical genre, an approach that would necessitate separate searches for each, or 2) extend the number of columns in the table to include all bibliographical information for items from any conceivable critical genre, recognizing that for any individual item most of the columns would be empty. (This second approach was one of the blind alleys I stumbled into in my first efforts—I had a table with over forty columns, mostly empty.) Another, even greater, problem with the “one big file” approach involves works with multiple authors or editors. Suppose some item has two authors and is published in a collection with three editors. To record such an item either the table would need to be further extended to encompass columns for “second author,” “third author,” “second editor,” “third editor,” etc., or the same item would have to be listed n number of times, once for each separate author or editor. Either way, one ends up with a horrendously inefficient database with much blank and/or much repeated information. Once the database grows to include more than a few hundred items, the result is an unsearchable mess.
A relational database offers a potential solution to the design and efficiency problems of the “one big file” approach. In a relational database, key information is divided into several different but more precisely defined tables. For example, suppose the journal article information in the example table above were recorded like this:
Table 2: Authors and Editors
Table 3: Journals
Table 4: Romantic Circles Items
In this relational database model, the different sorts of information that one might want to record about any given item are distributed into several different tables, and each item in these more specific tables is identified with an automatically assigned ID number (the first column of each table). The full bibliographical entry for any one item does not exist as a stable and complete row in any single table; rather, it exists as an active relationship between these different tables. For instance, Neil Fraistat’s article called “Illegitimate Shelley” that appeared in PMLA (May 1994, vol. 109, num. 3, pages 409-23) would now be represented in separate table that, in essence, establishes a relationship between tables. A simple example:
Table 5: Journal Publication Information
Note that this table draws its contents not directly from the bibliographer's typed input but rather from its relationships with other simple tables. (In the sample table I’ve provided the relevant ID numbers for each of these bits of information; in the actual table, entry, or search form, one would see the full names instead, as in the parenthetical inserts.) In effect, the bibliographer’s task is to match up the appropriate ID numbers for the author, the work, the journal, and so forth. Instead of entering “Fraistat, Neil” in the table, the bibliographer would select “Fraistat, Neil” from a drop-down list built with the contents of the Authors and Editors table. Likewise, instead of entering “PMLA” as the journal, the bibliographer would select “PMLA” from a drop-down list consisting of the Journals table.
The advantages of such a relational approach are manifold. For one, note that authors only need to be entered into the database once: the Authors and Editors table is, in essence, a simple listing (without duplications) of all the authors/editors in the database, whether they have written one short review or a scholarly corpus consisting of dozens of books, articles, edited collections, and scholarly editions. And this, in turn, facilitates author searches of the database as a whole—in theory, one need only find the author in the Authors and Editors table and then follow links to all the related tables. Further, if one wanted to catalog a work with two authors and three editors, it would be a simple matter to build an Authors/Works table that would look something like this:
Table 6: Authors, Editors, Works
Note that no space is left blank, and searches based on either authors or titles are now made quite simple and efficient—the computer no longer needs to process some forty columns of irrelevant information just to locate the right author. (A search on the author’s name will return all works associated with that author; a search on the title will return all authors or editors associated with that title.) The advantages of such a relational system become even more pronounced when one considers the use of a standard set of search keywords. For instance, suppose one wants to find all cataloged critical works having something to do with Felicia Hemans. It is an easy matter to build a table called “Topic Persons” listing all people who are subjects of the critical works cataloged in the Bibliography. Then one builds another table which, like table 6 above, simply associates Topic Persons with particular items from the bibliography. Searches become simple and efficient, regardless of the size of the bibliography, and, even better, one can now enforce a standardized way of representing people as topics—if “Shelley, Percy Bysshe” is listed in the Topic Persons table, the bibliography will not accept entries as “P. B. Shelley” or “Percy Shelley” or “Shelley, P. B.” or whatever. Such standardization obviously avoids any number of problems encountered in trying to do a simple topic search. Other advantages of the relational database system will become apparent in the “Conceptual Issues” section below.
But the speed and efficiency of a relational database do not come without costs. Most troubling among these costs: software compatibility issues. Unlike a simple ASCII-based HTML or XML approach, relational database engines rely on some kind of proprietary software—the most common on the web are packages like Oracle (which runs Amazon.com), SQL Server, Sybase, etc. My “beta version”—if I can glorify it with such a name—of the Romantic Circles Bibliography is built as an application within Microsoft Access 2000. The problem with these proprietary software packages is, of course, that some gateway needs tobe put in place to connect the database engine with the web so that one can—from within any web browser rather than from within the software package itself—manipulate the data. This can be rather tricky. Some database programs work on only certain kinds of servers; some demand that the user download a run-time version of the database software which may (or may not) be freely distributed; and so on. Having brought the Romantic Circles Bibliography to its current incarnation, I must say that I simply do not know yet how it will finally appear on the web. It’s possible that the HTML pages function within the Access software will suffice to make the database accessible via the web. (My early experiments with this have been promising, but I haven’t even begun to address the myriad bugs.) It’s also possible that some kind of conversion software will need to be developed so that queries to the database will be returned as HTML or, more likely, XML output to the browser. These are unsettling issues, but I am confident that some fairly simple solution will soon emerge.
Return to the Digitizing Romanticism Homepage
Go to Fraistat, Digitizing Romanticism: Introduction
Go to Kelley and Sha, The Sister Arts Go Digital: The Romantic Circles Art Gallery
Go to Crochunis and Eberle-Sintra, Editing Electronically Women Playwrights of the Romantic Period