Saturday’s Digitorium

Dr. Bohannon was speaking my language with her short talk on digital natives, digitial literacy, and a sort of alternative and/or flipped approach to teaching first year composition. It was the most informative session for me and I want to follow her work on digital literacy which is and will be a huge field for research in teaching and learning in secondary education. My question to her, and I hope I get to ask her, how much does her department of DH and the KSU department of curriculum and instruction talk to each other? Surely they are trying to learn about her work in digital literacy and maybe she could benefit from learning what they know about pedagogical knowledge.

Later: I just received an answer from Dr. Bohannon. She took classes in the college of edu to learn how to write programmatic objectives that are measurable. She wants to work on fostering interrelationships between the humanities departments and the education departments and believes DH its the platform on which to build it. She said to her knowledge there are no projects by colleges of education and humanities departments for pedagogy. As noted by articles on DH, the university departments are too fortified and need more interrelationship.

Though I had notes from the plenary by Dr. Miller, now they pale in comparison to Dr. Adams’ and Dr. Bohannon’s talk. What is most telling about DH is this: “What will measure success in the future is ’12 people have adopted my code base for their project,’ not in what journal have you published.”

Dr. Adams wins the best pun/ quotes: “There is an ungodly amount of Biblical texts…” “I don’t think anyone of us has divinity telling us what passage to read to give us meaning as did St. Augustine, but we do have computational tools to help us.” His project is fascinating. The larger implication is that the sort of metadata his project provides can help students learn to read– in the broad definition of the word “read”, which is interpreting meaning from close and distant reading.

Databases and Inquiry

The first thing I noticed about the two documents (one and two) that introduce and explicate the ongoing trend in the humanities to move from an exclusively “document-centric ideology” to include a “data-centric viewpoint” is the manner in which the information is presented. What was most likely intended as a quirky, fun mode of speech that was meant to relate to the target audience might also be construed as hubris-driven egoism. With a background in analyzing and deconstructing TEXTUAL OBJECTS (rather than, say, ahem, datasets), this piece of rhetoric is not without its own dilemmas.

While the documents tout a query-driven state of affairs within the humanities, they fail to present any viable questions, for example. The cheeky tone is on full display in its purposeful acknowledgment of copyright infringement, use of Simpsons cartoons, and the syntax and diction implemented. While admittedly, these concerns might be seen as non-entities in a didactic, elucidating document, it triggers my own radar in terms of questioning the rhetorical position of the text.

I also found instances wherein the textual choices were ripe for parody. The fact that the document which explains the move in the humanities from document-centric to include data-centric viewpoints was entirely textually oriented, for example, helps to show that this trend hasn’t exactly been put on display—again, while this is probably an unsound basis for argument within a document that is meant merely to highlight a trend and explain it to dilettantes like myself, it still seems worth noting.

No doubt, datasets and meta-data are extremely useful tools moving forward within the field of humanities. What one should refrain from nixing entirely from the conversation, though, are the basic tenets that have carried the field into its current state—namely, humanist concerns and rhetoric.

Looking at scale in humanities research

I think the idea of the humanities moving from a document-centric ideology to a data-centric ideology is an interesting one. Perhaps what the author of the text is getting at is that examining ideas in microcosm could be an equally reductive form of investigation as  just looking at a large data set. Traditional anthropological field work, for instance, that takes into account first hand interviews and case studies is a document-centric form of research can only go so far in understanding the practices of a particular culture. When data collection and statistical analysis come into play in a field that has previously neglected quantitative ways of knowing, the scale and scope of the investigation broadens too. I got kind of excited about the assertion the author makes that “if print culture foregrounds answers and pushes questions into the background, then perhaps data culture may do the opposite: it privileges queries and treats answers as if they are ephemeral”. At first I was resistant to this idea but I think there is some truth to it. I am all for privileging queries- I think ways in which we acquire knowledge should be messy and creative- if that makes any sense. There is a lot of pressure on the part of someone conducting research, in creating something like an ethnography (to use the example of Anthropology again), to draw immediate conclusions or create conclusive findings based on a small set of data. The same thing could be said of the minutiae of literary analysis: looking at the work of one author or one poem in its historical context. This seems to be how systems of knowledge and education in the humanities are set up- a kind of defensive strategy based on specificity and rhetorical prowess. In response to some of the questions put forward at the end of Emily’s post, I think that placing data culture at the forefront of research can broaden the scope of a researcher’s investigation and therefore allow new pathways of knowledge to open up. This is what might allow fields of study in the humanities to receive more time and attention (which I think they deserve).

A Storm of Carbon and Attributes: Similarities Between Docs and Databases

Databases are extremely exciting for my future work in Digital Humanities. This is the infrastructure for the type of inquiry that I would like to do—overlapping discursive lexicons. It offers opportunities to track the minutia that sometimes gets overlooked in an interdisciplinary project. The major difficulty, however, in this stage of inquiry is finesse. I know that for my projects the appearance of an entity (word) is not determined by the text. The rhetorical play in scientific non-fiction during the sixteenth and seventeenth centuries is much more like literary play than a contemporary reader might expect. This at least seems to present a problem in classifying words in regards to genre. A sphere is a metaphor in a poem and a mathematical construct in a scientific monograph, for example. The overlapping rhetorical strategies and cultural assumptions as well as the overlapping lexicon may mean that the questions that “databased” (I had to) inquiry foregrounds “how do I get from here to there,” or “how do I find “x” and “y” data accounting for complexities “a” and “b”?”

The slide shows that we read for today pointed to a difference between traditional document-centric inquiry and databased inquiry. One of the authors claimed that one foregrounds questions and the other pushes them into the background. I do wonder what happens when we explore the similarities between documents and databases. It seems that all words (as entities) have several attributes—some contextual and some grammatical. The pronoun “she” has the grammatical attribute “pronoun” as well as a contextual attribute of referring to an agent, that is itself a collection of data entities each with a string of attributes. It is difficult to remember at times that Lady Macbeth and Hamlet are mere words on a page—entities with attributes.

Hamlet is a man that wears black, not green.

It is tempting to view “Hamlet” as an entity—but the concept of “Hamlet” is more of a table that includes “black,” but excludes “green,” which in turn excludes “pronoun” Character, then seems to be a series of inclusions and exclusions, one layered on another. The data that makes up “Hamlet” is defined by some sort of linguistic data made of “is” and “is not” (I am going to stop this train of thought because I am colliding with the Derrida’s trace).

Defining both textual information and data in this way suggests to me that document data and database data do behave in similar ways. One of the chief differences is actually in labor. Narrative is a dominant mode perhaps because of the labor that it requires to extract the data, and labor requires time. Extracting data from a database (depending on the complexity and the power of the machine) is the labor of waiting while the machine crunches the data (I am aware that I am, for the moment, excluding database construction (authorship?) and query formation). The experience of reading and waiting are different—especially if you are productive while waiting, perhaps using that time to do some light reading—and our experience of time is altered as a result. I am unable here to address the similarities between database design and authorship, but I think there are connections there as well. My conclusions after thinking and writing about databases are fuzzy at best, but it still feels enlightening and exciting. My understanding of the nature of databases reaffirms my approach to texts, but perhaps it’s partly how I am sorting the data.

Whatever we understand, we understand according to our own nature.

Databases: Doc or Not?

After reading the two slide shows, I think I’m more confused about databases than when I began. I had always thought of a database the way it’s described in the second set of slides: sort of like spreadsheets where one doesn’t see the whole sheet at once and only pulls out the records needed for a particular task (the difference between databases and spreadsheets being that databases can interact with one another). In the first slide show, however, Quamen seems to view databases as pure data, not as documents, since he says both “documents and databases” can co-exist. Is a database sort of like Plato’s ideal solids, in that it doesn’t physically exist anywhere as a document? Is the spreadsheet comparison just a way for us humans to give a structure to something that is really only bits of data scattered over a server?

In the second slide show, the description of a database as “a high-quality representation of the real world” muddied the waters further for me. How can a collection of data represent the real world at “high quality”? To use the example database, a table of information listing species of birds could, I suppose, literally represent real birds, but I don’t see this representation as being high quality, or really anything above rudimentary. The table doesn’t even stand in for actual individual birds, just species. Likewise, the table of data about club members could be said to represent them, but a person’s name and phone number is such a tiny fraction of who s/he is, I take issue with it being either a high-quality or a real-world representation. I suppose I’m just arguing semantics, as I am when I question whether or not a database is a document or not. And I guess the answer doesn’t really matter as long as I understand how databases work, which I do.

Randomly:

Data culture privileges queries and treats answers as if they are ephemeral.

This quote reminded me of the part of The Hitchhiker’s Guide to the Galaxy where, after millions of years of computation, the supercomputer Deep Thought calculated the answer to the question of the meaning of life: 42. It wasn’t until after Deep Thought announced the answer that anyone realized they’d forgotten to ask him what the question was.

Maybe the question is “Are databases documents or not?”

Databases in DH

I echo Matt Smith’s sentiments about the author assessing value between two mediums, that is print and digital mediums. I don’t believe that digital is the only medium which foregrounds questions. Many research articles begin with questions, provides some answers, and more often than not end with more questions for the future. However, the author of “Databases: Intro to Relational Databases & Structured Query Language” may intend that digital mediums are more question-agnostic than print media. When using a database, answers are perceived as less concrete since they rely purely on the query in data cultures or digital mediums. Print media requires disaggregation of data from a research question and reconstruction of data when trying to use it for another purpose. Databases leave more room for manipulation of data, yielding more uses out of a data set. The data can be contextualized for  more than one purpose due to the ability to manipulate queries. So, data cultures are transforming humanities because researchers no longer reverse engineer documents to find the questions, instead the questions guide the data that is being collected in the digital medium.

Image vs. Print Culture

One of the fundamental issues concerning the disparity between print and image culture is the notion that “print culture foregrounds answers and pushes questions into the background.” I tend to balk when one thing is contrasted—at its very essence—to another thing, when lines are drawn and values are assessed. Personally, I found that books foregrounded information and pushed questions into the background until I began reading critically. So, is this a problem of the medium the information is delivered in or a problem of the consumer of that information? There are those that see images, a graph or a politically-charged image, and immediately assume it is indisputable information.

Information, whether image or data, will always be interpreted differently, but I think the important thing here is that the information is “as question-agnostic as they can be.” Print and image culture are capable of asking and answering the same questions. They merely perform the task in different ways. The idea that the scholar is able to “see emergent patterns in the chaos of data” is integral to new expanding new projects and facilitating current ones, but ultimately those emergent patterns will be read, analyzed, and processed by a skilled critical eye. The tools and languages we have learned about seem to facilitate and organize, but the print culture and the print critic are still necessary in order to make use of those tools and languages. I found the brief introduction to how databases actually work to be quite interesting and quite useful concerning how I go about to my own research. Of the many useful tidbits I’ve picked up, I find that digital humanists have an incredibly scientific approach to building a set of queries. I often forget the traditional inductive approach in favor of allowing my interest in a certain subject govern my research. Not only do they build hypotheses scientifically, but they also collect and organize that data in a necessarily logical pattern. I see the appeal of performing traditionally subjective research in a quantitative, scientific way. I grew up in the age that saw Dead Poets Society—jumping on chairs, reciting poetry in one’s whitey-tighties*, and chanting carpe diem—as the definition of English, so it is a nice breath of calculated air to entertain the prospect of rigor in research and eidetic certainty in our results.

*Don’t act like you didn’t have a pair.

Privileging Questions in Databases and Digital Humanities

“If print culture foregrounds answers and pushes questions into the background, then perhaps data culture may do the opposite: it privileges queries and treats answers as if they are ephemeral.” In the slides we read for today, this was marked as an “interesting idea,” and I think it is interesting, not only for discussions about databases versus documents, but for the humanities as a whole. I ran into this idea recently in a different context while reading for the Shakespeare Performance class that many of us are in. We were assigned a 2008 essay called “Adaptation Studies at a Crossroads” by Thomas Leitch, which reviewed the most recent scholarly work focusing on the adaptation and appropriation of literary texts. One of Leitch’s criticisms of undergraduate textbooks on the subject is that “they are limited not because they give incorrect answers to the questions they pose, but because those questions themselves are so limited in their general implications” (68). Leitch further asserts that sometimes “the question is more valuable than any answer” (75) and endorses textbooks that raise productive questions about adaptation studies even, or especially, when those questions are unanswerable.

It’s interesting that Leitch would raise some of the same issues as the DHSI slides, as he is largely dealing with the same kind of shift from print (books) to digital (movies). I think the privileging of questions rather than answers that Leitch and the DHSI lecture bring up is a productive approach to the humanities, and one that is sometimes itself underprivileged. Asking the right kinds of questions is always more important and more productive than definitively answering the wrong ones. And our answers to these questions often change over time anyway. In light of these observations, I’m not going to attempt a conclusion to this post; instead, here are some questions that might prove productive:  How can or will databases change our modes of research? What are the relative advantages and disadvantages of these changes? How can we use quantitative data provided by databases within the framework of disciplines that focus on the “human,” or at least on things that we believe to be largely unquantifiable?

A Database for “One Idea”

I found the Powerpoint on Intro to Relational Databases and SQL fairly straightforward and informative. However, a large portion of “Why Databases?” was unclear without the author’s explanation. In an attempt to expand on the argument for databases in Digital Humanities, I’d like to propose a possible interpretation of the section so nicely decorated with Simpsons characters. As an introduction to the “One Idea” model, I wonder if the photos of the Table of Contents and Index represent inflexible data sets, an old model for cross-referencing that is static and limited compared to databases. The “One Idea” seems to be something that could be stored as a database or as a document, perhaps a research topic about which a scholar wants to gather and curate essays. The first contributor adds four essays to this document or database. The second contributor adds three essays, which become part of the curated information on the topic. The pattern continues until the “One Idea” includes 18 essays. If the essays and the data about them, including subject tags, are stored in a database rather than only in document files, it becomes possible through querying to pick out which essays meet certain additional criteria, in this case, perhaps those that reference Shakespeare’s works.  Using the database, this list of essays (generated by querying the subgroup Shakespeare) can also be sorted using other data attached to each essay, perhaps in this case the year the essay was written.  This allows for much more flexibility and useful searching in a digital archive than only links to documents provide. I’d be interested to hear other theories about this section, if anyone else was curious about its contribution to the overall argument.

Boo-Boo the Barrage Balloon Meets TEI

I guess I’m officially a nerd, because TEI is exciting. Until this reading, I had never heard of SGML, much less that HTML originated from it. Since TEI is also a form of SGML, it and HTML must be siblings, or at least cousins (although HTML sounds like the black sheep of the SGML family). Fun! I also appreciate the explanation of XML, as I wasn’t really sure what it is. My only complaints about the reading were all the broken links (which made it hard to understand some of the instructions, since they referred to documents that aren’t there anymore), the typos, and the very 1990’s frame formatting. I’m guessing “A Very Gentle Introduction” is also very old in computer years.

I was interested in the idea that “preservation is a key problem for an emerging digital culture,” something I hadn’t really considered before this class. Our discussions on bitrot and other issues of digital deterioration have helped make me more aware of the problem, but I’m still a bit stuck in the mindset of “going digital means preserving.” Part of my impetus for my barrage balloon DH project is to preserve original photos and other balloon-related memorabilia in digital scans and to disseminate them online. Sharing the photos I collect is still best accomplished digitally, but could the actual photographs be better preserved than the digital images I make of them, despite the threat of fire, acid, vermin, etc.? To bring my questions more in line with textual documents, what about my very fragile copy of the World War II children’s book Boo-Boo the Barrage Balloon? What can TEI do for Boo-Boo and his compatriots Blossom and Bulgy?

After reading about TEI’s encoding options for various text elements, I can guess that TEI would let me encode the text of Boo-Boo the Barrage Balloon with indicators of quotations and formatting, milestone events (like when Boo-Boo saves London from the Nazis), a bibliography, and a header about the book itself. (Incidentally, I appreciate Mueller’s inclusion of hyperlinks in his TeiXBaby language to update TEI for the Web.) I could then use CSS to format the text of Boo-Boo to make it approximate the text in the book – though without the charming illustrations. Once I get the hang of TEI, I might take a stab at encoding Boo-Boo. I doubt he will be of much interest to scholars, but it will give me some practice!

Speaking of encoding, the reading’s instructions for encoding TEI were a bit confusing to me, although familiarity with HTML helps (especially with containers like <head>, <div>, and <p>, which are the same in HTML). I expect it will make more sense once I actually start encoding, and I’ll learn the language as I go. I can’t wait to get started!