Author Archives: Geoffrey Emerson

Godeling Databases?

In the Quamen document I was struck by the statement that syntactic web searches are fundamentally binary. This should not have been revelatory, but it really connected with a post-deconstruction era understanding of language—or rather didn’t connect. As we develop more sophisticated ways of approaching the data that we normally just read, it’s both maddening and invigorating to confront a familiar problem. But semantic web databases represent an answer to the binary limitation by introducing a range of predicates linking subject and object rather than just offering an attribute connected to an entity.

How I understand the relationship between traditional databases and semantic web databases reaches back to my post about databases a few weeks ago in which I looked for similarities between documents and databases. I believe the similarity to be found in the ideas of inclusion and exclusion—in a document, each word introduces a range of possibilities for the next word and so on and so forth while a databases includes or excludes based on a particular attribute. After a query is levied at the daaset, human readers can then begin to interpret based on a set of inclusions and exclusions. This model, however, rests on a binary system—each query a matic of the presence or absence of a particular attribute—i.e. “database, give me a list of all texts that are novels and tell a strictly linear narrative.” There are four possibilities: (Yes, Yes)—the dataset that is retrieved or included—(no, yes), (yes, no), and (no, no). each ordered pair represetnts a binary opposition through which I can include or exclude the entity. Insofar as I understand them, however, semantic web databases break free of this system and can engage more nuanced queries.

Triplestore databases are built on a similar set of rules to traditional relational databases, but they can represent the data with much more fidelity. I wonder how far this fidelity goes, however, it seems that we could be constructing another Borges tale, we are constructing a database that represents the world so well, that the barrier between the thing and the representation begin to fade. Further, is there any chance that we are moving toward a systemic understanding of the universe which will require that system to describe itself? I am really just thinking of the theories coined by Kurt Gödel which held that systems could reach a level of complexity in which the system could describe itself. He theorized that once this happens, the system basically falls apart—true systemic fidelity is untenable.

I do wonder, however, about a phrase that came up repeatedly in the Bizer piece. Bizer describes linked data as “a web of things in the world, described by data on the web,” which made me think about a number of issues regarding the relationships between objects in the world, and the world’s relationship to representation. It seems that we want to draw back the curtain of this world to see the raw data of existence, but this process is predicated on a human perspective. Is the data that is on the web in Bizer’s statement data about the world, or data about us? I think its likely that its split between these two possibilities, but how well can an SWD tell the difference?

A Storm of Carbon and Attributes: Similarities Between Docs and Databases

Databases are extremely exciting for my future work in Digital Humanities. This is the infrastructure for the type of inquiry that I would like to do—overlapping discursive lexicons. It offers opportunities to track the minutia that sometimes gets overlooked in an interdisciplinary project. The major difficulty, however, in this stage of inquiry is finesse. I know that for my projects the appearance of an entity (word) is not determined by the text. The rhetorical play in scientific non-fiction during the sixteenth and seventeenth centuries is much more like literary play than a contemporary reader might expect. This at least seems to present a problem in classifying words in regards to genre. A sphere is a metaphor in a poem and a mathematical construct in a scientific monograph, for example. The overlapping rhetorical strategies and cultural assumptions as well as the overlapping lexicon may mean that the questions that “databased” (I had to) inquiry foregrounds “how do I get from here to there,” or “how do I find “x” and “y” data accounting for complexities “a” and “b”?”

The slide shows that we read for today pointed to a difference between traditional document-centric inquiry and databased inquiry. One of the authors claimed that one foregrounds questions and the other pushes them into the background. I do wonder what happens when we explore the similarities between documents and databases. It seems that all words (as entities) have several attributes—some contextual and some grammatical. The pronoun “she” has the grammatical attribute “pronoun” as well as a contextual attribute of referring to an agent, that is itself a collection of data entities each with a string of attributes. It is difficult to remember at times that Lady Macbeth and Hamlet are mere words on a page—entities with attributes.

Hamlet is a man that wears black, not green.

It is tempting to view “Hamlet” as an entity—but the concept of “Hamlet” is more of a table that includes “black,” but excludes “green,” which in turn excludes “pronoun” Character, then seems to be a series of inclusions and exclusions, one layered on another. The data that makes up “Hamlet” is defined by some sort of linguistic data made of “is” and “is not” (I am going to stop this train of thought because I am colliding with the Derrida’s trace).

Defining both textual information and data in this way suggests to me that document data and database data do behave in similar ways. One of the chief differences is actually in labor. Narrative is a dominant mode perhaps because of the labor that it requires to extract the data, and labor requires time. Extracting data from a database (depending on the complexity and the power of the machine) is the labor of waiting while the machine crunches the data (I am aware that I am, for the moment, excluding database construction (authorship?) and query formation). The experience of reading and waiting are different—especially if you are productive while waiting, perhaps using that time to do some light reading—and our experience of time is altered as a result. I am unable here to address the similarities between database design and authorship, but I think there are connections there as well. My conclusions after thinking and writing about databases are fuzzy at best, but it still feels enlightening and exciting. My understanding of the nature of databases reaffirms my approach to texts, but perhaps it’s partly how I am sorting the data.

Whatever we understand, we understand according to our own nature.

Do Dhers Dream of Electric Questions?

The first question that is often asked when I talk to colleagues about large amounts of encoded text is what or how to use such a data set. Encoding every word of Shakespeare does not strike most academic or interested parties as particularly useful. Isn’t it just a concordance? But the advantage to a digitally encoded corpus is the ability to collaborate with an expert in writing applications in which to manipulate the data through metadata. This process requires a good deal of imagination to form question with which we can ply a corpus of text. The applications and scripts allow interested parties two main options. The first of which is the novel or unlooked-for discoveries within a body of text that was previously inaccessible, or obfuscated by the amount of data. The second option is to design a large scale question (that may need revision) that previously required speculation or generalization. I am not claiming that big data renders these types of rhetorical and logical moves obsolete, but it can open doors for questions that seem out of reach for individual researchers.

Let me try and by a little more specific. Topic Modelling is one way that we are able to dynamically sift through large amounts of information. David M. Blei provides an introduction to the concept here:

http://www.cs.princeton.edu/~blei/papers/Blei2012.pdf

The trick is to figure out the ways in which interested parties can, as he puts it, “zoom in” and zoom out” to relevant information and data. It allows readers the opportunity to find patterns, connections, and manipulate discourse. A more specific example still (and one that I find fascinating as a fan and an intellectual) is the Philip K. Dick android built by Hanson Robotics and researchers at the Institute for Intelligent Systems, like Andrew Olney at the University of Memphis. The android was built to look like Dick complete with beard, eyes, and facial expressions, but more than this is was programed to speak like him.

http://youtu.be/1bYiXIVyguU

The software developers used typical conversational models called bots to generate the grammatical glue of the conversation, but they also used topic models and concept maps of Dick’s corpus of writing to create responses to verbal questions that come from the writers own words. After hearing Olney give a lecture on how he managed to achieve a conversable android Philip K. Dick, I not only got excited about the possibility of electric sheep, but new ways in which to manipulate discourse. I am currently still looking for the right questions to ask for this methodology, but tracking concepts between images, poetry, fiction, non-fiction, journals etc. can allow us to recreate not just discourse, but echoes of a discussion.

Digital Humanities and some questions about its community

Johanna Drucker, Paul Barret, and the Founding Principles of dhpoco.org all interrogate the idea of community in the production and consumption of academic digital media. Debates about digital humanities can shed new light on these discursive communities and the relationships upon which they are predicated. Producers and consumers of digital media form a vast Venn diagram in which its members continually shift from one circle to the other. Perhaps digitization intensifies mobility of the community members, who can go from curator, to blogger, to (en)coder, to writer, to scholar—professional—public. Here, at the beginning of the twenty-first century, we work from the assumption that controlling information equals power, which is true, for the most part, but it becomes increasingly clear that the relationship between the information and community members is anything but static. As participants change roles from knowledge producer to consumer they change their relationship to power. It may be that this change subverts the infrastructure of knowledge distribution that western culture as relied upon for centuries, but I remain confident that patterns will emerge—patterns that may have been there the whole time—even if I cannot presently articulate them.

Drucker says “the humanities have a role to play in demonstrating that knowledge is historically and culturally situated.” The demonstrating and situating allude to a wider context and assumes that information moves in one direction. Her emphasis on monographs is understandable—they allow professionals to craft knowledge before it is produced, and allows peers to ensure quality control. I am curious, however, if this process is mutually exclusive with digital knowledge production, or if some of these strengths could not be adapted to other mediums. I want to be clear that I share Ducker’s values and concerns regarding quality and professionalism, which amounts to accountability to the community. I wonder, though, about the wider context when the humanities demonstrate and situate—demonstrate and situates to whom? Is the community insular in that they are only accountable to one another? Can the community broaden its boundaries and be held accountable to the contemporary culture to which they are demonstrating, and in which they are situating their subject? I am not suggesting just handing over the keys to the tower, but the physician is accountable to patients as well as peers, and the lawmaker to their constituents as well as their colleagues?

Reticent Enthusiasm: Responding the Digital Humanities Manifestos and Defining DH

Perhaps, even in a theoretical context, “what is Digital Humanities?” is the wrong question. Most of the answers are too vague to be of any use to either resistant or enthusiastic parties, and the inexactness is necessary to describe the field/methodology/toolset … etc. I am not sure I have the right question to ask, but maybe “what can Digital humanities do for me” or “what can Digital Humanities do for a given community?” would be a good place to start. Both versions of the Digital Humanities Manifesto rest on the idea of community which include a mix of collaborators and consumers (gasp). Overall, the idea of community in these texts is ambiguous. It is supposed to be united in some utopian digital humanist future, but not all rhetorical communities function in identical ways. I am enthused at rarefying the distinction between academic and public audiences, but I am unsure that distinction would disappear altogether. One of the deepest concerns in Freshman Composition is getting students to identify and adapt to particular audiences, yet I cannot help but wonder whether the universalism expressed in the manifestos is compatible with this skill. Basically, I am concerned that the rhetoric in the manifestos determines both object (DH project content) and subjects (the participants in DH projects).

The main underlying difference between the questions I mentioned earlier is a prescriptive versus a descriptive approach to DH. Based on the theoretical discourse from the Digital Humanities Manifesto 1.0 and 2.0 and Helle Porsdam’s “Too much ‘digital’, too little ‘humanities’?,” a prescriptive approach must retreat into ambiguity because much of the potential of DH remains unexplored. Rapidly advancing technology forces any definition of DH to speculate and predict—generating a definition that would determine the parameters of DH going into the future, and making the answer prescriptive, even if it does so unconsciously. Perhaps this temporal paradox could be avoided, but I want to put forward the possibility that even after a larger sample of DH projects and tools are available, alongside a stable predictive model for advancing technologies, the law of accelerating returns [1]potentially puts a satisfying definition out of reach.

While the lofty rhetoric of the manifestos gave me pause (particularly the claim that DH is the key to some utopian future), I agree that the democratization of information and its effect on social and economic systems is powerful and has great potential for positive change. Melding academic research and entertainment has the potential to rehabilitate enthusiasm and passion as identity markers—getting them a seat at the cool table in the cafeteria. Combing research and entertainment also achieves the value-shift that the manifesto calls for since it has the potential to encourage valuing knowledge for its own sake, rather than as a commodity to exchange. I do not think edutainment will be the downfall of capitalism, or give rise to anarchy, but it can be a force for positive change.

[1] The exponential growth of advancing technology