Response to “Linked Data- The Story So Far”

Architecture requires conflict. Clean resolution. Choosing a value. Uncertainty is required (Nikolov et al., 2008). Content feeds attribution, inference is “Oh yeah?”

Coarse-grained measures emerge on the silk (Volz et al., 2009). Represent a face.

Drawn provenance.

Digitorium: Approaches in Digital Pedagogy

This panel (Thursday, 3:30-4:45) was really interesting to me because it got me thinking about ways I can bring DH into my EN 103 courses I’m teaching in the fall. In “UA Genealogies: Historical Archives and Storytelling,” Dr. Lauren Cardon talked about a fascinating 103 project she taught, in which her students conducted genealogy research on one branch of their family, then composed 500-word narratives with digital images and uploaded them to Word Press. Thirteen of the best projects also went on display in Hoole. Lauren talked about how the project helped students make personal connections to the campus community through working with the ADHC and Hoole, as well as taught students skills in archival research and digital gallery-building.
Historical Archives and Storytelling on the ADHC site

Dr. April Morris presented “Beyond the Book Report: Digital Tools, Undergraduate Engagement, and Intellectual Experimentation,” in which she described her experiments in bringing DH tools into her art history classes. As exigence, she explained that the range of digital materials and technologies for presenting art is “exploding” and that students like and respond well to digital materials. April showed us a series of art history course websites she had designed, which became successively more integrated into the class and involved more student input. One of the coolest student projects is a “mortuary temple to Brenden Fraser” a current student is building in Minecraft. (!?) April stressed that students need a sense of ownership over their digital projects, as well as the importance of letting students present their research in class to build community.

Both presentations gave me inspiration for my upcoming courses, which will focus on visual pedagogy of World War II propaganda (including barrage balloons, of course). I had already planned on having my students keep Tumblr blogs, but now I’m considering having them work on a course website instead (or in addition), as well as adding a digital component to their researched position paper. I also plan to use DH resources (including ArtStor and other things I’ve discovered in this class) to give my students access to more media than is available traditionally.

A Web of Things

The Bizer, Heath, and Berners-Lee article made some points that intersect with the humanities and with previous discussions we have had in this class. One is slightly disturbing to me: “Linked Data uses RDF to make typed statements that link arbitrary things in the world. The result, which we will refer to as the Web of Data, may more accurately be described as a web of things in the world, described by data on the Web” (2). I’m not sure how literally I am supposed to take this statement, but to me, things in the world are not data. They can be described by data, but the Web of Data cannot include things in the world. The web refers to things in the world and identifies many different bits of data that refer to the same thing in the world. When things are related in the real world, this relationship is distinct from the data relationship, or predicate, between them. I’m uncomfortable with the appropriation of real objects by the (capitalized, as Michael points out) Web of Data, and I wonder what the authors intended to communicate by including the world in their data web and emphasizing the distinction (or non-distinction) between a web of things and a Web of Data. I’m interested to hear how others read this phrase.

The rhetoric that bothers me did not extend throughout the article: “The RDF Vocabulary Definition Language (RDFS) (Brickley & Guha, 2004) and the Web Ontology Language (OWL) (McGuinness & van Harmelen, 2004) provide a basis for creating vocabularies that can be used to describe entities in the world and how they are related” (4). Here, it is clear that the web describes real things and relationships rather than including them. I really like the idea that people can create their own words within an existing language to describe real relationships that have not yet been defined on the web of data. I feel like writers who have created new words would be into that. Jabberwocky! These data authors translate a relationship from the real world, and the word for it, into a data vocabulary. The individual creativity and control this gives contributors seems to fit right into the collaborative nature of DH and the internet as a whole.

On another note, the biography of Tim Berners-Lee has an especially impressive and punchy beginning and ending. Just in case not everyone scrolled down all the way, these are some pretty fun facts: “Sir Tim Berners-Lee invented the World Wide Web…In 2001 he became a Fellow of the Royal Society” (26).

Linked Data and other Capitalized Terms

The first item that I read for today was the .pdf which covered the ways in which queries are executed in the various languages that are currently available. CouchDB seems to be an easy-to-understand means through which one might venture to create and search a database without the use of SQL, which happens to be the way that large-scale search engines like Google operate. I found this information to be interesting, useful, and completely accessible given the images and context. Also, the .pdf is HUGE—I’m relieved that we were only covering a short portion of the document. It gives examples of the various languages that are covered and was clearly written with students, and professionals, in mind.

The second article on linked data was no less interesting, but perhaps a little more difficult in terms of accessibility. The concept of Linked Data on the Web is pretty ambitious but reminds me very much of the way in which people make connections neurologically. Associating various “real-world entities,” to use a term from the article, with a field of tangentially related items is exciting because it mirrors the process that we use to create meaning and understand the world—only this variety of data is on a much larger scale and is operating at speeds previously unthinkable (literally[!]). The concept, again, is an exciting prospect, but I did feel at times left out of the conversation by the esoteric, technical terminology implemented. I’m sure that the authors of this article had a slightly different audience in mind than a graduate student like me, and it showed, which is by no means a criticism. I simply found this bit of information slightly less accessible than the previously discussed excerpt. That said, this also seems a fitting wrap-up to the conversation that has taken place within this course up to this point. All of the points connect. Huzzah!

While it was slightly difficult to navigate the terminology used to describe structured data on the Web, it brought to mind phenomena that already occur—I’m looking at you tailored advertising on Facebook, related videos on YouTube, and the like. With Linked Data, I pictured myself subjected even further down the proverbial rabbit hole of available and related information that one might experience, say, while cruising the interweb highways of Wiki pages. But, of course, it’s different than standard hypertext links. I’m interested in the concept, again, because I feel that it mirrors the way in which human beings think, but I’m also curious as to how this Linked Data apparatus might be used to benefit the classroom and further the conversation. Query based intellectual activity is fascinating and this type of field would be ripe for such an endeavor, no doubt.

Digitorium: The Future of Textual Editing

One of the most interesting presentations I saw at the Digitorium conference was the plenary by Dr. David Lee Miller: “The Shapes of Text to Come: Textual Editing and Scholarly Publishing in the Age of Open Access.” The most interesting part of his talk, for me, was about the Spenser Archive and digital edition he was working on. He showed us how the digital edition of the Faerie Queen will work. Readers will be able to choose the views they want, seeing the text in modern or original spellings, making glosses and notes appear and disappear at will, and viewing the changes that editors have made to their copy text with one click. This is not all the digital text will do, but it was enough to sell me on the idea of a digital edition. I’m always a little bit skeptical about reading a long text onscreen. I seem to concentrate better and absorb more when I read in print, and I think many of us have had this experience. The digital scholarly edition, however, seems to have so many capabilities that the print one doesn’t. I’ve been reading Arden editions of Shakespeare’s plays recently, and I find all the notes and glosses and symbols and alterations in the text so complicated and confusing (especially for King Lear) that it’s hard to concentrate on the text itself. I sometimes get distracted by notes that I don’t need or run into abbreviations or symbols that take me out of the reading or have to flip back to the introduction to reread a passage about the editing choices. The kind of digital edition Dr. Miller showed us could fix those problems. In fact, the features of his digital edition seemed so convenient that I can no longer consider reading a print edition more effective, for my own purposes, than an online one. I’m excited by these developments in the field of textual editing, and I hope that these kinds of digital editions will catch on sooner rather than later.

‘The Global Data Space’

I think the readings for this week on linked data were a good way to end the class. Now that we know about some of the DH resources out there, and how those resources can make information more readily available, we’re learning about how all this information can be linked and connected. The creation of the “global data space” (1) that Bizer, Heath, and Berners-Lee are talking about is something I’ve never really thought about before. This project is so massive and involved that I’m not even sure you can really call it a project. The ultimate goal, according to the authors, is “being able to use the Web like a single global database” (17). It requires world-wide cooperation and overcoming the various research challenges that the authors outline. They state that “If the research challenges highlighted above can be adequately addressed, we expect that Linked Data will enable a significant evolutionary step in leading the Web to its full potential” (20). This seems to me like a very big “if.” They detail seven different challenges, all of which seem if not insurmountable, at least very difficult to accomplish. Maintaining the quality of this kind of global database, or even maintaining the database itself, seems particularly problematic. I also wonder about the scale of this project in terms of time. Are there specific goals, or rather, do the authors have specific expectations for what will be happening in five, ten, twenty, or fifty years? Perhaps I shouldn’t even think in terms of “project,” but rather in terms of progress toward an ideal that may never actually be achievable.

Godeling Databases?

In the Quamen document I was struck by the statement that syntactic web searches are fundamentally binary. This should not have been revelatory, but it really connected with a post-deconstruction era understanding of language—or rather didn’t connect. As we develop more sophisticated ways of approaching the data that we normally just read, it’s both maddening and invigorating to confront a familiar problem. But semantic web databases represent an answer to the binary limitation by introducing a range of predicates linking subject and object rather than just offering an attribute connected to an entity.

How I understand the relationship between traditional databases and semantic web databases reaches back to my post about databases a few weeks ago in which I looked for similarities between documents and databases. I believe the similarity to be found in the ideas of inclusion and exclusion—in a document, each word introduces a range of possibilities for the next word and so on and so forth while a databases includes or excludes based on a particular attribute. After a query is levied at the daaset, human readers can then begin to interpret based on a set of inclusions and exclusions. This model, however, rests on a binary system—each query a matic of the presence or absence of a particular attribute—i.e. “database, give me a list of all texts that are novels and tell a strictly linear narrative.” There are four possibilities: (Yes, Yes)—the dataset that is retrieved or included—(no, yes), (yes, no), and (no, no). each ordered pair represetnts a binary opposition through which I can include or exclude the entity. Insofar as I understand them, however, semantic web databases break free of this system and can engage more nuanced queries.

Triplestore databases are built on a similar set of rules to traditional relational databases, but they can represent the data with much more fidelity. I wonder how far this fidelity goes, however, it seems that we could be constructing another Borges tale, we are constructing a database that represents the world so well, that the barrier between the thing and the representation begin to fade. Further, is there any chance that we are moving toward a systemic understanding of the universe which will require that system to describe itself? I am really just thinking of the theories coined by Kurt Gödel which held that systems could reach a level of complexity in which the system could describe itself. He theorized that once this happens, the system basically falls apart—true systemic fidelity is untenable.

I do wonder, however, about a phrase that came up repeatedly in the Bizer piece. Bizer describes linked data as “a web of things in the world, described by data on the web,” which made me think about a number of issues regarding the relationships between objects in the world, and the world’s relationship to representation. It seems that we want to draw back the curtain of this world to see the raw data of existence, but this process is predicated on a human perspective. Is the data that is on the web in Bizer’s statement data about the world, or data about us? I think its likely that its split between these two possibilities, but how well can an SWD tell the difference?

Linked Drifblim Data

While reading Bizer, Heath, and Berners-Lee’s article, I decided to try out some of the linked data resources they mention. I found links to some browsing and searching tools on LinkedData.org and tried out LOD Cloud Cache search engine (since it allows traditional searches with words instead of URIs). I did a search for my favorite Pokemon, Drifblim.

To be honest, I didn’t really understand the results. The first result was Drifblim’s entry in “dbpedia.” Dbpedia’s entry listed some subject/predicate/object triples about Drifblim, where Drifblim is the subject. For instance, Drifblim (subject) is the the primary topic of (predicate) http://en.wikipedia.org/wiki/Drifblim (object – Wikipedia’s entry on Drifblim). I also got to see the “sameAs” predicate in use, where Dbpedia noted that Drifblim “is sameAs of” Grodrive (Drifblim’s French name). Clicking on each type of predicate in Dbpedia gave me more information on it, although again I had a hard time understanding what the information meant.

Another linked data search result was to a “WikiData” page about Drifblim – what I assume is a wiki-based database of linked data, an alternative to Dbpedia. This page had some different triples, such as Drifblim (subject) has the label or name (predicate) of Drifblim, Drifzepeli, Fuwaride, etc. (object – names for Drifblim in other languages). It also included a “description” predicate which told me that Drifblim is a species of Pokemon.

Although I didn’t really know what to do with my results, I can see how they exhibit what Bizer et al. call “a more detailed interface to the user that exploits the underlying structure of the data.” To me, the results seem almost mathematical – instead of saying “Drifblim is also called Grodrive,” it’s like a formula that reads DRIFBLIM = GRODRIVE. This tells me the structure of the data, that the two pieces of data are equivalent, and it would definitely be more machine-readable than an English phrase. I can see linked data being more useful for programmers and developers (and programs and applications) than for normal users like me. The article’s section on “domain-specific applications” gives some good examples, such as Revyu which provides richer data to end users by querying for linked data behind the scenes. (I searched for Drifblim on Revyu, but sadly found nothing. Maybe it’s time to write a review…) I think other applications which draw upon scholarly data, like Talis Aspire, would be great for DH projects. I’m left wondering how much linked data has developed since this article’s publication (in 2009?). From what I saw online, not much has changed, and I’m wondering how far along developers are getting toward making linked data useful for the average user, or toward “the ultimate goal of being able to use the Web like a single global database.”

“a web of things in the world, described by data on the Web”

by rumwik

Does Data Dream of Human Queries?

While reading the articles for Tuesday’s class I am struck by two things: 1) I am overwhelmed by how complicated a simple search can be, and 2) searching is seemingly becoming sentient. As far as I can tell, the difference between syntactic and semantic search is that the latter has a sort of awareness of connotation, context, and meaning-as-use. This implicitly connects with John Searle’s notion of speech acts, and it explicitly connects with his thought experiment the ‘Chinese Room.’ So, it seems that a search is basically a form of a ‘perlocutionary act,’ wherein person one makes a statement and person two processes that statement as a request and performs the task. This is not as simple as a syntactic search function because oftentimes queries (human and digital) are made that require subtleties, such as connotation and context. If I was at class with a friend and said I would really enjoy reading a book that they happened to own, the clear contextual meaning is that I might want to borrow the book, but the meaning is not on the surface.

Similarly, this relates to the ‘Chinese Game,’ a thought experiment that attempted to show that Strong AI is a misnomer, and that—in fact—computers are incapable of real consciousness. This makes me reconsider what true consciousness is and what it means to interact between the lines. Computers have, for some time, been able to process commands, and Searle essentially argues that processing a command is the extent of their function, but the Quamen slides suggest a very different interaction. The semantic web “invents a language so that the meaning of pages is searchable,” and the database or search engine is able to process that meaning and interact with your query. It is startling to me that computers are becoming more adept at this process, and perhaps better at understanding human interaction and undertones than we are.

This, in conjunction with Linked Data, will generate still more interaction. It seems that the transition from syntactic to semantic parallels the transition from HTML to RDF. The article states that the results of this more complex, layered web of data will “make typed statements that link arbitrary things in the world…a web of things in the world, described by data on the web” (2). Data will begin assessing the world and generating producing statements about it, which makes me wonder when consciousness begins and ends. My computer can interact with other computers, it understands its surrounding and responds to it, and it understands by queries even if they’re slightly wrong or predicated on context. Do you think my computer judges me when I binge watch House of Cards?

Linked Data

Just to understand what is linked data, I found the dummies version. So the Bizer et. al article told me RDF triple is helpful when there are two or more data sets and I’m trying to search for two things that have a relationship. The two things I’m searching for have URIs and are linked together so that if I search for Albert Einstein across multiple databases then Theory of Relativity is going to appear in the results as well because the two things are related, they are linked by tagging. This brings us to earlier conversations about documenting the methodology for tagging. If I work on a project that is going to create a web or linked data then the methodology or tagging must be documented for all the RAs to follow and it would be helpful if it were published so people know how to search and have an explanation for what comes up during a query.

The tool I reviewed attempts this. If we remember the query function didn’t work at the time of the tool review, but it pulled from Google’s database and used crowdsourced tag overlays to produce results from queries. It uses the Web of Science model which I think is or is related to linked data. After seeing this tool and its methodology, we may ask ourselves for the purposes of our particular projects whether its better to crowdsource the tags/links or have a method in place to have RAs tag/link?

Scholarometer

Digital Humanities Blog

University of Alabama