Category Archives: Digital Humanities

Interview for 4Humanities

[This interview by Ernesto Priego appeared earlier today in 4Humanities. Ernesto y yo creamos también una versión en Español.]

Ernesto Priego: Can you describe who you are and what you do?

Alex Gil: I am @elotroalex, the other Alex. I take to heart Sartre’s admonition that we are nothing and add Borges’ insight that we are someone other to the world. Columbia University employs me at the moment with the ambiguous title Digital Scholarship Coordinator to help bridge the widening gap between libraries and researchers. Part of my duties include preparing the subject and reference librarians to become the consultation arm of the Digital Humanities Center. Another big chunk of my time goes to consult and train graduate students and faculty in History and Humanities in a variety of subjects which I like to divide into three broad categories: remediation and curation, computational methods and scholarly communications.

I hear in your question another question: who have I been? A scholar, an editor, a conferencier, a digital tinkerer, a writer of tales and poems, an architecture student, a pre-med student, an odd-jobber. No wonder I am attracted to the digital humanities!

I finished my dissertation at the University of Virginia‘s English Department on the francophone poet and statesman, Aimé Césaire. I was lucky enough to find the earliest manuscript of a play he wrote during the Vichy occupation of his native land, Martinique. He later transformed that play into an oratorio and back into a play for different audiences at different points in his career. I used what we could call algorithmic thinking to sort out the complexities of that transformation.

In the years leading to my defense, I became more and more involved with the life of NINES and the Scholars’ Lab, eventually becoming a fellow of both. In the latter I was also part of the first run of the Praxis Program. In all honesty, I would not be who I am today without those experiences.

Caribbean Literary History, Textual Scholarship, Digital Humanities and now a Librarian. What a mix!

EPWe share various common interests, including a background in English literature. You are are also a fellow HASTAC scholar. Can you elaborate on how you perceive the interconnections/relationships between literary studies and digital scholarship?

AG: Literary studies has much to offer digital scholarship and vice versa. I learned early on in my training that interpretation, understood as repetition-with-a-difference, was essential to our experience as human beings. For all the genre pieces “against the digital humanities” claiming we are naïve about data, I knew that DH’ers with a literary background had always brought that sensitivity to texts; that only a small number of us were under the spell of naïve empiricism. The rest of us are still primed to ask the big questions from the machines.

I have been lucky to cut my teeth at the University of Virginia, where the tradition of textual scholarship and digital humanities are both strong and getting stronger. I was even luckier to become a researcher at a time in which we are wrestling with the difference between distant and close reading, between heuristic and algorithmic thinking, confronted with a new sense of textual enormity. These are questions that go to the very heart of interpretation.

Yes, I’m attracted to digital scholarship because I can get better search results, publish quicker and more openly, automate much tedious work, etc., but the reason that really carries my heart at the end of the day is the possibility of teasing out the human from the mechanical. Before Turing we had only one universal machine to worry about. Now we have two, and we are being asked to name the difference. Aha! A task for the literary scholars who command the machines!

As fundamental as the question of our humanity vis–à–vis the machines may be, an even more important contribution for literary scholars in the digital age is the creation of a reliable digital archive. I have already gone on record on this, and I will say it again: The most important task of the 21st century scholar is remediating our past and opening it up to the world. Literary scholars, in particular have kept the tradition of textual scholarship alive (as opposed to historians, for example), and now that we face a monstrous task of repetition, we are the ones who can lead the crowds we need to make sure we have a reliable past to offer future generations. For some marginal archives our task is urgent and spells the difference between survival or oblivion. I’m thinking here of archives in the poor and medium income countries, where governments can’t or won’t do much to preserve our documents from the elements.

EPNot all humanities departments or schools have frameworks, resources or even inclinations towards engaging in the theory and practice of digital scholarship. When did you first become attracted to it, and how have you experienced this transitional phase in which digital scholarship is not yet the openly-accepted, homogeneously self-described way of being a humanities scholar, at least not everywhere in the world?

AG: I became attracted to the digital humanities about 8 years ago when I found a dearth of digital resources on Caribbean scholarship and its primary sources. I had the ambitious, if naive, idea at the time that we could build an archive of canonical Caribbean literature. I approached a professor of Caribbean literature at the University of Virginia who was supportive, and we set to work. We got as far as organizing a stay at the Bellagio Center in Cuomo, Italy with some top Caribbean scholars and a team from the e-text center at UVa. We realized there that we had many challenges ahead, and the veil of naivete dropped in a period of 10 days. When I came back to the US I started learning skills, starting with TEI. I was lucky that UVa has such a strong DH presence on grounds. The rest is history.

Interestingly enough, the English Department is a very traditional department, and much of the DH work I did at UVa happened outside of it. Realizing that even at a place like UVa, DH is marginal to the departments can be eye-opening. Things seem to be changing now, and we’re starting to see more buy-in from departments.

We have to also acknowledge that computational methods do not encompass what we do under the banner of DH, and if chances are most scholarly communication will eventually transition online, algorithmic approaches to literature will remain theoretical—one theory amongst many. In the years ahead I imagine we will continue to negotiate that difference. While future generations of scholars will probably be comfortable with online venues of publication for their research, I don’t see us all doing topic modelling or network analysis. In that sense above all, I think the biggest change will come in the form of multi-modal, multi-media approaches to publication.

We must not forget either that these changes will not happen evenly across the world or even within national borders. I for one expect that some countries outside of the rich north will produce very innovative and striking forms of digital scholarship because they are not bound by the same institutional histories. For that to happen, though, we must overcome the same assimilationist forms of thought that anti-colonial and even postcolonial thinkers have combated. Perhaps those forms of scholarship will come from citizen scholarship, a secret hope of mine.

EPCan you discuss further what you understand by “citizen scholarship”?

AG: In the Caribbean, for example, most scholars/writers/artists don’t work for the academy. Many have affiliations to cultural groups of different sorts or freelance. A great deal of them work at banks, advertisement agencies, you name it. Beyond them, though, we have a cadre of aficionados, a wiki-style crowd that engages with the preservation and critique of our cultural heritage. Citizen scholarship refers to the aggregate activities of these groups and individuals. I know of very good scholars who never obtained a PhD.

What we count as a scholarship is also important here. Sonya Monjar for example directs a great project in Puerto Rico, “Esta Vida Boricua,” which focuses on life narratives. Some may say that this is mostly biographical work, but I see it as archival synthesis. Her’s is just one example of many outside of our traditional fields of vision that push the boundary between citizen engagement and scholarship.

The reason I hope to see more of these kinds of projects and the growth of scholarly interest from the part of the populace stems from the enormous need we have to remediate our material past. When Walter Benjamin warned us that the past was in danger, he spoke from a moment in the history of media where the work of curation and critical investigation was too expensive to be truly popular. Our moment is different, at least when it comes to the medium. The two things stopping us from rescuing that past that Benjamin favored–the one that is obscured by propaganda, profit interests and/or hegemonic ideology–are time and will. The average person in the Caribbean and other medium to low income areas needs to spend their time scraping up a living. Scholarly pursuit becomes a luxury under those conditions. Many things need to happen before a true public scholarly culture comes to life, granted, but I’m reassured by the fact that the medium is there already to facilitate it.

EPFinally, what strategies would you recommend to scholars (in academic institutions or not) interested in contributing to an international public scholarly culture?

AG: Start collaborating with someone who lives very far away from you. We have great tasks ahead of us. If remediating our archive responsibly is our most pressing need, as I argue above, then we have a great opportunity to collaborate on digitization projects that transcend boundaries. By rule rather than exception, archives are usually scattered. This creates many opportunities for us to build bridges between communities. At the moment I am involved in the Global Outlook DH initiative, a brand new Special Interest Group of the ADHO.

Our shared goal is to shed more light on the state of our global union and build bridges whenever possible. We are just starting out, but we hope to foster precisely those forms of shared archive building and playing that will lead to a global public scholarly culture. We have already started making wonderful progress in Cuba, where next year we will host the second THATCampCaribe. In the summer we hope to roll out Around DH in 80 Days, a tour of digital scholarship and curation around the world. I see other groups making great efforts to truly go beyond the rich countries: HASTAC and 4Humanities, to name two of the most visible ones. For these reasons and more, I predict this will be a year of many breakthroughs for digital scholarship on a global key.

The Internet was close to a blank slate at some point. Now it’s quickly becoming the dominant image of our cultural heritage. When it comes to the narratives we tell about our cultural and political history, at least in the West, in this our new mirror, we have an image that takes us back to canonical ideas of the West that have long been undermined in the Gutenberg galaxy. If the image of a shared cultural heritage is to be a non-hegemonic, honest reflection of ourselves, we must understand we are at heart working on a shared archive. True international collaboration around digitization and the play that they enable is a sine-qua-non of this archive. If I’m right, I hope the question on everyone’s mind will be not if, but who are you collaborating with?

Theory is dead, long live theory!

Posted on by 1 comment

Follow me on a caricature in two movements for mixed company.

First movement (Allegro Appassionato): Excessive Theory can lead to blindness.

For all the wonderful machinery that post-war theorists in the humanities have built to help us speak truth to power, many are particularly blind to their own appetites within the institution that shelters them from the elements. Nowhere is the operative ideology more evident than at the two points-de-capiton that bring us here: Building and Theory.

Fun fact: The discursive class depends on the already-built: Google, libraries, bibliographies, editions, audiences, Zotero, etc. You wouldn’t know this by the patronly anxieties of some of the inmates of the Republic of Intellectua. These anxieties have a long and honorable genealogy that according to one imaginary, the dominant one, harks back to Aristotle.

According to the original peripatetic, Dh’ers, or βάναυσοι as he called them, didn’t get to be citizens of his ideal Republic. Not because we are truly descendants of the Greeks (we are not), but because so many of us have a family member who traces their lineage to the divorce, the desire to banish the βάναυσοι still haunts our family reunions. Case in point: At the Po-Co dinner table, where I frequently break bread, most folks have an ongoing beef with what we call instrumental reason, justly associating some forms of it with a litany of evils. Without meaning to do so, many of us often get caught in the rhetoric and end up joining Aristotle instead of Fanon. When such embarrassments occur, as always, alibis and disavowals ensue.

Me río when I smell denial (from the Greek, ὑπόκρισις). The denial comes in several flavors: Some, like Aristotle, unapologetically place themselves above the clang of the anvil: The labor that sustains theory is visible, but unworthy. Others with more democratic aspirations re-appropriate the word ‘work’ —or as we would say, the work that the word ‘work’ is doing. Focusing on the wrong difference internal to work itself —my critical work vs. your what-is-it-that-you-do-again? work—, these folks inadvertently reaffirm their privilege, ending up back on Aristotle’s hillock where the clang don’t reach. An even more astute bunch builds elaborate and useful discourses on tools and work. They slum it, as the vox populi would have it. In many cases, though, they too abstract the work that enables their own. An endearing new offspring of the former acknowledges the instrumental intellect that makes discursive spaces possible in the first place, but claim their hands are tied to change the reward structures: Je sais bien, mais quand même… 

In all cases, an old class is perennially performed into existence: the service class. It doesn’t help that some folks are eager to accept the title and the role on unfair terms. We have now an opportunity to make a dent on the unbecoming tradition of la Distinction, and I hope we embrace it. It will involve a shift in both constructs: not towards a community of over-extended multi-classers (everyone knows an all-bard party would suck). We can push instead for spaces where both (de)formative activities, building and debating, occur simultaneously on equal footing, that individual tendencies may bloom to the benefit of the party formation.

I bear witness. Despite having won the service trifecta —graduate student, textual scholar and digital humanist—, I have always enjoyed playing the critical rebel; and, because a textual scholar and all-around tinkerer knows how to construct the materials he and others critique, I have always enjoyed what Hegel saw as the bias of service: absolute Wissen.

The oscillation between theory and practice, between solitude and service, discourse (the quintessential form of solitude) and building (the quintessential form of community) is our only hope out of our disavowals. I’m the first to admit knowledge does not belong exclusively to those with their backs bent over a Hinman collator or their eyes glued to a UNIX shell. I claim that detachment from such activities, even when they are your own, often leads to unwarranted arrogance and avoidable error; and, even more confidently, that the self-aware design and implementation of hermeneutic machines (whether editions or apps), can yield unique counter-intuitive insights with the cherished rigor of our most revered by-laws.

Second Movement (Pianissimo): The database is NOT the theory.

At the Scholars’ Lab, we recently wrapped up work on the first year of the Praxis Program. The tool we built, Prism, is a replica of a pen an paper exercise. We designed it for people in and outside the digital humanites. Our goal was to enable an exercise out of which new interpretations could be born. While we built it, we learned an enormous deal about the nature of annotation, fragmentation, constrained categories and hermeneutic difference. And if indeed we learned how to play with CoffeeScript, Cucumber, Sass, and a bagful of gems, we (and I mean everyone playing Praxis) still continued to engage, tacitly and explicitly, with old and new discourses. Without a doubt, many of our decisions about the direction of Prism owed as much to books we read eons ago as they did to the Law of the Rails. (If you are curious, some traces of our sentimental education persist in the blog). Besides the in-process theory, oftentimes internalized before we could make it public, the door is now also open to discourse on the texts under scrutiny, and/or the tool as a model for interpretation in general.

Here is one of those claims I would build on top of Prism:

Interpretation is a social phenomenon whereby we map our differences unto a shared text. 

That is a theory, NOT a database. Disagree with me and we can have a word, not a database.

All that to say, I wish my friends would avoid the infelicitous formula, “the database is the theory,” itself a caricature of the worst paradox-mongering coming out of the High Theory that the phrase tries to undermine in the first place. “They stab it with their steely knives,/But they just can’t kill the beast.” The problem as I see it is that those who feel they stand outside the traditions of humanities computing hear an easily disputable theory, when in truth, it is evident we ALL build and we ALL make claims. If we sometimes feel undermined by the discursive class, we should not retaliate by too readily collapsing the distinction between praxis and theory. Too many farm revolutions have failed because the pigs have moved into the farmer’s house. If we could diligently advertise instead that no one around here is confusing building and discourse as the same activity, I think that would go a long way to undermine our imagined and coagulating borders.

The phrase also distracts our audiences from, because it gets conflated with, a more important argument coming from Bethany Nowviskie, Steve Ramsay and other McGanners. To wit, that building can be an interpretative act. Those wonderful interpretative acts can only be a database, though, via weak metonymy. If we skip the catchy phrase, I think we can all agree: while all theory is interpretative, inasmuch as it can be extra-discursive, not all interpretation is theoretical (c.f. modernist sculpture). If we want spaces, even departments, with builders and theorists sharing professional rewards and the pedagogical load, I vote we continue to refine this promising theory.

In all fairness, the database needs to be a theory as much as a fish needs to cross the Niagara Falls on a bicycle. A more productive exploration of the relationship between the two would attempt to uncover the algorithmic, tabular, mechanical structures behind theory, and/or would attempt to make explicit the theory that walks in line with building a particular database, or a digital archive. Both of these humble approaches, coupled with the above, would surely ingratiate us to the barbarians at the door.

Of course, we must continue to ensure we are not saddled with unnecessary burden from folks who would see us as the help, eternal september and all. I suggest we turn the tables and recognize that discourse provides us with a service.  In order for us to perform such a wonderful legerdemain we must constantly re-acquaint ourselves with the fragile and unique deformities of the humanities and the social-sciences. I’m not kidding when I say that talkers are hackers too —yes, even if most of them just work those legacy systems we call books. If we were to lay bare their own mechanical and material exigencies, find the lingua franca that is always-already there to unite us, and do so without confirming their worst fears of obsolescence in the age of Google, we might just save ourselves from our exilic tendencies.

At the time when that crumbling democracy was transitioning from an oral to a written tradition, Aristotle’s intellectual grandfather, a stonemason by some accounts and a gadfly by all, was found guilty of seducing the young. Before he approached the hemlock, the man forbade the next generation from mourning; instead, he asked them in no uncertain terms to carry on the conversation. We have this in writing: Socrates is dead, long live Socrates!

…and then the Herokulypse

[Cross-posted on the Scholars' Lab blog]

After two and some years hanging around the Scholars’ Lab and earning my badges in the DH community, I finally learned a lesson that should be required learning for all new-comers: plumbing is real. I mean, I was more or less aware of its existence, brief-sightings, a shudder here and there from a ghostly presence. Problem is, I’ve been focusing on the flashy, large, important, big, fancy, loud, loud, loud uses of already-made tools or those tools I dream of, five-million dollars and the-rest-of-your-life tools. You know: The shiny stuff.

For the past couple of weeks, I have been working instead on the small stuff that needed to be done to roll Prism into production. Enter the plumbing. What I thought would be a series of small tasks turned out to be a major time vacuum. At issue was getting Heroku to play nice with what we had built in the development branch. The first two weeks, Heroku would not even display our site. A series of ‘Application Error’ messages was all I got. The culprits, in no particular order: the Asset Pipeline, Devise and Jasmine. Eventually, with help from above (i.e. E. Rochester and W. Graham), we got the site running …and then the Herokulypse.

Once in a while a bug comes, so uncanny, so daunting, that it makes you want to become a novelist. That was the Herokulypse. I obsessed about it for three days at the expense of my dissertation and everything else, with no results. The great obi-wayne-kenobot finally found the problem. To my relief I was on the right track trying to solve it. I just didn’t figure out the part about disabling page caching on the pages controller. Live and learn, and learn I did: Plumbing is real.

I found the lesson timely at a moment when we are debating the obstacles and affordances of coding for digital humanities. The experience with the Herokulypse really brought home for me the idea that code is labor, and that the digital humanities really puts pressure on our notions of leisure, labor and power. I am still working out these issues –issues which all my predecessors seem to have encountered in one way or another– and will be sure to report back to the public when I have more insights.

In the meantime, I won’t ask you to be careful of what you wish for. On the contrary, I will encourage you to scurry down the rabbit hole of code, that you may never think yourself superior to anyone who leans on the side of hack over yack.

Day of DH

Around here we usually wake up around 6:30 am to Henry’s complaints. He asks to be let out of the room, but has not learned how to open the door himself. I’m sitting on the couch in the living room with half an eye closed. Henry stares at me with an impish face covered with ink. In a father-son compromise meant to give me the time to write this blogpost, I gave him a rare treat: pen and paper.

Fail…

Thus begins my Day of DH 2012. Read more….

on sequences, noise and Juxta.

[Cross-posted at the NINES blog]

If you mention tokens and strings to a textual scholar, do not be surprised to receive a polite reprimand in response. Most consider the vocabulary inherited from computer science undeserving of the rich realities of the texts they hold hear, and with good reason. We have already endured an uphill battle against older, more insidious forms of abstraction that would have us believe texts are written in the heavens with capital T’s. At the same time, a growing number of well intentioned scholars are content to use digital tools that manipulate texts precisely at this level, without asking too many impertinent questions about the black-box processes that give them handy results. The classic example of this disavowal is the black-boxed use of Google, which I venture has become a staple of scholarship everywhere. In a sense, we are all forced in one way or another to rely on black-boxing. Slowly but surely, we realize we are in a world of “black-boxes all the way down.”

Despite the need to black-box some areas of our workflow in order to move along, we ignore some black-boxes at our own peril. Not only do we risk transferring agency to a half-understood process, we may miss key insights on our own scholarly procedures. After a few years of using the Juxta tool to help me collate the fascinating mutations of Aimé Césaire’s Et les chiens se taisaient, I finally took a peek inside the internal processes of the software. In my defense, I really wasn’t ready to look behind the curtain before dusting off my math skills, learning some of the basic vocabulary of computer science and acquiring basic code literacy. Although I still feel there is much more to understand, I have seen enough to know that I can never look at comparisons or textuality the same way again.

Perhaps a bit of background would be in order. As I pointed out in an earlier post, Juxta cannot handle the Césaire texts adequately unless you break them into smaller chunks. The main problem was and is the large amount of transpositions between one version and the next of Et les chiens se taisaient. I did the work of cataloguing and diagramming the many ‘moves’ by hand, using Juxta to compare each block of text internally. Earlier this year I started having doubts about my ability to capture all matching blocks between one version and the next, especially those comparisons that revealed upwards of 70 moves! I noticed that Juxta caught some matches, so I tried a small experiment.

I took my working text of the typescript (TS) and the editio princeps (EP) and processed them whole through Juxta. I carefully used the results to remove from the original TS and EP files all the matches caught by the first run. Once I had removed every match from both files, I ran those smaller files. Of course, the next set of matches was different than the first. I went ahead and carefully removed those matches from the files. This process continued for 10 or 11 runs, until I eventually had two tiny files with text I was reasonably certain was mutually exclusive. I was so excited, I even made a screencast explaining the process. (Later I realized that this method does not guarantee 100% accuracy, but I’m getting ahead of myself).

The experiment proved that I had indeed missed some matches, despite months of working with these texts the traditional way. I was very satisfied with my ingenuity, but I still didn’t understand why Juxta matched some things and not others. While I was thus occupied, a team of friends and colleagues were beginning to see some positive results porting the output of SuperFastMatch to the Juxta API. My dream of having my texts represented using the powerful Juxta visualization suite was getting so close I could taste it. But… I understood even less about SuperFastMatch than I did about diff. Enough was enough.

I had the faint notion that Juxta used a modified version of the diff utility, so I started my research there. Apparently, the diff family builds on a solution to the longest common subsequence problem. What Juxta was catching as a match in every run of my experiment was indeed a longest common subsequence. Here is where a hundred questions, questions I would’ve never thought to ask had I stayed outside the box, took center stage in my research: What does it mean that a complex comparison set has several levels of overlapping subsequences? What do these levels tell us about textual sequence in general? What’s the relationship between these sequences and the process by which a text is actually rearranged from one version to the next by human agency?

String matching 101: The longest common subsequence of any two strings compared to each other is that set of tokens that follow each other in the same linear order in both strings, despite any intervening tokens. In the case of Juxta, which seems to be running a wdiff flavor of the Google diff tools, the tokens in question are words. For example, given the following two strings, where each token is represented by a letter of the alphabet: 1) ABCXDEYFZ, and 2) ABMCDXYZEFN, the sequence ABCDEF can be said to be the longest common subsequence. If we ran this example through Juxta, M, N, X, Y and Z would be highlighted in green, while the longest common subsequence would remain unformatted. This is the principal method by which Juxta can claim to mark difference. As long as you work with simple texts, texts in which there is one clearly recognizable longest common subsequence with minor interruptions, this technique can be very effective. On the other hand, texts with many transpositions ‘break’ because mutually-exclusive large subsequences intersect eachother. Realizing the reason for Juxta’s limitations, I couldn’t help but think that textual scholars have also been operating using a human version of the diff, assuming a long stable sequence against which differences move about.

A few weeks ago, I also started thinking about the possibility of automating my experiment by writing a script to do what I was doing ‘by hand.’ I baptized my method Poor Man’s String Matching, but it could more appropriately be called an iterative diff. Once I set out to do the work of recognizing and stashing sequences programmatically, I started seeing the problems with my solution. Though these problems are not insurmountable, they reveal an enormous amount about our assumptions.

The two main problems are handling ‘noise’ and defining what counts as a coherent textual block. The latter is too difficult a problem to cover in a blogpost, but it is important enough that I dedicate a chapter of my dissertation, endearingly called “Legology,” to solving it. I take Noise to be those little isolated fragments, usually single words, that are part of the longest common subsequence, but which cannot be said to belong to a textual block. Here is where the humanist parts way with the computer scientist or the mathematician. For these two, even a value of zero can be counted as a sequence! Although there can be isolated tokens that could interest a scholar comparing two texts (rare words or proper names, for example), we are more often than not going to worry about two or more concatenated words, and we would certainly not call anything less a sequence. At least, I wouldn’t.

Juxta and noiseNoise can be of two kinds: The noise that happens outside of the blocks of interest, and the noise within blocks of interest. In Juxta these can be seen as white fragments in a sea of green, and green fragments in a sea of white. These are very different creatures and also need to be dealt separately. Noise can lead to small errors if we were to run a straight iterative diff, eliminating every longest common subsequence in each iteration. The errors come from the probability that a word caught in a sequence belongs to a smaller intersecting common subsequence.

To understand this properly, imagine we compare the results of the first diff run to the results of a human being who only matches blocks of text that are clearly matches. The human’s results would not be exactly the longest common subsequence, but they would definitely be more useful. Since we are interested in blocks, chances are letting the computer net everything would probably lead to the accidental disintegration of smaller blocks of interest.

Just as I learned some odd lessons about the role of sequence in comparison sets by studying the longest common subsequence problem, I also found some unexpected lessons about textuality from trying to solve the noise problem. If you’re interested in my solution, I invite you to read my dissertation when it comes out. In the meantime, I encourage the textual scholars who are reading this to try to solve these problems on their own, to engage with the procedures that make our machines tick, and to do it without taking off their humanities hats.

If we don’t learn how to think with our machines, what choice will they have but to think for us?

Derri(co)da

[A slightly modified version of this post was originally posted as a code-critique for the Critical Code Studies Working Group 2012]

Language: English and French.

Prior to anything else, I wanted to thank the organizers for inviting me and inspiring such an engaging debate.
I apologize for my late entry. Except for a premature comment on week 2, I have been just observing and absorbing, never sure when was the right time to join the dance. My hesitation comes as much from a noob-complex as from the nature of my intervention. In brief, I want to explore the ways in which critical discourse in general and literary criticism in particular are already procedural, and what it would mean to write code to express and critique natural language discourse. The can of worms I feel I am opening has been opened before in many different contexts, going as far back as Aristotle in my estimation. We could justify my intervention by claiming that any code we could generate to express or critique natural language discourse can itself be critiqued back from a CCS point of view. The process looks something like this:

Human Discourse –> Analogical Code –> Code Critique

The example texts that I want to use themselves generate a further mirroring that might throw this half-blind enterprise into the proverbial mise-en-abyme. We play at the edge of the cliff at our own pleasure! The example texts have been collected under the title, Ulysse gramophone/Deux mots pour Joyce by Jacques Derrida, and published by Éditions Galilée in 1987. If I am not mistaken, the texts are available in translation in Derek Attridge’s compilation Acts of Literature. As far as I know, they are the first sustained attack by Derrida on what was then (1980s) called humanities computing, though that attack was an extension of his life-long agonism with formal logic. It also happens to be the first time Derrida engages with Joyce directly. In these studies, both originally delivered as talks, Derrida makes several moves that made him an irresistible target for my meditations on discourse and procedure.

Let me start by making the most dangerous move and offer a brief <ul> of Derrida’s relevant points for those who are not familiar with these texts:
  • Derrida claims that Joyce is a “logiciel” (~software), a “joyciciel” (22-23) that reduces our computers to “un jouet d’enfant préhistorique” (a prehistorical child’s toy). Granted, he made his claims before the advent of the internet, but his comments were mostly directed at the logic part oflogiciel, so … 
  • According to Derrida, the main power of the joiciciel lies in its ability to predict the scholarly moves of generations of Joyce scholars to come.
  • Derrida claims that we cannot exhaust the identities held in potentia by the portmanteaus and puns in Joyce, nor can we reduce any word deployed by Joyce to an identity in the first place. To prove his point he uses the lines “He War” from Finnegans Wake (FW) in Deux mots pour Joyce (DMPJ) and the word “Yes” from Ulysses (U) in Ulysse gramophone (UG). Notice for example how “He War” could point to many different languages to generate “He was,” “He was war,” “He who was is war,” plus it suggests many homophones “hear,” “ar,” “ear,” usw. In the case of “Yes” Derrida points us to the many different contexts, as he is wont to do, to show how unstable that little word can be. My favorite one is the example of the “Oui?” that the French use when they answer the phone to say our “Hello” or as Derrida would have it, “Yes, I am here.” 
  • Finally, Derrida claims that the whole of the academic enterprise is itself a “computer de toute la memoire” (a computer of all memory), whose main goal has been to “programmer pendant des siècles la totalité des recherches dans le champ onto-logico-encyclopédique — tout en commémorant sa propre signature,” (to program for centuries the totality of research in the onto-logico-encyclopedic field — all the while celebrating its own signature) (97). In this regard, he contends that experts are “pre-programmed” by their research questions, especially by those limits we impose on what counts as a valid intervention or not. Funny, that he reduces others to procedural approaches while sparing himself and Joyce!
As you can see, this is begging to be addressed. I feel like I can probably write pages of critical prose in response, but I thought that a more appropriate response would be in the form of what I call ‘useless’ code, one that exploits the “extra-functional significance,” that many of you wish to derive from perfectly useful code. This whim is both a goad to push me to deepen my code ‘competency,’ as @samplereality would have it, and the cheekiest revenge on such a Gargantuan critique of computational methods. Consider this also to be my call for a more able procedularist to help me answer Derrida’s (and apparently Joyce’s) challenge.

Here is one of the tentative avenues by which I think we can approach this hydra, brought to you in an appropriately unnecessary <ol>: 

  1. We can attempt to code Derrida’s own scholarly methods:
    He begins his study of the word “yes” in Ulysses by counting 222 occurrences… by hand. (74) He immediately follows with a playful footnote where he quotes another scholar citing 354 occurrences. Counting words is trivial, and Derrida only does so to complicate it immediately by pointing out that the other scholar also noted that the Irish ‘ay’ should be counted as well. Derrida will then spend countless pages showing how a) ‘yes’ can be said without saying it and by other means (including the word ‘no’!); and, b) that saying ‘yes’ itself says many different things. The point is, of course, that we can never exhaust the possibilities computationally. It is here that we agree with Derrida, but rewrite his ‘proof’ using useless code, one rhetorical move at a time. Take for example the following line from Derrida: “Yes ne peut donc être, dans Ulysse, qu’une marque à la fois parlée et écrite, vocalisée comme graphème et écrite comme phonème, oui, en un mot gramophoné” (Yes has no other choice but to be, in Ulysses, both spoken and written, vocalized as grapheme and written as phoneme –yes, in a word, gramophoned) (75). Well here’s a (naive) tiny code-critique:

    Required: CMUdict,
    Language: pseudo-Ruby
    [ruby]Class Word
      def tokenize_written
        #tokenize words in your fave edition of Ulysses
        …
      end
      def tokenize_spoken
        #load up the CMUdict or equiv.
        …
      end
      def find_match
        #initialize tokenize_written and tokenize_spoken
    #find possible matches between words and phonemic
    # transliterations
        …
      end
    end
           
    derrida_yes = Word.new (‘yes’)
    if derrida_yes.find_match == true
      puts="Derrida was right after all!"
    else
        puts="Bunk!"
    end[/ruby]


I close by inviting you to think of other ways in which we can use code to re-express critical discourse and in which procedural thinking can be used as an analogy for specific rhetorical/scholarly gestures. Could it be useful, say to re-write by other means the history of criticism? Am I wrong, or just wrong-headed?

[...]
Category: Digital Humanities

other ways of doing it

Colleagues in North America have a lot to learn about the possibilities of the digital world from my colleagues down in the global south. Sometimes, when we feel we know a tool well, someone comes along and uses it in a strange and exciting new way. I have experienced this in my own work, where I have discovered some uses off the beaten path for the Juxta collation tool. Thinking about the different global approaches to new media, I was struck in particular by the way Dominican intellectuals and artist used Facebook as a full-blown publishing platform. I’ve been told that this might be a Caribbean-wide phenomenon. Sadly, Facebook limits you to a bubble of ‘friends’ that makes it difficult for me to survey too far afield. Of course, I am limited also because my approach to ‘friending’ is radically different than the one being used by those publishing work on Facebook. While I have to excuse each acceptance based on some personal narrative –which privileges the random acquaintance at the coffee shop over those interested in my work–, my counterparts amass followings based on interest.

Virgin Islands, 1941

There are two kinds of publishing practices I can distinguish from my limited POV. One is exemplified by the work of one of Dominican Republic’s foremost living thinkers and poets, and one of my first mentors: Armando Almanzar Botello. He has disassembled his book of poetry, Cazadores de Agua, and re-mediated each of the poems on Facebook and 2 blogs (I still don’t know why he has two blogs). He has re-published each of the poems in their entirety several times on Facebook using the note feature, effectively recycling the poems every so often. I have never seen a similar publishing rhythm, and under such constraints. Paradoxically, at the same time that he limits his audience to his ‘friends,’ he has never reached a larger audience (as far as I can tell).  The poems look clunky on Facebook to say the least, but they are read and commented on by a large group of interested readers. In that sense, the community he has built around his work using Facebook is not that much different than the small communities DHers in the north build around their public work in more open venues.

The same can be said of the second kind of publisher, exemplified by senior Dominican historian Frank Peña. Dr. Peña publishes highly charged polemics on his page, ranging from 1000 to 2000 words, also using the note feature. The pieces are well documented and written in unimpeachable Spanish. He usually draws 50+ comments on these pieces, even several days after they are published. In contrast to Armando, Dr. Peña is publishing original material of the sort one would associate with a political blogger. He is not the only one using Facebook as a blog. If I had to venture a guess, I would say the practice is born out of Facebook’s ease of use, as opposed to even the most user-friendly blogging platforms. Or, we could say that these intellectuals have found a vital way of building community in a way that can reach that ever-elusive anonymous public of public humanities. Although I remain critical of locking the content within a bubble of followers and the unsearchable abyss of the Social Network, I do have to admit the intellectual communities bubbling up around these writers are vibrant, relevant and anything but ephemeral.

Category: Digital Humanities

Praxis, MLA 2012 and timeliness

[Cross-posted at the Scholars' Blog]

I’m finally settling back into my C’ville routine. My last stop this winter break was the MLA convention in Seattle. Like many of my colleagues, I also felt that “the MLA’s heart (like a post-holiday Grinch) grew at least three sizes over the four days of the 2012 conference.” While last year echoed a prominent informer‘s assessment that DH was “the next big thing” with anxiety, this year felt more like “Hey, I like that. How do I do it?” This was especially a good year for those in the business of rethinking the future of graduate methods training (ahem, ahem) and of graduate futures in general. Needless to say, I felt really great about being part of the first cohort of Praxis.

Saturday evening I had a chance to catch up with one of my early undergraduate mentors. He had questions. He wanted to know what I knew about the DH world. I’m sure half of his curiosity came out of an earnest desire to hear the tale of my travels. The other half was a shrewd (and responsible) move to build a vocabulary for conversations his department will inevitably have this year with the dean, other departments, the library, etc: Can an isolated DHer work well with limited resources? Do you need a center? How do you get graduate students involved? Our conversation went on for a good three hours and it was very rewarding to offer a candid assessment of the field from where I’m standing.

I also realized that where I’m standing is what in battle we would call higher ground. I don’t mean the privilege of hobnobbing with the enormous DH talent we have on grounds. Nothing, of the sort. Although projects are a whole different affair, you could develop decent DH skills and ideas were you connected from Pie Town. I mean the privilege of seeing graduate methods transformation first-hand. I agree with Brooke that there is a continuum that links us to analog models in the department (at least at UVa). But the continuum does eventually lead to new ground.

What I’ve seen, of course, has been well recorded by all the Praxis bloggers. If this is your first time hearing about Praxis and you are interested in the fresh air blowing our way, I encourage you to read more…

towards a geo-textual humanities

[Reposted from the Scholars' Lab blog]

Maps are texts, and texts are maps.

At the beginning of the movie The English Patient, as Márta Sebestyén’s “Szerelem, szerelem” overcomes our senses, a paintbrush traces the figure of human swimmers on a yellowing page. The black-ink soon gives way to a skin-colored desert landscape sifting beneath our aerial view, evoking hands moving over human curves. Skin, page, territory all united by the theme of lost love. I can’t think of a better image to describe how we are wedded to the n-dimensions of the textual condition.

As we get ready to think about design I wanted to outline a few of the ways we can abstract the material reality of print to a totality of 1′s and 0′s. In my own work I have been trying to create a digital edition of Aimé Césaire’s Et les chiens se taisaient that is both pleasant to read and that allows for some algorithmic manipulation of the textual territory. My goals lead me to seek the chimera of html forgeries as opposed to the classic images with texts beneath them. The experience taught me an enormous deal about the process of remediation.

There are many ways we can remap texts online. We can have a simple image. We can have text behind that image, like your typical PDF. We can map out the position of text and white space on that image by overlaying a basic Cartesian x and y grid on top (or is it below?). We can name areas on that grid like land-grabbers use contracts to justify their fences. We can query  the areas, we can query the points, we can query the text. We can overlap those areas, like the map of Aztlán tensely overlaps with the map of the United States, like our Prism diffracts difference. We can create replicas from scratch using HTML, using Canvas, and trade grain for the possibility of playful deformation and a digital audience born into cool media. We can standardize our geo-textual mark-up, make a TEI out of HTML/CSS, opening the door for large scale analysis  of page design in book-history. Heck, we can just put our UTF-8 txt’s out there and just sit back and wait for our computer overlords to tell us that the eternal present of spotless text was all we ever needed. Lord knows, most literary scholars haven’t done better than that. (I will rebel against that last possibility the way I rebel against propaganda, the way I rebel against the early Wittgenstein, who wanted to get rid of love because we couldn’t fit it in just one map).

Let us move towards a geo-textual humanities conscious there are swimmers in the desert of the page.

(to be continued…)

Category: Digital Humanities

Mimesis and Computers

[Reposted from the Scholars' Lab blog]

“Computers are inherently dumb.” I hear this all the time, even from folks in computer science. I like to think of them as marionettes.

After Wagner called for a Gesamtkunstwerk, many European artists and thinkers reacted strongly to it (Nietzsche being the most famous case). This reaction eventually led to a modernist distrust of theater in general, and of human actors in particular. Think for example of Bertolt Brecht’s Verfremdungseffekt. Somewhere in between Wagner and Brecht, the English artist Edward Gordon Craig suggested that human actors should be replaced by marionettes. As you can imagine, this did not go well with the actor’s guild.

I hear echoes of those debates and cultural shifts in our moment, when computers are starting to resemble us more and more. Computers don’t replace us always in the way that machines replaced farmers or smiths, although there are still parallels between ours and the anxieties of the industrial and agricultural revolution. And just like machines then generated monstrous forms of mechanized human labor, computers do the same (If you don’t believe me, ask any of my students for Project Tango). However, there is another anxiety I see which is not necessarily that of the repetitive tasks of machines replacing familiar mechanical tasks with unforeseen ones. I’m talking about our fear of marionettes. Even more specific, the fear that we will confuse the marionettes for human beings.

Quixote fights the puppets

Quixote fights the puppets

The true marionette is always controlled by a human, so are computers… ultimately. We ventriloquise through them, and they only talk back to us according to our ridiculously precise instructions. I’m not talking about Bina48. She’s kind of creepy. I’m talking about the ways in which a google search acts like an operator at the end of a 411 call; or the way that netflix suggests what we might like. There are two approaches to figuring out what counts as a title in a large repository: we can tag it, or we can write an algorithm that does it for us. Don’t worry, we’re not there yet completely. At some point that meta-data might pass the Turing test. When it does, by definition, users will think a human did the work… but wait a second.

Prism is not really interested in how humans might be fooled by the marionettes more than it is in how we can fool the marionettes to behave like us. Sometimes that line is blurred. The ‘text mining’ component, as I have understood it, seems like the bastard child of natural language processing and web crawling. The goal here is not to count words (although that is a time-honored human activity), but to abstract semantic relationships that can be used to query large data. When we Google something, we are doing something akin to that, except we never think Google is run by a million efficient munchkins. When we get results for our perhaps-to-be Prism queries, we use those results in public at our own risk. That there will always be Quixotes in the audience… well…

Take home tweet: Even if we replace actors with marionettes, the plot stays the same.