Process and Processing: Methods for Discovery and Representation in Data

This semester I am teaching a graduate seminar, Techniques in Information Visualization, and although my classes never enroll only cinema students, this one includes graduate students from five disciplines, at both the MA and PhD levels: English, Education, Journalism, Public Planning & Development, and Architecture. This diversity makes class both extremely rich but challenging to plan and lead. In short, it helps all of us get outside of our comfort zones. The lack of a shared vocabulary, for instance, means jargon must be reexamined and either justified or abandoned.

So what does this have to do with data and its management? Well, there is no better topic for the type of defamiliarization inherent in a class like this than that of information visualization. As numerous posts in this carnival have pointed out, thinking differently about what constitutes a datum and how it leads to information is incredibly important. This is true in all fields. Humanities researchers frequently feel their work has little if any data, while many “hard” and social scientists feel that the inclusion of a survey mechanism adds a statistical element to a project, as though self-reporting is transparent and always accurate.

After deciding what the dataset will be, I find there are two main uses for information visualization: discovery and representation. The very act of visualizing complex datasets can be illuminating—it is often the only way to see connections and trends among results. But once these insights are gleaned, the visualization needs to be “cleaned up” so to speak, in order to emphasize the insights to be conveyed. Sometimes this means excluding outliers and sometimes this means simplifying certain aspects, but it should always be a rhetorically savvy act, one with intentionality. Both activities—the discovery and the representation—are extremely useful ones in contemporary culture. As more of the world becomes data driven, we must interrogate the basis of any dataset, as well as its representation. To this end, I will briefly discuss a tool and a method for visualizing data which can defamiliarize it in the process.

The first is a tool, but it’s actually so much more: Processing. As the Processing site notes, it is “a programming language, development environment, and online community.” Processing lives at the intersection of math and the visual arts, rendering and displaying data dynamically. Perhaps one of the most prominent projects is We Feel Fine by Jonathan Harris and Sep Kamvar which scours the internet every ten minutes looking for blog posts that have the phrase “I feel” and “I am feeling” and then displays this data dynamically, making it visually quite stunning. Processing has been used for rapid prototyping in addition to full scale projects and there are numerous examples on its site. It is free and open source with excellent documentation and some great tutorials. There is also a Processing.js extension for flexibility and HTML5 integration. We have been slowly rolling it out in our curriculum (the Media Arts + Practice Division of the School of Cinematic Arts) and it’s been exciting to play with the possibilities. At the very least, Processing can foster algorithmic literacy and allows one to dive in with very little coding. In so doing, it can expand the possibilities for working with both data and code.

Similarly, the method I turn to can also function as a bridge between the sciences and the humanities. My current research, the Large Scale Video Analytics (LSVA) project, brings the power of supercomputing to bear on massive video databases (this includes digitized films as well as natively digital video). Not only are most image-based repositories incompletely tagged, there is a loss inherent in any transfer between one semiotic register (image) and another (word). As such, my team is attempting to enhance machine-read image queries by deploying several of them in a single search across thousands of videos, while also allowing crowd-sourced tagging.

One of the team, Dave Bock, is a visualization expert at the NCSA (National Center for Supercomputing Applications) and he normally works with chemists and physicists visualizing their data. In talking about the LSVA, I often use the mantra “Video is the big data issue of our time,” thought it’s remained more of an abstraction to me: a concept that seemed as though it would help the supercomputing people understand the importance of this work. But Dave began treating the moving images as actual data and began visualizing it using the same methods he employs to visualize scientific data.

The early results are amazing in that they have helped us to clearly see similarities and differences in things like color timing, shot angle, edit lengths, et cetera, across videos and across archives without the nagging impulse to focus exclusively on the content of the video. The difficulty with filmic media is that it always seems to be capturing something real, some aspect of the material world. It seems objective, a mechanical rendering of the world. Its constructed nature is often difficult to stay aware of, and yet to film something is to frame it, and to frame is to exclude all else. Moreover, editing footage is also a rhetorically sophisticated act, one that is not ideologically neutral. But treating footage like data de-emphasizes this level of the film, and we can begin to speculate about the cumulative impact of these screens that bombard us daily. Also, by establishing a “barcode” of sorts for one film, we can compare it across thousands in a way that viewing simply would not allow: when there is more video produced each day than a human can view in a lifetime, we simply must find different research methodologies.

As with Processing, the benefits of this approach are not immediately, nor straightforwardly apparent, though the benefits are many. To visualize image-based media, and to spatialize time-based media is a bit trippy, but I am confidant it will be the source of new insights and help us form more sophisticated research questions about video data.

Author

Virginia Kuhn

Associate Professor, School of Cinematic Arts, University of Southern California.
View all posts

Author

1 Comment