Pre-Foundations of Digital Methods
Notes from the DMI Summerschool Certificate Program Day 1 by Erik Borra, Anne Helmond, Sabine Niederer, Michael Stevenson and Esther Weltevrede.
An introductory lecture to digital methods by Prof. Dr. Richard Rogers.
Internet ‘qualities’ that pre-configure research (historical issues)
- Internet researcher is internet user: Every internet research also an internet user (own informant) and lacks naiveté. Rare that researchers are also users of what they research.
- Internet as virtual: Internet as virtual, research into the ‘virtual’ and ‘virtual research environments’ virtual in three senses at the same time. Richard will try to put the virtual to rest once and for all this morning.
- Internet as lacking reliability: (quality of information online)
- Industry methods: Methods of the medium, those that are built-in. Often Industry (audience marketing) ‘methods’ traditionally dominant (hit counts, trends). Joseph Turow direct marketing (niche, not mass). How methods are done online.
1. Everyone an internet researcher
Internet users’ capacities as re/searchers. “Much internet research is itself motivated by scholars “discovering email, usenet, and so on. There is nothing wrong with that. This is how it should be. However, we should be cautious about overlapping method and experience. ” Steve Jones (1999), Studying the Net: Intracies and issues”. Doing Internet Studies.
First pre-foundation: Where does your own experience end and research begins? Where does your own experience end and method begin?
Other side: Internet research requires you to be geeky, tech savvy?
2. The virtual
Second pre-foundation: Notion of the virtual. Cyberspace: Google Images query shows that the associations are a mysterious network spatiality. The images show routes through network spatialities. The study of cyberspace had to do with this particular routes through network space.
The legacy of cyberspace. Terminological persistance of cyberspace (in security studies discourse).
“the National Strategy to Secure Cyberspace may be read as yet more evidence of cyberspace’s demise – if not through dismissal, then by depriving cyberspace of its revolutionary connotations to the point that it is no longer recognizable. Where the EFF and others argued for a ‘stateless’ cyberspace, the U.S. government now envisions ‘A Nation in Cyberspace.’” (Stevenson, 2009)
Virtual image query: cybersex, virtual world/virtual reality being distinctive, HMD.

The continued use of “the virtual”: Apologists, or field-builders?
- That which is not actually existing (virtual reality)
- Between the actual (meaning now, as in French, Dutch, German. In English it does not mean ‘now’) and the real (disembodied, time-shifted or space-shifted Existing effect (real-world effects)
- Existing in effect (or in effect only) (real-world effects, consequences of virtual actions having real effect)
- Simulated (“artificial”) (virtual world)
- Representation (not the real thing) (as another way to oppose the real: is an issue crawler map a representation of the network or is it the network itself?)
- Counterfeit (fake) (hocus pocus)
—— - Related to online research (virtual ethnography; virtual methods; virtual research environment).
Why is it that the ‘virtual’ is related to online research? What is the link? What do you mean with ‘virtual’? Especially in science.
3 Virtuals or 3 ways to seeing the web. Periodizing the web’s study.
- Web as Cyberspace (1994-2000) Virtual as distinct from the real. Cyberstudies. Ontological distinction.
- Web as Virtual Society? (2000-2007) Virtual is part of the real. Virtual methods
- Web as Virtual? Society (2007-) ‘Virtual’ as indication of the real. Digital methods
Use online data and ground claims offline, or make ‘online grounded’ claims? In Julian Dibell’s – Play Money, virtual worlds have been domesticated, become ‘normal.’
3. Historical Problem with Web Data
Cyberspace and the virtual: why the internet has a serious reputation problem. Web data’s incapacity to stand alone.
- One issue is the messiness of Web data and the need for data cleansing heuristics. The uncontrolled Web creates numerous problems in the interpretation of results.
- Indeed a skeptical researcher could claim the obstacles are so great that all Web analyses lack value.
- One response to this is to demonstrate that Web data correlate significantly with some non_Web data in order to prove that the Web data are not wholly random.
Thelwall et al. Webometrics. Annual Review of Information Science and Technology (2005) vol. 39 (1) pp. 81-135
DMOZ and Yahoo Directory were editors but have been taken over by the algorithm. Because it’s uncontrolled it’s messy and it needs to be cleaned. The Internet needs a cleaning company.
It is a medium of self-publication.
What are the conditions of proof? How do you validate the data? Correlate it with non-Web data: Google Flu Trends compare it with CDC data, the clearing house for medical data.
4. Virtual (digitized) Methods
Digitized methods
- Online surveys – Finding the list to mail them to — Problematic question: What is the response rate? You don’t know. Is that important? Virtual methods weaken the status of web data. Instead of results you get indicators. They have been moved online without an understanding of the online.
- Online interviews – Become written exchanges by email
- (Online) observation – Watching users over the shoulder
- Online samples – Become difficult. Also issue exhaustiveness
- Online investigative reporting – Fact-checking by phone/surfing?
The order of checking is challenged: “Actually, I think I have enough keywords now to consult Google.”
Clean data has to do with the expanse and scope of the data. Has the cleaning company come through?
Differences in Web data: Ones that are complete and ones that are from a single space. Noshir Contractor wouldn’t touch any data unless it was complete / exhaustive, otherwise it was too messy.
Samples are a serious issue. How do you know whether it is good/complete enough? What is a reasonable way to collect a list of Dutch/Argentinian blogs? How to do that? It’s a sampling issue, and if it needs to be exhaustive, good luck! The internet resists exhaustiveness (by its very nature?)
Digital Methods
Distinction between methods that migrate to the medium and those ‘native’ to it.
- General Digital Methods Protocol – Follow the Medium
- Which objects are available? (Links, tags, threads, timestamp)
- How do dominant devices and platforms handle them?
- How to learn from and repurpose medium methods (Mash-up)
- Are findings grounded in the online? Is the online the baseline?
The online as a baseline example: What’s Cooking on Thanksgiving – Interactive Graphic – NYTimes.com
Digital Methods challenge: The extent to which the web hereby becomes a completely new set of serious data and a new site for very serious analysis.



