Innovative use of crowdsourcing technology presents novel prospects for research to interact with much larger audiences, and much more effectively than ever before
Posted: August 25, 2011 Filed under: crowdsourcing | Tags: crowdsourcing, galaxy zoo, impact Leave a comment »(Originally published in LSE Impact Blog, 25 August 2011)
In the push to make clear and unquestionable links between research and its effects on society, academics with seemingly esoteric projects might struggle to make their work accessible and interesting to the public. But projects centring on Scots language dictionaries, tattered Greek papyri and Bentham’s philosophy of utilitarianism have all made the jump through innovative use of crowdsourcing. A growing number of projects, such as Ancient Lives, Transcribe Bentham, Old Weather and Scots Words and Places, are making sophisticated use of the web to actively engage the general public as contributors to their research.
Old Weather, for example, invites the general public to transcribe naval logs, thus providing crucial meteorological data for climate scientists, as well as opening up sources for the history of the British navy. Transcribe Bentham works with a range of groups, in particular schools, to decipher the numerous papers of Jeremy Bentham. For such projects, securing user contributions is about much more than impact. They provide a venue for communities outside academia to play a meaningful role within university research, providing insight and knowledge, saving time, and facilitating the route towards high-quality outputs.
It is worth remembering that crowdsourcing predates the digital era; the Oxford English Dictionary was initially built on contributions from volunteers and there is a long tradition of active contributions from the public within many fields of the social sciences.
But the development of crowdsourcing on the internet has rapidly accelerated the sophistication of its methodologies. Recent projects have been particularly adept at using social media, developing refined mechanisms for ensuring that contributions are quality assured, working with large data sets, and creating interfaces that interact in a way that reduces complexity and confusion.
These developments mean that there are suddenly novel prospects for future projects to interact with much larger audiences than previously, and to do so in a much more effective manner.
Of course, there are plenty of research projects that do not lend themselves to this kind of public engagement whatsoever. That’s fine. But for other projects, even those that could seem recondite in nature, there are opportunities to explore.
So as crowdsourcing advances, a vital factor will be the sensitivity with which the needs and motivations of those taking part are understood. If the research community engages the public in a utilitarian sense, as just cogs in a larger research wheel, then the whole methodology will become imperilled. Understanding what moves an inhabitant of a specific community, a child in the schoolroom, or the ‘silver surfer’ with a new internet connection, and making sure their input is suitably recognised is crucial.
Engagement, as Chris Batt pointed out in his report on the topic, must be a two-way conversation “knowledge co-creation and exchange rather than simply knowledge transfer: a dialogue which enriches knowledge for mutual benefit.”
The task of the University of Oxford’s RunCoCo team was to develop guidelines for projects wishing to develop digitised collections by asking the public to upload their own content or adding information to existing resources, as happened with the highly successful Great War Archive. Equally, the Citizen Science Alliance is working according to firm principles on how to interact with their users, as articulated in Arfon Smith’s podcast on the success of the Galaxy Zoo project. Indeed, the Alliance is now looking for other researchers with whom to work with and is requesting proposals for ideas.
If crowdsourcing is to continue to be embedded in research, then it is the principles and thinking drawn from RunCoCo or the Citizen Science Alliance that need to be adopted, adapted and implemented. There is a wealth of UK research that can be enhanced by the involvement of a engaged, knowledgeable and passionate UK public.
Crowdsourcing and Variant Digital Editions – some troubles ahead
Posted: July 18, 2011 Filed under: crowdsourcing | Tags: crowdsourcing Leave a comment »(This blog first published on JISC Digitisation blog, July 2011)
Projects like UCL’s Transcribe Bentham and New York Public Library’s What’s on the Menu? have done groundbreaking work in engaging the public to transcribe their manuscript collections.
Crowdsourcing allows rapid, and it seems high-quality, creation of transcribed data from original documents. Transcribe Bentham has so far created 1,330 transcribed versions, and only a handful have been rejected for a lack of quality. Previously, such scholarly transcription would have taken considerable time and effort, spanning many years.
With notable successes like these, crowdsourcing is now becoming more familiar as an academic tool. But for certain datasets, particularly ones of considerable academic importance, this could bring some problems with crowdsourcing having the ability to create multiple editions.
For example, the much-lauded Early English Books Online (EEBO) and Eighteenth Century Collections Online (ECCO) are now beginning to appear on many different digital platforms.
ProQuest currently hold a licence that allows users to search over the entire EEBO corpus, while Gale-Cengage own the rights to ECCO.
Meanwhile, JISC Collections are planning to release a platform entitled JISC Historic Books, which makes licenced versions of EEBO and ECCO available to UK Higher Education users.
And finally, the Universities of Michigan and Oxford are heading the Text Creation Partnership (TCP), which is methodically working its way through releasing full-text versions of EEBO, ECCO and other resources. These versions are available online, and are also being harvested out to sites like 18th Century Connect.
So this gives us four entry points into ECCO – and it’s not inconceivable that there could be more in the future.
What’s more, there have been some initial discussions about introducing crowdsourcing techniques to some of these licensed versions; allowing permitted users to transcribe and interpret the original historical documents. But of course this crowdsourcing would happen on different platforms with different communities, who may interpret and transcribe the documents in different way.This could lead to the tricky problem of different digital versions of the corpus. Rather than there being one EEBO, several EEBOs exist.
But this is part of a larger problem. If there are multiple versions of the original content, then which one is the one you use? In fact it’s not only about the content. Which platform works quickest? Which gives the most ‘accurate’ search results? Which one provides enhanced tools for analysis? Which gives the best results for your particular area of research? Where do you send your students? Which one do you cite?
Most importantly, which one do you trust? And why?
In ‘traditional scholarship’, different editions of original documents would be published at, for example, 50 year intervals, and it would be part of the scholarly workflow to review and criticise such editions. The complexity and proliferation of digital resources radically changes this – not only are there more digital resources but the knowledge and skills needed to critically analyse a resource are considerably widened out.
At the moment, there are no immediate solutions for these challenges. But it’s clear that the potential of the Internet continues to fracture existing practices of scholarship – despite the care, attention, and research intelligence that has gone into creating EEBO, ECCO and their various platforms, the potential for academics, funders, publishers to push forward and develop new digital ideas mean that thenotion of the Internet as a place where traditional scholarly practices can simply be repeated continues to disintegrate.
