As part of a survey by the UK’s Arts and Humanities Research Council, I was asked how I made use of Twitter. Here are the response I gave
1. Which social media services do you currently use for professional networking or discussing your research?
Twitter (a lot), Google Plus, Linked In (a little bit)
2. What do you see as the benefits of engaging with social media for you as a researcher?
I am not strictly a researcher, but am involved in a lot of projects realted to digital humanities, digital libraries and information science. Twitter is great as it allows one to to build networks, learn what is going on elsewhere. The latter is really important – I have a much better idea of work being done around the globe in the field of digital humanities; and get garner that basic information much more quickly than other sources (eg. Conferences, papers). Twitter is not a replacement for that latter type of scholarly comms though; it is a supplement.
3. Have you encountered any problems or barriers to using social media in relation to your research work?
You need to be aware of the limits. Twitter is good for starting or maintaining some social connections, but a lot more is needed if you want to have in depth conversations. Also I find that I share viewpoints with people I am in contact with on Twitter; so less direct argument and critique happens (although in my twitter stream there is plenty of critique of third parties that are not part of that group)
4. Do you find it easy to find and connect with other researchers in the arts and humanities fields?
Yes, very easy to connect with people in information science, digital humanities, libraries. But I think this group is well disposed to Twitter in the first place. I am also interested in connecting with art historians but there are very few on Twitter. A community needs a critical mass of numbers to be worthwhile
5. How do you usually find out about other current research projects in your field?
Nearly always via Twitter; but conferences and word of mouth can help provide much more illumination.
Abstract submitted to DH2014 by Alastair Dunning (The European Library)Clemens Neudecker (KB National Library of the Netherlands)
Within the Digital Humanities, there is a long history of debate and discussion as to how texts are accurately represented in digital form. Arguments as to how texts are encoded in both a logical and semantic sense are a recurring feature of past DH conferences.
Yet the intense intellectual focus on the precise details of marking up small corpora or even individual texts has masked the fact that issues related to the representation of large corpora of digitised materials – books, manuscripts, newspapers, records etc. – have been too often ignored. Libraries, archives, museums and other collection institutions have now been digitising corpora of material for many years, but with a very few exceptions, it is still quite rare for an entire run of primary sources to be digitised and made available online.
This means that there are gaps within the digital record. Yet it is unusual for online resources to actively demonstrate these gaps; resources may be advertised as a growing corpus, but when searching through or downloading a digital resources there is rarely any indication of what has not been digitised. This skews the sense of the nature of the collection the scholar is working with and erodes trust.
This problem is compounded by assumptions made by end users that when a search is made in a digital resource, they actually are searching over everything in the original archive. In most cases, this is far from being the case.
This long paper looks at this problem in the context of the Europeana Newspapers project (www.europeana-newspapers.eu), a three year, four million euro project, which is creating full-text for 10m pages of digitised newspapers from 12 libraries across Europe, and also developing an interface to allow for cross searching of over 18m newspaper pages. The final interface, available from the European Library in 2014 (http://www.theeuropeanlibrary.org/), will also provide keyword searching over the OCRd (Optical Character Recognition) text and allow users to compare different newspapers from around Europe published on the same day.
While it is an ambitious project, it is only a drop in the ocean of the overall number of digitised newspapers in Europe (a conservative calculation within the project put the number of digitised newspaper pages in European libraries at 130m ). What appears on the final interface will only be a sample of what actually exists in European libraries.
Moreover, other issues – political, economic, legal and technical – mean that the quality and national distribution of newspapers in the project (and therefore represented in the final online interface) are unevenly balanced. For the resource to be trusted by the academic community, this lack of balance must be acknowledged.
In terms of the economic and legal issues, the project is integrating newspapers from 12 existing newspapers online libraries, each of which have different business models. These different business models affect the final project interface. The National Library of Turkey and the British Library newspapers operate behind a pay wall, for instance – therefore the final Europeana Newspapers site will not be able to directly show images from their collection.
Other libraries are wary of sharing full-resolution images, with the legitimate fear that the users will no longer visit their own national website. In such cases, only fragments of their newspaper images will appear in the central site. Legal issues are also pertinent; some libraries are unsure of the copyright status of some of their historic newspapers and therefore do not want to commit to allowing another entity to publish them.
In addition, there are several technical issues impeding uniform access to the resources. Nearly every digital newspaper collection today contains full-text derived from automatic processing with OCR software. But while some newspaper repositories grant access to the full-text, often the full-text is hidden and only exposed as an index for searching, but not available to the end user for online display or (programmatic) download, or sometimes not even for indexing by Google.
In other cases, full-text is made available, but not for the entirety of the collection, either due to IP issues or because the content holder took a deliberate decision not to show the full-text to the user, often because of the amount of error rate in the OCRd text. Regularly there is no sufficient information provided about the OCR error rate of a particular digital resource, which makes it even harder to assess what amount of the content can realistically be retrieved through a full-text search.
There are also different ways how digital facsimiles are made accessible. Many recent online newspaper portals use the JPEG2000 image file format. The benefit of this is the ability to zoom more or less seamlessly in and out of the digital facsimile. But since JPEG2000 has not been around for a very long time in the digitisation community, many collections that have been digitised in the past are only available in TIF format. This means that zooming can only be provided in a static way on these images, e.g. through different resolution JPEGs. As a result, it is often not possible for researchers to explore these legacy resources in much the same way as they do with recently digitised materials.
In other cases, digital facsimiles have been produced by capturing existing microfilm copies rather than the original source material, thus the digital versions expose artefacts that were not present in the original paper source, but only introduced in the microfilm. However, this type of provenance is most typically not available to end users who are left alone in their interpretation of the differences in resource presentation and functionality.
Finally, the metadata standards used to describe the digital contents also vary. Not only are there different representations in use for encoding full-text such as plain text, ALTO or TEI. But also descriptive metadata is commonly encoded in different standards, and with different degrees of granularity. While standard bibliographic information such as the title or date of publication are commonly available, more specific information on, for example, a particular article or the names of persons or places occurring in it rarely are. Within the Europeana Newspapers project a subset of 2m pages out of the total 10m will be refined further down to the article level, thus enabling more sophisticated search and retrieval functionality than the remaining 8m pages.
A central point of this paper is that these issues are not just issues for librarians; it is not about showcasing how a digital resource is. Rather it is the urgent need to demonstrate how such issues have a profound effect on the academic community’s engagement with online resources.
If a researcher wants to conduct a comparative analysis of newspapers in Chronicling America (the US historic newspaper site), the National Library of France and the British Library, she will have to use three different interfaces with different levels of content and metadata quality. Moreover, she will also have to grasp the particularities of each of these collections with regard to their quality and completeness and what that entails for her research.
This paper will conclude with some recommendations for how those building digital resources can make their content choices more transparent. Informed dialogue between the cultural heritage organisations and the research communities is required. It calls for creators to tear down the illusion of completeness and help persuade end users that many digital resources are fragmentary things, where the representation of absence is just as important as representation of existence.
- For a brief summary of the issue see Julia Flanders, ”Collaboration and dissent: challenges of collaborative standards for digital humanities” in Collaborative Research in the Digital Humanities (eds. Marilyn Deegan, Willard McCarty), 2012. The TEI mailing list provides ample evidence of such discussion http://listserv.brown.edu/archives/cgi-bin/wa?A1=ind1309&L=TEI-L.
- For instance, Johanna Drucker in “Performative Materiality and Theoretical Approaches to Interface” Digital Humanities Quarterly (2013, Volume 7 Number 1) and also “Humanities Approaches to Graphical Display” Digital Humanities Quarterly (2011, Volume 7 Number 1) addresses theoretical concerns relating to the interface but with less focus on its practical representation within online resources. The issue has received much more attention in the world of 3D visualisation, e.g. with the creation of the London Charter (http://www.londoncharter.org/).
- See “History, Digitized (and abridged)” for a summary of the extent of digitisation in 2007. http://www.nytimes.com/2007/03/10/business/yourmoney/11archive.html?pagewanted=all&_r=1&.
- One of the findings in, Reinventing research? Information practices in the humanities, Research Information Network, 2011, http://www.rin.ac.uk/our-work/using-and-accessing-information-resources/information-use-case-studies-humanities.
- Google Generation, David Nicholas, Ian Rowlands, Paul Huntington, 2007 http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/googlegen.aspx.
- Alastair Dunning, European Newspaper Survey Report, 2012, http://www.europeana-newspapers.eu/wp-content/uploads/2012/04/D4.1-Europeana-newspapers-survey-report.pdf.
- For a comparative study of search ranking of digital newspaper repositories see Digital collections: If you build them, will they visit?, Frederick Zarndt et. al., IFLA WLC2013, Newspaper and Genealogy Section, Singapore, http://www.ifla.org/files/assets/newspapers/Singapore_2013_papers/day_1-_01_xzarndt_frederick_et_al_digital_collections.pdf.
- Digitalisierte Zeitungen und OCR: Welche Forschungszugänge erlauben die digitalen Bestände?, Jan Hillgärtner, 18/03/13, http://newsphist.hypotheses.org/23.
- For a study in the methodology and analysis of digitised newspapers vs. paper copies see The Digital Turn. Exploring the methodological possibilities of digital newspaper archives, Bob Nicholson, in Media History Vol. 13, Issue 1 2013, Special issue: Journalism and History: Dialogues.
- For an example of this issue within the Digging into Data projects see One Culture. Computationally Intensive Research in the Humanities and Social Sciences A Report on the Experiences of First Respondents to the Digging Into Data Challenge, Christa Williford and Charles Henry, 2012 http://www.clir.org/pubs/reports/pub151 and also the aforementioned Reinventing research? Information practices in the humanities.
One of the reasons scholars complain about digital resources and the decline of the traditional bookstack is that they lost the joy of serendipity.
That is, the moment when you come across one interesting book that you might never have heard of, while actually searching for another.
I’m not sure if such serendipity is really integral to academic research, or whether it’s just a pleasurable moment that provides an occasionally spark of inspiration.
But it’s interesting to see two new tools online that promise to return digital resources to users in a way that does not have the cold logic of a Google search result.
The Mechanical Curator (http://mechanicalcurator.tumblr.com/) randomly selects small illustrations and ornamentations, from the pages of 17th, 18th and 19th Century books at the British Library
Meanwhile, Serendip-o-matic (http://serendipomatic.org/) allows users to enter some text and then retrieve results from various digital libraries.
I doubt these are the only tools like this – I’m quite sure some interface designers have also been introducing an element of ‘randomness’ (although randomness is not really the right term) into other digital libraries as well.
What will be interesting is to see how much usage such concepts get? Will they provide something that is missing from current search practices? Or will the combination of Google’s ‘Search’ and ‘I’m feeling lucky’ mean that such an approach to information retrieval remain as decorational toys.
Keynote presentation given at Document Engineering 2013 conference, Florence, Italy, September 2013.
The most common problem for digitisation projects, in the UK at least, has been the long-term sustainability of the interfaces designed to surface the digitised material. Much of the work undertaken by the Strategic Content Alliance summarise and address this issue.
Part of the problem has been funding structures. Funding bodies (such as the AHRC, JISC or the New Opportunities Fund) could support innovative projects to digitise and publish scholarly materials, special collections and cultural heritage, but they could not supply the costs for the continued maintenance of hardware, software and the ongoing addition of content and marketing required to establish the ongoing success of the project. In many cases, institutions did not have the means in place to refresh and renew this content.
This realisation led Jisc to create a programme on the institutional skills required to set up ensure that digitisation projects became embedded within their institution’s digital offerings, rather than remaining as add ons. The Content Clustering and Sustaining Digital Resources report summarised some of these issues.
Given this evolution of thinking, it is heartening to see great examples of UK universities building sustainable digital libraries. The London School of Economics (LSE) Digital Library is a great example, and shows all signs of getting stronger and stronger.
The LSE’s had a mixture of internal and external project funding, but their digital content is drawn into their overarching digital library. Collections (such as Beatrice Webb’s diaries and Russian Childcare Posters) are not treated as stand alones, but part of the same back-end technical infrastructure and front-end interface. Wrapping above into a single infrastructure has the additional effect of making it easier to build digital preservation of the objects into the library’s workflow and having usability experts deal with one rather than many interfaces – both vital tasks for any successful digital library.
Licensing is clearly handled (for example, a CC-BY-SA-NC is used in this poster about immigrant labour), URLs are clearly referencible and descriptions of collections clearly indicate what content has been digitised and made available. The interface is clean, and simple, and (with only a few collections at the moment) easy to navigate.
And finally, the clustering of colletions in one place also makes cross searching easy, and throws up interesting juxtapositions such as this:
From the diary of Beatrice Webb, writing on DH Lawrence in 1934: “This sex … is too divorced from conscious hygiene, personal affection, or social obligation; it is wholly antagonistic … to the development of any human or social purpose in life.”
And then from the student newspaper in 2005 – “The top 10 places to have sex at the LSE“
The Creative Europe programme is the EU’s funding for the cultural and creative sector for 2014-20.
It brings together other more specific programmes from the 2007-13, such as the MEDIA programme for funding development of audiovisual skills and infrastructures, largely related to cinema.
Much of the new funding is earmarked for continuing quite specific tasks – distribution of European film, and translation of Euroepan language literature. Other pockets of funding are for dedicated to cross-border cultural performance. Documentation on the website trying to build banking expertise and easier access to private funding.
There is a fair bit of mention of digital, but in quite a generic way. From a cultural heritage point of view, one sees there might well be possibilities in addressing the infrastructural shift to digital, and also addressing the different business models required by that shift. There is, unsurprisingly, mention of transnational access; that is something else the cultural heritage sector might support with digital access.
The press release has some good pieces of information, such as the proposed budget of 1.8m Euros. There are also associated FAQs. The (undated) recommendations from the European Commission to the Council and Parliament go into further detail.
The (undated) pamphlet on the European Commission website (pdf) suggests the details of the programme are not yet confirmed.
As with Horizon 2020, it may well be that the European Commission’s website is out of date, representing initial plans rather than the current state of affairs. It is annoying that it is not updated, nor that there is any newsletter or mailing list to subscribe to. The We are More website suggests that the funding may drop to around the 1.3bn mark after initial discussion between European Council, Parliament and Commission in April 2013.
There is no mention of when the first calls might be announced, but according to other sources this will be in mid-late 2013. Like Horizon 2020, I guess that will require further discussion between EU institutions to finalise matters.