Six Themes from Europeana Tech

There were numerous great presentations and round table discussions at the Europeana Tech event last week, held at the Bibliotheque national de France. Here are some of the key points that might be of interest to libraries and other cultural heritage organisations.

Bibliotheque National de France
Bibliotheque National de France

1. Maps and geo-referencing remain cool

Thousands of maps were extracted and identified from the British Libraries Labs collection, and Wikimedians then did an awesome job categorising the maps. Alternatively, use the LoCloud Historic Place Names service to help identify places within documents.

2. Publishing images online ? Use the International Image Interoperability Framework – IIIF

Want your digital images to be used in a controlled way by others and without the difficulty of ftp or hard disk transfers? If you use the IIIF standard for publishing images it becomes easy for you to both share images, manage their re-use and analyse their usage by others.

3. Metadata is often minimal … face it !

Collections like DigitalNZ and the Cooper Hewitt design museum have many records with sparse metadata. Interfaces accept this and try to adapt, rather than just leaving lots of inexplicable white space.

4. But there are ways to improve metadata

OpenRefine is ‘Excel on Steroids’ – powerful ways to make bulk adjustments to open data. Meanwhile, the amazing release of millions of Flickr from BL Labs has been accompanied by automated methods of tagging that data

5. Unconnected project-based services

There are plenty of great tools for cultural heritage created by different EU projects. But sustainability remains an issue. Could the Europeana Cloud service provide a better way to connect data to the most valuable services for curating, enriching and exploiting that data ?

6. Wikidata as the basis for everything

Wikipedia has increasingly been a popular way for libraries to embed links to their content. But there is increasing interest around what libraries and others can do with Wikidata – providing structured data about books, manuscripts, letter, diaries and other collections, and forming a backbone of verifiable statements that can actually support and improve Wikipedia.

Some quick principles for creating digitised culture

Getting lost in the mire of massive European projects, I am trying to put together some principles to remind me of what I am trying to work on. A first draft is below !

  • Always do user research. However great it is, your knowledge and intelligence cannot know what 10s, 100s or 1000s of users will do
  • Use existing infrastructure to make life easier. C’mon, Google Docs is pretty cool.
  • “Nobody ever complained about a website being too easy to read” (thanks Dean Birkett)
  • Data should be free and easy to download at a granular level. PDF bad, CSV good …
  • … but think context too … CSVs will mystify some people.
  • Be open and transparent in your process. Yes, it hurts. But then everyone knows where you are and what you are trying to do .
  • Avoid vapourware. If something’s not really ready yet, don’t say it is.

They are all pretty obvious, but are useful to remind yourself of from time to time. I’m also thinking about doing on workplace behaviour.

The Great Twentieth-Century Hole Or, what the Digital Humanities Miss


Presentation given at DH Benelux June 2014

Presentation on Europeana Newspapers

Presentation given at British Library information day on digitised newspapers

Digitisation Projects Classified by Date of Corpus

At the DH Benelux Conference in The Hague in June, I’m looking into the extent to which the Digital Humanties ignores the twentieth century. The abstract is here.

As part of this work, I’ve been investigating the projects undertaken at various DH centres, in particular those projects that are working with a specific corpus of data (as opposed to doing networking, or tools development), and the dates of those corpora.

I’ve taken some significant DH Centres and marked each of the projects according to a very rudimentary temporal classifications – ‘Classical, Medieval, Renaissance, 18th century, 19th century, 1900-1950, 1950 onwards’

The Google spreadsheet with the results so far is at

So far, I’ve included

Department of Digital Humanities, King’s College London
Huygens Institute, National Library of Netherlands (The Hague)
Maryland Institute for Technology in the Humanities, University of Maryland
Centre for Literary and Linguistic Computing, University of Newcastle
Center for Digital Research in the Humanities, University of Nebraska

There is a sixth tab with the total number of projects.

There is a fuller list I wish to explore on the ‘Totals’ tab of the published spreadsheet. Any more links to identifiable lists of projects based at DH Centres would be gratefully received !

PS I’m aware there is a whole bunch of methodological/sampling problems with focussing on ‘projects in DH Centres’ ! I’m hoping to bring this out in the paper.

The Great 20th-Century Hole; What the Digital Humanities Miss

(Abstract submitted for the DH Benelux 2014 conference), The Hague, June 2014)

Over the past few years, there have been endless debates about the definition of the Digital Humanities. Many angles are considered – the practitioner of DH as builder, as coder, as theorist, as user – and also where the practitioner of DH sits and works – in the library, at home, in a ‘laboratory’, in the computer science department or with other disciplines.

However, this paper argues that that another angle has been ignored – a temporal one. The Digital Humanities has an uncritiqued bias towards the pre-20th century. The projects, papers, books and conferences that constitute the field of Digital Humanities (or at least in the Digital Humanities within the western tradition) have taken as their objects of study the classics, the Middle Ages, the early modern period, the Enlightenment and the nineteenth century. The twentieth century – arguably the most important era for study for the humanities – remains relatively untouched as a point of investigation. Whereas there is a mass of projects related to the digitisation of early printed books, manuscripts, maps, early photography, those related to film and media, contemporary books, or modern letters, documents or recent politics are relatively scarce.

The paper draws in evidence from projects such as Europeana Newspapers, programmes like Digging into Data; centres such as the King’s College London Department of Digital Humanities; events like the annual DH conference and books such as the recent Debates in the Digital Humanities to indicate the extent of this bias. It explores the extent to which projects relating to the twentieth century feature within such academic endeavour.

The paper explores the reasons for this bias. Not surprisingly, reasons of licensing and copyright play a role. The copyright status of much twentieth-century material creates a barrier that seems to block engagement from the outset. Indeed, it will be argued that this key problem, and one one that the community has been not only been slow to address but even to recognise. But there are other reasons to consider as well – issues relating to economics, file formats, and ambitions and relationships of individual disciplines within the humanities to the digital.

It concludes that if the Digital Humanities wishes to fully live up to its potential it needs to conceive of itself in a particular way and tackle these problems as part of a larger alliance. The type of partnerships that scholars within the DH umbrella have formed – with librarians, archivists, publishers – need to be reformulated and strengthened. The twenty-century hole is a massive problem for the digital humanities and only one that can be dealt with by the community by presenting and articulating the issue as part of a larger group of interested stakeholders


Get every new post delivered to your Inbox.

Join 2,323 other followers