Making Research Data open  – starting with the low hanging fruit ? 

One thing I expected with my new job was to have difficult arguments with scientists about why they should make their data open. While it received lots of critical response on twitter, I had a sneaking feeling that ‘research parasite‘ editorial published earlier this year was actually reflective of a larger, if unstated, line of thought.

However, initial conversations at Delft seem to imply that something a little different. While there may be pockets of resistance, there are plenty of scientists who are intrigued or committed to openness.

Therefore I suspect the more difficult part of publishing opendata is actually sorting out the detail. How do deal with versions, problematic formats, getting high quality metadata. Working out costs. How best to pick the low hanging fruit, rather clambering to the top of the tree to find the fruit.

Part of the reason for this might be that TU Delft already has an impressive Open Science programme. The focus for ‘open’ is not just Open access for articles, but covers the lifecycle of the work in teaching and research. So there is an Open Education programme for MOOC and OERs and an open ICT programme, sitting along side the tpush for openness in Research Data. Whatever type of role a member of staff or student is playing, the exposure to openess will be present.

Notes from first day in TU Delft Research Data team 

Services offered by or through Delft Library Resarch Data Services 

  1. Data Archive – repository for storing data. Questionb of branding – archive or repository ? Smaller issues logged via Bugzilla and Trello; larger ideas for change still require evidence base throughout the universtiy before they can be implemented. EG Dark archive, or restricted access requirements, interface design.  Embargoes to be offered in self-upload form. Workflow involves data moderation process, checking both quality of metadata and technical quality of content. All conversations go through data officer. Rare to get functional process requirements from researchers,most are driven by library; eg implemtnation of Orcid 
  2. Datacite – international tool delivered via RDS, for giving DOIs to data. Library is Datacite member
  3. Dataverse – generic tool for managing data during research projects. Hosted by DANS; specific instance for 3tu (Delft, Eindhoven, Enschede) use
  4. Open earth Labs, created by Delft for Geo Sciences.Helps manage research data with focus on geo data; more complex than Dataverse
  5. Data Management Plan assistance. Greater number of requests for ‘Data Paragraphs’ in pre-proposals though rather than actuall fullplans 

On leaving Europeana (Part 1)

As of 1st February, I will be leaving Europeana (and The European Library) to take up a new role within the Research Data team at the Technical University of Delft.

I leave Europeana with a heavy heart. It is a unique organisation, with creative people and an ambitious desire to make significant change to how Europeana cultural heritage is shared in a digital world,

It’s not straightforward working there. Trying to create winning products based on strategic interests ranging from those of famous international galleries to tiny military museums, from renowned centuries old libraries to new city libraries and from thousands of archives, both jumbled and organised. Add in the multi-lingual element, the hugely different approaches to licensing and metadata across the continent, plus the friendly concerns of our funders at the Commission, and you are left as the juggler keeping several balls aloft.

Ringling Bros and Barnum & Bailey

Ringling Bros and Barnum & Bailey, Circus Museum, CC-BY-SA

In the face of that Europeana’s achievements are impressive. It has done so much to standardise licencing within the cultural heritage domain with the Europeana Licencing Framework. Many other fields of knowledge (eg the academic sector) are crying out for an approach like this. Once you have licencing harmony, re-use and remixing of big data turns from a distant possibility into achievable reality.

The Data Model helps make data interoperable. Without standardised data it does not hang together at all, not for a portal, not for linked data, not for an API. Enough said.

The recently published Publish Framework has a really nice carrot and stick approach to making the cultural heritage sector improve the quality of its content, and its ease of reuse. It’s not enough to stick a crappy low resolution jpeg behind a rubbish html page with a inexplicable URL. Content needs to be instantly accessible and downloadable and permanent, to both machines and humans.

Finally, I really like the way the portal is developing. Its recent redesign is much easier on the eye. More importantly, the importance on thematic collections (starting with art history and music) is vital to give some focus. Developing a content strategy that helps create a critical mass of content and metadata is, in my humble opinion, one of the most important things Europeana can do in 2016.

From a personal note, here are three things I am really proud of at my time here:

  1. The European Library assembled one of the largest releases of open data in the cultural sector. The Linked Open Data release of over 90m bibliographic records was achieved only after a  massive process of ingesting data, working through the licencing conditions for many different national libraries, and then working with the team to create and publish the linked data.
  2. The great team at The European Library also put together the largest open archive of historic newspaper in Europe. Centralising data from over 20 libraries, with over 11 million image and pages of full text, was a mammoth achievement. It is very gratifying to look at the user stats – the average user spends nearly 15 minutes on the site – an incredibly high figure.
  3. Finally, introducing Europeana Research. There is so much potential for the cultural sector and the  digital humanities to work closely together. Some of the Europeana plans for the next year, including a grants programme for researchers to use open cultural data, look really exciting. To be part of the team getting this off the ground was a privilege.

None of this would be possible without all the people both within Europeana and at other project partners. I can’t name everyone (and you are great even if you are not on this list !), but some of the great people I have worked with closely in the office over the course of the four years include: Nienke and her infectious drive and optimism, Markus, Alena and Nuno’s technical genius, Natasa and Adina’s tenacity with data, Valentine and David’s stupendous all-round knowledge,  and Harry and Jill’s passion and commitment. There may be many balls in the air in Europeana, but there are also many safe hands to catch them.

On that person who understands end users AND developers

When digital humanities projects started getting going many years ago, one of the prized members of any project team was the person who could connect what the researchers wanted and what the technical developers had to do.

That person never really had an official title, but if you didn’t have that role, you tended to end up with horribly ugly sites that helped the research aims of at most two researchers, if you were lucky. Think of 20 search boxes on a screen with drop down boxes with over 50 categories to choose from.

It’s amazing to think how the universe of web design has moved on since then. You never hear the term ‘webmaster’ any more; there’s a spectrum of different tasks and titles (from user researcher to front-end developer) needed to convert user needs into gleaming digital product.

Any digital humanities project (or better still, centre) that wants to manage successful and lasting services over time needs those roles. And as expectations about what the web can deliver continue to increase, so does the need for anyone can create a loop between what the users want and do, and what the developers then build.

In large private companies (oozing in the cash that digital humanities projects can only dream of), there are separate roles for this. Undertaking user research, drawing wireframe outlines, designing graphics and ‘look-and-feel’, user interaction and then user testing and feedback.

Most public sector bodies are fortunate to have one person to do any of that. Europeana has been lucky to have Dean Birkett as part of that connection between what users want and how a website works.

Dean’s work is highly impressive, being able to understand user needs and quickly sketch and conceptualising ideas that can be passed onto developers. He’s heading off to do some free lance work and he will be sorely missed in the office.

Before he left, he mentioned some books that are key for bridging that gap between users needs and completed digital products. They aer useful for any digital project that wants to make sure it is delivering what its users want.

Six Themes from Europeana Tech

There were numerous great presentations and round table discussions at the Europeana Tech event last week, held at the Bibliotheque national de France. Here are some of the key points that might be of interest to libraries and other cultural heritage organisations.

Bibliotheque National de France
Bibliotheque National de France

1. Maps and geo-referencing remain cool

Thousands of maps were extracted and identified from the British Libraries Labs collection, and Wikimedians then did an awesome job categorising the maps. Alternatively, use the LoCloud Historic Place Names service to help identify places within documents.

2. Publishing images online ? Use the International Image Interoperability Framework – IIIF

Want your digital images to be used in a controlled way by others and without the difficulty of ftp or hard disk transfers? If you use the IIIF standard for publishing images it becomes easy for you to both share images, manage their re-use and analyse their usage by others.

3. Metadata is often minimal … face it !

Collections like DigitalNZ and the Cooper Hewitt design museum have many records with sparse metadata. Interfaces accept this and try to adapt, rather than just leaving lots of inexplicable white space.

4. But there are ways to improve metadata

OpenRefine is ‘Excel on Steroids’ – powerful ways to make bulk adjustments to open data. Meanwhile, the amazing release of millions of Flickr from BL Labs has been accompanied by automated methods of tagging that data

5. Unconnected project-based services

There are plenty of great tools for cultural heritage created by different EU projects. But sustainability remains an issue. Could the Europeana Cloud service provide a better way to connect data to the most valuable services for curating, enriching and exploiting that data ?

6. Wikidata as the basis for everything

Wikipedia has increasingly been a popular way for libraries to embed links to their content. But there is increasing interest around what libraries and others can do with Wikidata – providing structured data about books, manuscripts, letter, diaries and other collections, and forming a backbone of verifiable statements that can actually support and improve Wikipedia.

Some quick principles for creating digitised culture

Getting lost in the mire of massive European projects, I am trying to put together some principles to remind me of what I am trying to work on. A first draft is below !

  • Always do user research. However great it is, your knowledge and intelligence cannot know what 10s, 100s or 1000s of users will do
  • Use existing infrastructure to make life easier. C’mon, Google Docs is pretty cool.
  • “Nobody ever complained about a website being too easy to read” (thanks Dean Birkett)
  • Data should be free and easy to download at a granular level. PDF bad, CSV good …
  • … but think context too … CSVs will mystify some people.
  • Be open and transparent in your process. Yes, it hurts. But then everyone knows where you are and what you are trying to do .
  • Avoid vapourware. If something’s not really ready yet, don’t say it is.

They are all pretty obvious, but are useful to remind yourself of from time to time. I’m also thinking about doing on workplace behaviour.

The Great Twentieth-Century Hole Or, what the Digital Humanities Miss


Presentation given at DH Benelux June 2014


Get every new post delivered to your Inbox.

Join 2,643 other followers