Digitised scans of historic newspapers create rather large file sizes. If you need to see a text up close, a low-resolution version just won’t do – words and characters are too blurry. So libraries that have undertaken digitisation projects on newspapers create individual master files of anything from 10 to 50 GB per image.
This creates quite a challenge for The European Library (TEL) within the the Europeana Newspapers project. TEL is creating an end-user interface for these historic documents, assembling around 10m images of newspaper pages from the 12 library partners involved. To create a successful user experience, TEL needs to be able to present good quality images – maybe not master files, but images of size at least 0.5 Megabytes (MB) and up to 2.5 MB
Great for the user, but a headache for the technical manager. 10m images at an average of 1.5 MB per image demands a total of server space around 15 m Megabytes (around 14 Terabytes). This is okay in a project setting, but not sustainable in the long term.
Therefore the project has come up with a new solution.
Rather than all the images be centrally harvested and then stored at TEL, some libraries have offered TEL access to their image server, ie their own hardware space where suitably sized images are stored.
When a user makes a request (via the search or browse) to see a particular image the TEL interface then dynamically grabs the image from the source library.
Have a look at an 1814 issue of the Viennese newspaper Wiener Zeitung from the National Library of Austria. Here the user can zoom in and out and explore the image within the TEL interface – but the digital version remains housed in Vienna.
This approach has other advantages in that it lets the curator of the original material maintain control of the digitised versions – lack of control is one of the reasons cited by managers as to why they are reluctant to share content with third party publishers.
However, not all libraries in the project have taken this approach, as it takes a bit of effort to allow the images to be grabbed in this way.
Therefore copies of the newspaper from the National Library of Latvia (such as a 1914 issue of ‘Drywa’) are pre-harvested and stored at TEL.
But as knowledge of this technique increases, I imagine it will become more popular. Rather than pre-assembling such collection and having to go through the process of harvesting and then storing a collection (which is time consuming and costly), third party aggregators will be able to curate, showcase and publish specific collections drawn from a variety of sources. With the result that content no longer remain trapped in institutional silos, but can be more easily seen and contextualised in a variety of different settings.