One dangerous assumption that is commonly heard these days is that any kind of institutional repository will be able undertake long-term preservation. Indeed many people are still of the belief that putting some on the Internet will ensure long-term preservation.
Thus it’s useful to have some clarity about precisely what a digital repository is and whether it can be trusted for long-term preservation.
Various international bodies have come together to produce a checklist of 10 bullet points to define such a repository.
1. The repository commits to continuing maintenance of digital objects for identified community/communities.
2. Demonstrates organizational fitness (including financial, staffing structure, and processes) to fulfill its commitment.
3. Acquires and maintains requisite contractual and legal rights and fulfills responsibilities.
4. Has an effective and efficient policy framework.
5. Acquires and ingests digital objects based upon stated criteria that correspond to its commitments and capabilities.
6. Maintains/ensures the integrity, authenticity and usability of digital objects it holds over time.
7. Creates and maintains requisite metadata about actions taken on digital objects during preservation as well as about the relevant production, access support, and usage process contexts before preservation.
8. Fulfills requisite dissemination requirements.
9. Has a strategic program for preservation planning and action.
10. Has technical infrastructure adequate to continuing maintenance and security of its digital objects.
Compare these to many of the ‘repositories’ currently in existence and you will see how many do not guarantee long-term preservation.
The British Library is hosting a conference on a topic which there is increasing digitisation interest, such as the BL’s own Archival Sound Recordings project.
According to the project blurb:
“Unlocking Audio is an international conference exploring the planning and strategies required for the successful execution of large-scale audio digitisation projects, and the technical and practical issues involved.”
Historical GIS (Geographical Information Systems)-based projects such as the Vision of Britain project at the University of Portsmouth can be extremely rich resources, visually and statistically, but carry extra layers of complexity.
In terms of copyright, there are several layers of rights that need to be cleared for such resources. First there are the maps which are scanned in; secondly there are the administrative boundaries to define regions, units, places, counties, parishes or whatever appropriate spatial unit; thirdly there are gazeeteers which provide indices to changing names for these spatial units; and finally there is the actual data that fills up such units, whether this be mortality rates, election data or population reports.
The resource creator, or the institution he works for, needs to get permission from all these rights owners in order to digitise, deliver back-up and preserve the data. A time-consuming task at the best of times. The ideal solution of convincing them all to use Creative Commons so as to licence the rights to the resource creator is nice in theory but difficult in practice.
Perhaps a clearing-house or licensing facility is needed on behalf of the educational sector who can a) obtain the necessary permissions or b) even act as a rights-holder for essential GIS data such as gazeeters and boundary data. This would give more time to the academic specialists to build GIS resources without getting caught up in all the copyright obstacles.
Sometimes you are so involved in the digital library world you forget why you are so concerned with digitising stuff.
Simon Tanner’s 2005 report for UNESCO (Digital Libraries and Culture) reminds the reader of some of the universal reasons for digitising cultural content.
- It democratises access to the arts
- It safeguards cultural artefacts under risk of destruction
- It nurtures notions of ‘home and family’ by dealing with diaspora, displacement and cultural identity
I’ll post the link once Google locates the public version!
The Centre for Data Digitisation and Analysis (CDDA) at Queen’s University Belfast, and BOPCRIS at the University of Southampton are two of the leading digitisation units in the UK Higher Education sector.
Both are engaged in a wide number of projects (such as the Stormont Debates digitisation or the 18th-century Parliamentary papers and play an instrumental part in the UK’s digitisation infrastructure.
Most notable about CDDA is that has a good relationship with local students, providing them with skills and training to undertake data capture and processing work. The centre has around 5 or 6 flatbed scanners, 4 or 5 book scanners on tables and 1 larger book scanner for more difficult material.
BOPCRIS’s jewel is its robotic scanner which using suction power to automatically lift and turn pages before taking pictures. BOPCRIS also has 4 or 5 book scanners on tables and one larger book scanner standing on its own.
The UK’s central archive for arts and humanities data, the Arts and Humanities Data Service (AHDS) has unexpectedly lost half its funding after the Arts and Humanities Research Council withdrew the half a million pounds it contributed towards the running of the service.
I have a slightly vested interest in that I worked for the AHDS for several years, but even then this seems a poor decision to endanger a unit that is responsible for long-term preservation of digital data, and has had a pioneering role in setting up a working preservation service. There are not many others in existence in the world.
Even stranger is the four-line justification given on the funding council’s website. Most seasoned observers will not the statements are simply not true; the lack of accompanying evidence is telling.
Both the BBC and the Guardian have run stories on an extraordinary digitisation project relating to ‘un-shredding’ Stasi Cold War spy documents
At the end of the Cold War, the documents were ripped into pieces, but archivists starting reassembling them because of their obvious historic value.
The task was accelerated when they started to digitise the pieces. To quote the Guardian
“The machine works by scanning the document fragments into a computer image file. It treats each scrap as if it is part of a huge jigsaw puzzle. The shape, colour, font, texture and thickness of the paper is then analysed so that eventually it is possible to rebuild an electronic image of the original document.”
This is an extraordinary process!
One can easily see plenty of other applications where the ability to analyse digitised materials could produce innovative and unexpected results – any kind of historical document where physical damage has eroded its meaning could be aided by extending this technology. One can also see the need for really complex metadata schema to ensure all the relevant technical metadata which informs the analysis softwares in place.