(options)

Preservation (or archiving) is seen by many as a major problem. And so it is, but it is not a problem that should stop or slow the creation of an institutional repository. There is at least ten years to resolve the issue before it becomes pressing.

The simple hardware issues are that the most research active university in Australasia will probably just fill a 100GB disk in 5 years if it puts in place a mandatory policy now (otherwise say 20GB). At the end of five years the server will be replaced on a normal ICT cycle and the next five years will need to be accommodated in a larger disk. Who can doubt that at the end of 5 years a 200GB drive will be commonplace or indeed regarded as small? Even in 2005, 600GB and 1TB drives are available.

The software issues are more problematical, but if text documents are stored in a pdf or xml format version (perhaps alongside others), there is a good chance that software to read and render them will be available for at least that period, and perhaps very much longer. For some non-text items, xml might still provide a reasonable option. Scanned images pose yet a third problem. However, to delay until these are all resolved (they never will be given the nature of the ICT industry) would be as silly as delaying buying a computer because it will be cheaper tomorrow.

Persistent URIs are also important, but over this time frame all software packages provide a persistent URI, though they differ in the scheme they use.

It is also relevant to consider that preservation of refereed research articles is actually a voluntary activity. The actual publication of the article transferred the responsibility for long-term preservation of the article to the publisher, where it has resided for all of the last century. Parallel archiving of research articles will aid their longevity by duplicating possible sites, but is quite clearly an add-on to present scholarly publication processes.

But what about ETDs (electronic theses and dissertations)? As long as a university still demands to have a paper copy of a thesis bound and stored in its Library, again, the long term preservation is already covered by the paper copy. The volume of electronic theses is not large in any case; software to read them 50 years from now might be. A particular problem is posed by ETDs that are intrinsically electronic, and which lose some of their content if reduced to paper (if indeed that is possible). However, at present there are very few of these due to university regulations (maybe none yet in Australia?). Their preservation needs to be carefully considered, especially since they may well use uncommon software, computer languages or systems.

One solution is to pass the problem off to a central archive office, funded to look after preservation. Under consideration is to do an annual scan of the University of Tasmania institutional repository and copy the year’s crop of documents to the State Library of Tasmania’s STORS Service. Similar archiving arrangements might be arranged elsewhere in Australia.

You may be interested in one or more of the projects listed on the espida site of the University of Glasgow, especially espida itself and ERPANET.


Page last modified on 01 November 2005, at 02:57 PM Tasmanian Time