Monday 10 September 2007

Decommissioning Repositories

I had an interesting discussion with Chris (our repository technical guy) today. We host a couple of repositories in ECS that have been used by long-term projects which have now ended. A repository was appropriate to create for them because both were multi site projects - one UK project with six collaborating universities and one EU one with dozens and dozens of partners. Each repository formed a useful way of collecting project outputs and other publications that were relevant to the project's goals, and because each project was relatively long-lived (six years or so) then they were thought of as autonomous quasi-organisations in their own right. And for that very reason, the anticipated contents would not have fitted into a single institutional repository - the majority of course coming from the other institutions!

But now the party's over, there is no more funding, and none of the partner institutions has offered to keep the repository going in perpetuity. Not even the hosting institution or the ex-manager wants to keep their repositories going. We know that even if we don't turn them off their hosting hardware will fail in a few of years. That sounds like very bad news because a repository is supposed to be forever! Was it irresponsible to create these repositories in the first place? Should it be forbidden to create a public repository whose life is guaranteed to be less than a decade? Or perhaps that should be factored into the original policy-making - "this repository and all its contents are guaranteed up to 31st December 2017 but not after". If that were machine readable then the community could have decided whether they want to mirror the collection, or selected bits of it.

However, an easier solution appears to be at hand (or at least for EPrints). A repository has two functions - (a) collection / management of information by registered users and editors and (b) dissemination of that information to all and sundry. Once a repository is decommissioned and its managers and depositors have ceased to use it then the former activity ceases, but the latter can go on in perpetuity. A static website is much easier to run that a repository - it is just a set of files, overseen by a web server instead of a database and a hundred active Perl / Java classes. The dissemination (public) part of the repository can be turned into a static website and simply grafted on to the hosting institution's static web space (using an apache virtual host to keep the URLs identical).

To activate this change, the EPrints repository template needs to be edited to delete all reference to "logging in" or "dynamic site searching" and then all of the static pages need to be regenerated to use the new template. Once that has happened, the repository's 'html' and 'documents' subdirectories can just be transferred to a new web server. The URLs will all be retained intact, the metadata and documents will all be retained intact, the 'collections' will all be retained intact (e.g. view by research group, view by project, view by subject or view by year) and to an external user the repository will look and act much the same.

Only two extra considerations are left - firstly an OAI-PMH static file will have to be generated by the old repository for its holding to still be usable by OAI services in their new location. But more importantly, the hosting institution should consider establishing some light-touch policies for this repository "fossil" - especially with regard to continuing preservation of access and preservation of the documents.

According to the Veterinary Dictionary on answers.com, 'senescence' is the "depression of body functions as part of the process of growing old." I think this accurately describes the process outlined above, so I shall start referring to repository senescence. Of course, this procedure could be applied to a live repository, in order to create a static copy for distribution by CD. This would be ideal for conference / workshop series or electronic journal publishers.

NB The procedure applied above is easy to achieve with EPrints precisely because it was designed to eliminate processing load on the server by making as much of the repository as possible servable as pre-generated static web files.

NNB An alternative approach might be to import all the holdings from the project repository into the institutional repository. But since each of these projects consisted of so many partners, most of contents would fall outside the collections policy of the host IR. Very few IRs actually want a collection of material that is 95% created by other institutions, and nor do the 95% of authors want to see their work bolstering another University's profile! Various IRs have managed to square the circle when it comes to articles of journals that they publish in their presses, but it seems for now that project partnership is a different relationship.

2 comments:

  1. Sounds like a planning failure to me. Of course one can't guarantee that one's repository will stay open indefinitely -- but one CAN plan for a graceful handoff if it doesn't.

    ReplyDelete
  2. A graceful handoff to what? There are no other repositories that will take the whole set of items! Perhaps in the future when the repository environment in the UK is more mature/active. It's true that recent project funding agreements are beginning to declare that the project websites will be maintained for five years beyond the life of the project. But even so, that's still a stay of execution.

    I think that "planning failure" is probably right, but then the planning horizon didn't extend this far. No librarians involved, you see :-)

    ReplyDelete