Friday 26 June 2009

Hardworking Repositories: The Global Picture

To round off the picture of hardworking repositories (ie repositories which receive regular daily deposits) here is the global top ten repositories listed with the number of days in the last year in which deposits were made. The data is obtained from the Registry of Open Access Repositories.

ORBi (University of Liege, Belgium)311
IR of the University of Groningen (Netherlands)301
KAR - Kent Academic Repository (UK)286
University of Southampton:
School of Electronics and Computer Science
UBC cIRcle (University of British Columbia, Canada)269
LSE Research Online (London School of Economics, UK)260
EEMCS EPrints Service (School of Electronics
and Computer Science, University of Twente, Netherlands)
LUP: Lund University Publications (Sweden)259
UPSpace at the University of Pretoria (South Africa)257
University of Tilburg (Netherlands)256

There are all sorts of caveats attached to this list! Firstly, I removed two entries because they were not "institutional" but "national" in scope. Secondly, I left in two "departmental" repositories (ECS and EEMCS) because - dammit, if a department can achieve regular deposits then so should a whole institution! Thirdly, this table depends on OAI harvested data from ROAR - if there are any problems with the OAI feed then it will affect the analysis. And perhaps most importantly, this table does not take into account the types of deposit that were made on the days in question. They could be research articles, research data, teaching material, holiday photographs, or bibliographic records sans open access full text. So for example, the UBC repository is mainly composed of student theses and dissertations.

As I have said in the last two postings in this blog, this list simply reflects how much deposit usage the repository is getting on a daily basis and it deliberately factors out the number of deposits in order to smooth over the effect of batch imports from external data sources. The emphasis is on finding a simple metric to highlight embedded usage of a repository across a whole institution.


  1. There is another caveat (at least in DSpace repositories): you are collecting and counting number of days with new records, and that may have a "weak" relation with deposit dates,as repositories may have intermediate steps (metadata validation or other). A repository can register 20 days of new records with 10 days of deposit, or the other way around - a batch import of dozens of papers in one day, may represent several days of "new" records.

  2. Sure, editorial workflow leads to uncertainty of deposit in any repository (not just DSpace). Even the author's self-archiving process might be broken across several sessions/days, at least in EPrints.