Friday, 17 April 2009

EPrints and its Development

I'm in the process of writing a paper about the first ten years of EPrints (yes, it'll be 10 years old at the end of October 2009), and I've been trying to put together a comprehensive overview of the internal construction of EPrints as it stands in 2009. What you might call an "architecture diagram for users".

Stung into action by John Robertson's recent blog entry on repository developments which mentions only a few of the ideas that we are working on, I thought it might be a good idea to share a draft version of this diagram.

The PDF linked from this posting shows my understanding of the internals of EPrints, highlighting the bits that we are working on at the moment in the version 3.2 development track.

Some more details about the 24 new features planned for the next release of EPrints can be found on the EPrints Wiki. Presentations and demos will be forthcoming at Open Repositories 2009, ECDL, OAI6, Sun PASIG and all good repository workshops in your area :-)

Cloud, Web, Intranet and Desktop Connectivity - repository data can now be stored in the cloud, on the web, on an intranet storage service, on a local disk or on any combination of the above. Also, the contents of the repository can be mounted on the user's desktop as a 'virtual file system'.

Desktop Document Support - thumbnails and embedded metadata extraction is provided for Microsoft Office documents. Media copyright checklists are generated for PowerPoint slideshows to assist Open Access clearance for lecture slides. Complex thumbnails are now supported, such as multi-image thumbnails for a slideshow or an embedded FLV clip of a video.

Research Management - Support for new kinds of administrator-defined data objects with project, organisation and people datasets as standard to provide compatibility with Current Research Information Systems (CRIS). Citation reporting will use ISI's Web of Science as well as Google Scholar.

Preservation Support - Preservation Planning Capabilities embedded in the repository using PRONOM and DROID.

Improved EPrints Data Model - as well as eprints, now files, documents, users and all data objects have persistent URIs and arbitrary relationships between them. RDF export plugin provides linked data capabilities, and a new REST interface provides an API to all EPrints data.

Improved Interoperability and Standards - SWORD 2 (v1.3 Specification), new OAI-ORE Import and Export Plug-ins, RDF plugins improved to provide better support for W3C Linked Data, CERIF support for Current Research Information Systems and enhanced Compatability for DRIVER project systems.

Miscellaneous Improvements - there are more enhancements to repository administration and improvements to the way that abstract pages are generated. IRStats/EPStats are better integrated with EPrints distribution. Autocompletion/Name Authorities have been added for Institutions and Geographical Places (both with geolocation data). Enhanced User Profiles allow for more CV-relevant information than just publication lists. User-defined collections provide "shopping trolley" functionality for ephemeral compilations as well as persistent collections. A Scheduler / Calendar for planning for embargoes, licenses, preservation activities, periodic maintenance activities etc. Quality Assurance Issues can be manually raised and resolved. PDF coverpage capabilities will be provided as standard.


  1. Good to see the roadmap... i note the automated extraction of Office metadata.. I found to my cost that the simpler default bits were very misleading(people have a habit of taking one doc and wiping contents and writing something new-so even author is wrong in the resulting .doc.. and as for Acrobat under the various versions.. words fail me..

    The move towards data is implicit rather than explicit- could you expand a bit?

    As small suggestion, we are using NESSTAR as a stable URL provider and access medium for several research project repositories, implemented in variously Sharepoint and the Napier Knowledge Based System Framework base system feeding several such with stable datacube access... ( for example). It seems to me that a similar approach might be a useful stepping stone for data handling extension in classical document repositories... we are doing this solely as a demo (but fully functional for the two huge datasets named) for the ETIS and ETIS+ datasets in the NKBS, but it works so smoothly that I suggets you might take cognizance of this half way house...

  2. So there's a SWORD 2 (v1.3 Specification) interface? Good news ;)


  3. In reply to Adrian - hangs head in shame. Public fail. Yes, the software works but we have been slow in delivering it to Adrian, the Swordmeister General.