RepositoryMan: May 2009

Friday, 29 May 2009

Google Wave

There's an urgent need to develop preservation / e-research / e-learning / rights management strategies for Google Wave.

There. That's my bid for some inevitable digital library memes.

Wednesday, 27 May 2009

Don't ever stop adding to your body of work

I've just returned from the high octane, tech-frenzied social whirl that is Open Repositories 2009 (or #or09 to its delegates). It's a week full of diverse and diverging agendas (cloud this, desktop that, policy the-other) that make your head spin. There are new product announcements (EPrints 3.2 / DSpace 1.5 / Zentity) and new initiatives being explained (DuraSpace). And new demos of new features. It's normal to go to conferences to show off products that you've only just finished, hoping that the demos hang together. Now the Developer Challenge means that we're all there showing off things that we hadn't even started! It's mad, completely mad, and I wouldn't miss it for the world.

So I came back with a kind of tech-hangover - and spent a couple of days feeling the backlash response of "what does it all mean?" and "what is the point?" It's all very exciting, but are we actually going anywhere that we all want to be?

Surprisingly, the cure came in the form of a Presidential address reported in the Washington Post. Under the headline "Don't ever stop adding to your body of work" Barack Obama talked about the need to keep on contributing to a lifetime of achievement. I'm a sucker for a good metaphor, and I read this as a message to institutions and faculty about using a repository to reify their contribution to science and scholarship, to manifest their body of work.

That is what building a body of work is all about - it's about the daily labor, the many individual acts, the choices large and small that add up to a lasting legacy. It's about not being satisfied with the latest achievement, the latest gold star - because one thing I know about a body of work is that it's never finished. It's cumulative; it deepens and expands with each day that you give your best, and give back, and contribute to the life of this nation. (Barack Obama delivering the commencement address at Arizona State University.)

This is what repositories are really about: making the abstract concrete and fleshing out CVs. Collecting evidence of intellectual creativity, supporting research activities and profiling the emergence of innovative individuals, collaborations and communities. Evidence that spans whole careers and beyond.

This was also the message of David Schulenberger's closing keynote at the SPARC Digital Repositories meeting in November 2008: the job of the institutional repository is to tell the story of "what we've achieved" to its faculty and its institution's funders and supporters.

Back at home, this is why we keep doing what we doing. Not just so that we can play with new development features, but so we can get a job done. So that we can build the infrastructure of our institutional memory, we can tell our institutional story and we can provide a platform for our future institutional success.

That's me done. I'm back to hacking shell scripts and XML.

Friday, 22 May 2009

A Distilled Guide to EPrints v3.2

Having spent an entire morning talking about new EPrints features at OR09, I thought that it would be great to have a really (really) condensed version of the talk as a public guide to how EPrints is evolving. I spent my last day in Atlanta reducing the presentation to just 9 pages - if you don't include the title and acknowledgements. The result is a brief account of all the features that make EPrints a serious repository platform: effective data model, flexible storage options, choice of APIs, support for the researcher's tasks, and reporting usage, impact and research information.

I'll try and update this as v3.2 develops; please let me know what other information you would like to see!

Thanks for a great Open Repositories experience in Atlanta - see y'all again soon!

A Distilled Guide to EPrints v3.2

View more OpenOffice presentations from lescarr.

Friday, 15 May 2009

PhD studentship in Digital Rights and Digital Scholarship

EPrints Services are funding a PhD studentship in Digital Rights and Digital Scholarship at the EPSRC Web Science Doctoral Training Centre at the University of Southampton.

The Web has had a huge impact on society and on the scientific and scholarly communications process. As more attention is paid to new e-research and e-learning methodologies it is time to stand back and investigate how rights and responsibilities are understood when "copying", "publishing" and "syndicating" are fundamental activities of the interconnected digital world.

Applicants with a technical background (a good Bachelors degree in Computer Science, Information Science, Information Technology or similar) are invited for this 4-year research programme, which begins in October 2009 with a 1-year taught MSc in Web Science and is followed by a three year PhD supervised jointly by the School of Law and the School of Electronics and Computer Science. The full four-year scholarships (including stipend) is available to UK residents.

EPrints Services provide repository hosting, training and bespoke development for the research community and are funding this research opportunity to promote understanding of the context of the future scholarly environment.

Further information:

EPSRC Web Science Doctoral Training: http://webscience.ecs.soton.ac.uk/dtc

EPrints Services: http://www.eprints.org/

Enquiries should be addressed to Dr Leslie Carr (lac@ecs.soton.ac.uk) in the first instance.

Thursday, 14 May 2009

Repositories and Research information

I've just spent three days in Athens at the euroCRIS meeting, discussing the relationship between repositories and Current Research Information Systems. The idea behind a CRIS (plural CRIS, not CRISes) is that it forms a cross-institutional information layer that aggregates information from the library (publications), human resources (personnel and organisational structure), finance department (projects and grants), estates management (facilities and equipment) and external sources (funding programmes, citation data), and so integrates at some level with the set of services provided by a repository.

The CRIS initiative comes out of an administrative background (starting in 1991) and so predates repositories and exists tangentially to them. A CRIS is typically concerned with repository metadata (how many papers? which publishers? written by whom?) but not its data contents. So my concern was that the repository should not be sidelined or marginalised, but instead the repository should be seen as a mature partner in the aggregate of information services provided across the institution. The experience gained in the UK's recent research assessment exercise (documented in Institutional Repository Checklist for Serving Institutional Management) has very clearly been that the library, through the repository, provides enormous experience in dealing with bibliographic information, ensuring quality and basic auditing capability on claims of authorship and publication. Treating the repository as a superfluous adjunct to an administrative catalogue is to miss the benefit that a managed repository has to offer.

At the meeting many universities from across Europe spoke of how they were trying to make the two systems work together in one form or another. In some ways, the innovation is not technical, but simply in the concept that institutional information should not be siloed, but that it can be shared between administrative domains for the benefit of the whole institution.

On the technical side, CERIF (Common European Research Information Format) is the data sharing and interoperability standard that euroCRIS are promoting. Now on its third major iteration since 1991, it models many of the entities found in the research environment, particularly people, institutions, projects and research publications, patents and products. The standard is expressed in the language of the relational database, with individual tables defined for each kind of entity. Its particular novelty is that that roles like "author" or "project manager" are relationships between independent entities (people, publications or projects) rather than attributes of those entities, and that all relationships are constrained to an explicit time-period.

These requirements are straightforward to satisfy in EPrints - each new entity type (e.g. project) is just an extra dataset with an independent metadata schema and its own workflow and display rules. So an EPrints repository should be able to take on a useful role within a CRIS environment, deployong its comprehensive set of services for ingesting and managing project and personnel data, as well as research publication data. What is not yet clear is whether EPrints should be a helpful adjunct to, a useful component of, or a competent replacement for a CRIS.

That dilemma will be partly solved by the new JISC R4R (Ready for REF) project, whose aim is to investigate the use of CERIF as a mechanism for exchanging research information between universities (e.g. supporting the movement of staff throughout their careers). R4R, which is a joint activity between the Kings College, London and the University of Southampton, is focusing on the transfer of research information in the context of the forthcoming UK Research Excellence Framework (REF) activities.

In the meantime, there is a lot of interest in this area: the report on Serving Institutional Management that I mentioned above was the most-downloaded item of the OR08 conference.

Thursday, 7 May 2009

Batch Updates

I've been taking advantage of the new ISI license to import citation counts into our school repository.

Now we have Web of Science and Google Scholar citation counts listed for matching eprint records, you can search for eprints that fall into a citation range (e.g. 10 or more) and you can order search results by either type of citation count.

Now I'm being asked to provide reports of h-factors and citation averages and community normalised bibliometrics. What larks! I've had to draft in Perl assistance to write the necessary scripts.

But what it's taught me is that we're still missing out on an awfully big proportion of our school's research outputs - and we're an engineering school, not a humanities school. So I'm looking to add a THIRD source of citation data - the ACM Digital Library. The ACM run many of the journals and conferences that our researchers publish in - journals and conferences that ISI don't index. And then there's Scopus - that would potentially be a FOURTH citation data source. It looks like we'll need to have a separate "evidence of impact" dataset in the repository.

Integrating all this extra data has been made very easy by some developments from Chris Gutteridge and Tim Brody. Firstly, the EPrints import framework now supports an update option that allows you to merge new data with existing records. Secondly, the Microsoft Excel exporter (which is so useful for generating complex reports and charts) now has a matching importer. Combine these two features together and you can use all the user interface features of a spreadsheet to do large-scale, batch data amendments outside the repository environment and then commit the updates to the repository. This is great for spotting and fixing metadata errors.

Tuesday, 5 May 2009

Repository as Platform? Or Product?

Is repository software (DSpace, EPrints, Fedora) a platform to build on, or a shrinkwrapped product to unpack and use? There are at least two answers to this question, and each software has to try and strike the right balance for its intended community.

I note the results of the recent DSpace community survey, that shows that 80% of repositories use the default metadata configuration, 78% have made at most "minor cosmetic" changes to the configuration and 62% use no addons beyond the distributed core code (stats, SWORD, google indexing etc).

This seems to support the view that if you come up with a new feature but it isn't a standard part of the core repository then it won't be used. It's a challenge for repository software designers, and for repository projects. For example, how do you make the repository user interface pleasing and useful to artists, engineers, teachers and researchers all at the same time.

Answers on a postcard, please!