Monday 21 July 2008

Top Gear, Top Blokes

Fans of the BBC's Top Gear show are having to wait 21 years for studio tickets as the waiting list is now over 336,000 people long, according to Autoblog.

Mind you that's nothing compared to the 100 year wait that institutional repository fans might have to endure to reap the benefits of the ROAD project's latest experiment. In a stunt very reminiscent of the Top Gear program, Stuart Lewis and his team of repository torturers are going to stuff a million items into the ingest interfaces of DSpace, EPrints and Fedora repositories. If this really were "Top Gear", two repositories would explode and the winner would be Stuart Lewis with a wallet of rewritable DVDs. Since this isn't "Top Gear" all that will happen is that some of the repositories might slow down unacceptably and will need to have their storage or metadata modules re-engineered to work efficiently at this scale.

But what's the 100 year wait about? That's how long it would take for an Institutional Repository working at full efficiency to accumulate a million items, given that the average institution has about 1000 academics who each deposit a research or teaching output around once a month (or 10 times a year given time off for vacations and admin). That makes about 10K items per year, 100K items per decade or a million items per century accruing to your repository. And given that most IR's aren't operating at that level of efficiency yet, the Repository Managers of the next century can safely drink a toast to the ROAD team for setting their minds at ease.


  1. Interesting but how long would it take if it was an automated tool (or 'Robot Scientist') curating research results at the rate of 1 item per minute?-)

  2. Really good question and I'm going to deliberately take a contrary position, just because it's more fun :-)

    Is the Robot Scientist running lots of different experiments? Is it curating analyses and creating new scientific knowledge? Are the deposits in any way comparable to the research outputs that are being put (very slowly) into an institutional repository?

    Or is it just a big database storing more heterogenous and uninterpreted data every 60 seconds.

  3. ...but in answer to your question, about 5 years, assuming that it's not allowed to run over night and at weekends when the security staff have gone home.

    And assuming that nothing can go wrong, can go wrong, can go wrong....

  4. I've just read the RobotScientist home page and I see that the project is aiming for exactly the kind of quasi-human non-schema-conformant high-level knowledge that I was assuming would only happen in the IR.

  5. 'non-schema-conformant', which schema are you parsing against? There's also an interesting ontology under development (EXPO) based on a general ontology of experiments.

  6. Yes the Robot Scientist is running lots of different experiments, curating analyses and creating new scientific knowledge. It generates hypotheses and analyses results. And we hope to show comparable research outputs deposited in the repository. We also have projects to look at the formalisation and automation of scientific outputs/papers/results in general, and we are not alone in this area of work.

    In October we'll acquire a second robot that will do different experiments (drug screening) and produce even more data. The robots are designed to run over nights and weekends and in these early days plenty of things do go wrong. However repositories will still need to keep up with e-Science - our automation technology will only improve from here on...