Thursday, 9 August 2007

Data in Repositories

Although I've been in several UK repopsitory projects with bona fide hard scientists who have been investigating the use of repositories for storing data (JISC EBank UK, JISC R4L) I'm a bit of a newcomer to the practicalities of storing data in a repository. At one level it's an easy task - just upload a file and add some metadata - in other words it's a process indistinguishable from depositing a journal article. The difference is that humans can interpret the contents of "articles" whereas it is a lot more difficult to understand a spreadsheet or a data table, unless the creator has gone to considerable lengths to document it.

This was brought home to me when I wrote an article on evaluating repositories that was based on a huge spreadsheet of data that I had collected from a registry of repositories. I uploaded the spreadsheet to the repository, and then realised that it was almost useless because no-one else could interpret all the columns of data, let alone discern which columns were intermediate calculations and which were genuine "results". I have tried, on a number of occasions, to "document" spreadsheets so that there are different, self-explanatory regions, but it almost always comes down to the fact that I would be better off creating a new article that explains the spreadsheet.

So I am very interested to see that Apple have just released a new application that tackles exactly this issue - a spreadsheet that is constructed as a set of tables on a sheet of text and images. I have just ordered a copy, and I hope that it will make my job (as a repository user and manager) a bit easier!

