Thursday 9 August 2007

Data in Repositories

Although I've been in several UK repopsitory projects with bona fide hard scientists who have been investigating the use of repositories for storing data (JISC EBank UK, JISC R4L) I'm a bit of a newcomer to the practicalities of storing data in a repository. At one level it's an easy task - just upload a file and add some metadata - in other words it's a process indistinguishable from depositing a journal article. The difference is that humans can interpret the contents of "articles" whereas it is a lot more difficult to understand a spreadsheet or a data table, unless the creator has gone to considerable lengths to document it.

This was brought home to me when I wrote an article on evaluating repositories that was based on a huge spreadsheet of data that I had collected from a registry of repositories. I uploaded the spreadsheet to the repository, and then realised that it was almost useless because no-one else could interpret all the columns of data, let alone discern which columns were intermediate calculations and which were genuine "results". I have tried, on a number of occasions, to "document" spreadsheets so that there are different, self-explanatory regions, but it almost always comes down to the fact that I would be better off creating a new article that explains the spreadsheet.

So I am very interested to see that Apple have just released a new application that tackles exactly this issue - a spreadsheet that is constructed as a set of tables on a sheet of text and images. I have just ordered a copy, and I hope that it will make my job (as a repository user and manager) a bit easier!

Tuesday 7 August 2007

Cobbling it Together, or How To Make a Slideshow from a Repository

Being a Computer Scientist, I tend to think of ways of achieving automated solutions to problems, but sometimes it just ain't the best way. When I began to think about ways of creating slideshows from PowerPoints stored in the repository (described a couple of entries ago), I imagined that I would get someone to write me a nice little program. But I realised that it's just as quick for me to do myself using the repository pages and Adobe Acrobat.

(a) identify all the relevant eprints with Search or Browse.
(b) drag each interesting Powerpoint link into a notebook (anything that will allow you to drag and drop a link - I used Google Notes)
(c) make sure that the page of notes containing links to powerpoint files appears somewhere on the web (Google Notes creates a URL for your shared entries).
(d) Open Adobe Acrobat
(e) From the File Menu, choose "Create PDF from Web Page..."
(f) Type in the URL of the notes page
(g) Set the depth of the crawl to 2
(h) Click on the "Create" button
After a few minutes, Acrobat will have pulled in each of the PDF files into one long PDF file. It will also have created pages for each of the HTML links it followed.
(i) Use Document/Delete Pages... to get rid of the unwanted HTML pages.
(j) Set the Document's Initial View to "Full Screen" in the File/Properties... menu
(k) Set the Full Screen options to "Loop after last page" and "Advance every 5 seconds" in the Acrobat Preferences.

It involves a bit of messing around, but it is relatively quick while giving you lots of control. In a perfect world, EPrints would provide an "Export to PDF Slideshow plugin" I suppose!

Monday 6 August 2007

Preserving the Past

This post is more about repsoitory usage, rather than repository management, but I think it allows me to reflect on my own usage and deposit practice and consider how I might need to support other lecturers similar to me.

Our repository has a fairly liberal accession policy - if you think it's a research output, it's in. This policy is flexible in all sorts of directions, for example recently professors have taken to depositing clips of TV news programmes which mention their research. However, it's never been pushed towards the preservation agenda, but today I've been tidying my office - my first proper attempt since we moved to a new building with smaller offices in December. I have finally got the chance to re-evaluate the contents of all those old box-files, last looked at 4 years ago during the previous move. I have discovered a set of CD-ROMs from one of my old PhD students, who has left me his thesis together with demonstrations and presentations of a handful of projects that he was working on. So I have come over all preservation-minded, and I'm wondering how to deposit this in the repository. A lot of it as never published, but it was demonstrated internally in a large multi-institutional project, and I am loathe to forget it. He did put his thesis on the repository as soon as he graduated in 2002 (bless him!), but the rest needs examining. I think it's all screendumps, powerpoints and web sites and the dynamic websites that he worked on had published, static equivalents so there is no issue about the software emulation. Pity - I'd like to try out some of VMWare's virtualisation mechanisms with EPrints.

Friday 3 August 2007

More Mundane Work

Just so that you don't think that my life is all "wandering through labs" and "having lovely ideas about how to show off our research", I have got a list of edits to make to a professor's eprints. He has been complaining that his publications aren't being correctly categorised by the repository, and I assumed that there was some kind of bizarre bug that we were responsible for. But it turns out that he has just incorrectly filled out the "type" of each publication. So I promised to sort them all out for him - but to do that I'll have to try and find out all the proper conference details for each publication.

Why did I offer to do it for him? Why didn't I stick to my self-archiving principals? Did I mention that he was a professor and I'm not?

Creative Uses of a Repository

I was just walking through our lab this afternoon. The "lab" is the open plan area that the research staff and students inhabit in the Intelligence, Agents, Multimedia group. It is on the top floor of our new building, which enables me to say to visitors as they step out of the lift "Welcome to the IAM group - the highest level of abstraction in the School". (That kind of thing passes for wit in Computer Science circles.)

Back to the plot - as I was walking through the lab, I was struck by how many posters our researchers have accumulated from conference visits. They are stuck up all over the cubicle walls that separate out the different research areas. Some of these posters are looking a bit the worse for wear (they used to be up in the old building) but none the less they give a good impression of the research that this group has undertaken recently.

Now, as long as all these posters are actually stored in the repository, we could use them to provide a rotating display on a public plasma screen. There are an increasing number of these in various buildings all over the department. All it would take would be a little script that chooses a different PPT or PDF Poster Presentation from our eprints repository and displays it for 2 minutes before going on to the next poster. Each time it would choose a poster from a different author / discipline / research team. All I'd have to do is find a spare screen!

My list of things to do this summer isn't getting any shorter!