Wednesday 23 April 2008

Cloud Computing and Cloud Thinking

Hello from the 2008 Web Conference in Beijing! Yesterday I took part in the Web Science workshop on Web Evolution and spent my evening uploading all the presentations to the Web Science EPrints repository and feeling a bit like Cinderella while my senior colleagues from Southampton went to a reception hosted by Microsoft. While I was uploading a 15Mb PDF over a very slow connection I took the opportunity to have dinner in the hotel's Brazilian restaurant. Several Caipirinhas later I returned to finish off the repository management tasks in much improved humour:-)

Today is the first day of the main conference, and the keynote speech was given by a Chinese VP from Google on "Cloud Computing". He covered all the basics about Cloud Computing and particularly about Google's internal cloud infrastructure and their cloud-based user applications. Now I'm very interested in Cloud Computing as a Computer Science Researcher and Lecturer, and I'm looking at including it in my teaching and in my work. Hurrah for David Flanders and his Fedorazon project who are giving us advice about running EPrints in the Amazon cloud.

However, it also seems to me that all this hard work and infrastructure is just moving our current working practices from our laptops and workstations to yet another exciting new platform. Instead of having my files stored on an identifiable piece of hardware in a known location, they are now stored somewhere unknown and unknowable, but invisibly managed, replicated and always available. This might offer various advantages, but it is a fairly superficial change in my working life.

What I'm really interested in is not a shift in technology, but a shift in human behaviour. Not cloud computing but cloud thinking. Encouraging researchers and scholars to move their ideas from the private and inaccessible domain of their laptops or workstations or manuscripts or CD-Rs into the public domain of the Web to increase the efficiency of the research process and to improve the sum total of human knowledge. Just putting documents or data in the cloud doesn't make it any less private. Moving all of research into the cloud wouldn't increase the sum total of disclosed human knowledge - and that's what I think is really important.

It's all part of the Open Access ideal - don't withhold your intellectual capital unnecessarily. And cloud computing (like service oriented architectures and any other platform infrastructure) may be a useful step in the right direction, or it may be a complete red herring.

Saturday 19 April 2008

Beware What You Wish For

Now that our repository has been upgraded to EPrints 3.1, the repository technical support team (that's Chris) has agreed that the repository management (that's me) should be allowed to have control over the new web-based management tools. In theory, I had the right to this level of control before, but in practice it meant logging into the command line of an infrastructure machine for which I wasn't supposed to have login access. This was part of the management/technician rift that made us put as much repository administration as possible into the web interface of 3.1.

Still, now it's actually arrived, I've realised that all the excuse making and prevaricating that I did before just won't work. The magic words "Oh yes, I need to get the web programming team to look at that" is something that has saved me a lot of work in the past. Now the game is up, my cover is blown and I'll just have to do it myself.

The first thing is to fix the citation styles (we have the italics in the wrong place, and book sections aren't flagged as such). I've got a nice email from Pauline Simpson on the topic somewhere. Then alter the QA audit to ferret out never-published papers. Then update the by-group view pages and their sub-orderings. Then I can take a look at the new tagcloud view-by-keyword styles and the new community of practice co-author listings.

Wish me luck. I hope I don't press the wrong buttons!

Wednesday 16 April 2008

Georgia State vs The Publishers

Apparently Georgia State University has been providing teaching materials to its students without getting the necessary copyright clearance. See the publishers' press release for one side of this story.

I really shouldn't raise my head in public about this lawsuit, because I try to keep quiet about non-OA issues in case I confuse the issues. However, what stands out to me in the above document is something commonly seen in the Open Access debate: publishers glorifying their role. Here's a quote:

“University presses are integral to the academic environment, providing scholarly publications that fit the needs of students and professors and serving as a launch pad from which academic ideas influence debate in the public sphere,” said Niko Pfund, Vice-President of Oxford University Press. “Without copyright protections, it would be impossible for us to meet these needs and provide this service.”

The inference to be drawn from the above paragraph is the obviously false "without copyright protections there would be no scholarship". I suggest the following translation into more grounded reality (copy editing services provided free on this occasion):

“University-based publishing companies are part of the academic food-chain, selling scholar's publications to needy students and professors and serving as one of the channels from which academics' ideas influence debate in the public sphere,” said Niko Pfund, Vice-President of Oxford University Press. “Without copyright protections, it would be impossible for us to meet our needs and provide this business.”

Sunday 13 April 2008

Cow Tipping and All That Jazz

Last week (being the week after That Conference) I was able to escape the country and visit fellow blogging repositarian Dorothea Salo in Wisconsin. Despite warnings of freezing weather and record-breaking snowfalls, I arrived at Dane County airport to the very English sight of grey clouds and heavy drizzle. Dorothea introduced me to Kristin Eschenfelder who is a researcher in social informatics and we all spent a very pleasant evening talking social epistemology and information flows in open source software networks and at Indian restaurant.

The following day I had the pleasure of sitting in on a MINDS management meeting (MINDS is the DSpace Institutional repository of Wisconsin University). Despite the fact that Southampton and Wisconsin have different educational and funding contexts at the national level and different university structures and management at the institutional level, it was very clear that the challenges and activities of repository management are identical for host and guest. There really ought to be an international repository managers organisation, independent of the software platforms and the agendas. Neither of us was able to be at the Repository Managers session at OR08 (Dorothea didn't have the funding to attend the conference and I was too involved in conference administration during the event) but I hope that there might be some movement towards that in the aftermath of the conference.

Then it was on to Chicago (even more rain) where I had been invited to speak about EPrints at a CARLI meeting (Consortium of Academic and Research Libraries in Illinois), alongside Tim Donohue (DSpace Committer) and Sarah Shreeve (IDEALS repository manager). Together with Dorothea, Tim and Sarah have been developing BibApp - a bibliography managing application that works alongside repositories. BibApp was one of the finalists in the OR08 Developer Challenge, but this was my first chance to get a close-up look at the software. Previously it had been DSpace-specific software, but in its latest version it integrates with EPrints via SWORD. It contains some potentially very useful functionality for librarians - it extracts lists of publishers from authors' bibliographies and alerts them to those that have the most permissive Open Access policies as stated in the ROMEO database. The intriguing thing from my POV is that BibApp is deliberately implemented as a separate application that works alongside repositories, but how much of it can be achieved inside a repository? What is the best location for repository-enhancing functionality? Where are services located, and who takes responsibility for them? More of this later I think!

PS If you're wondering about the title of this post, Cow Tipping is a rural Wisconsin pass time and All That Jazz is a song from the musical "Chicago".

Upgrading Repositories

Repository upgrades are a blessing to their users (better interface, better services, fewer bugs) but can be a worry to the technical support staff. The key issue is that while Version (n) = Version (n-1) + Upgrade + 1 hour or less it may be the case that LocalizedRepositoryVersion(n) = LocalizedRepositoryVersion (n-1) + Upgrade + 1 month or more.

When we released EPrints v3 last year, we knew that the fundamental rewrite needed to achieve such a big jump in terms of repository functionality was going to lead to a bigger upgrade effort. Although anyone starting off with an EPrints v3 repository found it easy to install, upgrading required a migration wizard to assist the process.

Having gone through all that, it was always our ambition that EPrints 3.1 would be a "trivial" upgrade process, and in fact that was part of the design objective for EPrints 3.0. Still, as the list of new features in EPrints 3.1 grew and grew, I began to worry about what this would mean for people who had to install it. But good news - we installed it on our main server last week and it took "less than an hour". Bear in mind that our main server runs EIGHT repositories from the same installation code, and so required eight sets of checks and configuration checks and tweaks.

(In case you're wondering, those eight repositories consist of four major repositories - the ECS school repository, the public EPrints demo repository, the public EPrints sofware distribution repository and the Cogprints research repository - and four experimental repositories used by minor projects and workshops.)

Based on this experience, we can say with some confidence that a single repository can be upgraded to version 3.1 in less than ten minutes. Of course, once you've upgraded you'll probably want to spend some considerable time playing around with the new facilities and configuration options, but that won't be the technical support guy's job. In EPrints 3.1 the repository configuration is all done by the repository manager, through the web interface.

What a Long, Strange Trip It's Been

The University's Easter Vacation is just coming to an end, and things are returning to normal after the week-long international festival of repository vitality that was OR2008. I've still got to sort out the financials and finalise the web site, but I've been spending most of time on the conference repository ( in the last week.

The thing that no-one tells you about repositories is that they are a lot like children. They end up being wonderfully satisfying, but they take an awful lot out of you and they go through phases of being messy and uncontrollable. This has dawned on me over the last few weeks in dealing with the OR08 repository, which is just emerging into the phase where I'm feeling really proud of it. It started off only a few days before the conference, when I realised that it was going to be easier to put all the presentations into a repository than manage them all on a website. The last conference I ran (WWW2006) we put all the presentations in a directory on a webserver and generated all the pages and links from a flatfile database using php. I didn't seriously consider using a repository for this conference for a couple of reasons (a) politics - choosing a specific repository platform (like EPrints) didn't seem very much in keeping with the non-partisan nature of the conference series and (b) policy - I have no perpetual mandate for launching a repository for the conference series, and making one for the single event seems a bit profligate given the rhetoric of persistence that repositories are couched in. In the end practicality won out over politics and policy, because repositories have moved on so much in the last couple of years that they have become genuinely useful tools for large-scale information acquisition, processing and dissemination for the web. Sure, if you have a small workshop with a dozen papers to publicise just bung up a website, but with 30 papers+presentations and 50 posters+artwork and user groups providing another 40 presentations (plus a couple of BOFs), a repository becomes an invaluable infrastructure for collecting and displaying material.

I touch on this dichotomy in my own paper at the conference (End-of-Life Scenarios for Virtual Organisation's Repositories) which is all about balancing the immediate usefulness of a repository with the responsibility for sustaining it into the future. In some ways it's an argument not about repositories in particular, but about web resources in general. And perhaps the analogy with children is apt once more - there's a certain excitement in making them, but then someone has to stick around and pay the bills.