RepositoryMan

Friday, 17 April 2015

EPrints for EPSRC Data Management

The following simple Research Data Management advice has just been set around my institution for staff publishing papers to satisfy the new EPSRC data mandate. Although each institution will provision research data differently, it was great to see all the work that has been done over the last few years distilled into a simple set of instructions that even professors can understand!

Write the paper
Login to EPrints
Go in to manage deposits
Click on the Add New Data Set button
Upload an Excel spreadsheet with the data in from the paper
Fill in as many of the questions as you can, making sure you describe what the data corresponds to in the paper (e.g. Fig 1 etc…)
You can link it to the grant that funded it (these should be in the system already)
In the options for the upload I made the data “visible to registered users only” and embargoed it until the end of the year with “publication pending” as the reason.
Email researchdatamanager@yourinstitution.ac.uk to get a DOI - the repository team will check what you’ve entered at the same time.
Write the following in the acknowledgements of the paper, "The data for this paper can be found at doi:10.the/DOI/you.received.above"
Submit paper
When the paper is accepted, make visible to all, remove embargoes, and link it to a copy of the paper that has been uploaded onto the system.

Southampton's repository has an extended set of metadata fields to describe datasets that are part of the ReCollect EPrints Bazaar plugin that was developed by the UK Data Archive and the University of Essex, as part of the JISC MRD Research Data @Essex project.

Thursday, 31 January 2013

The Basics of Scholarly Communications in the UK

In the decade since the Budapest Open Access Initiative declared a new public good, there have been many expositions of the advantage and inevitability of Open Access and its consequences for new modes of scientific enquiry. Tony Hey (who has just claim to 'first cause' of UK open access in his position of Head of Electronics and Computer Science at the University of Southampton) has recently started a series of blog posts A Journey to Open Access that gives a very accessible introduction to the topic. Stevan Harnad (who was given a chair in ECS by the same Tony Hey) also blogs extensively at Open Access Archivangelism.

In my lesser role of championing repositories and developing the capabilities of the EPrints platform, I have had the privilege of working with library and information professionals to try to explain the principles of Open Access to a broad range of academics and researchers, and I have been struck by the almost total lack of understanding of the UK scholarly communication infrastructure shown by my research colleagues.

To help those who have been too busy writing papers to appreciate how those papers appear and now find themselves über-confused and offended by the Finch regime, I offer the following diagram as an introduction to Everything You Need To Know on the topic. Forget the dissemination of papers and the transfer of knowledge that form the scholarly publishing cycle, this is all about influence and power.

Publishing companies have pushed governments towards Gold Open Access (more money for publishers) and pulled universities away from Green Open Access (no-cost parallel dissemination). Researchers themselves have sided with publishing companies and learned societies (who act like sub-branches of publishing companies) to try to maintain the stability of the publishing industry, irrespective of the health of the university sector on which it depends!

Consequently, we now have a government proposal (the Finch report) to pay publishers twice! Once to make UK research open access whilst still retaining subscription access to the non-UK material. It's a kind of Westminster Open Access Initiative stating that an old tradition of scholarly publishing and a new technology of the Web have converged to make possible an unprecedented injection of public cash for publishers.

The only reasonable way forward is for researchers to take the initiative, and to show the kind of academic leadership that Professors Hey and Harnad demonstrated a decade ago - to start being proactive in their own scholarly communications. The easiest way to do that is to start using the existing repository infrastructure provided by their universities and supported by their libraries.

Researchers already hold all the cards, they don't need to be held to ransom in this Finchian standoff. They are the producers and consumers and quality control agents that create every aspect of the literature, they are also the community that defines its own criteria for professional advancement and assessment. Everything they think that they depend on the publishing industry for, they can actually achieve for themselves.

Thursday, 29 November 2012

Repository Twitter Training

In a previous post I reported on using EPrints to gather data from Twitter in order to support researchers in the social sciences, particularly those looking for evidence of social processes or for the impact of the Web on society. The work was also reported at OR2012 in Edinburgh in a paper Microblogging Macrochallenges for Repositories that described the work involved in adapting EPrints to support this task.

Having got some more experience from running a pilot service at Southampton, we would like to invite anyone from the repository community who is interested in this work to join in a training session at the University on Tuesday 11th December from 1-3pm (buffet lunch included).

The first hour will focus on using the service: how to harvest twitter streams, how to monitor the harvesting process, how use the repository tools to analyse the collection of tweets, how to export the data to other visualisation and analysis services and how to deposit the analysed data in an institutional repository.

The second hour will discuss the management of the service itself: how to install twitter-harvesting functionality using the EPrints Bazaar, how manage the functionality, how to integrate it with your institutions other repository services and consideration for the licensing and ethical restrictions on gathering and using Twitter data.

If you are interested in attending or finding out more information, please email me, lac@ecs.soton.ac.uk.

Monday, 12 November 2012

Repositories, Theses and Graduation Ceremonies

I was attending my son's graduation ceremony at Bournemouth University last week. While waiting for his turn, the title of a graduating student's PhD thesis was read out. It caught my attention (it was about TV production on Dr Who) and so I slipped out my iPhone, googled the student's surname, a word from the title and the name of the university and found the thesis available in the Bournemouth Institutional Repository (first result). I was able to download and start skimreading the PDF before the student had returned to his seat .

It's difficult to express what a genuinely exciting experience this was - it felt like I had arrived in the future! This is a repository use case that I had never thought of, and everything just worked.

Congratulations to Bournemouth's repository team on the hard work they have put in to making the experience join up. Also, congrats to Andrew Ireland on a really interesting thesis!

PS Universities really should consider letting graduation audiences see some of the really impressive work that their students have done. Perhaps an onstage projection of a poster from their final dissertation while they walk across the stage?

Friday, 20 July 2012

Changing Lightbulbs

Some more reflections on the road(s) to Open Access...

Q: How many publishers does it take to change a lightbulb?

A: The lightbulb doesn't need changing because everyone has bought torches.

Q: How many funders does it take to change a lightbulb?

A: One to run a community lightbulb changing programme, and another to bulk purchase torches.

Q: How many librarians does it take to change a lightbulb?

A: About 0.25FTE, but the lightbulb has to have a CC-BY license.

Thursday, 19 July 2012

Open Access Joke. Spoiler: not funny at all

Q: How many Finch committee members does it take to change a lightbulb?

A: The lightbulb doesn't need to be changed, it just needs a large injection of public funds to transition it to a more illuminating condition.

One of the Finch committee members has gone public on the tricky balancing act that the committee tried to maintain. In his words "Green was unacceptable to funders unless learned societies and publishers were willing to allow it". In my words, the committee was structured so that publishers' interests trumped all other considerations.

Wednesday, 18 July 2012

Gold Finch and Green Open Access

The UK's Finch Recommendations on Open Access, much of which look suspiciously like a blank cheque that the research sector has to write to one of its support industries, has stirred a lot of debate. Still, the government has supported it, and RCUK has been careful to publicly support it even while ensuring that it doesn't interfere too much with its current policy of open access mandates. But while I'm frustrated at the Finch recommendations and relieved that they haven't stopped the funding councils support for the UK's rich open access repositories infrastructure, I do think there might be some positive outcomes for OA.

Let's not lose sight of the fact that the Open Access proposition is very simple, but quite radical:

Universities are disruptive communities - they create new knowledge and transfer it to society through teaching, training and all kinds of impact mechanisms.
The Web is a disruptive technology - it drastically reduces the difficulty of sharing knowledge between multiple parties, across the world.

Open Access is a disruptive idea - it rebuilds universities' research communications on the Web's more efficient communications platform.

The context in which Open Access operates is less simple. Scholarly communication is a complex network of stakeholders whose principle output is "The Scientific Literature"and whose major outcome is "The Progression of Scientific and Scholarly Knowledge". But each stakeholder participant in this network is driven by other outputs and outcomes: individual researchers have careers to develop and families to feed; universities have reputation to develop and sustainability to ensure; publishing companies have profits to increase and shareholders to benefit; research funders have governments to impress; governments have lobbyists and voters to satisfy and industries to benefit. The meshing of these diverse motivations into a stable network of 'players' that produce such a lasting and valuable resource is tribute to the decades of investment into the bigger picture of scientific progress by all parties. The astonishing thing about scientific publishing is not that it has been done well, but that it has been done at all.

The Open Access idea is particularly welcomed by those who see the stresses in the network threatening its viability or choking its productivity. On the other hand, where Open Access practice is actually adopted, it is by those researchers who see it as an effective route to getting their job done regardless of the "complex network of stakeholders". In other words, open access flourishes in disruptive communities who adopt new practices to improve their own capabilities, regardless of the consequences. Disruptive technologies aren't disruptive just because they exist, but because they are adopted, used and gradually mainstreamed. The network works around this disruption - new players emerge, new practices are fashioned, new relationships are formed, new contracts are negotiated - and an improved network results that is better fit to the current conditions.

Willett's strong words directed to publishers at the recent Publishers' Association indicate that the government really has adopted the Open Access ideal and is not taking many prisoners along the way:

Provided we all recognise that open access is on its way, we can then work together to ensure that the valuable functions you carry out continue to be properly funded

The role of the Finch recommendations is to coerce the current research publishing players into accepting that Open Access is a reality that they must adopt by offering them a lifeline that allows them a chance of transitioning to the realities of a new Open Access publishing network.

Many of us think that this is pointless because we believe that the new network needs leaner, more efficient participants rather than the same old players. But the effect of the Finch lifeline may be a radical restructuring of the network, as Chris Keene (EPrints repository manager at Sussex) has pointed out in discussions on the UKCoRR mailing list. Payment of the APC (article processing charge) changes the relationship between publishers and researchers.

So although Finch's proposal may seem retrograde, superfluous and overly generous to the publishing industry, it does lead publishers by the nose to a much more exposed position. Now they have to deal with every author of every research paper and justify their costs on a much greater scale. Previously cost negotiations have been handled once per year per institution, and then with the library as an intermediary. Now they have to deal with angry and cash-strapped researchers on a daily basis - those that lived by the market will probably die by the market in a thousand hand-to-hand combats.

In the meantime, quite unlauded by Dame Finch, the UK has a robust infrastructure that actually delivers Open Access through an excellent network of institutional repositories together with training and advocacy programmes from each University library, all underpinned by a decade of technology R&D, policy development and professional practice funded by JISC. Finch doesn't predict a smooth transition to publisher-led Open Access, and the research community's response seems to back her predictions up. But the RCUK response shows what the UK is actually really good at - pragmatism - and likely means an increased role for repositories and the emergence of a more balanced and thoroughly hybrid environment as the network of stakeholders all seek to come to a new equilibrium.

Tuesday, 3 April 2012

Soton Labs: Embedded Repository Experimentation

We are just in the second stage of the transition for the ECS repository - all the data has been copied across to the main Southampton Institutional Repository, all the ECS repository URLs now redirect there as well, and we are in the middle of data reconciliation and de-duplication. This is very exciting, because the university finally has a single OA research service, with all stakeholders pulling in the same direction and providing a unified view of the university's research output for business, research, education and administration purposes. Huge thanks to Wendy White, Simon de Montfalcon and the rest of the library team, as well as Tim Miles-Board, Tim Brody and the rest of the EPrints Services team for making the whole venture run so smoothly!

Even more exciting for us is the fact that we now about to set up a new programme of repository activity called "Soton Labs". Inspired by the idea of Google Labs, it is an institutional space for experimentation and innovation around research information systems, and EPrints will form its backbone. Driven by the needs of the research staff, it will be informed by a whole range experience and ideas (many gathered from research council and JISC projects) that can be offered to staff on the famous "permanent beta" experimental basis until they are ripe for integration into the main (business critical) repository. Unlike the ECS repository which was focused on a single department's needs, Soton Labs will have a broader brief, to deliver cutting edge services and to facilitate new improved practice for early adopters throughout the whole institution.

I've got a shortlist of tasks that we hope to address in the coming months:

live collection of research data
simple metadata schemas for research data archiving
collections of documentation around research proposals (bids, reviews, responses)
research projects
linked data.

So you can see that rather than reducing the repository activity in Southampton by halving the number of installations, we're stepping up the pace of repository development.

Tuesday, 13 March 2012

Lunch Talking at SPARC 2012

In the lunch break at SPARC 2012 today our table was discussing the negotiation of author rights for repository deposits. In lamenting how authors tend to be backed into a corner by the publisher's last-minute demands to sign the copyright transfer form (or else forfeit their publication opportunity), a delicious and subversive idea arose. I present it for you here, without any claim of endorsement by SPARC or my lunchtime companions.

PLOS NULL: the high profile, high impact journal that publishes articles that have been peer reviewed, accepted and corrected for publication by third party journals whose lawyers have then refused to agree the author's pro-repository copyright transfer amendment.

Monday, 12 March 2012

Value Transactions and The Publishing Business Model

I'm at the SPARC2012 Open Access conference, and all this talk about Open Access is reminding me that the issue of scholarly publishing is actually very straightforward.

Publishing companies have a very simple business model - they take authors' articles, add value and charge for that value. You can see this process illustrated in the diagram below, with the various stages in publishing an article broken out between the different parties, and each transaction explicitly labelled with its typical financial charges and legal agreements.

A decade on from the original Budapest Open Access Initiative and here we are in Kansas City just about to start discussing more of the nuances and implications of this obvious publishing model.

Thursday, 5 January 2012

Mendeley Open Access Update

In the last six months since I analysed Mendeley's contribution to Computer Science OA in June 2011, they appear to have increased their membership of that community by 37% and the ratio of full text documents to community members has increased from 0.66 to 0.71. The number of OA documents has increased by 47% to 11,757 and the number of OA active users (i.e. users who have made at least one document public through Mendeley's servers) has risen by 46% to 2,441 but still represents only 15% of the total membership of that community.

Congratulations to Mendeley - their service is obviously rising in popularity and hence in significance to the community. OA analysts will note that the increase in open access documents comes from increased membership, rather than a change in behaviour of the community.

Wednesday, 26 October 2011

Rethinking the Open Access Agenda

I used to be a perfectly good computer scientist, but now I've been ruined by sociologists. Or at least that is what Professor Catherine Pope (the Marxist feminist health scientist who co-directs the Web Science Doctoral Training Centre with me) says. I am now as likely to quote Bruno Latour as Donald Knuth, and when I examine "the web" instead of a linked graph of HTML nodes I increasingly see a complex network of human activity loosely synchronised by a common need for HTTP interactions.

All of which serves as a kind of explanation of why I have come to think that we need to revisit the Budapest Open Access Initiative's obsession with information technology:

An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds. Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge. (see http://www.soros.org/openaccess/read)

BOAI promises that the "new technology" of the Internet (actually the Web) will transform our relationship to knowledge. But that was also one of the promises of the electric telegraph a century ago

From the telegraph's earliest days, accounts of it had predicted "great social benefits": diffused knowledge, collective amity, even the prevention of crimes. (Telegraphic realism: Victorian fiction and other information systems by Richard Menke.)

There has been much good and effective work to support OA from both technical and policy perspectives - Southampton's part includes the development of the EPrints repository platform as well as the ROAR OA monitoring service - but critics still point to a disappointing amount of fruit from our efforts. Repositories multiply and green open access (self-deposited) material increases; knowledge about (and support for) OA has spread through academic management, funders and politicians, but it has not yet become a mainstream activity of researchers themselves. And now, a decade into the Open Access agenda, we are grasping the opportunity to replay all our missteps and mistakes in the pursuit of Open Data.

I am beginning to wonder whether by defining open access as a phenomenon of scholarly communication, we mistakenly created from the outset an alien and unimportant concept for the scientists and scholars who long ago outsourced the publication process to a support industry. As a consequence, OA has been best understood by (or most discussed by) the practitioners of scholarly and scientific communication - librarians and publishers - rather than by the practitioners of scholarship and science.

We have seen that the challenge of the Web can't be neatly limited to dissemination practices. In calling for researchers open the outputs of their research, we inevitably argue with researchers to reconsider the relationship that they have with their own work, their immediate colleagues, their academic communities, their institutions, funders and their public. It turns out that we haven't been able to divorce the output of research from the conduct and the context of research activity. Let's move on from there.

In a recent paper Openness as infrastructure, John Wilbanks discussed the three missing components of an open infrastructure for science: the infrastructure to collaborate scientifically and produce data, the technical infrastructure to classify data and the legal infrastructure to share data - extending the technical infrastructure with a legal framework. I think that we need to go further and refocus our efforts and our rhetoric about "Open Access to Scientific Information" towards "Open Activity by Scientists" supported by three kinds of infrastructure:

Human Engagement
Methodological Analysis and
Social Trust.

The aim of open access to scientific outputs and outcomes will not occur until scientific practitioners see the benefit of the scientific commons, not as an anonymous dumping ground for information that can be accessed by all and sundry, but as a field of engagement that offers richer possibilities for their research and their professional activities. To realise that, scientists need more than email and Skype to work together, more than Google to aggregate their efforts and more than a copyright disclaimer to negotiate and mediate the trust relationships that make the openness that OA promises a safe and attractive, and hence realistic, proposition.

What I'm saying isn't new - there has been lots of effort and discussion about improving the benefits of repository technology to the end user/researcher, and about lowering the barriers of use. JISC have funded a number of projects in its Deposit programme, trying various strategies to increase user engagement with OA. As well as continuing to pursue this approach, we also need to step back from obsessing about the technology of information delivery, think bigger thoughts about scientific people and scientific practice and tell a bigger and more relevant story.

Sunday, 9 October 2011

Using EPrints Repositories to Collect Twitter Data

A number of our Web Science students are doing work analysing people's use of Twitter, and the tools available for them to do so are rather limited since Twitter changed the terms of their service so that the functionality of TwapperKeeper and similar sites has been reduced. There are personal tools like NodeXL (a plugin for Microsoft Excel running under Windows) that do provide simple data capture from social networks, but a study will require long-term data collection over many months that is independent of reboots and power outages.

They say that to a man with a hammer, the solution to every problem looks like a nail. And so perhaps it its unsurprising that I see a role for EPrints in helping students and researchers to gather, as well as curate and preserve, their research data. Especially when the data gathering requires a managed, long-term process that results in a large dataset.

EPrints Twitter Dataset,
Rendered in HTML

In collecting large, ephemeral data sets (tweets, Facebook updates, Youtube uploads, Flickr photos, postings on email forums, comments on web pages) a repository has a choice between:

(1) simply collecting the raw data, uninterpreted and requiring the user to analyse the material with their own programs in their own environments

(2) partially interpreting the results and providing some added value for the user by offering intelligent searches, analyses and visualisations to help the researchers get a feel for the data.

We experimented with both approaches. The first sounds simple and more appropriate (don't make the repository get in the way!), but in the end the job of handling, storing and providing a usable interface to the collection of temporal data means that some interpretation of the data is inevitable.

So instead of just constantly appending a stream of structured data objects (tweets, emails, whatever) to an external storage object (a file, database or cloud bucket) we ingest each object into an internal eprints dataset with appropriate schema. There is a tweet dataset for individual tweets, and a timeline data set for collections of tweets - in theory multiple timeline datasets will refer to the same objects in the tweet dataset. These datasets can be manipulated by the normal EPrints API and managed by the normal EPrints repository tools: you can search, export and render tweets in the same way that you can for eprints, documents, projects and users.

EPrints collects Twitter data by regular calls to the Twitter API, using the search parameters given by the user. The figure on the left shows the results of a data collection (on the hashtag "drwho") resulting in a single twitter timeline that is rendered as HTML for the Manage Records page. In this rendering, the timeline of tweets is shown as normal on the left of the window, with lists of top tweeters, top mentions, top hashtags and top links together with a histogram of tweet frequency on the right. These simple additions serve to give an overview of the data to the researcher - not to try to take the place of their bespoke data analysis software, but simply to help understand some of the major features of the data as it is being collected. The data can be exported in various formats (JSON, XML, HTML and CSV) for subsequent processing and analysis. The results of this analysis can themselves be ingested into EPrints for preservation and dissemination, along with the eventual research papers that describe the activity.

All this functionality will soon be released as an EPrints Bazaar package; as of the time of writing we are about to release it for testing by our graduate students. The infrastructure that we have created will then be adapted for other Web temporal data capture sources as mentioned above (Flickr, YouTube, etc).

Sunday, 26 June 2011

Mendeley: Measuring OA rates

Having talked about Mendeley's OA deposit rates in my last blog post, I thought it worthwhile to check how representative my chosen discipline (Computer Science) was. Rather than download the entire community for each other discipline, I have performed a quick and dirty sample of some of the available literature in each discipline using the search function. Each Mendeley search result offers the option of saving the PDF (if available) to your library, so it is a simple matter to wget some search results and grep for PDFs.

The table below shows the results of this procedure for 11 disciplines (two illustrative keywords each). The "available PDFs" column records the number of PDFs offered on the first page of the search results (each page contains 200 results); the total number of results shows the relative coverage of the topic in Mendeley.

Computer Science appears to be in the 5-10% range of OA (18 or 11 PDFs out of a page of 200 results) which does seem to be just about average. Social Science, Medicine, Health Science, Economics and the Humanities appear to have fewer PDFs and Maths and Physics appear to have rather more.

Search term	Discipline	Available PDFs	Total Results
chromatography	Chem	10	14260
crystallography	Chem	27	4921
JAVA	CS	18	848
software	CS	11	15185
geology	Earth	36	4180
hydrodynamic	Earth	40	2853
econometrics	Economics	13	565
microeconomics	Economics	5	88
biodiversity	Env	14	4668
climate	Env	14	13003
nursing	Health	6	10723
palliative	Health	6	1978
archaeology	Hum	6	1730
Foucault	Hum	11	248
algebra	Math	101	4424
cohomology	Math	171	525
cancer	Med	11	52315
pharmacology	Med	4	62285
quasar	Phys	127	556
telescope	Phys	101	2347
cognition	Psy	11	18805
schizophrenia	Psy	17	4055
criminology	SocSci	2	154
sociology	SocSci	2	2005

Mendeley: Download vs Upload Growth

There was a lot of talk about Mendeley at OAI7 in Geneva, especially the news that in the first quarter of 2011 the number of articles downloaded for free jumped from 300,000 to 800,000. That's really good news, confirming Mendeley as a successful service in the Open Access domain. Having done an analysis of Mendeley's impact on Open Access (see Comparing Social Sharing of Bibliographic Information with Institutional Repositories) just under a year ago, I thought I'd repeat the analysis to see the extent of the impact of their growth on deposits as well as downloads.

Results: the number of members of the Computer Science discipline appears to be 2.2x larger than last August (increased to 74736 from 34230.) Of these, only 12102 appear in the Computer Science directory listing, whose contents are now filtered by Mendeley according to their "profile completion"; the gross number was kindly provided for me by Steve Dennis at Mendeley. This filtering takes care of the long tail of accounts that have never been used. Of the filtered users, 1676 are "OA active", having publicly shared at least one PDF document (up 21% on last August). The total number of PDFs shared by this group is 8014, up 16% on last August with 4.8 PDFs being shared per "active OA user" (down from 5.0 last August).

So a big increase in user numbers results in a small increase in publicly shared PDFs, confirming (I think) that Mendeley are not preaching to the choir, and are mainly attracting users who are not already "OA active". Users of Mendeley have clearly transitioned from "scholarly knowledge collectors" to "scholarly knowledge sharers". The challenge still remains how to change their behaviour from "scholarly asset maintainers" to "scholarly asset sharers".

Wednesday, 27 April 2011

Experimenting With Repository UI Design

I'm always on the lookout for engaging UI paradigms to inspire repository design, and I recently noticed that Blogger has made some new "dynamic views" available. It provides a variety of smart presentation styles aren't a million miles away from the ones emerging on smartphone apps, combining highly visual and animated layouts.

So I've imported some repository contents into Blogger to get some hands on experience, and I'd be interested in any feedback on whether this looks useful or compelling.

The new blog is called Mike O'Lection - it's a little DSpace repository joke. .
New views

Sidebar: http://mikeolection.blogspot.com/view/sidebar
Timeslide: http://mikeolection.blogspot.com/view/timeslide
Mosaic: http://mikeolection.blogspot.com/view/mosaic (very Tumblr)
Snapshot: http://mikeolection.blogspot.com/view/snapshot
Flipcard: http://mikeolection.blogspot.com/view/flipcard

Original repository pages: http://eprints.ecs.soton.ac.uk/17386/, http://eprints.ecs.soton.ac.uk/21289/, http://eprints.ecs.soton.ac.uk/21622/, http://eprints.ecs.soton.ac.uk/21030/

These views suit various different types of material, but the constant theme that is emerging is that a good visual is pretty much de rigeur for any resource. This means that relying on the thumbnail image of an article's first page is not going to be a good strategy (hint: they all look the same.) I can forsee the need to extract figures and artwork from the PDFs and Office Documents uploaded to a repository.

(Over the next few days I hope to put some more examples on the blog to help get a better feel for how this will work. But I think I might make a bulk Blogger exporter for EPrints because manual cut and pasting is only enjoyable for a few minutes!)

Tuesday, 26 April 2011

Mobile Use of Repositories

While looking at the impact of mobile devices on the development of the Web I found useful information in this March 2011 press release from web analytics company StatCounter, charting the rise of Android.

StatCounter data also pinpoints the rise and rise of mobile devices to access the Internet. The use of mobile to access the Internet compared to desktop has more than doubled worldwide from 1.72% a year ago to 4.45% today. The same trend is evident in the US with mobile Internet usage more than doubling over the past year from 2.59% to 6.32%.

I thought I'd see whether this behavior applies equally to repositories and so I had a poke around in the usage states for eprints.ecs.soton.ac.uk and this is what I found:

53,285 PDF downloads from 27 March 2011 (4am) - 3rd Apr 2011 (4am).
Of these 33,304 are attributed to crawlers and 19,981 to real browsers.
Only 0.93% of the browser downloads occur on mobile devices (70% iOS, 22% Android, 7% Blackberry and 1% Symbian)

The use of mobiles that we are seeing for accessing research outputs in repositories is less than 1/4 of the general use of mobile Internet. An obvious reason for that is the unpalatable mixture of PDF pages and small devices, but popular applications like Mekentoshj's Papers and Mendeley for iPhone seem to indicate that an attractive mobile experience should be possible.
That implies that there's another exciting opportunity for repository developers to up their game!

Thursday, 14 April 2011

Faculty of 1000 Posters - Still Looking for a Silver Bullet

The F1000 Open Access Poster Repository was brought to my attention by a recent Tweet. I love repositories with posters in - they're copyright-lite and very visually attractive - and I've long advocated for more use to be made of these kinds of scholarly communication. With some success, I have pushed hard for the poster artwork to be made available online in all the conferences I have been involved in organising.

The Faculty of 1000 has a special relationship with some Biomedical conferences, inviting authors to upload their posters to the open access F1000 site. Perhaps this is an effective new way of gaining open access to specific kinds of early-report research material?

The F1000 posters site contains 909 posters. 649 of those are derived from 28 invited conferences (an admirable average of 23 posters per conference), and the remaining 260 posters are uploaded on an ad hoc basis from authors attending 148 other conferences (an average of 1.7 posters per conference).

While it is clear that the invitation approach is much more effective than the laissez faire approach, the huge size of biomedical conferences (often displaying several thousand posters over the course of four days) means that the overall success rate of this OA strategy is only 4.2% (a figure I reached by counting the total number of posters at a sample of 7 of the 28 invited conferences).

So, still no silver OA bullet!

Monday, 21 March 2011

I Won't Review Green OA, It's Spam - I DO NOT LIKE IT Sam-I-Am

According to the Times Higher, Michael Mabe (chief executive of the International Association of Scientific, Medical and Technical Publishers and a visiting professor in information science at University College London) fears that repositories are essentially "electronic buckets" with no quality control. He also expressed doubts that the academy would be able to successfully introduce peer review to such repositories, partly because it would be difficult to attract reviewers who had no "brand allegiance" to the repositories.

Let's think about this....

Q: Who are the authors of papers?

A: Researchers.

Q: Who put papers in repositories?

A: The authors.

Q: Who review papers?

A: The authors of other papers.

Q: Where do they get papers to review?

A: From a URL provided by the journal editorial board.

Q: Who are the editorial board?

A: Authors of other papers.

Q: Just remind me what the publishers do?

A: Their most important job is to organise the processes that get the peer review accomplished by the other authors (see above).

Q: Where does the brand value of a journal come from?

A: It's a bit complicated, but mainly from the prestige of the authors on the editorial board and the prestige of the papers that the authors write. There is a default brand that comes from the publishing company that owns the journal, but of course that comes recursively from the brand value of all the journals that it owns.

Q: "Electronic buckets" don't sound very valuable, do they?

A: No they certainly don't - I mean, imagine the kind of material that normally ends up in a bucket! Who would want to peer-review that? But hang on - who stores stuff in buckets anyway? That's a bit of a problematic metaphor for a storage system! Try replacing "buckets" with "library shelves" and the statement becomes more accurate. What kind of material do you find on library shelves? Things that people might want to read. Things that people might want to review.

Q: But how would authors know what to review in a repository without the publishing company's branding?

A: I suppose an editorial board would send them a URL.

Friday, 11 March 2011

You Can't Trust Everything You Read on the Web

Houston, we have a problem. It turns out that trusting repositories as authoritative sources of research information is all very well and good, except when the repository is an authoritative source of demonstration (fake) documents. Sebastien Francois (one of the EPrints team at Southampton) has just reported that Google Scholar is indexing the fake documents that we make available in demoprints.eprints.org.

So when your weaker students start citing

Freiwald, W. and Bonardi, X. and Leir, X. (1998) Hellbenders in the Wild. Better Farming, 1 (4). pp. 91-134.

you know that it's just a teensy misunderstanding, OK? But if anyone needs their citation count artificially boosting, I have a repository available to monetize.