RepositoryMan: February 2011

Three years ago on this blog (doesn't time fly!) I contrasted the efforts that librarians and academics could make in furthering Open Access. My argument (such as it was) focused on the relationships between the two communities, noting that when it came to research, librarians could only advise and assist but that academics could lead and command. Or at least in theory! In particular I backed the idea that change would come from senior managers in the academic world and from research funders. In the intervening time we have indeed seen a big increase in OA leadership in the form of mandates being adopted, but I wonder if the pace of change is not about to put even researchers in the back seat.

The Web was developed at CERN, in Switzerland, and took over the world in more than a geographic sense. It emerged from its home in a highly-funded, very collaborative, international research laboratory and carried the culture and design assumptions of its birthplace (open information exchange, minimal concern over intellectual property control, no requirements for individuals to monetize knowledge production) and stamped them on the rest of society, regardless of society's estimation of its own needs (for more, see the presentation The Information Big Bang & Its Fundamental Constants). One manifestation of the clash between the Web and "how society has historically operated" was the Budapest Open Access Initiative some ten years after the initial development of the Web.

The Web's culture of open information exchange has more recently had a very visible effect in the area of Open Government Data. A simple re-statement of the objectives of the Semantic Web as The Five Stars of Linked Data has powered a tremendous focus of activity in national and local government when allied with political agendas of Transparency and Accountability. Portals like data.gov.uk and data.gov provide access to "the raw data driving government forward" which can be used to "help society, or investigate how effective policy changes have been over time". In the UK, the Treasury's COINS database of public spending is one of 5,600 public datasets that have been made available as part of the initiative. In the US, the Open Government Directive requires each department to publish high value data sets and states that "it is important that policies evolve to realize the potential of technology for open government." Both US and UK government see the opening up of public data as the driver for political improvement, innovation and economic growth, with the Public Data Corporation as the focus of British development of an entire social and economic Open Data ecosystem.

Having watched Open Access lobbyists engage in political processes in the UK and US (with a handful of Senators, Congressmen and MPs sometimes for OA and sometimes against) it is rather a shock to see the President and the Prime Minister suddenly mandating a completely revolutionary set of national policies based on the technological affordances of the Web, and in the teeth of plenty of advisors' entrenched opposition. And rather a shock to realise that offices even more elevated than a vice chancellor are enthusiastically joining the world of open resources and open policies.

But data and publications are different things, and publications are privately owned by private publishing companies rather than stockpiled by the government. However, the decade of Open Access debate has shown that progress in OA (and OER and open data) is impeded more by individual and institutional inertia than corporate opposition. When the highest offices of government are confidently pushing forward a programme of open participation, will academics have the luxury of treading water?

How will our governments sudden enthusiasm for open data affect Open Access? Perhaps not at all. Perhaps Universities are too insulated from the administrative whims and shocks of Washington and Whitehall to be affected. (How many researchers have even heard of data.gov?) Even so, governments will indirectly cause a shakeup in the administration of public research funding, and the infrastructure needed for universities to adequately respond to the requirements of open funders will cause them to become more open themselves.

The public climate that informs the private OA debates and decisions in University boardrooms will change; pro-OA researchers and librarians will no longer be arguing from such a defensive position, not appearing as idealistic hippies. Even in the absence of direct government mandates, pro-OA decisions will be easier to support and less contentious to implement. The values of the research communities will change as public values and expectations change - when even governments become more accountable through open data, research communities that insist that their data and their research is their private property, for the sole benefit of the furtherance of their own careers, will soon appear old-fashioned and untenable.

So watch this space. It may be that Cameron and Obama will indirectly achieve what Harnad and Suber have been toiling for. I wonder what I'll have to say in another three years' time?

The mantra of open data is: put your data on the web / with an open license / in a structured, reusable format / that is open / using open identifiers / that are linked with other data.

The third step/star in this process is commonly explained as using CSV rather than Excel, (because the former is an open format, but the latter is a closed proprietary standard). You'll see this position stated at Linked Data Design at the W3C and sites all around the world are copying it.

We really need to think a bit harder about this: Excel's native format is an open standard, and although an XML encoding of a the complete semantics of a spreadsheet is hardly a straightforward thing to deal with, it is simple enough to extract data from. In particular, I don't see that it is significantly more difficult than dealing with CSV!

Once you've unzipped the Office Open XML data, you can iterate around the contents of the spreadsheet, or extract individual cells with ease. And without any .NET coding or impenetrable Microsoft APIs. Here's a simple example that lists the addresses and contents of all the cells in a spreadsheet.

<xsl:template match='/'>
<xsl:for-each select="/worksheet/sheetData/row/c">
<xsl:value-of select="@r"/> = <xsl:value-of select="v"/>
</xsl:for-each>
</xsl:template>

Of course it's simplified: i've missed off the namespaces, and strings are actually stored in a lookaside table and there are multiple sheets in a single document, but even so I'd rather wrangle XML than wrestle with CSV quotes any day.

RepositoryMan

Sunday, 27 February 2011

Open Access - Who Calls the Shots Now?

Rehabilitating The Third Star of Linked Data