Friday, 17 April 2009

Google and Repositories

Continuing yesterday's comments on the effect of Google PageRanking in resource discovery, there is an added Google effect that compounds the problem of discovering resources in repositories. Google doesn't treat each resource separately, but instead it aggregates all the resources from a single site, showing only the top two resources from that site no matter how many should appear.

For example, if I search for the terms "ontology" and "hypertext" directly in our school repository, 8 articles are returned. If I do the same search in Google, then our repository appears gratifyingly at the top of the list of results, but only TWO of those items are listed together with a discrete link to more items from this site.

So, not only is your article in competition with all other web pages on the planet, it is doubly in competition with other articles in your repository which could deprive it of its rightful place in the rankings.

This means that we need to think about redesign our repository pages to link to "other related work" that the visitor may not have seen represented in Google.


  1. One way to handle this would seem to be to create virtual hosts (using, of course, subdomains of the origin repository) to break out communities, journals, and the like. That way there is more of a chance for more "hits" to appear in the search results (because to Google they appear to come from different domains). Of course, I only have a passing familiarity with search engine optimization, so this might not actually work...

  2. "Other related work" is already a feature in eprints: it's those items searched for when a user clicks on the subject heading in the repository record. If the user has even looked at the repository's metadata record, of course!