Sunday, 26 October 2008
Patterns in Repository Access
The clocks have gone back this morning, and I was looking for something to do with my extra hour. Having tidied the kitchen cupboards, I thought I'd have a play with the Google Analytics result for our school repository.
I've only ever reported summaries of download data to our research committee - and that data is pretty constant at 30,000 full-text downloads per month, or a million papers every three years. So I was interested to see how the daily pattern of repository accesses varies over the academic year, and how that variance itself seems to repeat every year. The image attached to this posting shows the daily downloads (recorded by Google Analytics) plotted over the last year (October 27 2007 - October 26 2008) in blue, with the previous year's data also plotted in green.
The rapid oscillations are the weekly rise and fall - a peak on Mon/Tues followed by a gradual, slight decline over the week and a slump on Saturday (to around 1/3 of peak levels) with a slight rise on Sunday. Invoking Excel on the Google Analytics results, and ignoring weeks with public holidays or traditional staff vacations (where access levels are significantly lower and patterns of attendance are less predictable) the general pattern for the remaining 58 high-activity weeks' access is Monday 18%, Tuesday 18%, Wednesday 17%, Thursday 17%, Friday 16%, Saturday 7%, Sunday 8%.
What surprised me was how much the gentle falls and rises over the academic year seem so similar on both curves. The places where the match is less than exact correspond to the start of the graph (there is no data for Oct-Nov 2006) and to Easter in each year (mid March in 2007 and early April in 2008).
I'm not sure that there's a moral to this posting, apart from the fact that there seems to be a hidden regularity in the repository downloads. I must set a student to investigate!