Repositories Should be More Like Email (apparently)

See below of a summary of an interesting JCDL 2008 paper that adds to the "repositories - they're all wrong" debate. Cathy is well-known (and, I think, well-loved) from the hypertext community for her ethnographic studies of information handling, and here she reports on a small scale study of the information management practices of research authors as they go about the task of writing papers, and the implications for repositories. The paper is noteworthy because it highlights the role of email as a personal archiving solution and argues that any repository platform will need to do better than email in a range of criteria to gain user acceptance.

Well, it's a new target for repository developers, and perhaps a new marketing slogan to look forward to (EPrints: Sucks Less Than Hotmail).

From my experience, the paper rings true in its description of ad-hoc and distributed author processes, but it is focused on a small group of Computer Scientists all of whom use LaTeX and BibTeX, so I don't know exactly how applicable its message is across the whole institution.

Marshall, C. C. 2008. From writing and analysis to the repository: taking the scholars' perspective on scholarly archiving. In Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (Pittsburgh PA, PA, USA, June 16 - 20, 2008). JCDL '08. ACM, New York, NY, 251-260. doi: 10.1145/1378889.1378930.

(For those without subscriptions for the ACM Digital Library, Google Scholar will point you at a preprint available at

ABSTRACT: This paper reports the results of a qualitative field study of the scholarly writing, collaboration, information management, and long-term archiving practices of researchers in five related subdisciplines. The study focuses on the kinds of artifacts the researchers create in the process of writing a paper, how they exchange and store materials over the short term, how they handle references and bibliographic resources, and the strategies they use to guarantee the long term safety of their scholarly materials. The findings reveal: (1) the adoption of a new CIM infrastructure relies crucially on whether it compares favorably to email along six critical dimensions; (2) personal scholarly archives should be maintained as a side-effect of collaboration and the role of ancillary material such as datasets remains to be worked out; and (3) it is vital to consider agency when we talk about depositing new types of scholarly materials into disciplinary repositories.

The Bits I Underlined

Furthermore, from the point of view of the researchers and scientists themselves, institutional archiving arrives on the scene late in the process; the deposit of publications and datasets is an afterthought to the actual work, the research and writing. What would make archiving more integral to the entire process? What does scholarly archiving look like today from the scholar's perspective? How can normal collaborative interactions be used to improve repository quality?

I make an effort to focus closely on the practices and artifacts relevant to maintaining personal archives and contributing to institutional repositories.

Second, participants feel that versions record the development of ideas, a trail that may prove important. But how important? Much of the history and provenance of an idea can be reconstructed from communications media like email, especially when it is combined with intrinsic metadata such as file dates. Thus benign neglect coupled with imaginative interpretation will get you pretty far in reconstructing a publication's history.

What is most apparent throughout this discussion is that personal archiving is a side effect of collaboration and publication: for example, if email is used as the mechanism for sharing files, it also becomes the nexus for archiving files. If one's CV is the means by which a public list of publications is maintained, it is also used as a pointer for oneself to the most authoritative version of a publication. Personal archiving can be both opportunistic and social: participants talked about tracking down public versions of their own publications to reclaim copies of lost work.

Email is cited as a good permanent store for three reasons: (1) it is easy to browse chronologically, which makes retrieval easy and lifts the filing and organizing burden; (2) intrinsic metadata supports the reconstruction of context (for example, who made particular revisions and why); and (3) email is usually accessible from any web browser. If email is used as an archive, some care must be taken to ensure everything that is important is actually in email. Some archival material is normally in email (reviews, for example) and no extra effort needs to be expended to make it part of the record. Other types of artifacts‚ (run output, for example) must be put into email deliberately. Email is a sufficiently good archive that some participants made the effort...

It is easy to see how email provides just enough mechanism to fulfill the minimal version of these requirements. Any CIM infrastructure must beat email along all of those dimensions if it is to be adopted in email's stead

