By Peter Sefton
In the week of 9th – 13th of July I attended the Open Repositories 2012 conference in Edinburgh. I’m on the organising committee, taking a special interest in the ‘developer challenge’ event, which has become an important strand of the conference DNA. I was chair of the judging panel, and tried to help Mahendra Mahey of UKOLN and team encourage entrants, provide feedback on ideas and so on. So that’s a contributing factor to me not getting to that many sessions, but coming away with a good sense of what repository developers are thinking about and working on. I’ve had quite a few exciting discussion about what we can do for Open Repositories 2013 at Prince Edward Island.
The good news is that the sessions are online in video form, so I can go back to the ones that I wanted to see and missed, particularly the session on name and data identifiers. If the ORCID ID system works, then that will be a very good thing, watch the presentation for why. Simeon Warner makes the point that if repositories want to play an important role in scholarship then they need to engage with ORCID. I think this is a strong argument for repositories getting involved early, rather than sitting back and waiting to see if ORCID fails, which it might if everyone sits back and waits. But it might not fail, and then the repositories that stay out will be marginalised in a world of research metrics glued together by ORCID IDs.
It’s a bit difficult to correlate sessions from the program with the YouTube recordings but you can search for the ones of interest.
It’s all about Research Data
OK, so it’s not all about data, but overall, the conference continued the trend of increasing attention on Research Data Repositories (RDRs) and the research process. And that’s not just me, that’s the data speaking (although the data are of dubious quality). Peter Burnhill wrapped-up proceedings with a talk illustrated by a tweet-driven word-cloud, built from tweets about the conference. Data was big, along with the obvious terms like open and repository.
Figure 1 Image via http://cdrs.columbia.edu/cdrsmain/2012/07/the-news-from-edinburgh-and-open-repositories-2012/ attributed to Adam Field
When the conference first started in Sydney 7 years ago it was mostly about Institutional Publications Repositories (IPRs), with some discussion about how they might be better integrated into research processes such as data collection and publishing. Back in those days presentations even started out with helpful definitions of the word repository. IPRs are still with us, of course but the discussion of integration with scholarly processes of all kinds has moved from “we should” to “we are”.
There are now a lot more general digital library type systems being discussed too. The change in approach as we move beyond IPRs to RDRs was the topic of a presentation that I gave in the short-papers stream at the conference, prepared with four other Australians from three other institutions. I have posted the presentation including the speaker notes on my blog. By the way, this presentation won best-in-session.
Research data management had a couple of sessions, including this one with an appearance by Natasha Simons from Griffith (about six minutes in) talking about their Research Hub and a very different perspective from Anthony Beitz at Monash (at about 58minutes), where there is no hub. Anthony makes the point that a single research data management system is never going to suit all researchers and emphases the importance of dealing with research communities. In between is Sally Rumsey talking about Oxford’s developing institutional approach. All three are worth watching for those of us implementing research data systems, whether centralised or not.
One of the big words in the word cloud is “challenge”, which reflects the amount of effort that JISC put in to promoting the developer challenge. I’ll say a bit about this years winners, using the notes I gave to Mahendra.
The winning entries in the developer competition both came from the data side. The runners up, Keith Gilbertson & Linda Newman, showed an idea for a small simple mobile app to capture video or audio and deposit it to a repository, but with a twist – the option to send it to a machine or human transcription service. The idea of using Microsoft’s speech conversion service got this one the special Microsoft prize, some .NET Gadgeteer hardware.
Patrick McSweeney’s winning entry “Data Engine” tackled research data management by bringing useful tools for data wrangling and visualisation into the repository. Patrick picked the challenges facing one PhD student in Engineering to illustrate a prevalent problem, a lack of generic tools for managing tabular data. There’s plenty of action in this space at the moment – for example the DC21 Data Capture app being developed for the Hawkesbury Institute for the Environment at UWS has some things in common with Pat’s app as does Orbital, which Nick Jackson and Joss Winn from Lincoln talked about in the ‘non traditional content’ section, where I guess ‘traditional’ means PDF versions of papers.
I am hoping that Pat can leverage his win to make it out to Australia for eResearch Australasia, and we can get him talking with some of our local developers.
From the runners up I was very excited by “Is this research readable?” an idea by Cameron Neylon, implemented by Ben O’Steen. The idea is to get some hard data on the accessibility of research: take a statistically significant number of research articles via a randomly generated set of DOIs and get people word-wide in different places and on different networks to report if they can access them. This entry is a very important contribution to the ongoing debate about the evolution of scholarly publishing. If implemented fully it will allow a global crowd-sourced statistically significant survey of how much of the online scholarly record is actually accessible to various people in various parts of the world, a topic about which much has been said, but for which we have very little hard data. I hope that Cameron along with his employer PLOS will continue this important campaign – perhaps declaring a world-wide ‘Is this research readable’ month late
r in 2012.
I’ll be feeding insights and information from the conference into the various projects we have running at UWS and talking with the library systems team about the latest in Fedora Commons repositories. We’re running the Fedora-compatible ReDBox system as part of our Research Data Repository infrastructure – there was some interesting discussion in the Fedora Commons session about how to align the way various applications can play nicely together, this is definitely something the Australian community should get involved in.
Figure 2 Me, chairing a session, photo by Jonathan Markow from the Duraspace Foundation
Copyright Peter Sefton, 2012. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>