How eResearch-y are you?

How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions.
  1. Your latest published article has a graph in it. If the eResearch police asked you to reproduce the plot exactly using the original data you’d:
    1. Check out the code archived with the article, and re-run the make-file, which would not only re-generate the plot using Knitr, but the whole article, which would also be made available as an interactive website using Shiny with an option to re-run the models on data which is crowd-transcribed from the logs of 17th century slave ships.
    2. Redo the diagram in Excel, using the clearly set out method and supplemental material from the article.
    3. Find the data (by borrowing back last year’s laptop from a postgrad), then fiddle around with what you think is the right spreadsheet make something that looks pretty much like the one in the paper.
    4. Plot? What plot? And what was all that babble in option A?
  1. Turns out that some of the photos and recordings you made when documenting a research site contain images and sounds of a Yeti. If you can provide complete records of where and when you collected this data, you can collect a $1,000,000 prize from a cable TV station. Your next step is to:
    1. Provide the DOI to the dataset which you have archived in your institution’s data repository. The repository record with the data attached provides all the information required to support your claim.
    2. Scan the relevant pages from your field notebook and annotate these with supporting information specific to the Yeti sighting.
    3. Rummage around the office: you last saw that scrap of paper you scribbled on during the fieldwork with the pile on top of your filing cabinet.
    4. Quickly throw together some handwritten notes and scorch them with a candle so they look old. No actually, you couldn’t be bothered. Also you don’t believe in Yetis or Santa.
  2. You have so much data to analyse and your models are getting so complicated that your laptop is getting hot, so you:
    1. Use Docker to create a 128 Node compute cluster in the NeCTAR cloud, get some results, archive all the code, data and outputs with DOIs and go home early.
    2. Enrol in Intersect’s High Perfomance Computing (HPC) courses and learn how to run your job on shared infrastructure.
    3. Give it to one of the PhD students to sort out.
    4. We have you mixed up with someone else – your iPad never gets hot unless you watch too much YouTube in the sun.
  1. When archiving data you always:
    1. Take care to use standard file-formats that are easily machine readable, and make sure all code and as much provenance information as possible, are also archived.
    2. Fill in the metadata fields on the institutional data catalogue application as carefully as you can.
    3. Try to change the worksheet names on your Excel files from Sheet 1 to something more meaningful, if you get time.
    4. Use the shredder in the research office. It’s more fun than the old technique of scrunching up the envelope on which the data were written and trying to get it in the bin for a three-pointer.
  1. The best place to store research data during your project is:
    1. On a secure, backed-up, cloud storage server (with data held in an appropriate jurisdiction) which you can access from anywhere with an internet connection, and share with designated collaborators.
    2. On a secure, backed-up drive accessible only from your office.
    3. On a Dr Who USB stick.
    4. You delete your raw data after you’ve analysed it. Although, actually, sometimes raw data doesn’t agree with you; so you cook some up to better fit your conclusions.
  1. A data management plan is:
    1. An important tool which facilitates planning for the creation, storage, access and preservation of research data. Creating this at the start of a research project and referring to it as a living document informs the research workflow and specifies how data will be managed.
    2. Something to think about once you’ve collected some data.
    3. More paperwork to bog down the research process, like Ethics. Oh for the good old days when we used to be able to electrocute the students without filling out so many forms.
    4. Data management plan? I’m not even in management so don’t interupt me, I’m enjoying my holidays
  1. Collaborative research is:
    1. Enabled by eResearch technologies and supported by Open Access to published research data.
    2. Maximising the funding universities receive by sharing resources and equipment for a research project.
    3. Popping next door to ask a colleague a question.
    4. Not something you’re interested in. Your data will die with you.
  1. If you wanted to share your completed research dataset with others you would:
    1. Contact the Library or eResearch and discuss publishing the data and related methodology to the institutional data catalogue, which can then also be included in the Research Data Australia discovery portal. The data would be described using appropriate metadata, and linked to related collections, fields of research, people and facilities.
    2. Publish the data on your personal website and ask people to contact you via a hotmail address for more information.
    3. Email the file to colleagues you think would be interested
    4. You told us before – your data will die with you.
Your score:

Mostly As – We’d love to talk to you about becoming an eResearch champion. You have embraced the benefits of eResearch technology and methodology and have put comprehensive plans in place for the use and re-use of your valuable data.

Mostly Bs – You understand that technology is a useful tool but you’re hesitant to rely on it for your research. Try putting aside your trust issues and play around with one new tool or habit this week – it might spark an idea or save you valuable time. There are lots of opportunities to attend training or do a self-paced online course to increase your comfort level.

Mostly Cs – It might be time to chat to the eResearch team about joining the 21st century. Although your existing research process may be valid, eResearch boosts the research process through opportunities to add computing power, streamline workflows, and collaborate with like-minded researchers from around the globe.

Mostly Ds – Bah, humbug.

Creative Commons License
How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions. by Peter Sefton & Katrina Trewin is licensed under a Creative Commons Attribution 4.0 International License.

Thanks Kim Heckenberg for your input and sorry Alf, we didn’t put in anything about multi-screen immersive visualization.

Who is DORA?

[Update 2014-09-11: fixed some grammatical misalignments]

Avid readers of this blog will have noticed that we’ve suddenly started talking about DORA quite a lot. So who, why and/or what is a DORA?

The simple answer is that it is a Digital Object Repository for Academe, which tells you that we came up with a snappy acronym but perhaps leaves you wanting a little more information.

As part of our mission of supporting (? encouraging, enabling, proselytizing, enforcing, …) eResearch at UWS, we’ve taken a step back from the coalface and tried to paint The Big PictureTM about what systems to support eResearch should look like. One output of this was the set of principles and practices, and another is a high-level architecture of an eResearch system, which looks like this:

Overview of an eResearch system, with DORA

Overview of an eResearch system, with DORA

A Little Mora1

The basic idea is that a DORA provides a good place on which to store research data while researchers are working on it. To this end there are a few key features any potential DORA must support, some of which come directly from our principles, but some of which are more process oriented. A DORA must:

  • safely store research data and its associated metadata in a way which keeps them linked. Conceptually there is a combined object in the DORA which contains both the data and metadata (No Data Without Metadata)
  • allow versioning of these combined data objects
  • allow search for data objects
  • support the Upload and Researcher APIs to allow scripted operations:
    • Upload API allows automated upload of research data and associated metadata
    • Researcher API allows searching for and downloading of object, and then uploading of modified or processed versions of them
  • support the Publisher API to allow a clean transfer of information on data objects to an institutional data catalogue, as well as a possible transfer of the the data to another repository (depending on the nature of the data, the repositories and an institution’s policies)

In an ideal world, there would be one DORA which would do everything for everyone, but honestly that seems so unlikely that we have to acknowledge that we will end up with a small number of DORAe and for any given research project we will pick the most appropriate one. This is another place where the APIs come in – if all DORAe support the same APIs then they become drop-in functional replacements for each other. Additionally, behind these APIs there could be a small ecosystem of cooperating tools – a simple repository for storing, an indexer for searching, a preview generator, etc – further reducing the need to find One Perfect Tool which Does Everything Brilliantly. (Separation of Concerns)

The catch here is, of course, that it is unlikely that two different potential DORAe will come out of the box supporting exactly the same APIs, so there’s a good chance that we will have to write some code to adapt the out-of-the-box API to the generic one we design. One possible light in this particular darkness is how much we can use something like Sword 2 as an API.


So how does a DORA work with our AAAA data management methodology? To our great relief, pretty well:

the getting of the data and metadata in the first place. It’s not really shown on it but essentially the output of the acquisition are the data and metadata on the filesystem at the bottom of the diagram.
the combining of the data and metadata and the uploading of it into the DORA, via the Upload API.
the stuff a researcher does to the data. The data is fetched via the Researcher API and updated versions are written to the DORA also via the Researcher API. This is where the versioning capability of DORA comes into play.
information about the data is packaged up and transferred into the Institutional Data Catalogue. Optionally, the data described in the catalogue may be transferred to the Institutional Data Store.

As you can see, a DORA sits at the heart of this, and is pretty key to making it all work, which is why we might start to seem as if we’re banging on about DORAe rather.

Creative Commons Licence
Who is DORA? by David Clarke is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

  1. I should take a moment to apologise for having opened this particular PanDORA’s Box of punnery.

What should we be giving our researchers?

Creative Commons License
This work by Peter Sefton with assistance from Andrew Cheetham and Deborah Sweeney is licensed under a Creative Commons Attribution 3.0 Australia License.

I recently briefed our Campus Development Committee at the University of Western Sydney about the eResearch. I asked the question, What Information and Communications Technology support do our researchers need?

The eResearch team has put together a roadmap which is not quite ready for release. Central to the roadmap is this overall diagram of how we see eResearch, spanning from basic shared infrastructure that supports admin, teaching, learning and research at the bottom via the three ‘pillars’ of eResearch to higher-end, more specific research infrastructure like the NeCTAR funded Virtual Laboratory for Human Communications Science that we’ve recently started, led by the MARCS institute.

Making it topical

To put eResearch in context I referred to something that’s happening in the learning side of the enterprise. At the University of Western Sydney we have a bold new initiative in place. The headline reads “UWS Deploys iPads to support IT-enhanced learning”.

From the website:

This bold move – believed to be the largest rollout of its kind ever carried out in an Australian university -marks the start of the University’s major longer-term strategy to engage students in new ways of learning and interacting with all that UWS has to offer across its campuses and online. 

“With digital technology revolutionising how we connect and interact with the world, university study should be no different,” says Professor Kerri-Lee Krause, (UWS Pro Vice-Chancellor (Education)

“This initiative will not only readily equip our students and academic staff with mobile tools to enhance learning, it will also help them to engage with an ever-increasing online world.”

This is a multi-million dollar play – designed to support the phased rollout of the ‘blended learning’ initiative throughout the undergraduate programs as well as to attract and retain students and to show our commitment to providing a modern, responsive educational environment that spans the online and face-to-face worlds.

So, what’s the eResearch equivalent? What would be the headline? What wold be the initiative that would put UWS ahead of the curve?

UWS deploys <insert-initiative-here> to support IT-enhanced research

<insert photo of happy researchers>

Question is, what can we do that demonstrates a similar commitment to our research community? We’re always open to new ideas on this, but the eResearch team has consulted extensively with research communities, IT and administrators in the University to come up with a Roadmap which will be published soon, subject to any further comments from the eResearch committee.

eResearch activities

  • Specific support:

  • Generic support:

    1. Research Data Repository project (including two other projects funded by the Australian National Data Service)

    2. Research computing project*

    3. Research collaboration project*

  • eResearch Readiness Program*

*These are yet to be funded and established.

All this is in the service of the research objectives of the university:

  • Objective 1 – Increase external research income to the University

  • Objective 2 – Increase the number of fields of research at UWS operating above or well above world standard

  • Objective 3 – Increase the number and concentration of funded research partnerships

  • Objective 4 – Ensure UWS attracts and graduates high quality HDR students to its areas of research strength.

It’s worth looking at where we sit in the research landscape. Here’s a chart put together by Andrew Cheetham, our Deputy Vice Chancellor Research that shows the distribution of Australian Research Council discovery grant allocations for 2013. This shows how well our young university performs in attracting Australian Competitive Grant income compared to the cohort of Dawkins Universities  formed post 1988. Note that UWS (in red) is sitting just behind Griffith,



Figure 1 © Andrew Cheetham, University of Western Sydney, used with permission

There are 21 people listed in Griffith’s eResearch team.

At UWS the core ongoing team numbers two at the moment, increasing to three soon, plus a couple of extras (present total 4.5).

With such a small team we need to try to find a big lever to have an impact on the eResearch culture at UWS; even though we were unable to secure funding in the latest budget allocations for a dedicated trainer/disseminator to coordinate eResearch education for staff and students, and set up a series of interest groups where we can reach multiple people at once to help create peer-support networks – these will be organised around various kernels of community that we have found around the place:

  • High Performance Computing (HPC) users,

  • Statistical computing (in particular the those using the R language),

  • Reproducible Research and

  • Digital Humanities.

One of the other things we need to do is encourage an eResearch culture. This means asking not just ‘what services do we need to provide to researchers?’ but also what are the attributes of an effective IT-enabled researcher. We have started work on mapping-out what kind of support we need to give our research community to improve eResearch readiness and capability.