Examples of the growing eResearch Infrastructure at UWS

This work by Toby O’Hara, Peter Bugeia & Peter Sefton is licensed under a Creative Commons Attribution 3.0 Unported License.

In this post, we list some of the basic infrastructure for eResearch that are already set up and ongoing and the University of Western Sydney. You’ll notice a few themes, such as use of virtual computing environments, use of shared storage including the nascent UWS Research Data Repository (RDR), and consultation and coordination. Some of these are stop-gap solutions until the RDR and Research Computing Environment becomes fully operational as part of the continuing RDR project.

These examples from a number of disciplines in across our flagship institutes and schools all show the importance of having dedicated research computing support and consulting and hardware infrastructure – none of the research teams we talk about below would be able to perform their research without services that go well beyond the standard ITS offerings available for administrative computing at UWS.

Centre for Complementary Medicine Research (CompleMed)

This group had started collecting data about plant samples, and needed a way to store their data in a central location which could be shared and reused amongst the team and with other researchers. eResearch were able to set up a virtual server with attached storage from the Research Data Store and install an FTP service on the server so that at any time a person with a login could download raw data, and upload analysed data. The system was set up so that authorised users are managed directly, and persons inside and outside the university could gain access. This is still going, and has the potential to grow. There is interest from other research teams as well. To facilitate continued growth within CompleMed, more storage will be needed and more services to allow global collaboration.

In the near future the FTP service will be seamlessly migrated to the Research Compute Environment, which is planned infrastructure designed for this purpose (and other needs like it. It is also a good candidate for local cloud storage, another component of the RDR, which is planned for implementation next year.

Centre for Positive Psychology in Education (CPPE)

The research involved CPPE creating, distributing, and marking of paper surveys which needed to be scanned and stored. The scans then needed to be ‘sent’ or ‘read’ by an image reader and then translated into data files that could subsequently be analysed. The team needed a central storage point where the scans could be automatically deposited. They also needed computing capability that could run the conversion software. The resulting analysable raw data would also be stored centrally, so that an authorised person could then access it. To meet all of these needs, a portion of the Research Data Store was allocated for storage of the scanned images, and the related data. A virtual server was also created, which then hosted the software to convert the scanned images into data sets. This virtual server is not yet part of the Research Compute Environment, and would be a good candidate. This set up is still going strong. eResearch are aware of additional tools that would make the conversion and sharing of data more seamless, and are working to make this possible. To be able to provide this would reduce cost and processing time to the researcher.

HIE – Hawkesbury Forest Experiment

The Whole Tree Chamber component of the Hawkesbury Forest Experiment collects data about carbon/ tree interactions. Sensor data measuring needs to be stored centrally where multiple members of the team, and researchers who are in the team but outside the university would be able to analyse the data and query the data for various research projects. Our implemented solution was to use a virtual server, install a Secure FTP application, with secure logins, and attached to the Research Data Store. When the Research Computing Environment is ready, this SFTP is planned to be migrated across. HIE is also going to be the beneficiary of a new data capture capability which is an integrated system for capturing and storing data together with its metadata, into an organised format for facilitated access and reuse. The application will be run from the Research Compute Environment, and utilise the Research Data Store.

The EucFACE is another tree data component of the Hawkesbury Forest Experiment. Intersect, the eResearch consortium in NSW, in consultation with UWS eResearch assisted in creating a new research data management plan, which contains direction and instruction regarding all aspects of managing the data including data capture, storage, documentation, retention, reuse, disposal and archiving.

This data management plan has many elements that can be used across HIE. The next step for us will be to collaboratively implement a cross-institute plan, as a guideline for data held in common as well as for individual research projects. It is a goal of eResearch to create a research data management plan for institutes and schools, tailored to their research methods and data.

HIE have been a strong supporter or eResearch, many of their researchers are enthusiastic about what we are working on, such as centralised collaborative data sharing, easy to access collaboration tools, and structured archive and reuse of data.

HIE – Genomic Life Sciences

This particular example is a large project involving multiple streams within the Genomic Life Science group. The stream we assisted with was the collection, storage and retrieval of genetic data. eResearch and ITS consulted with the research team and developed specifications for equipment which was then procured and implemented. The solution consisted of 2 large servers, networked together, with virtual workstations configured for retrieving and analysing data. There were also a number of hard disks procured, sufficiently large that it could retain the copious amounts of data being produced.

This is a good example of a group that would benefit from the Research Data Store and the Research Computing Environment. To have storage and servers for research, available upon request as a service to researchers would quickly and conveniently provide for projects such as this one.

This is also an example of the consultative nature of eResearch, and the crucial recommendations and advice for researchers who have requirements that can be met more effectively with the advances in technology of which they might not be aware.

Centre for Health Research

eResearch has worked with the Centre for Health Research on a number of projects. Some examples include:

  • Setting up an environment that can store and work with a large amount of data available as a Public Health Database to be used by several researchers on different projects. This included a central repository for the data, sufficiently large to retain the data and the working copies that were being used by the different teams. There was also the purchase and installation of several servers, which were then carved into workstations for retrieving and analysing the data. This is still ongoing. Last we heard, it was very popular, and we will be going back to them soon and devise the best approach to increase the capacity.

  • The Centre are also the custodians of another set of data, made available through the Department of Families, Housing, Community Services and Indigenous Affairs. This data is available to researchers inside and outside the School of Medicine. Very soon this data will be residing on the Research Data Store, with shared drive access, and with well defined access control and rules for use. eResearch has not only facilitated the technical solution, but also the development of a data management plan for this data. By consulting with the appointed data manager and other users of the data, the data management plan was created to clearly define how the data was to be stored, used, safeguarded, and governed.

CHR have been a strong supporter of eResearch as well, and are very interested in the possibilities in collaborative tools and sharing locally and globally.

The Australian Speech Corpus (AusTalk) or the Big ASC

AusTalk is a LEAF funded project, run by the MARCS Institute, intended to collect audio and video samples of people all over Australia speaking and responding to a series of standardised interview questions. There was a great deal of work around getting a kit with hardware and software, with contributions from the ITS group. eResearch also assisted the AusTalk project manager in creating a data management plan, to spell out the requirements of storing, securing, and accessing the data.

Nanoscale Organisation and Dynamics

This research within the Nanoscale Organisation and Dynamics group consists of collecting a large number of nano-scale images and storing them for comparison and analysis. eResearch and ITS determined that a large data storage was required, as well as virtual servers and workstations. The data has been added to the Research Data Store and first steps are implemented to set up the virtual servers and workstations, which were procured through the research funds.

Smaller requests for assistance

eResearch have also been able to facilitate a number of smaller requests for assistance from researchers. These include:

  • Archival storage

  • A persistent identifier for their data, to be used to reference the data

  • Advice for good data management best practices

  • Consultation in choosing the best technology solution