Climate Change and Energy Research – Intersect eResearch Project Summary
[This is document was prepared by Intersect for the DC21 project being run by the University of Western Sydney, funded by the Australian National Data Service. We’re posting this here as part of the story about how UWS is building a Research Data Repository.]
Dr Luc Small | 20 February 2012 | 4.1
Intersect is developing and deploying technology aimed at assisting research into climate change and energy by enhancing the management of environmental sensor data. This document describes the core functionality of the proposed project and is targeted at researchers and their support teams who may wish to join the project as collaborators.
Background and Context
There has been a significant rise in the number of sensors and sensor networks used in environmental research in recent years. This growth has brought with it the challenge of managing sensor infrastructure and the data produced by the increasing numbers of deployed sensors.
Three classes of instruments are targeted for this project, from which data and meta-data will be collected:
Eddy Flux Towers – Collect meteorological and flux data (e.g. surface-atmosphere exchanges of CO2, water vapour and energy).
Whole tree chambers – Collect meteorological data regarding the environmental impact on a tree wholly encapsulated within the chamber.
Weather stations – Collect meteorological data.
While these instruments are the current focus of the project, the project aims to be sensor/infrastructure agnostic and therefore more generally applicable to sensor data management.
The problem of insufficient sensor infrastructure and data management affects researchers, data technicians and infrastructure managers the impact of which is:
lost or misplaced sensor data.
inadequate recording of how and where data was collected
inadequate recording of quality assurance, gap filling and other post-processing done to the data, and the assumptions made by the data technician during post-processing.
scientific conclusions are based on less than ideally managed source data that is prone to error.
A successful solution would:
store data in a secure, backed-up, centralised location.
record rich metadata about how and where data was collected.
record rich metadata about the post-processing done to the data.
provide an intuitive means by which researchers can access data and be fully informed about its nature by consulting its associated metadata.
Infrastructure management: The ability to keep track of sensor infrastructure (for example, flux towers and weather stations) and individual sensors, and changes to the sensors and/or the infrastructure.
Raw sensor data acquisition: Manual and automated data acquisition from files generated by sensors and/or their data-loggers.
Versioned data storage: Permanent retention of raw sensor data. When datasets are quality assured, gap filled, or transformed, new versions of the datasets are created, time-stamped, and related back to the original raw sensor datasets. Data is stored in a centralised fashion that can be easily backed up.
Data sharing: Data can be downloaded by those within the research group. Data comes with detailed meta-data describing the sensor and infrastructure used to acquire the dataset and any transformations that may have been done to it. Meta-data can be made available to Research Data Australia to make the research data more readily discoverable by other scientists.
Data upload: As noted above, new versions of a dataset can be uploaded to the system and linked to the original raw dataset. This allows the process of data transformation to be tracked and ensures that it is a non-destructive process because all datasets created are retained.
If you face a similar problem and would like to join as a collaborator, or you face a related problem and are interested in understanding more about this project and potentially reusing components of it, please contact your local eResearch Analyst or email firstname.lastname@example.org.
ANDS – Data Capture Program – $200k
University of Western Sydney
Hawkesbury Institute of the Environment
Prof. Ian Anderson
Development commenced in December 2011 and the system will go live in the latter half of 2012.
TERN/OzFlux: The present project is best regarded as supporting the precursor activities that enable the delivery of quality assured data to a facility such as OzFlux.
A Day in the Life…
A new sensor is installed on a flux tower. Data files are retrieved from the associated datalogger once a day and placed on networked storage. The infrastructure manager:
Adds the sensor to the catalogue of sensors associated with the flux tower.
Associates the sensor data files with the sensor record.
Provides detailed meta-data about the sensor, such as its make, model, position on flux tower, etc.
Removes the sensor record for the faulty sensor that this new model has replaced.
Over the days that follow, data starts flowing in from the new sensor. The data technician:
Downloads the raw sensor data that has been collected.
Gap fills part of the data where the sensor has recorded readings that lie outside the band of expected values.
Uploads the gap filled data along with an explanation of the post-processing applied to the data.
The system stores the gap filled data as a new version. This new version is automatically associated with the original raw sensor data. The raw data remains available, unmodified, for future reference. The researcher:
Explores the sensors available on the flux tower and selects the one she’s interested in.
Browses the data available for the sensor and takes note of the data technician’s comments about any post-processing steps that have been performed.
Selects the gap filled version of the data since it is most appropriate in this instance.
Downloads the gap filled sensor data and commences analysis.
Is aided in analysis and write-up by having the full details of the flux tower, sensor, and post-processing step at hand.
Finds anomalies in the gap filled data and isolates the post-processing as the cause by looking at the raw sensor data.
Having decided this is “the” dataset the researcher asks the system to archive a copy and mint a new DOI so it can be cited like an article, and retrieved from the UWS Research Data Repository.
Please refer to the diagram below for an indication of how other stakeholders will interact with this project.
This document by Intersect Australia is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.