National Research Data Programme – NRDP

Data Dynamics

Data is a useful umbrella term across many aspects of our lives, however not all data are equal. In research, data comes in many shapes and forms. Unsurprisingly, different types of data are required for different research disciplines, each with a different approach to meta-data. Data can be created through computation, collected iteratively over a time, streamed constantly from a sensor or network of sensors, or developed from observations. Research data is not just generated by researchers; central and local government, environmental agencies, and the public – clinicians, teachers, and farmers – all develop research data. Our New Zealand research system is actually very good at creating and collecting data; however up until quite recently technological constraints on storage and connectivity meant that most of that data could not be effectively retained or shared. In an increasingly digital society and economy, we find ourselves relatively inexperienced at managing our data, and under-resourced if we’re to sustain this stock of knowledge.

Open Data

Open Data means that the existence of particular data can be discovered, and access to that data can be negotiated. It does not mean that access will be free; data is expensive to collect and valuable to employ. It also does not mean that all data will be made available – there are many reasons why data might be confidential or private, and knowing it exists doesn’t automatically convey a right to see it. The London-based Open Data Institute uses a helpful diagram (Figure 7) to differentiate data by access.



In principle (and possibly in law), publicly funded research data is supposed to be Open. New Zealand research institutions will need some help to achieve this, however it’s a goal worth striving for. A well-functioning, implemented open research data policy not only increases the impact and efficiency of research, we think it will also encourage trusted data bridges to form between research, industry and government which have potential to transfer knowledge as effectively as journal publishing and possibly a lot faster.

On the Data Spectrum at Figure 7, some data is secret – its very existence is restricted knowledge. There will always be cases where this is needful, and the technical solutions to manage this are well-understood, provided the data management capability exists in the sector in the first place. As we don’t yet have a strong national research data capability, our institutions are forced towards imprecise or broad-brush secrecy, making knowledge unavailable for sharing for longer than might actually be necessary.

Active Data versus Passive Data

We think it’s important to distinguish between active and passive data in research. Active Data is generated through research activity every year – it is data our researchers are working with and developing outcomes from right now. Each year the Government invests about NZD1.4bn in research, of which approximately 40% is spent directly on collecting, creating, analysing and interpreting active research data. Passive data on the other hand, is data that isn’t currently being used, or has been collected but not analysed – this is not dead data, simply data that needs to be accessible in case it could be used again in the future. In some cases, “accessible” may mean preservation in a national collection or archive, however in many cases accessibility of passive data simply means it’s easy to find out who (which institution, faculty or department) has the data you are interested in.

Access versus Storage

National collections and discipline-specific repositories have a role to play in co-locating data from many institutions, increasing the efficiency of research activities by making data discoverable, and through long-term data preservation. The Royal Society of New Zealand recent report on National Taxonomic Collections suggests there is scope to increase the effectiveness of our national capabilities in preservation and storage of important collections. That said, storage of research data in most cases remains the responsibility of our research institutions. Our research institutions are the employers of our researchers and the holders of research contracts, as such these institutions are also the owners of research outputs such as data, and often have a statutory duty of care when it comes to publicly funded data to ensure that it is value-adding asset for the long term.

It’s troubling therefore that many of our research institutions do not feel they have the resources to make data accessible, and in some cases are actively deleting research data due to a lack of data management resource. In 2016, the technology for storing data is inexpensive, but the means and processes for managing data are complex and relatively costly. Without support for data management, our institutions cannot be sure that value created by research funded this year will be accessible for future use. We know from international studies that the expensive, value-adding activities in data management are data acquisition and data sharing.

Quite literally, the “access” and “openness” of research data makes up 85% of the costs of research data management (storage and preservation make up the remaining 15%)

Policy support for research data can also make a big difference to our research institutions; for example, carefully managing and depreciating (relatively inexpensive) storage hardware assets, while fully expensing (relatively costly) data acquisition and sharing each year seems to place the cart before the horse when locating the source of economic value.