MB: I am a statistician by training but a lot of the work I do involves the analysis of genomic data, usually focused on human disease. A major part of that is the use of existing data, usually in the public domain, so there’s a large computational aspect to that. Some researchers do this in isolation, but the groups I work with are collaborating to use these data in an exploratory way to identify novel relationships. We also use these data to confirm research findings in the lab.
I talk about our goals as being “augmenting research” – how can we help researchers augment their existing research with data that has already been generated elsewhere? A lot of data is analysed once, published, and then never revisited. The push is to create people who can ask clever questions with existing data and existing tools. The important change inherent in this approach is that you don’t necessarily need have to have your own “wet lab” to do genomics anymore. This change is being heralded by a number of prominent researchers, most notably bioinformatics guru Atul Butte. The idea is to generate ideas and questions from existing data, but outsource all the validation work to labs run by service providers, and then publish the proven outcomes.
MB: At this level, science is focused on asking the questions – but the data and the tools, the experiments, are often sourced from service providers or other research groups. Economies of scale and specialised technicians mean work can be done faster, more cost efficiently, and to professional standards. This really challenges existing science and science training because in universities we’re still doing all this ourselves and training our PhDs to be hands on. This doesn’t mean the skills aren’t important – the scientist still needs to understand the process and be able to interpret the data.
What training is duplicated at the post-graduate level? Is there opportunity for a major national infrastructure centre and what are the risks of centralising these activities? One of the dangers of centralisation is that you detach the knowledge from the individual universities and science communities, and then the technology evolves beyond scientists’ ability to keep up and apply. That said, when it comes to science and scientists, there are a powerful incentives to keep up with technology – perhaps the shift we need is for more scientists to lobby for capital investment in national infrastructure, rather than build business cases for local instrument purchases.
MB: We now have the ability to generate ridiculous amounts of genomic data. We can profile individuals deeply, but the challenge is to convert this to clinically actionable information. The expectation is that we’re moving towards using genomics information more and more in the clinical domain. We are starting to see genomics having a clinically meaningful impact in a number of settings, but this will happen unevenly. For example, we’re yet to be able to use whole genome sequence data to personalise cancer treatments on a routine basis simply because we still don’t fully understand the links between molecular characteristics, and clinical outcomes. Conversely, genomic technologies have had a major impact in cases of undiagnosed genetic disorders. This could be considered an “early adopter” segment of the clinical world, where genomics provides a powerful tool for finding the underlying genetic cause of a disease.
MB: This is also not an argument for putting gene sequencers and HPCs in every hospital – it’s going to be about clever outsourcing to service providers, much as we use diagnostic labs in the health system today. Running patients through genomic screening processes for diagnosis is prohibitively expensive simply because of the strict regulatory framework in this domain, which is certainly a good thing in terms of patient safety. In some centres in the US this challenge has been overcome by placing all clinical genomic analysis into a research framework. When a patient needs to be screened, they are enrolled in a research study that does the broad genomics work – if something “actionable” shows up, then they are formally tested for that particular anomaly using an approved diagnostic method. The high cost of using genomics in a clinical setting means the outsourced genomics lab is already becoming a commercially viable operation. At some point not too far in the future, genomic sequencing will be similar to sending blood samples to the lab for testing.
MB: While I agree with Cristin Print that the scale of genomic data is huge, what we currently need to report as providers to clinicians is likely to be quite targeted and actionable information. An interesting question is what the lab or service provider should do with the background data, i.e., the rest of the individual’s genome sequence. Once sequenced, genomics information is a valuable resource, as it can be reused many times throughout the lifetime of an individual; but the questions are what to keep, whether to keep everything, and whether to sequence everyone?
MB: Individuals are more and more comfortable about personal information being in the cloud or shared with people they may not know. People are actually giving out a lot of information through social media and are very open to sharing information publically – often with little comprehension of how that information might be used against them. Researchers however are still very conscious of protecting the details of research subjects, as the loss of ethically sensitive data is a potentially career ending move – particularly with genomic data.
MB: There’s a growing realisation that non-institutional infrastructure, such as Google, may be able to do a better job of sharing data and driving collaboration than your department or institution can. Researcher IT needs are so diverse that they’re hard to satisfy within the bounds of institutional internal IT; which means researchers go to many places to get the resources they need. In many cases institutional IT support can only just scratch the surface of research requirements (as an example, the University of Otago provides hundreds of terabytes of centralized storage for research use, however the Department of Biochemistry also maintains a separate large-scale storage infrastructure, as its own research data stack is close to the 100TB mark). It’s hard to meet this challenge at a single institution or at a national level, as identifying researcher needs into the future is almost impossible. To date, we can see two main approaches to this challenge:
– Build it and they will come: very much the driver behind NeSI and NZGL when first conceived;
– Tell us what you need: the way REANNZ operates now.
MB: One of the issues in trying to forecast researcher IT and instrument needs and requirements out into the future is the New Zealand funding cycle for research. In a three year funding cycle such as Marsden and HRC (Health Research Council) operate, there’s a very good chance any one particular researcher could be working on something fundamentally different in three years’ time. In this situation, it’s hard to know what your IT needs will be in the longer term and therefore very hard to forecast infrastructure requirements. It’s also difficult to see this cycle changing anytime soon.
MB: I like Keith Gordon’s sport analogy to science funding – our science funding approach applied to rugby would mean we would make a list of all the players who played well this year, select a group randomly from these, and then call them the All Blacks. Unfortunately we will always have many more excellent project proposals than we can afford to fund, but this funding uncertainty, coupled with relatively short funding cycles, makes it very difficult to build and sustain long-term research programmes.
MB: I think we’re probably better off with flexible, well-monitored investment over a reasonable timeframe (at least five years) when funding technology driven research infrastructure like HPC; in contrast to the more rigid time scales and schedules associated with physical infrastructure such as buildings and roads.
MB: NZGL was initially envisioned not as a corporate entity, but as more of a collaborative initiative, along the lines of NeSI or the CoREs. However, autonomy was also seen to be an important ingredient, and as a result, NZGL was created as a company, with clearly defined governance and management structures that were sufficiently independent from the academic institutions. One issue this raises, is the danger of divorcing the new organisation from the institutions and the researchers. If the academics are not intrinsically involved, they can easily become disconnected from both the day-to-day operations and the long-term goals of the organisation. NeSI and NZGL management have worked hard to have academic involvement on a daily and weekly basis, trying to maintain high degrees of engagement – but this is challenging, and researchers can also find it difficult to contribute, no matter how supportive they are of the goals. This can be an over-commitment issue – academic time is often offered up in lieu of cash “co-investment” contributions to national initiatives like NZGL and NeSI, but if additional funding is not being made available, then this simply equates to the universities being asked to do more with their existing budgets.
MB: Considering our existing infrastructure, I think that NeSI really benefitted from following NZGL, as considerable groundwork had already been done in terms of negotiating the establishment of a national infrastructure – NeSI were then able to look at what had been set up, and alter the various components to suit their needs. One thing NeSI did differently was to require an up-front cash investment from the partner institutions, whereas NZGL gets much of its co-investment from internal and in-kind contributions – on the one hand this gives NZGL less flexibility in terms of ongoing investment in infrastructure, but on the other it means that the collaborator institutions remain actively engaged with NZGL in investment discussions. Both NeSI and NZGL have made valuable investments in vital physical infrastructure, and at the same time have built internationally competitive teams to support NZ HPC and genomics research. Looking ahead, it will be critical to maintain the collective expertise that has been put in place. Buying HPC and genomics hardware is relatively quick, but growing an effective team is a long-term proposition – I sincerely hope that both initiatives are given the chance to reach their full potential, in an increasingly cooperative fashion, as we move forward.