Thursday 4 October 2018
Concurrent Session 3B
High Performance Computing On Demand
Using Azure for capacity on demand high performance computing as an alternative to building an on-premise in-house supported cluster. This leverages new opportunities in cloud capabilities to deliver faster processing of more datasets, producing more effective and cost efficient results.
First and foremost this approach supports Lincoln’s “Cloud First” strategy. Rather than replacing the computing resource that had been used previously with an in-house managed, on-premise cluster of machines which would rapidly age and require significant management effort over their expected lifespan, the attractions of an externally managed, always up to date, platform were obvious. Scalability was also a significant consideration. With a traditional solution processing capability is limited by the maximum size of the machines that can be purchased, even though they will only operate at 100% for a tiny fraction of their lifetime (and often remain almost entirely inactive for large periods). With a capacity on demand model huge processing power can be commissioned and decommissioned rapidly, allowing analyses to be undertaken more quickly and giving the potential for unexpected work to be taken on with minimal notice. Once the costing considerations of the capacity on demand are understood it will provide much clearer insight into the computer processing costs of research projects which can then be fed into the research proposal as an integral part of the funding request. Up to date capacity on demand will mean that the ability to take on additional research will no longer be constrained by an out of date and/or undersized computer capability.
Lincoln’s Wine, Food and Molecular Biology department (WFMB) used New Zealand Genomics Limited for its HPC needs. When MBIE withdrew finding from NZGL in late 2017 it was no longer able to continue trading. Unable to find an appropriate substitute WFMB proposed to build its own on-premise HPC capability and called in IT Services to facilitate this.
ITS had been investigating the use of Azure, and believed that it was potentially a good solution to WFMB’s needs – capable of providing essentially ceiling-less levels of processing power on demand as required which could be switched off when not in use. In concept it was believed that Azure would provide a far more powerful platform than could be purchased and run in house whilst at the same time costing less overall.
On this venture Lincoln ITS decided to work with one of their existing partners, Inde, who already had experience of setting up and supporting Azure environments.
An initial trial VM was created, and although this worked well from a processing perspective data transfer speeds were unacceptable.
Inde then worked with Microsoft to investigate FDT (Fast Data Transfer), which produced a significant leap in data transfer speeds, sufficient to support the work of the researchers.
Another hurdle that needed to be overcome was that Lincoln researchers were not familiar with setting up their work in an Azure environment. Their view that setting up VMs required a significant amount of work, so Inde developed a number of scripts that would create and deploy the VMs, perform the analysis, store the results and then close the Azure instance.
There were reservations around the overall costs involved in running the workstreams in Azure, but these were partially allayed by working to use the far cheaper (but potentially less enduring) low cost storage. Long term budgeting is still uncertain, however, since the amount of time machines will need to be available (and hence chargeable) is not yet sufficiently clear.
Other concerns relate to the amount of active management required to ensure machines are only available whilst they are needed, and the level of persistence of the low cost provisioning option. Only time will tell how these issues resolve but by October we will know one way or another whether Azure has been proven as a viable platform for High Performance Computing in an environment such as Lincoln.
Dr Darrell Lizamore
Darrell is a postdoctoral fellow at Lincoln University. His research focuses on understanding how plant genetic information changes in response to environmental stress events and harnessing this potential produce new horticultural varieties. This involves designing strategies that use high-performance computing to compare entire plant genomes. In addition to his own research, Darrell provides bioinformatics services to the Bioprotection Research Centre and is founder of Zebra Biotech Ltd., which provides DNA testing and analysis services for the NZ horticultural industry.