Virtual Machines and Types of Service for TeraGrid Computing

Foundational capabilities we provide in TeraGrid, such as "roaming" access and a "coordinated" software environment, open new possibilities in terms of more specialized services, or to allow the TeraGrid, as a system, to respond to supply and demand. For example, a resource provider might elect to increase the "price" of a queue in order to improve turnaround time by reducing demand, or decrease the price to increase demand (and thus utilization).

We also are looking at ways to support on-demand services for urgent computing, through projects like Pete Beckman's Spruce work. The tricky part is being able to service an on-demand job where it's not a viable option to keep supercomputers on hot standby! We have considered things like offering a 'preemptible' service on a particular resource, where the user is charged at a lower rate in exchange for knowing that his or her job may be killed to make room for an on-demand job.

It is worth considering the use of virtual machine technology for an even better 'preemptible' service, or even to support migration of jobs in the event of an on-demand service request. One might even consider migrating the jobs to a commercial service such as Amazon's EC2!

Many people have demonstrated moving virtual machine images around with virtually no disruption to the application. The TeraGyroid collaboration between TeraGrid and the UK Reality Grid project is an example, and at iGrid2005 Franco Travostino and others demonstrated job migration.

Of course the ideal applications to take advantage of a virtual machine service are those that involve ensembles of single-processor jobs without large data requirements. But we do have a large number of users whose applications fit this very profile, so it is worth investigating such a service. Further down the road we will want to be able to support message passing (multiple-processor parallel) jobs as well as data staging needs of applications that are data-intensive. Not being able to solve those issues just yet shouldn't prevent us from looking at services that solve simpler cases!

(at OGF/GlobusWorld)


Post a Comment

<< Home