Trying out EC2

About half-way through a Monday I decided to stretch my legs a bit and see how long it would take to get EC2 up and running (I got into the beta). Pete Beckman and I sat down in a cafe and found a very nice Exploring Amazon EC2 primer by Jesse Andrews which made it quite easy. The Amazon docs are also pretty good.

Once you sign up for the service (you have to already be an S3 customer, which I am) you generate your X.509 keypair and download it to your linux system (which in my case is a Mac). Amazon provides a set of command line API's that you download and unzip. You set up some environment variables and voila, you can use the command line utilities. The first thing we did was to query to find pre-made system images (rather than trying to build our own, which would take longer). You generate a keypair to control your "instance" (virtual machine), select one of these pre-made system images, and fire it up.

We found that within 3-4 minutes the instance was up and running, and we could SSH into it and begin to play. This is not particularly impressive as hacking code goes, but it seems to me that being able to sign up for a web services based computation service and get something up and running within an hour is quite nice!

So I just fired up an instance and I'm running a web server on it. It's costing me $2.40 a day so I'm not gonna leave it up for long, but if you are reading this post within a few days of the timestamp try out my Amazon EC2 virtual webserver! (too late- I left it up for about a month but it's gone now!)


Compute Services - Now we're talking!

A while back IBM started an On Demand business that seems ideally tuned to companies that need a cluster (even a large one) periodically for surge capacity. It's aimed at businesses, and those kind of things involve a fair amount of initial investment of time and energy to get the paperwork set up, etc. To make this, and the cost, worth doing you need to have a fair amount of computing surge (and plenty of companies have this!).

More recently, Sun took this idea of a grid-based compute service further, launching what they called a "Utility Computing" service called Sun Grid. Nice. An individual can sign up and get an account, and in short order start uploading and running (Solaris, x64) applications from their portal. No lengthy setup. Currently Sun charges $1 per CPU hour.

Amazon's rolling out an even more flexible grid computing service. No lengthy setup process, and forget the portal- use web services. Yesterday Amazon announced a beta version of Elastic Compute Cloud, or EC2. Once available beyond beta, one will be able to just sign up and start computing via a SOAP API. It’s coupled with the S3 service, where you store your system images (which include your entire application software environment for your virtual machines) and data. Since it’s a virtual machine, you get root access and it's up to you what software you want to run. The virtual machines are the equivalent of a 1.75 GHz Xeon with 1.75 GB of memory, 160 GB of local disk, and network bandwidth of 250 Mbit/s. Amazon’s pricing - $0.10 per CPU hour.

As I mentioned on Monday (talking about Sharing Data) it seems to me that these commercial grid resources and services are worth examining in terms of support for scientific computing. I’m hoping the TeraGrid Science Gateway partners will jump on these and check them out!

Non-Sequitur of the Week: The World's Oldest Botanical Garden.
Earlier this summer we were in Italy on vacation and one of our stops was Padua, near Venice. Besids being the site of one of the worlds' oldest universities (University of Padova, founded in 1222), it's a nice base of operations to explore the area around and including Venice. Strolling around the city we chanced upon the Orto Botanico di Padova (Padua Botanical Gardens), which was magnificent. What we didn't know at the time was that it is the world's oldest, having been established in 1545! If you don't happen to be in Italy there are of course other fine botanical gardens we'd recommend, including the UC Berkeley Botanical Gardens, the Royal Botanical Gardens in Edinburgh, Scotland, and of course the Chicago Botanic Garden! (if you have recommendations please post them in comments)

Have you joined Facebook yet?


Social Networks...

Last week I joined Facebook, which is an amazing and exploding social network for university and high-school students, staff, and faculty. I've been trying to figure out how to support and catalyze scientific communities for several years, including designing and launching the GridForge for the Global Grid Forum (now called Open Grid Forum, or OGF). GridForge uses the same underlying platform as Sourceforge.net, Slashdot, and a good number of other sites. But while this seems to be a great platform for supporting individual groups developing things (like software or documents) together, they are not geared toward finding and mapping social networks (i.e. finding colleagues or discovering potential collaborators with overlapping interests, etc.).

As I have explored Facebook I've kept the question in the back of my mind - could this be a useful tool for the scientific community as well? With some minor enhancements I think it very well could be. Already it is useful to see connections between people, find people, and even manage events.

I've searched for colleagues and I find that it seems perhaps many of us have become geezers with respect to new technology like this - most of the 7 million participants are college students or younger. I did find that some of us do not in fact believe that its hip to be square. I was happy to see Larry Smarr, Ian Foster, and Beth Plale were already in the mix.

It would be great if more people from our community (TeraGrid staff, users, collaborators...) joined Facebook to see if it might be a useful tool for our community.

See you there!

Charlie Catlett's Facebook profile


Sharing data

I've been having lots of discussions with people about data. We typically focus on big data in TeraGrid but it's also important to note that many scientists have "small" volume and performance requirements (relative to supercomputers). Because their requirements are more closely matched to "commodity" solutions, we have more options with which we might help them. For example, users with only a small amount of data (measured in hundreds of MBytes, roughly speaking) could take advantage of a web services approach such as using Amazon's S3.

A web services approach also offers some advantage over solutions geared toward ultra-large data and ultra-high bandwidth, in that it becomes easier to share the data, which is increasingly important to scientific communities. Our science gateways partners, I believe, would be much more eager to leverage a web services storage resource than one that requires them to integrate with a traditional (in our circles) high-performance filesystem or storage system.

We certainly need to continue to provide, and enhance the high-performance solutions! (we are also pushing toward petaflops computing, which will stretch these systems considerably). Work we are doing, for instance, with the GPFS-WAN environment and with high-performance data movement tools such as GridFTP is very critical. But it's equally important to look at solutions that allow us to better integrate with "web 2.0" and emerging commercial services. As I've spoken over the past weeks to Jim Gray from Microsoft (8/16 post), Vint Cerf from Google (8/7 post), and today to Burton Smith from Microsoft it's quite clear that we should be collaborating in these areas with some amount of enthusiasm. These companies are moving fast, and in the right direction with respect to support for scientific communities. We have a tremendous amount of synergy with companies like Microsoft, Google, Amazon, and others where our customer base overlaps considerably with theirs.

On other fronts...

Ian Foster's blog notes that today is the 10th anniversay of the Globus software project in that on this date in 1996 DARPA awarded funding to develop the open source toolkit.

While we're being nostalgic.... Twenty years ago this year we deployed the NSFNET backbone, which used six supercomputer centers as connecting points for regional and campus networks. Five of those centers (NCSA, PSC, NCAR, SDSC, and Cornell Theory Center) are connected today with 1 or more 10 Gb/s links as part of the TeraGrid network. Much improved over the 56 Kbit/s links from 1986...


Science Gateways - Web Services We'd Like to See!

I've been talking with Nancy Wilkins-Diehr, who heads up the TeraGrid Science Gateways initiative, about the common requirements we are seeing bubble to the surface as now well over 20 gateways (most of them could be called "eScience Portals") are working with us to "plug" in grid resources to these gateways. Nancy's list is very interesting, and it would be good to figure out which of these might already exist (and could be adopted), which we should commission to be developed, etc.

* Resource Status Service (both polling and pub/sub)
* Job Submission Interface (The gateways expect this to be provided by Globus WS-GRAM)
* Job Tracking Interface (Both polling and pub/sub)
* File/Data Staging Interface
* Retrieve Usage Information
* Retrieve Inca Info
* Advanced Reservation Interface
* Retrieve user information for a job
* Retrieve accounting information/statistics

These are the computational-centric services, but there are many other science gateway services, particularly collaboration and social networking tools, that many gateways are providing, and we would do well to look at what we could adopt from commercial providers of "Web 2.0" capabilities! (some of our gateway partners are doing this already)

Non-sequitur of the week:I just bought four tickets to see one of my favorite bands, Jars of Clay at the House of Blues in October. I especially like this band's leadership in solving real world problems though initiatives like the 1000 Wells project, which is in its second year with partners like Africare and has catalyzed lots of local efforts - for example at UC Davis.


Conversations about Data Analysis

Monday I caught up with Jim Gray from Microsoft Bay Area Research Center to talk about web services, storage, and data. He's been working with the NVO and Sloan Digital Sky Survey groups, who obviously have very large data requirements. One angle they are looking at is providing analysis tools at the data archive, rather than the traditional mode of downloading data and analyzing it locally. An example of this kind of service is the Catalog Archive Server.

On a similar topic I met with the Interagency Modelling Analysis Group (IMAG) yesterday to talk about their requirements and how they might use TeraGrid. A message I heard from this group is that there are many scientists who are able to do their computational work locally, and may or may not need to scale their work to use TeraGrid. However, where TeraGrid could really help them would be in providing analysis and visualization services. I mentioned to them that we have a variety of these types of services at the TeraGrid resource provider sites and that the TACC and UC/ANL resource providers have deployed dedicated resources on which they are operating visualization services for the user community. As a next step we talked about having someone talk to the IMAG about TeraGrid visualizaion services and how these compare to the needs of this group of scientists.


CTSS V4 - Draft spec for the Workflow "kit"

As mentioned earlier, we are moving toward an architecture where TeraGrid resources support (a) a core set of services, plus (b) one or more "kits" that provide specific functionality and services. One of the first kits we are discussing is the set of services and software necessary to support workflow on a resource. Lee Liming provides the following overview of the draft:

The purpose of defining this kit is not to redesign TeraGrid's capabilities in this area. Instead, the purpose is to make sure that we understand and document what the current capabilities are so that we have a solid baseline for subsequent changes. (Along the way, we can use it to improve the user documentation for the existing capabilities.) The "Future directions" section outlines what we currently know about likely subsequent changes. If there are any simple, non-controversial, beneficial, easy-to-implement enhancements that can be identified now, this would be a reasonable time to make them, so please speak up if you see something like this. Anything else (non-trivial, controversial, difficult to implement) will probably be put off until the next design cycle. HOWEVER, this would be a reasonable time to identify those things so that we can begin planning for them.

The draft is available for download from repo.teragrid.org and we are eager to see discussion as this moves forward! (note that this online draft is currently only available to TeraGrid participants, but I'd be happy to email to you if you're interested but not a participant)

...writing from


A Roadmap to Attribute-Based Authorization

At the end of August we're holding an internal TeraGrid workshop to look at a number of important issues related to authorization, audit, accounting, and security. One of the things (I've mentioned in a few other posts) we will be talking about is attribute-based authorization. Von Welch (GridShib project) has been editing a paper, Scaling TeraGrid Access: A Roadmap for Attribute-based Authorization for a Large Cyberinfrastructure with several of us helping as co-authors. It is worth looking at even in draft format. Comments more than welcome!


week-ending at...

Non-sequitur of the week: One of the best places to rock-climb in the (OK, flat) midwest is Upper Limits. They've converted some old grain silos (it's the midwest...) in Bloomington, Illinois into a rock climbing gym (they have another gym in St. Louis). A few dozen 60 foot top-roped routes inside several silos, good bouldering inside and outside, and a couple of 100 foot routes up the side of the building outside. (I'd provide a Google Earth pointer but it's under a cloud....)


Dividing and Conquering - CTSS V4

In the first instantiation of TeraGrid (nearly 5 years ago) we defined a set of software and services that would be run on all TeraGrid platforms. The purposes were to (a) provide users with a consistent experience on all TeraGrid systems and (b) integrate those systems with important functions such as single-signon, remote job submission, system-wide accounting, etc.

Affectionately named after the early Cray Time Sharing System (operating system), the Coordinated TeraGrid Software and Services (CTSS) v1 had many more components than was necessary, as we determined after users began to arrive, so CTSS v2 was more streamlined. A few months ago we moved to and even more streamlined CTSS v3 and we're still doing a post-mortem regarding how the transition went and how we can continue to improve on smooth evolution of the TeraGrid software environment. But the challenge of this software stack approach is moderate when you have a large number of resources, and significantly harder as those resources become more diverse.

As we have been developing plans for CTSS v4 we are evolving our approach. V4 will consist of a set of required core services (authorization, information services, verification and validation, and accounting/audit) along with an optional set of "kits" that provide specific functionality (workflow support, program development, job execution, etc.). Each TeraGrid resource provider will run the core components and will select the set of kits they will support on each of their systems. Some may elect only to support job execution and workflow. Others may also support the common program development environment. We are initially looking at a total of eight discrete kits, and these will be fleshed out over the next few months.

This approach moves us toward what I believe is an achievable, though challenging, goal. I'd like for CTSS V5 to be a set of service descriptions, leaving the implementation of those services at the discretion of the resource provider. Certainly we will continue to integrate and package software that can be used to implement TeraGrid CTSS, but the use of our implementations should not be necessary for a resource to interoperate within TeraGrid.

This is an approach that I think can work where resources reside in different Grid facilities as well. My view is that a science gateway user should be able to put together a workflow that harnesses resources in TeraGrid or other Grids, with the only key consideration being that the user is authorized to use those resources.

CTSS v4 will be an important move for the TeraGrid community, as the benefits to this modularization can only be harnessed by rigorous change management and a commitment to providing the tools that users and science gateway partners need in order to navigate resources with different combinations of services. We're starting with a careful evaluation of how the CTSS v3 transition has gone, and we will be engaging both resource providers and end users as we begin to specify the first set of kits.

If you're interested in helping out, pleaase contact Lee Liming, who is leading our CTSS v4 planning efforts.

Writing to you from...


Conversation with Vint Cerf

I had breakfast with Vint Cerf this morning where we spent about 90 minutes talking about common interests Google and TeraGrid might have. There are many! We talked about Web Services and the Science Gateway initiative as well as how people are using TeraGrid today. We'll probably follow up with me giving a talk about TeraGrid out at Google and some introductions with the relevant technical people at Google to pursue further discussions.

Vint also gave the keynote today at the EDUCAUSE Snowmass meeting, His talks (which lately he is doing without slides) are fast-moving intellectual tours through the spaces where technology, policy, applications and humans mix. He covered a variety of very interesting topics, for example offering insight into issues including net neutrality, the need for improved security in applications and operating systems, and some of the challenges to effective search (both of Internet and other content). He closed, of course, with an update on the Interplanetary Internet. As always, a very fun and stimulating talk!

Writing to you from...


Campus Partnerships

This morning I gave the opening talk at the Campus Cyberinfrastructure Workshop here in Snowmass, Colorado. It was a good opportunity to talk with leaders from the roughly 40 universities here about how TeraGrid can partner with campuses on key issues of Cyberinfrastructure. We've been looking at the following programs, each of which we want to flesh out with campus partners over the next several months:

- Cooperation in authorization infrastructure (wouldn't it be great for a campus user, assuming strong identity management and authentication on campus, to use his or her campus credentials to access TeraGrid...)

- Cooperative computational and storage/data management services (integration of capability resources such as TeraGrid with capacity resources on campuses)

- Federation of digital assets (data collections, etc.)

- An "affiliates" program for outreach beyond R1 institutions and for coordinating support for TeraGrid users, enabling campus staff to work with TeraGrid staff to support those users.

- Education and Training partnership to develop and harvest best-of-breed curricula and programs aimed at not only the next generation workforce (k-20) but continuing education for today's workforce.

Gary Bertoline (bertoline@purdue.edu) and Scott Lathrop (lathrop@mcs.anl.gov) are organizing these discussions - and always looking for more participants in those discussions.

(my location via Google Earth)

NSF draft Cyberinfrastructure Plans (v7)

I was thrilled to see that NSF posted a new version of their Cyberinfrastructure Vision document. The last draft (v5) was posted in January, but this new draft (v7) is more complete and incorporates suggestions and changes from the community.

This is really worth downloading and reading, and it's still a draft- they are eager for input from the community so give it to them (they are listening!).

You can find the report at the NSF Office of Cyberinfrastructure website. The document is called "CI Vision" version 7.1. While you're at the OCI website you might also check out the "Report of Blue-Ribbon Advisory Panel on Cyberinfrastructure" (also called the Atkins report) which is still quite relevant 3.5 years after it was published.

(my location via Google Earth)


PKI and Multi-level Authentication

I'm at an Educause confab in Snowmass this weekend. Today was the first day of a Public Key Infrastructure summit of the Identity Management Working Group. The PKI meeting today consisted of some interesting case studies from several universities and companies. Universities, particularly large ones (with 10's of thousands of students, faculty, and staff), have a significant challenge with scale (and thus cost!).

I was particularly impressed with the work done at the University of Wisconsin in evaluating various approaches to PKI and devices, such as USB keys, to hold credentials. (this was recently outlined in detail via an Educause online seminar. They looked at initial and 10-year costs of developing their own system from open source, developing their own system with commercial pieces, or partnering with key vendors to provide them with a solution. It turned out, when taking into account staff investment, that the vendor-partner solution was the most cost effective up front and over ten years.... and involved the shortest time to deploy a solution. The UW PKI Lab as well as collaborators at Dartmouth have done a good amount of investigation over the past few years in this area.

Given the difficulty in getting staff, faculty and students to buy into an extra device that they carry around, it was suggested the ideally the PKI vendors might consider developing toward the use of cel phones and iPods to hold credentials. (I thought this was a cool idea)

There were also overviews of Apple's approach to PKI and a presentation from Aladdin on the scale of the problem of identity theft in universities and labs. One of the innovative tools we are looking at within TeraGrid was discussed - the myVocs system from the UAB Advanced Technology Lab.

myVocs is a good example, along with Shibboleth and the Gridshib project, of technology that TeraGrid can leverage in a two-way partnership with campuses. This is, after all, where most of our users live and so we are seeking ways to lower barriers to their use of TeraGrid while (just as importantly) improving the security of our systems.

Tomorrow is a workshop where about 40 universities will be talking about how to work together to create cyberinfrastructure. More on that later this weekend.

(my location via Google Earth)


Outreach to Educators

This week TeraGrid partnered with NCSA and the Shodor Education Foundation to hold a 5-day workshop at Chicago State University. There were about 30 faculty from Chicago State as well as local community colleges and local K-12 schools. The hands-on workshop exposed these faculty to interactive tools they can use to teach students about computational science. In addition to high-performance resources such as provided by TeraGrid, the participants learned about a range of computational science education resources available through the National Science Digital Library.

This is a neat example of the kind of workshops that are happening around the country with TeraGrid resource providers and other partners - as you can see at the TeraGrid Education, Outreach, and Training website. We are leveraging the annual high-performance computing SCxx conferences to further strengthen the nation's ability to prepare the next generation of our workforce. Purdue is leading the SC06 education program and TeraGrid will be coordinating the education programs at SC07, and SC08. Last week Scott Lathrop, TeraGrid's director of education, kicked off a planning meeting held at Argonne National Laboratory with 40 education leaders from almost as many organizations.

(my location via Google Earth)



If you are involved in the National Science Foundation's TeraGrid project you know that we are just about a year into "bootstrapping" a virtual organization to operate and enhance the virtual facility we built over the past few years, serving a set of communities comprised of several thousand scientists and educators. We have many mechanisms for communication but most of it is rather one-way, which has its place of course.

Here I'd like to have a dialog about issues of urgency and/or importance to the TeraGrid community, including "internal" staff as well as our user community.

During TeraGrid'06 last month I talked about our overall mission and our strategies, and I'd like to note them here as a way of kicking off this blog.

What are we trying to accomplish with the TeraGrid facility? Our mission is to "create integrated, persistent, and pioneering computational resources that will significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems. " In order to pursue this mission, we take an integrated approach to the scientific workflow including obtaining access, application development and execution, data analysis, collaboration and data management.

That integrated approach currently involves three organizing principles, or areas of focus:

DEEP- To ensure that scientists can exploit the enormous power of the TeraGrid resources (currently well over 100 TF in aggregate) as an integrated system we have staff at resource provider sites who work shoulder-to-shoulder with science teams. This program, "Advanced Support for TeraGrid Applications (ASTA), assigns 1/4 to 1/2 of a support person in a scientific application team for 4 to 12 months, working with that team to harness TeraGrid capabilities of particular importance to their scientific goals. The ASTA program supports roughly a dozen teams at a time, with a goal of 20-25 teams supported per year. We are exceeding this goal, and many of their stories are featured at the main TeraGrid website.

WIDE- For over 2 decades the NSF high performance computing program, including TeraGrid, has served several thousand users very effectively. However, NSF alone funds tens of thousands of scientists, most of whom have computational requirements that do not frequently require supercomputers. The TeraGrid Science Gateways program is a set of partnerships with discipline-specific teams who are providing computational infrastructure (in most cases either a specialized web portal or a community-organized grid) for their science communities. The partnerships involve integrating TeraGrid as a computational and data management “service provider” embedded in the science-community cyberinfrastructure. We have over twenty such gateway partners working with us today.

OPEN- TeraGrid began (in 2001) as an infrastructure involving four partner sites, grew to nine sites, and is currently organized as a set of resource providers and a core “grid infrastructure group” (GIG) that coordinates and provides common software services and support as well as planning, architecture, management and operations. TeraGrid software and services architecture is service-oriented, stressing open source standards such as are deployed with key software including the Globus Toolkit GT4, Condor, and other tools.

The TeraGrid organization is also open in that we anticipate more resource providers over time. In order to partner with the broader community of universities and other service and resource providers, we are working with colleagues in Educause, Internet2, EPIC, and other communities to design, together, a set of “campus partnerships” during 2006. The goal of these partnership programs is to work with campuses (where most TeraGrid users reside) to improve and streamline TeraGrid access from campus systems, as well as to work with campuses to develop a set of frameworks that can be used to create national-scale cyberinfrastructure for computation and for data federation. These programs will explicitly reach beyond the R1 institutions to include the broader R&E community.

That's a lot for one blog post... I'm interested in your comments on any of these topics and will be diving into them over the next few weeks in a more focused way, taking one or two projects at a time.

(my location via Google Earth)