Sharing data

I've been having lots of discussions with people about data. We typically focus on big data in TeraGrid but it's also important to note that many scientists have "small" volume and performance requirements (relative to supercomputers). Because their requirements are more closely matched to "commodity" solutions, we have more options with which we might help them. For example, users with only a small amount of data (measured in hundreds of MBytes, roughly speaking) could take advantage of a web services approach such as using Amazon's S3.

A web services approach also offers some advantage over solutions geared toward ultra-large data and ultra-high bandwidth, in that it becomes easier to share the data, which is increasingly important to scientific communities. Our science gateways partners, I believe, would be much more eager to leverage a web services storage resource than one that requires them to integrate with a traditional (in our circles) high-performance filesystem or storage system.

We certainly need to continue to provide, and enhance the high-performance solutions! (we are also pushing toward petaflops computing, which will stretch these systems considerably). Work we are doing, for instance, with the GPFS-WAN environment and with high-performance data movement tools such as GridFTP is very critical. But it's equally important to look at solutions that allow us to better integrate with "web 2.0" and emerging commercial services. As I've spoken over the past weeks to Jim Gray from Microsoft (8/16 post), Vint Cerf from Google (8/7 post), and today to Burton Smith from Microsoft it's quite clear that we should be collaborating in these areas with some amount of enthusiasm. These companies are moving fast, and in the right direction with respect to support for scientific communities. We have a tremendous amount of synergy with companies like Microsoft, Google, Amazon, and others where our customer base overlaps considerably with theirs.

On other fronts...

Ian Foster's blog notes that today is the 10th anniversay of the Globus software project in that on this date in 1996 DARPA awarded funding to develop the open source toolkit.

While we're being nostalgic.... Twenty years ago this year we deployed the NSFNET backbone, which used six supercomputer centers as connecting points for regional and campus networks. Five of those centers (NCSA, PSC, NCAR, SDSC, and Cornell Theory Center) are connected today with 1 or more 10 Gb/s links as part of the TeraGrid network. Much improved over the 56 Kbit/s links from 1986...


Post a Comment

<< Home