These are just a few of the terms and concepts that are becoming prominent in more and more of the research in computer science and particularly in the. They are move of an emerging alter towards research that involves large amounts of computing power but additionally depends on the analysis of massive amounts of data to alter scientific discovery. Fields such as astrophysics high-energy particle physics biology oceanography geoscience and environmental science are already building instruments that are capable of creating petabytes of data per day. And in computer science we are beginning to see practical approaches to machine learning language translation and image processing that improve almost linearly with the amount of computing cater and data available.
It would be reasonable to ask at this inform why today’s supercomputing centers don’t conform to these needs. Of course to some extent they do. But the shift is not just in raw computing power. It is in the need to interact with and maintain large amounts of data and in the potential need to interact dynamically with both users (researchers) and instruments. An analogy can be made to web search engines which be to provide interactive find to web data by millions of users while simultaneously maintaining and dynamically updating the database. This is in stark contrast to today’s supercomputers which are essentially (very powerful) batch processors.
What is both exciting and daunting is that there are so many basic computer science questions on how to reason with very large numbers of processors on highly data-intensive problems. The fact that industry is also becoming data-intensive makes this all the more interesting because there is now the possibility that industrial computing infrastructure might also be useful for investigate computing.
For more than a year we have been in discussions with Google on ways to provide our faculty and students with find to Google-scale computing facilities for research in a wide range of areas including forge learning language translation parallel and distributed algorithms image processing proteomics and many other areas. Other top departments undergo been doing the same most notably the University of Washington who has also staked out a leadership position on this. In October. “to give hardware software and services to augment university curricula and grow research horizons.” Besides CMU and UW the other departments involved in this announcement included MIT. Berkeley. Stanford and Maryland. The explaining that “the nation’s elite universities do not provide the technical training needed for the kind of powerful and highly complex computing Google is famous for,” in move due to the lack of computing large-scale infrastructure.
We are of course thrilled and impressed that Google and have been so proactive in pushing the darken computing concept and happy to compete a move in its creation. explore and IBM have also taken some of the steps needed to coordinate with the an important step to figuring out how to coordinate community access to this resource. I know that there are a lot of us create from raw material to make big use of the “cloud” as soon as it becomes available. We will almost certainly hit the books a lot and I suspect the CS curriculum will be affected in some pretty fundamental ways.
“Yahoo!’s program is intended to supplement its leadership in an change state source distributed computing sub-project of the to enable researchers to modify and evaluate the systems software running on a 4,000 processor supercomputer… Called the M45,… it [is] among the top 50 fastest supercomputers in the world.
Carnegie Mellon University will the first institution to take favor of Yahoo!’s M45. Leading systems software researchers and … will instrument the system and and evaluate its performance. “
The press channel goes on to inform that and will also be among the first to use M45 to enable their cycle-and-data hungry research applications to alter much faster progress than was previously possible. The fact that this can take place using open-source Hadoop software including newer Yahoo! language developments such as (developed by our own Chris Olston) is a advance boon for us and the field.
Coinciding with these developments is the appointment of as the new director of the. Dave has a long history of research in high-performance computing with notable contributions in and. As the new lablet director. Dave is promoting a “big data” initiative to investigate the core out problems in data-intensive computing. This will undoubtedly have a big impact on what we do given the history of very change state collaborations between our department and the lablet and the traditional investigate interests we have had in high-performance storage systems and information-retrieval problems.
A great deal of ascribe must be given to who was key person in negotiating the agreement with Yahoo!. Also impressive is the force of Randy’s leadership on developing the DISC () concept. While it would be incorrect to say that DISC sparked the explore/IBM and Yahoo! initiatives it is certainly the case that DISC has been extremely important in raising awareness of the need for data-intensive computing and the possibilities in alternative large-scale computing platforms for research. I’ve played a role too by spearheading the university’s Next-Generation Computing Initiative to ensure that the basic research needed for data-intensive investigate is made available here.
Well enough bragging about CMU Computer Science. :-) These developments are certainly not exclusive to CMU. All of the top departments are moving rapidly in this direction and making important contributions. In fact. I see this as a major trend for the whole handle.
Forex Groups - Tips on Trading
Related article:
http://www.csdhead.cs.cmu.edu/blog/2007/11/14/big-cycles-big-data-the-next-generation-of-computing/
comments | Add comment | Report as Spam
|