CSE Distinguished Lecture
Speaker: Kathy Yelick
Professor, Department of Electrical Engineering and Computer Sciences, UC Berkeley
Associate Laboratory Director, Computing Sciences, Lawrence Berkeley National Laboratory
Date: Tuesday, March 27
Location: Klaus Advanced Computing Building
In the same way that the Internet has combined with web content and search engines to revolutionize every aspect of our lives, the scientific process is poised to undergo a radical transformation based on the ability to access, analyze, and merge complex data sets. Scientists will be able to combine their own data with that of other scientists, validating models, interpreting experiments, re-using and re-analyzing data, and making use of sophisticated mathematical analyses and simulations to drive the discovery of relationships across data sets. This “scientific web” will yield higher quality science, more insights per experiment, a higher impact from major investments in scientific instruments, and an increased democratization of science—allowing people from a wide variety of backgrounds to participate in the science process.
What does this "big science data” view of the world have to do with exascale computing, which has been primarily targeting modeling and simulation? Scientists have always demanded some of the fastest computers for computer simulations, and while this has not abated, there is a new driver for computer performance with the need to analyze large experimental and observational data sets. The exponential growth rates in detectors, sequencers and other observational technology, data sets across many science disciplines are outstripping the storage, computing, and algorithmic techniques available to individual scientists.
In this talk I will describe some examples of how science disciplines such as biology, material science and cosmology are changing in the face of their own data explosions, and how this will lead to a set of open questions for computer scientists due to the scale of the data sets, the data rates, inherent noise and complexity, and the need to “fuse” disparate data sets. What is really needed for data-driven science workloads in terms of hardware, systems software, algorithms and programming environments and how well can those be supported on systems that also run simulation codes? How will the imminent hardware disruptions affect the ability to perform data analysis computations on future systems?
Katherine (Kathy) Yelick is a Professor of Electrical Engineering and Computer Sciences at UC Berkeley and the Associate Laboratory Director (ALD) for Computing Sciences at Lawrence Berkeley National Laboratory (LBNL). Her research is in high performance computing, programming languages, compilers, parallel algorithms, and automatic performance tuning. She currently leads the Berkeley UPC project and co-leads the Berkeley Benchmarking and Optimization group. As ALD for Computing Sciences at LBNL, she oversees the National Energy Research Scientific Computing Center, the Energy Sciences Network, and the Computational Research Division, which covers applied math, computer science, data science, and computational science.
Host: Jason Riedy