Philadelphia, PA—Whether you’ve watched an elaborate weather forecast, made an online purchase, or received personalized news stories in your inbox in recent years, you’ve likely seen “big data” in action.
Big data is everywhere these days, be it personalized ad targeting, weather and climate modeling, or flu trend analysis to mention just a few.
Ever-increasing amounts of data are now available thanks to many modern realities: e-commerce and transaction-based information that has been stored over the years, data streaming in from growing social media activity and rising Web traffic, and sensor data from the increased use of digital sensors in industrial equipment, electrical meters, automobiles, and satellites, for example. With decreasing storage costs, archiving this data has also become easier than ever.
This information explosion gives rise to the need for better analysis and interpretation of data in order to make accurate predictions from it. This is where mathematics, computational science, and statistics devoted to extraction and analysis of big data come in.
At the “Big Data” panel at the SIAM Annual Meeting held in July, researchers from various fields gathered to discuss how large-scale data is playing an increasingly important role in fields as diverse as bioinformatics, high energy physics, and social science.
Sastry Pantula, Head of the Division of Mathematical Science at the NSF, points out the strengths that mathematicians and statisticians bring to big data analysis, specifically with regard to machine learning, pattern recognition, and spatio-temporal models.
Machine-learning software, for instance, can crawl computer systems or review users’ e-mail habits, similar to search engines that crawl the Internet looking for websites and links to discern patterns in user behavior. This can help determine how specific software applications are used, and which may be prone to success or failure. Pattern recognition becomes important as well as we find our way around mountains of data and try to glean relevant information from them. Computers can find patterns in data and thus help users focus on the most significant content. Spatio-temporal models, in turn, help find patterns that vary with time and space—for example, consumer shopping patterns may have peaks during the holiday season or prior to weather-related events such as hurricanes.
Pantula also emphasizes the importance of multifaceted projects with regard to using big data. “I think there is strength in combining efforts from various agencies,” he says. Pantula says projects like the Obama administration’s ‘Big Data Research and Development Initiative,’ can bring a lot of awareness to the research and utilization of large-scale data. “Having that announcement brought a lot of visibility to big data issues, whether from companies or from academia. Such initiatives are a very good thing.”
Stanford University’s Gunnar Carlsson studies the shape of data, which involves measuring and mapping their depth, multidimensionality and scale. Such analytical methods, which focus on the mathematics of shape recognition, are crucial to address the high dimensional aspects of large data sets. Carlsson explains how statistical and mathematical tools can help glean as much information from data as possible. “We’ve made a lot of advances in storage of data,” he says. “Extracting that knowledge requires development of mathematical techniques, since math can do it more effectively.”
Emily Shuckburgh of the British Atlantic Survey uses theoretical studies and numerical modeling to simulate the dynamics of oceans and the atmosphere to improve predictions of future climate change. She works with observational data from various sources, including climate models, satellite data, and real observations from the environment that go into projecting future climate patterns.
Shuckburgh explains two complementary approaches for analyzing data. “There are two different ways that you can look at scientific data. One is from a physical perspective—understanding the physics of what’s going on and the physics that’s governing that dataset. The other way is to look at it in more statistical ways— to try and use inference techniques to understand the patterns and structures in that data. Those are two complementary approaches, and I think one of the interesting challenges is how to combine these two.”
In addition to methods used to analyze big data, the panel also points out the need to be cautious with interpretations.
The source of data should be taken into consideration, in addition to any pre-processing that may have occurred before one gets the information. The sheer size and complexity of all data available makes the discerning of valuable and significant information from it that much more challenging.
In addition, there are issues with reading too much into data, which leads to false interpretations and misguided discoveries. As panelist Tammy Kolda of Sandia National Laboratories says, “The scary thing about data analysis is if you stare at something long enough you’re going to find something.” William Harrod of the DOE’s Office of Science says that researchers should try to gather real data rather than data that is artificially generated.
At the same conference, a minisymposium devoted to large graph analytics reviewed various application areas for large data sets, such as bioinformatics, cybersecurity, social media and the Web. Algorithms, tools and technologies that are being used to study data were discussed by Ali Pinar of Sandia National Laboratories; David Bader of Georgia Institute of Technology, and Nicholas Arcolano and Jeremy Kepner of MIT.
View the video here:
# # #
The Society for Industrial and Applied Mathematics (SIAM), headquartered in Philadelphia, Pennsylvania, is an international society of over 14,000 individual members, including applied and computational mathematicians and computer scientists, as well as other scientists and engineers. Members from 85 countries are researchers, educators, students, and practitioners in industry, government, laboratories, and academia. The Society, which also includes nearly 500 academic and corporate institutional members, serves and advances the disciplines of applied mathematics and computational science by publishing a variety of books and prestigious peer-reviewed research journals, by conducting conferences, and by hosting activity groups in various areas of mathematics. SIAM provides many opportunities for students including regional sections and student chapters. Further information is available at www.siam.org.
[Reporters are free to use this text as long as they acknowledge SIAM]