Machine Learning in Physics – The power and promise


Our ability to generate and analyze large data sets has increased dramatically over the last three decades. This “big data” revolution, spurred by an exponential increase in computing power and memory, has been accompanied by disruptive technologies such as Machine Learning (ML) and Artificial Intelligence (AI) for analyzing and learning from large datasets.

The prodigious rise of Machine Learning (ML) based techniques impacts many industrial applications today, including autonomous driving, healthcare, finance, manufacturing, energy harvesting, and more. Its goal is to recognize patterns in data and find solutions to unseen problems.

For example, in a highly complex system such as a self-driving car, vast amounts of data coming from sensors have to be turned into decisions of how to control the car by a computer that has “learned” to recognize the pattern of “danger.”

In parallel to the rise of ML techniques in similar applications, scientists have become increasingly interested in the potential of machine learning for fundamental research, and physics is no exception.

Machine Learning and Physics

To some extent, this is not too surprising since ML and physics share some of their methods and goals. The two disciplines – machine learning and physics – are concerned about gathering and analyzing data to design models that can predict the behavior of complex systems. Modern machine learning, like physics, prioritizes empirical results and intuition over more formal approaches found in statistics, computer science, and mathematics.

Many of the core concepts and techniques in machine learning have their roots in physics, such as Monte-Carlo methods, simulated annealing, and variational methods. Furthermore, many deep learning methods rely on “energy-based models” inspired by statistical physics. As a result, many aspects of modern ML will be familiar to physicists.

The use of “big data” has been pioneered by physicists and astronomers. Experiments like CMS and ATLAS at the LHC, for example, generate petabytes of data each year. Hundreds of terabytes of data measuring the properties of nearly a billion stars and galaxies are routinely analyzed and released by astronomy projects like the Sloan Digital Sky Survey (SDSS). The researchers in these fields today are increasingly incorporating recent advances in ML and data science to gain insight from large datasets.

Machine learning has become indispensable for pattern discovery from large data sets. It has an astonishing array of practical applications in modeling, prediction, classification, visualization, and planning. Machine learning includes many robust methods that can transform raw data into structured information using learning algorithms. It is, therefore, not surprising that those learning algorithms found their way into Physics applications.

1. Data collection from research

The collection of experimental and observational data has always been at the heart of the scientific endeavor. As computing power has increased and new technological capabilities have increased the scale of data collection in research, the volume and variety of data available to researchers have also increased. In some areas of science (such as astronomy, particle physics, and genomics), the volumes of data routinely generated in scientific studies are huge. For example, the Square Kilometre Array – a powerful new telescope that will be used to survey the night sky – has the potential to generate more data each second than the internet itself does. As noted earlier, machine learning techniques can be directly helpful to the researchers who generate the data in their efforts to analyze and interpret it.

2. Detecting new particles in physics

In July 2012, physicists from the Large Hadron Collider (LHC) at CERN announced that they had discovered the Higgs Boson, an elementary particle of critical importance to the Standard Model of particle physics plays a role in giving matter mass.

The Higgs Boson can be created when particles collide together at high energy, as happens in the LHC. Once created, the Higgs Boson quickly breaks down into other particles; it decays within 10-22 seconds into other particles, called (gamma) photons. Therefore, finding this particle required detecting a specific pattern of decay amidst the other particle collisions and activity in the LHC. Machine learning played a role in helping to detect this pattern.

Using simulations of what the decay pattern of the Higgs Boson would look like, a machine learning system was trained to pick out this pattern from other activities. Having learned what the presence of the Higgs Boson would look like, the system was put to use on data from the LHC, thereby contributing to the discovery.

Machine learning techniques are used today in many analyses in particle physics, at levels from correctly reconstructing the signals left by individual particles in detectors, and distinguishing these from other particles, to discriminating signals from background noise. These techniques are important in helping to optimize the potential of today’s experiments by increasing the sensitivity of analyses.

3. Finding patterns in astronomical data

Astronomical research generates a lot of data. The Large Synoptic Survey Telescope (LSST), for example, is expected to generate over 15 terabytes of astronomical data each night once it is operational. A key challenge for astronomy in analyzing this data is to separate interesting features or signals from the noise and assign them to the appropriate category or phenomenon. Machine learning can assist in this data analysis by preparing the data for use and detecting the features in the data.

For example, the Kepler mission seeks to discover Earth-sized planets orbiting stars, collecting data from observations of the Orion Spur that could indicate the presence of stars or planets. However, not data is useful. Data can be distorted by onboard thrusters, variations in stellar activity, or other systematic trends. Before analyzing the data, these so-called instrumental artifacts should be removed from the system. To do this, researchers developed a machine learning system that can identify these artifacts and clean them for later analysis.

Machine learning is today used to identify new astronomical features, for example:

  • Finding new pulsars from existing data sets.
  • Identifying the properties of stars and supernovae
  • Correctly classifying galaxies.

As a key tool for researchers to analyze these large datasets, detecting previously unforeseen patterns or extracting unexpected insights, machine learning has today become a key enabler for a range of scientific fields, including particle physics, astronomy, life sciences, social sciences, and more, pushing forward the boundaries of science. Progress in machine learning research will surely impact the social acceptability of machine learning in applications and public confidence and trust.