HPC and Cloud Operations at CERN
Clouds and Distributed Computing
TimeMonday, 18 November 201910:30am - 11:10am
DescriptionCERN was established in 1954, with the mission of advancing science for peace and exploring fundamental physics questions — primarily through elementary particle research. The Large Hadron Collider (LHC) at CERN is the world's most powerful particle accelerator colliding bunches of protons 40 million times every second. This extremely high rate of collisions makes it possible to identify rare phenomenon and to declare new discoveries such as the Higgs boson in 2012. The high-energy physics (HEP) community has long been a driver in processing enormous scientific datasets and in managing the largest scale high-throughput computing centres. Today, the Worldwide LHC Computing Grid is a collaboration of more than 170 computing centres in 42 countries, spread across five continents. Recently demonstrations at scale of both commercial cloud providers and HPC centers have been performed.
In 2026 we will launch the High Luminosity LHC (HL-LHC), which will represent a true exa-scale computing challenge. The processing capacity required by the experiments is expected to be 50 to 100 times greater than today, with storage needs expected to be on the order of exabytes. Neither the rate of technology improvement nor the computing budget will increase fast enough to satisfy these needs and new sources of computing and new ways of working will be needed to fully exploit the physics potential of this challenging accelerator. The growth of commercial clouds and HPC centres into the exa-scale represents a huge opportunity to increase the potential total resource pool, but even together this ecosystem may not be sufficient to satisfy the needs of our scientific workflows. The total computing required is pushing us to investigate alternative architectures and alternative methods of processing and analysis. In this presentation we will discuss the R&D activities to utilize HPC and cloud providers. We will summarize our progress and challenges in operating on dedicated resources and on shared and purchased allocations on HPC and cloud. We will outline the biggest impedance issues to interoperating these facilities, which often have similar challenges for data handing and scale but very different challenges in flexibility and operations. We will close by addressing forward looking projects together with industry partners to utilize techniques like Machine Learning and optimized hardware to fundamentally change how many resources are needed to extract science from the datasets.