WHAT YOUR NEXT EXPERIMENT'S DATA WILL LOOK LIKE: EVENT STORES IN THE LARGE HADRON COLLIDER ERA
Each new generation of collider experiments confronts the challenge of delivering an event store having at least the performance and functionality of current-generation stores, in the presence of an order of magnitude more data and new computing paradigms (object orientation just a few years ago; grid and service-based computing today). The ATLAS experiment at the Large Hadron Collider, for example, will produce 1.6-megabyte events at 200 Hz–an annual raw data volume of 3.2 petabytes. With derived and simulated data, the total volume may approach 10 petabytes per year. Scale, however, is not the only challenge. In the Large Hadron Collider (LHC) experiments, the preponderance of computing power will come from outside the host laboratory. More significantly, no single site will host a complete copy of the event store–data will be distributed, not simply replicated for convenience, and many physics analyses will routinely require distributed (grid) computing. This paper uses the emerging ATLAS computing model to provide a glimpse of how next-generation event stores are taking shape, touching on key issues in navigation, distribution, scale, coherence, data models and representation, metadata infrastructure, and the role(s) of databases in event store management.