Architecting a 35 PB distributed parallel file system for science
Alberto Chiusole, HPC DevOps Engineer @ Seqera Labs
Perlmutter is the newest supercomputer at Berkeley Lab, California, and features a whopping 35 PB all-flash Lustre file system. Let’s dive into its architecture, showing some early performance figures and unique performance considerations, using low-level Lustre tests that achieve over 90% of the theoretical bandwidth of the SSDs, to showcase how Perlmutter achieves the performance of a burst buffer and the resilience of a scratch file system. Lastly, some performance considerations unique to an all-flash Lustre file system, along with tips on how better I/O patterns can make the most of such powerful architectures.