Parallel Processing of Large Seismic Data Sets With Mspass
Description:
The size and accessibility of seismology data have undergone a transformative change due to large-scale projects like the USArray component of Earthscope. This has ushered in an era of unprecedented data resources for the seismology community. Collaborative endeavors like SCOPED (Seismic COmputational Platform for Empowering Discovery) aim to provide data and computation as a service, with MsPASS (Massive Parallel Analysis System for Seismologists) as a pivotal component within this platform. MsPASS is open-source, fully containerized, operates on parallel schedulers, and utilizes a NoSQL database for data management. These features make it the only framework in existence today for generic seismic processing on systems of any scale. The newest release of MsPASS (v2.0) has major enhancements in parallel IO capabilities, prototype methods for working on cloud computering systems, and improvements to documentation. Current work is focused on added interoperability with existing Machine Learning packages, such as SeisBench and PhaseNet. We will demonstrate current capabilities with results from processing a large dataset of all broadband channels located within the contiguous United States that operated during the USArray recording period. MsPASS can also leverage AWS Lambda functions for preprocessing data hosted on the cloud. We demonstrate the improvement in throughput performance with a workflow processing the SCEDC dataset on AWS. Another notable development is the seamless integration of MsPASS into large HPC systems through the SCOPED Gateway, a science gateway facilitating seismic data processing as a service over the web. This not only boosts the accessibility and flexibility of MsPASS but also significantly lowers the intellectual entry barrier often encountered with new software systems. With these new developments, we aim to transform MsPASS into a community tool for large-scale seismic data processing, thereby enhancing capabilities, fostering collaboration, and innovating data mining within the seismology community.
Session: Leveraging Cutting-Edge Cyberinfrastructure for Large Scale Data Analysis and Education - I
Type: Oral
Date: 5/2/2024
Presentation Time: 04:45 PM (local time)
Presenting Author: Chenxiao
Student Presenter: Yes
Invited Presentation:
Authors
Chenxiao Wang Presenting Author Corresponding Author chenxiaowang@utexas.edu University of Texas at Austin |
Yinzhi Wang iwang@tacc.utexas.edu University of Texas at Austin |
Gary Pavlis pavlis@indiana.edu Indiana University |
Sasmita Mohapatra sasmita@utdallas.edu University of Texas at Dallas |
Jinxing Ma jinxin.ma@utexas.edu University of Texas at Austin |
|
|
|
|
Parallel Processing of Large Seismic Data Sets With Mspass
Category
Leveraging Cutting-Edge Cyberinfrastructure for Large Scale Data Analysis and Education