MsPASS: A Parallel Processing Framework for Seismology
Session: Applications and Technologies in Large-Scale Seismic Analysis
Type: Oral
Date: 4/28/2020
Time: 08:30 AM
Room: 120 + 130
Description:
Over the past decade, the huge success in many large-scale projects like the USArray component of Earthscope gave rise to a massive increase in the data volume available to the seismology community. We assert that the software infrastructure of the field has not kept up with parallel developments in ‘big data’ sciences. As a step towards enabling research at the extreme scale to more of the seismology community, we are developing a new framework for seismic data processing and management we call Massive Parallel Analysis System for Seismologists (MsPASS). MsPASS leverages several existing technologies: (1) a scalable parallel processing framework based on a dataflow computation model (Spark), (2) a NoSQL database system centered on document store (MongoDB) and (3) a container-based virtualization environment (Docker and Singularity). The system builds on the widely accepted ObsPy toolkit, with extension built on a rewrite of the SEISPP package currently in Antelope contrib. The synthesis of these components promises to provide flexibility to adapt to a wide range of data processing workflows. The container technology has proven invaluable in deployment to a wide range of computing resources. The intrinsic parallelism made possible by Spark and MongoDB shows significant performance improvements. The use of Python as the language to drive the parallel processing workflows promises to reduce efforts in code development and provides a mechanism to parallelize many existing algorithms in ObsPy and SEISPP. We are striving to build a simple API that can make the package approachable by mere mortals.
Presenting Author: Yinzhi Wang
Authors
Yinzhi Wang iwang@tacc.utexas.edu University of Texas at Austin, Austin, Texas, United States Presenting Author
Corresponding Author
|
Gary Pavlis pavlis@indiana.edu Indiana University, Bloomington, Indiana, United States |
MsPASS: A Parallel Processing Framework for Seismology
Category
Applications and Technologies in Large-Scale Seismic Analysis