Caltech/USGS Southern California Earthquake Data Available in the Amazon Cloud (AWS)
Session: Applications and Technologies in Large-Scale Seismic Analysis
Type: Oral
Date: 4/23/2021
Presentation Time: 02:45 PM Pacific
Description:
The Southern California Earthquake Data Center (SCEDC) has made the Caltech/USGS Southern California Seismic Network (SCSN) data archive available in the cloud as part of the Amazon Open Dataset Program. The AWS bucket name is s3://scedc-pds and it is hosted in the us-west-2 (Oregon) region.
We describe the contents of this dataset and show that cloud based archives reduce time and efforts needed for completing research as compared to traditional data gathering from a data center and local data processing. We also present our reasoning behind design decisions such as archive organization and data formats and discuss what a cloud archive means for community standards in software and software APIs.
The main contents of the SCEDC/SCSN public data set are:
1. The SCSN event catalog (1932-present) and phase picks for these events in ascii format.
2. Continuous recorded waveforms (1999 to present) from 603 seismic stations recorded by the SCSN. Each file contains one channel day in mSEED format.
3. Event-windowed waveforms (1977-present) in mSEED format.
4. Metadata from CI stations in FDSN StationXML format
Users that process large volumes of data in ambient noise correlations, template matching, and machine-learning studies for example, will find that the I/O time is considerably reduced when the processing is done in the cloud in the same AWS region (us-west-2). I/O costs from the AWS public dataset are no-cost. We have put some simple scripts and examples at https://github.com/SCEDC/cloud that can be used as templates to get started with data processing. The poster will also present cost estimates for a variety of research activities to give users an idea of the processing speed, ease of operating in the cloud, and costs incurred working with a cloud archive. Such costs can be compared with the costs of purchasing a computer server and a disk array, and weeks or months spent on downloading and processing data.
Presenting Author: Ellen Yu
Student Presenter: No
Authors
Ellen Yu Presenting Author Corresponding Author eyu@caltech.edu Caltech |
Aparna Bhaskaran aparnab@caltech.edu Caltech |
Shang-Lin Chen schen@caltech.edu Caltech |
Rayomand Bhadha rayo@caltech.edu Caltech |
Zachary Ross zross@caltech.edu Caltech |
Egill Hauksson hauksson@caltech.edu Caltech |
Robert Clayton clay@gps.caltech.edu Caltech |
|
|
Caltech/USGS Southern California Earthquake Data Available in the Amazon Cloud (AWS)
Category
Applications and Technologies in Large-scale Seismic Analysis