We present an extensive quality-controlled dataset of waveforms of earthquakes recorded at regional distances. These waveforms are 5 minutes long and contain arrivals for the P, Pg, Pn, S, Sn and Sg phases, as well as event and station metadata. Each one of the examples in the dataset is required to have at least one of {P, Pg, Pn} arrivals and at least one of {S, Sg, Sn} arrivals. Arrivals in the dataset are recorded at a source-receiver distance between 1 and 20 degrees in three component instruments. After initially collecting over 3 million waveforms, we quality controlled the data using an ensemble of Machine Learning Models. First, we trained a Recurrent Neural Network that distinguishes between earthquake signals and synthetic noise. This model allows us to flag examples in the dataset for which there are labeled arrivals, but the waveforms do not show any distinguishable earthquake signal. On the other hand, given that 5 minutes is a long window, and many earthquakes can be recorded in such time, we used a fine-tuned version of our RNN to flag those examples for which there are multiple earthquakes, because only one of them is labeled. We show preliminary ML models trained on the dataset for seismic phase picking.
Session: Opportunities and Challenges for Machine Learning Applications in Seismology [Poster]
Type: Poster
Date: 4/19/2023
Presentation Time: 08:00 AM (local time)
Presenting Author: Albert L. Aguilar
Student Presenter: Yes
Invited Presentation:
Authors
Albert Aguilar
Presenting Author
Corresponding Author
aguilars@stanford.edu
Stanford University
Gregory Beroza
beroza@stanford.edu
Stanford University
A Dataset of Regional Earthquake Waveforms
Category
Opportunities and Challenges for Machine Learning Applications in Seismology