Leveraging Cloud Services for the Earthscope Data Repositories
Description:
EarthScope Data Service Staff are designing and developing a cloud-based platform to replace on-premise systems that historically have been operated by the UNAVCO and IRIS facilities under the SAGE and GAGE awards. Our goals for this Common Cloud Platform (CCP) project include leveraging cloud services to implement operational improvements such as on-demand scalability, increased robustness, and improved data accessibility. The focus of our work to date has prioritized three main areas of development. We are replacing the core of our existing real-time data ingestion and distribution system using AWS managed Kafka streams. Kafka’s Pub/Sub capabilities let us capture data that is then consumed by multiple downstream processes, such as archiving, derivative product generation, stream exports, etc. Joining these streams allow us to dynamically modify combined streams as we add, delete or modify stations. We use container-based consumers, managed in ECS, together with topic partitions to support horizontal scaling of computationally expensive operations. Additionally, Kafka’s append-only design lets us operate without interruption during updates. We are also developing new Analysis Ready, Cloud Optimized (ARCO) geophysical data containers based on TileDB multidimensional arrays. We will use these containers to store geophysical datasets in normalized formats, better enabling export to multiple target formats (e.g RINEX 2, 3, & 4, SEG-Y, miniSEED). TileDB support for multidimensional slicing, compression, and optimization for access in object storage, make it an attractive option for storing and performantly accessing complex data. Furthermore, we are replacing our primary dataflow system with a fully serverless event based design. The serverless architecture scales dynamically in order to accommodate transient spikes in batch dataflow submissions and will also be utilized to Extract, Transform, and Load (ETL) our existing data assets into the new system. We are excited to present our successes and challenges as we transition from on-prem hosting to a cloud native environment.
Session: Geophysical Data Analysis in Cloud Computing Environments [Poster]
Type: Poster
Date: 4/18/2023
Presentation Time: 08:00 AM (local time)
Presenting Author: Henry Berglund
Student Presenter: No
Invited Presentation:
Authors
Chad Trabant chad.trabant@earthscope.org EarthScope Consortium |
Henry Berglund Presenting Author Corresponding Author henry.berglund@earthscope.org EarthScope Consortium |
David Mencin david.mencin@earthscope.org EarthScope Consortium |
Jerry Carter jerry.carter@earthscope.org EarthScope Consortium |
Rob Casey rob.casey@earthscope.org EarthScope Consortium |
Charlie Sievers charlie.sievers@earthscope.org EarthScope Consortium |
Gillian Sharer gillian.sharer@earthscope.org EarthScope Consortium |
|
|
Leveraging Cloud Services for the Earthscope Data Repositories
Category
Geophysical Data Analysis in Cloud Computing Environments