Project

General

Profile

Wiki » History » Version 20

« Previous - Version 20/24 (diff) - Next » - Current version
John West, 05/21/2013 04:03 PM


EMERALD

EMERALD is a tool for requesting, downloading, reviewing, and managing seismic event data. We use the term "event data" as opposed to "continuous data" -- event data is considered to be time series of finite length, usually associated with arrivals of particular seismic phases from a given event at a given station.

The acronym stands for Explore, Manage, Edit, Reduce, & Analyze Large Datasets.

Project Goals

EMERALD was created to help seismologists deal with three major issues affecting those who work with large quantities of seismic event data. These are:

(1) Rapid review of seismograms

We are of the opinion that seismology is best done by looking at seismograms. While autonomous methods are being created to handle some aspects of seismology, creation of high-quality datasets still requires visual review by researchers. Yet, with the advent of large deployments of hundreds or thousands of seismometers, the sheer number of available seismograms has made some traditional methods of review cumbersome. EMERALD is designed to give seismologists the power to review large numbers of seismic traces quickly and choose those to include in the data set for further processing. EMERALD is accessed through a web browser, so reviewers can switch between desktop, laptop, and mobile devices.

(2) Metadata changes

Station metadata such as location, seismometer orientation, and device response characteristics are subject to change, and occasionally are entered incorrectly into the system and changed at a later date. For some investigations, errors or changes in metadata can make significant differences in the results. EMERALD is unique in including a service which periodically polls for updated metadata, compares it to stored values, and alerts the user via e-mail to changes. The user can then decide how best to handle the changes.

(3) Sharing of processing methods and codes

Individual investigators have typically developed processing codes to meet individual needs, usually operating on sets of seismic files in SAC, SEED or some other format. File and directory naming schemes are usually specific to that researcher, and often to that specific project; furthermore file path information and naming schemes are often hard-coded into the method. This makes sharing of methods between researchers difficult and leads to wasted effort in reinventing existing methods. EMERALD is designed as a framework around a common data storage system (a relational database in the PostgreSQL engine), so methods can be easily shared between users. A remote updating feature allows user-developed methods to reside in a common repository for access by any EMERALD user.

Using EMERALD

This is a short overview of the organization, methods, and terminology within EMERALD.

Datasets

The main organizational units of seismic data and metadata are Datasets. In general, a Dataset is analogous to a research project. Each user can have multiple Datasets, applying different methods to differing seismic data in each. Notifications to the user (by e-mail or text message), seismic phases of interest, and metadata updates are all managed at the Dataset level and can differ between projects.

Within each Dataset are multiple, user-created Subsets. Subsets can act as checkpoints, i.e., a user can create a new Subset, apply processes to the data in that Subset, decide that method is a dead end, delete that Subset and return to the previous one to try again. The initial download or import into a Dataset is always into Subset 0: Raw Data; these traces cannot be modified by the user except by moving them into a new Subset and thus are always available for reprocessing. Subsets can also be branch points, where a user decides, for example, to filter the raw data into multiple frequency bands giving each its own Subset.

Reviewing Seismic Traces

Users can review seismic traces organized by event, by station, or by event/station combination. Any station, event, event/station combination, or individual trace can be flag as rejected -- this does not delete the trace(s), but traces flagged as rejected are not passed on to subsequent Subsets.

Calculations and Processes

Methods applied to seismic data are broadly categorized as Calculations or Processes. Calculations derive values from the seismic data; examples of Calculations would be phase arrival times, signal-to-noise ratios, or source-to-station great circle distances.

Processes actually modify the seismic time series data. Examples of Processes would be filtering, selecting a time window around a particular phase, and rotating 3-channel data to radial and transverse components. When applying Processes the user can decide whether to create a new Subset. Processes applied to the current Subset conserve disk space but are not easily reversed; processes which create a new Subset can be removed just by deleting that Subset.

Any combination of Calculations and Processes can be combined into an Automation Batch, allowing frequently used combinations to be saved and reused.

Exporting

The selected, processed seismic traces can be exported as SAC files for subsequent processing and analysis. The user can control the file and directory naming conventions and structure for export, to maximize compatibility with existing external codes. Conversely, users may elect to develop new methods directly in EMERALD.

Getting Started

The following guides are available to help get you started with using EMERALD and downloading data.

Quick-Start Guide
How To Request Data
EMERALD Hints & Tips