seismic.xcorqc package

Submodules

seismic.xcorqc.analytic_plot_utils module

Utility functions supporting plotting for cross-correlation visualizations.

seismic.xcorqc.analytic_plot_utils.distance(origin, destination)[source]

Compute the distance in km between origin coordinates and destination coordinates. The coordinates are (latitude, longitude) couplets in units of degrees.

Parameters
Returns

Epicentral distance between origin and destination in kilometres

Return type

float

seismic.xcorqc.analytic_plot_utils.drawBBox(min_lon, min_lat, max_lon, max_lat, base_map, **kwargs)[source]

Draw bounding box on a basemap

Parameters
  • min_lon (float) – Minimum longitude

  • min_lat (float) – Minimum latitude

  • max_lon (float) – Maximum longitude

  • max_lat (float) – Maximum latitude

  • base_map (mpl_toolkits.basemap.Basemap) – Basemap on which to draw the bounding box

seismic.xcorqc.analytic_plot_utils.timestamps_to_plottable_datetimes(time_series)[source]

Convert a series of float (or equivalent) timestamp values to matplotlib plottable datetimes.

Parameters

time_series (iterable container) – Series of timestamps

Returns

Equivalent series of plottable timestamps

Return type

numpy.array(‘datetime64[ms]’) with millisecond resolution

seismic.xcorqc.client_data module

seismic.xcorqc.client_data.main()[source]

seismic.xcorqc.correlator module

Description:

Generates cross-correlations for data from staion-pairs in parallel

References:

CreationDate: 11/07/18

Developer: rakib.hassan@ga.gov.au

Revision History:

LastUpdate: 11/07/18 RH LastUpdate: dd/mm/yyyy Who Optional description

class seismic.xcorqc.correlator.Dataset(asdf_file_name, netsta_list='*')[source]

Bases: object

get_closest_stations(netsta, other_dataset, nn=1)[source]
get_unique_station_pairs(other_dataset, nn=1)[source]
seismic.xcorqc.correlator.process(data_source1, data_source2, output_path, interval_seconds, window_seconds, window_overlap, window_buffer_length, resample_rate=None, taper_length=0.05, nearest_neighbours=1, fmin=None, fmax=None, netsta_list1='*', netsta_list2='*', pairs_to_compute=None, start_time='1970-01-01T00:00:00', end_time='2100-01-01T00:00:00', instrument_response_inventory=None, instrument_response_output='vel', water_level=50, clip_to_2std=False, whitening=False, whitening_window_frequency=0, one_bit_normalize=False, read_buffer_size=10, ds1_zchan=None, ds1_nchan=None, ds1_echan=None, ds2_zchan=None, ds2_nchan=None, ds2_echan=None, corr_chan=None, envelope_normalize=False, ensemble_stack=False, restart=False, dry_run=False, no_tracking_tag=False)[source]

DATA_SOURCE1: Text file containing paths to ASDF files

DATA_SOURCE2: Text file containing paths to ASDF files

OUTPUT_PATH: Output folder

INTERVAL_SECONDS: Length of time window (s) over which to compute cross-correlations; e.g. 86400 for 1 day

WINDOW_SECONDS: Length of stacking window (s); e.g 3600 for an hour. INTERVAL_SECONDS must be a multiple of

WINDOW_SECONDS; no stacking is performed if they are of the same size.

seismic.xcorqc.fft module

seismic.xcorqc.fft.ndflip(a)[source]

Inverts an n-dimensional array along each of its axes

seismic.xcorqc.generate_dispersion_curves module

Description:

Runs Rhys Hawkins’ code in parallel to generate dispersion curves based on cross-correlations of station-pairs. Note that this script call shell scripts that are expected to be in the current working directory.

todo: remove dependence on shell scripts.

References:

CreationDate: 10/01/20

Developer: rakib.hassan@ga.gov.au

Revision History:

LastUpdate: 10/01/20 RH LastUpdate: dd/mm/yyyy Who Optional description

seismic.xcorqc.generate_dispersion_curves.kill(proc_pid)[source]
seismic.xcorqc.generate_dispersion_curves.runprocess(cmd, get_results=False)[source]
seismic.xcorqc.generate_dispersion_curves.split_list(lst, npartitions)[source]

seismic.xcorqc.generate_test_data module

seismic.xcorqc.generate_test_data.generateStationTestData(sta)[source]

seismic.xcorqc.utils module

class seismic.xcorqc.utils.ProgressTracker(output_folder, restart_mode=False)[source]

Bases: object

increment()[source]
seismic.xcorqc.utils.drop_bogus_traces(st, sampling_rate_cutoff=1)[source]

Removes spurious traces with suspect sampling rates. :param st: Obspy Stream :param sampling_rate_cutoff: sampling rate threshold :return: Input stream is updated inplace

seismic.xcorqc.utils.getStationInventory(master_inventory, inventory_cache, netsta)[source]
seismic.xcorqc.utils.get_stream(fds, net, sta, cha, start_time, end_time, baz=None, trace_count_threshold=200, logger=None, verbose=1)[source]
seismic.xcorqc.utils.rtp2xyz(r, theta, phi)[source]
seismic.xcorqc.utils.split_list(lst, npartitions)[source]
seismic.xcorqc.utils.xyz2rtp(x, y, z)[source]

seismic.xcorqc.validate_xcorr_setup module

seismic.xcorqc.validate_xcorr_setup.test_setup()[source]

seismic.xcorqc.xcorqc module

Description:

Cross-correlation functionality

References:

CreationDate: 29/06/17

Developer: laurence.davies@ga.gov.au

Revision History:

LastUpdate: 29/06/17 LD First commit of xcor code. LastUpdate: 13/07/17 LD Fixed xcor filtering issue when traces have different sample rates. LastUpdate: 11/08/17 RH Implement ASDF-based cross-correlation workflow LastUpdate: 11/07/18 RH Implemented parallel cross-correlator LastUpdate: 19/07/18 RH Implemented cross-correlation approaches described in Habel et al. 2018

LastUpdate: dd/mm/yyyy Who Optional description

seismic.xcorqc.xcorqc.IntervalStackXCorr(refds, tempds, start_time, end_time, ref_net_sta, temp_net_sta, ref_sta_inv, temp_sta_inv, instrument_response_output, water_level, ref_cha, temp_cha, baz_ref_net_sta, baz_temp_net_sta, resample_rate=None, taper_length=0.05, buffer_seconds=864000, interval_seconds=86400, window_seconds=3600, window_overlap=0.1, window_buffer_length=0, flo=None, fhi=None, clip_to_2std=False, whitening=False, whitening_window_frequency=0, one_bit_normalize=False, envelope_normalize=False, ensemble_stack=False, outputPath='/tmp', verbose=1, tracking_tag='')[source]

This function rolls through two ASDF data sets, over a given time-range and cross-correlates waveforms from all possible station-pairs from the two data sets. To allow efficient, random data access asdf data sources, an instance of a SeisDB object, instantiated from the corresponding Json database is passed in (tempds_db) – although this parameter is not mandatory, data-access from large ASDF files will be slow without it.

Station-ids to be processed from the two data-sources can be specified as lists of strings, while wildcards can be used to process all stations. Data is fetched from the sources in chunks to limit memory usage and data-windows with gaps are discarded.

Cross-correlation results are written out for each station-pair, in the specified folder, as NETCDF4 files. Panoply (https://www.giss.nasa.gov/tools/panoply/), already installed on the NCI VDIs can be used to interrogate these results.

Parameters
  • refds (FederatedASDFDataSet) – FederatedASDFDataSet containing reference-station data

  • tempds (FederatedASDFDataSet) – FederatedASDFDataSet containing temporary-stations data

  • ref_net_sta (str) – Network.Station for the reference Dataset.

  • temp_net_sta (str) – Network.Station for the temporary Dataset.

  • ref_sta_inv (Inventory) – Inventory containing instrument response for station

  • temp_sta_inv (Inventory) – Inventory containing instrument response for station

  • instrument_response_output (str) – Output of instrument response correction; can be either ‘vel’ or ‘disp’

  • water_level (float) – Water-level used during instrument response correction

  • ref_cha (str) – Channel name for the reference Dataset

  • temp_cha (str) – Channel name for the temporary Dataset

  • baz_ref_net_sta (float) – Back-azimuth of ref station from temp station in degrees

  • baz_temp_net_sta (float) – Back-azimuth of temp station from ref station in degrees

  • resample_rate (float) – Resampling rate (Hz). Applies to both data-sets

  • taper_length (float) – Taper length as a fraction of window length

  • buffer_seconds (int) – The amount of data to be fetched per call from the ASDFDataSets, because we may not be able to fetch all the data (from start_time to end_time) at once. The default is set to 10 days and should be a multiple of interval_seconds.

  • interval_seconds (int) – The interval in seconds, over which cross-correlation windows are stacked. Default is 1 day.

  • window_seconds (int) – Length of cross-correlation window in seconds. Default is 1 hr.

  • window_overlap (float) – Window overlap fraction. Default is 0.1.

  • window_buffer_length (float) – Buffer length as a fraction of ‘window-seconds’ around actual data windows of interest. This helps exclude effects of tapering and other edge artefacts from data windows before cross-correlation. Default is 0

  • flo (float) – Lower frequency for Butterworth bandpass filter

  • fhi (float) – Upper frequency for Butterworth bandpass filter

  • clip_to_2std (bool) – Clip data in each window to +/- 2 standard deviations

  • whitening (bool) – Apply spectral whitening

  • whitening_window_frequency (float) – Window frequency (Hz) used to determine length of averaging window for smoothing spectral amplitude

  • one_bit_normalize (bool) – Apply one-bit normalization to data in each window

  • envelope_normalize (bool) – Envelope via Hilbert transforms and normalize

  • ensemble_stack (bool) – Outputs a single CC function stacked over all data for a given station-pair

  • verbose (int) – Verbosity of printouts. Default is 1; maximum is 3.

  • tracking_tag (str) – File tag to be added to output file names so runtime settings can be tracked

  • outputPath (str) – Folder to write results to

Param

start_time: Start-time (UTCDateTime format) for data to be used in cross-correlation

Param

end_time: End-time (UTCDateTime format) for data to be used in cross-correlation

Returns

1: 1d np.array with time samples spanning [-window_samples+dt:window_samples-dt] 2: A dictionary of 2d np.arrays containing cross-correlation results for each station-pair. Rows in each 2d array represent number of interval_seconds processed and columns represent stacked samples of length window_seconds. 3: A dictionary of 1d np.arrays containing number of windows processed, within each interval_seconds period, for each station-pair. These Window-counts could be helpful in assessing robustness of results.

seismic.xcorqc.xcorqc.setup_logger(name, log_file, level=20)[source]

Function to setup a logger; adapted from stackoverflow

seismic.xcorqc.xcorqc.taper(tr, taperlen)[source]
seismic.xcorqc.xcorqc.whiten(a, sampling_rate, window_freq=0)[source]

Applies spectral whitening to trace samples. When window_freq=0, all frequency bins are normalized by their amplitudes, i.e. all frequency bins end up with an amplitude of 1. When window_freq is nonzero, a smoothed amplitude spectrum (smoothing window length is as computed below) is used to normalize the frequency bins.

Parameters
  • a – trace samples

  • sampling_rate – sampling rate

  • window_freq – smoothing window length (Hz)

Returns

spectrally whitened samples

seismic.xcorqc.xcorqc.xcorr2(tr1, tr2, sta1_inv=None, sta2_inv=None, instrument_response_output='vel', water_level=50.0, window_seconds=3600, window_overlap=0.1, window_buffer_length=0, interval_seconds=86400, taper_length=0.05, resample_rate=None, flo=None, fhi=None, clip_to_2std=False, whitening=False, whitening_window_frequency=0, one_bit_normalize=False, envelope_normalize=False, verbose=1, logger=None)[source]
seismic.xcorqc.xcorqc.zeropad(tr, padlen)[source]
seismic.xcorqc.xcorqc.zeropad_ba(tr, padlen)[source]

seismic.xcorqc.xcorr_station_clock_analysis module

Functions for computing estimated GPS clock corrections based on station pair cross-correlation and plotting in standard layout.

class seismic.xcorqc.xcorr_station_clock_analysis.XcorrClockAnalyzer(src_file, time_window, snr_threshold, pcf_cutoff_threshold)[source]

Bases: object

Helper class for bundling of preprocessed cross-correlation data before plotting or subsequent processing.

do_clustering(coeffs)[source]

Do DBSCAN clustering on the corrections.

Parameters

coeffs (tuple(float, float, float)) – Triplet of distance coefficients, corresponding to the sensitivity of the clustering to point separation along 1) x-axis (time), 2) y-axis (correction) and 3) slope (drift rate)

Returns

Results of sklearn.cluster.dbscan (refer to third party documentation)

do_spline_regression(group_ids, regression_degree)[source]

Do univariate spline regression on each cluster of points.

Parameters
  • group_ids – Cluster IDS generated from do_clustering()

  • regression_degree – Desired degree of curve fit for each cluster, one for each non-negative cluster ID

Returns

dict of regressors that can be applied to arbitrary time values

do_spline_resampling(group_ids, regressors, sampling_period_seconds)[source]

Using pre-computed regressors, resample every cluster at a prescribed frequency

Parameters
  • group_ids

  • regressors

  • sampling_period_seconds

Returns

get_corrections_time_series()[source]
plot_clusters(ax, ids, coeffs, stn_code='')[source]

Plot the distinct clusters color coded by cluster ID, with underlying corrections shown in gray.

Parameters
  • ax

  • ids

  • coeffs

Returns

plot_regressors(ax, ids, regressors, stn_code='')[source]

Plot regressor functions on top of original data

Parameters
  • ax

  • ids

  • regressors

  • stn_code

Returns

plot_resampled_clusters(ax, ids, resampled_corrections, stn_code='')[source]

Plot resampled regressor functions per cluster on top of original data

Parameters
  • ax

  • ids

  • regressors

  • stn_code

Returns

seismic.xcorqc.xcorr_station_clock_analysis.batch_process_folder(folder_name, dataset, time_window, snr_threshold, pearson_cutoff_factor=0.5, save_plots=True)[source]

Process all the .nc files in a given folder into graphical visualizations.

Parameters
  • folder_name (str) – Path to process containing .nc files

  • dataset (FederatedASDFDataset) – Dataset to be used to ascertain the distance between stations.

  • time_window (float) – Lag time window to plot (plus or minus this value in seconds)

  • snr_threshold (float) – Minimum signal to noise ratio for samples to be included into the clock lag estimate

  • save_plots – Whether to save plots to file, defaults to True

  • save_plots – bool, optional

seismic.xcorqc.xcorr_station_clock_analysis.batch_process_xcorr(src_files, dataset, time_window, snr_threshold, pearson_cutoff_factor=0.5, save_plots=True, underlay_rcf_xcorr=False, force_save=False)[source]

Process a batch of .nc files to generate standard visualization graphics. PNG files are output alongside the source .nc file. To suppress file output, set save_plots=False.

Parameters
  • src_files (Iterable of str) – List of files to process

  • dataset (FederatedASDFDataset) – Dataset to be used to ascertain the distance between stations.

  • time_window (float) – Lag time window to plot (plus or minus this value in seconds)

  • snr_threshold (float) – Minimum signal to noise ratio for samples to be included into the clock lag estimate

  • save_plots – Whether to save plots to file, defaults to True

  • save_plots – bool, optional

  • underlay_rcf_xcorr – Show the individual correlation of row sample with RCF beneath the computed time lag, defaults to False

  • underlay_rcf_xcorr – bool, optional

Returns

List of files for which processing failed, and associated error.

Return type

list(tuple(str, str))

seismic.xcorqc.xcorr_station_clock_analysis.plot_estimated_timeshift(ax, x_lag, y_times, correction, annotation=None, row_rcf_crosscorr=None)[source]
seismic.xcorqc.xcorr_station_clock_analysis.plot_pearson_corr_coeff(ax, rcf, ccf_masked, y_times)[source]
seismic.xcorqc.xcorr_station_clock_analysis.plot_reference_correlation_function(ax, x_lag, rcf, rcf_corrected, snr_threshold)[source]
seismic.xcorqc.xcorr_station_clock_analysis.plot_snr_histogram(ax, snr, time_window, nbins=10)[source]
seismic.xcorqc.xcorr_station_clock_analysis.plot_stacked_window_count(ax, x_nsw, y_times)[source]
seismic.xcorqc.xcorr_station_clock_analysis.plot_xcorr_file_clock_analysis(src_file, asdf_dataset, time_window, snr_threshold, pearson_correlation_factor, show=True, underlay_rcf_xcorr=False, pdf_file=None, png_file=None, title_tag='', settings=None)[source]
seismic.xcorqc.xcorr_station_clock_analysis.plot_xcorr_time_series(ax, x_lag, y_times, xcorr_data, use_formatter=False)[source]
seismic.xcorqc.xcorr_station_clock_analysis.read_correlator_config(nc_file)[source]

Read the correlator settings used for given nc file.

Parameters

nc_file (str) – File name of the .nc file containing the cross-correlation data.

Returns

Pandas Series with named fields whose values are the runtime settings used for the .nc file

Return type

pandas.Series

seismic.xcorqc.xcorr_station_clock_analysis.station_codes(filename)[source]

Convert a netCDF4 .nc filename generated by correlator to the corresponding station codes in the format NETWORK.STATION

Assumed format: NET1.STAT1.NET2.STA2.*.nc

Parameters

filename (str) – The .nc filename from which to extract the station and network codes

Returns

Station code for each station (code1, code2)

Return type

tuple(str, str)

seismic.xcorqc.xcorr_station_clock_analysis.station_distance(federated_ds, code1, code2)[source]

Determine the distance in km between a pair of stations at a given time.

Parameters
  • federated_ds (seismic.ASDFdatabase.FederatedASDFDataSet) – Federated dataset to query for station coordinates

  • code1 (str) – Station and network code in the format NETWORK.STATION for first station

  • code2 (str) – Station and network code in the format NETWORK.STATION for second station

Returns

Distance between stations in kilometers

Return type

float

Module contents