Averaging#
Routines for averaging visibility data.
Time and Channel Averaging#
The routines in this section average row-based samples by:
Averaging samples of consecutive time values into bins defined by an period of
time_bin_secs
seconds.Averaging channel data into equally sized bins of
chan_bin_size
.
In order to achieve this, a baseline x time ordering is established over the input data where baseline corresponds to the unique (ANTENNA1, ANTENNA2) pairs and time corresponds to the unique, monotonically increasing TIME values associated with the rows of a Measurement Set.
Baseline |
T0 |
T1 |
T2 |
T3 |
T4 |
---|---|---|---|---|---|
(0, 0) |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
(0, 1) |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
(0, 2) |
0.1 |
0.2 |
X |
0.4 |
0.5 |
(1, 1) |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
(1, 2) |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
(2, 2) |
0.1 |
0.2 |
0.3 |
0.4 |
0.5 |
It is possible for times or baselines to be missing. In the above example, T2 is missing for baseline (0, 2).
Warning
The above requires unique lexicographical combinations of (TIME, ANTENNA1, ANTENNA2). This can usually be achieved by suitably partitioning input data on indexing rows, DATA_DESC_ID and SCAN_NUMBER in particular.
For each baseline, adjacent time’s are assigned to a bin
if \(h_c - h_e/2 - (l_c - l_e/2) <\) time_bin_secs
, where
\(h_c\) and \(l_c\) are the upper and lower time and
\(h_e\) and \(l_e\) are the upper and lower intervals,
taken from the INTERVAL column.
Note that no distinction is made between flagged and unflagged data
when establishing the endpoints in the bin.
The reason for this is that the Measurement Set v2.0 Specification specifies that TIME and INTERVAL columns are defined as containing the nominal time and period at which the visibility was sampled. This means that their values includie valid, flagged and missing data. Thus, averaging a regular high-resolution baseline x htime grid should produce a regular low-resolution baseline x ltime grid (htime > ltime) in the presence of bad data
By contrast, other columns such as TIME_CENTROID and EXPOSURE contain the effective time and period as they exclude missing and bad data. Their increased accuracy, and therefore variability means that they are unsuitable for establishing a grid over the data.
To summarise, the averaged times in each bin establish a map:
from possibly unordered input rows.
to a reduced set of output rows ordered by averaged
(TIME, ANTENNA1, ANTENNA2)
.
Flagged Data Handling#
Both FLAG_ROW and FLAG columns may be supplied to the averager, but they should be consistent with each other. The averager will throw an exception if this is not the case, rather than making an assumption as to which is correct.
When provided with flags, the averager will output averages for bins that are completely flagged.
Part of the reason for this is that the specifies that the TIME and INTERVAL columns represent the nominal time and interval values. This means that they should represent valid as well as flagged or missing data in their computation.
By contrast, most other columns such as TIME_CENTROID and EXPOSURE, contain the effective values and should only include valid, unflagged data.
To support this:
TIME and INTERVAL are averaged using both flagged and unflagged samples.
Other columns, such as TIME_CENTROID are handled as follows:
If the bin contains some unflagged data, only this data is used to calculate average.
If the bin is completely flagged, the average of all samples (which are all flagged) will be used.
In both cases, a completely flagged bin will have it’s flag set.
To support the two cases, twice the memory of the output array is required to track both averages, but only one array of merged values is returned.
Guarantees#
Averaged output data will be lexicographically ordered by
(TIME, ANTENNA1, ANTENNA2)
TIME and INTERVAL columns always contain the nominal average and sum and therefore contain both and missing or unflagged data.
Other columns will contain the effective average and will contain only valid data except when all data in the bin is flagged.
Completely flagged bins will be set as flagged in both the nominal and effective case.
Certain columns are averaged, while others are summed, or simply assigned to the last value in the bin in the case of antenna indices.
Visibility data is averaged by multiplying and dividing by WEIGHT_SPECTRUM or WEIGHT or natural weighting, in order of priority.
\[\frac{\sum v_i w_i}{\sum w_i}\]
SIGMA_SPECTRUM is averaged by multiplying and dividing by WEIGHT_SPECTRUM or WEIGHT or natural weighting, in order of priority and availability.
SIGMA is only averaged with WEIGHT or natural weighting.
\[\sqrt{\frac{\sum w_i^2 \sigma_i^2}{(\sum w_i)^2}}\]
The following table summarizes the handling of each column in the main Measurement Set table:
Column |
Unflagged/Flagged sample handling |
Aggregation Method |
Required |
---|---|---|---|
TIME |
Nominal |
Mean |
Yes |
INTERVAL |
Nominal |
Sum |
Yes |
ANTENNA1 |
Nominal |
Assigned to Last Input |
Yes |
ANTENNA2 |
Nominal |
Assigned to Last Input |
Yes |
TIME_CENTROID |
Effective |
Mean |
No |
EXPOSURE |
Effective |
Sum |
No |
FLAG_ROW |
Effective |
Set if All Inputs Flagged |
No |
UVW |
Effective |
Mean |
No |
WEIGHT |
Effective |
Sum |
No |
SIGMA |
Effective |
Weighted Mean |
No |
DATA (vis) |
Effective |
Weighted Mean |
No |
FLAG |
Effective |
Set if All Inputs Flagged |
No |
WEIGHT_SPECTRUM |
Effective |
Sum |
No |
SIGMA_SPECTRUM |
Effective |
Weighted Mean |
No |
The following SPECTRAL_WINDOW sub-table columns are averaged as follows:
Column |
Aggregation Method |
---|---|
CHAN_FREQ |
Mean |
CHAN_WIDTH |
Sum |
EFFECTIVE_BW |
Sum |
RESOLUTION |
Sum |
Dask Implementation#
The dask implementation chunks data up by row and channel and averages each chunk independently of values in other chunks. This should be kept in mind if one wishes to maintain a particular ordering in the output dask arrays.
Typically, Measurement Set data is monotonically ordered in time. To maintain this guarantee in output dask arrays, the chunks will need to be separated by distinct time values. Practically speaking this means that the first and second chunk should not both contain value time 0.1, for example.
Numpy#
|
Averages in time and channel. |
|
Averages in time and channel, dependent on baseline length. |
- africanus.averaging.time_and_channel(time, interval, antenna1, antenna2, time_centroid=None, exposure=None, flag_row=None, uvw=None, weight=None, sigma=None, chan_freq=None, chan_width=None, effective_bw=None, resolution=None, visibilities=None, flag=None, weight_spectrum=None, sigma_spectrum=None, time_bin_secs=1.0, chan_bin_size=1)[source]#
Averages in time and channel.
- Parameters:
- time
numpy.ndarray
Time values of shape
(row,)
.- interval
numpy.ndarray
Interval values of shape
(row,)
.- antenna1
numpy.ndarray
First antenna indices of shape
(row,)
- antenna2
numpy.ndarray
Second antenna indices of shape
(row,)
- time_centroid
numpy.ndarray
, optional Time centroid values of shape
(row,)
- exposure
numpy.ndarray
, optional Exposure values of shape
(row,)
- flag_row
numpy.ndarray
, optional Flagged rows of shape
(row,)
.- uvw
numpy.ndarray
, optional UVW coordinates of shape
(row, 3)
.- weight
numpy.ndarray
, optional Weight values of shape
(row, corr)
.- sigma
numpy.ndarray
, optional Sigma values of shape
(row, corr)
.- chan_freq
numpy.ndarray
, optional Channel frequencies of shape
(chan,)
.- chan_width
numpy.ndarray
, optional Channel widths of shape
(chan,)
.- effective_bw
numpy.ndarray
, optional Effective channel bandwidth of shape
(chan,)
.- resolution
numpy.ndarray
, optional Effective channel resolution of shape
(chan,)
.- visibilities
numpy.ndarray
or tuple ofnumpy.ndarray
, optional Visibility data of shape
(row, chan, corr)
. Tuples of visibilities arrays may be supplied, in which case tuples will be output.- flag
numpy.ndarray
, optional Flag data of shape
(row, chan, corr)
.- weight_spectrum
numpy.ndarray
, optional Weight spectrum of shape
(row, chan, corr)
.- sigma_spectrum
numpy.ndarray
, optional Sigma spectrum of shape
(row, chan, corr)
.- time_bin_secsfloat, optional
Maximum summed interval in seconds to include within a bin. Defaults to 1.0.
- chan_bin_sizeint, optional
Number of bins to average together. Defaults to 1.
- time
- Returns:
- namedtuple
A namedtuple whose entries correspond to the input arrays. Output arrays will be
None
if the inputs wereNone
.
Notes
The implementation currently requires unique lexicographical combinations of (TIME, ANTENNA1, ANTENNA2). This can usually be achieved by suitably partitioning input data on indexing rows, DATA_DESC_ID and SCAN_NUMBER in particular.
- africanus.averaging.bda(time, interval, antenna1, antenna2, time_centroid=None, exposure=None, flag_row=None, uvw=None, weight=None, sigma=None, chan_freq=None, chan_width=None, effective_bw=None, resolution=None, visibilities=None, flag=None, weight_spectrum=None, sigma_spectrum=None, max_uvw_dist=None, max_fov=3.0, decorrelation=0.98, time_bin_secs=None, min_nchan=1)[source]#
Averages in time and channel, dependent on baseline length.
- Parameters:
- time
numpy.ndarray
Time values of shape
(row,)
.- interval
numpy.ndarray
Interval values of shape
(row,)
.- antenna1
numpy.ndarray
First antenna indices of shape
(row,)
- antenna2
numpy.ndarray
Second antenna indices of shape
(row,)
- time_centroid
numpy.ndarray
, optional Time centroid values of shape
(row,)
- exposure
numpy.ndarray
, optional Exposure values of shape
(row,)
- flag_row
numpy.ndarray
, optional Flagged rows of shape
(row,)
.- uvw
numpy.ndarray
, optional UVW coordinates of shape
(row, 3)
.- weight
numpy.ndarray
, optional Weight values of shape
(row, corr)
.- sigma
numpy.ndarray
, optional Sigma values of shape
(row, corr)
.- chan_freq
numpy.ndarray
, optional Channel frequencies of shape
(chan,)
.- chan_width
numpy.ndarray
, optional Channel widths of shape
(chan,)
.- effective_bw
numpy.ndarray
, optional Effective channel bandwidth of shape
(chan,)
.- resolution
numpy.ndarray
, optional Effective channel resolution of shape
(chan,)
.- visibilities
numpy.ndarray
or tuple ofnumpy.ndarray
, optional Visibility data of shape
(row, chan, corr)
. Tuples of visibilities arrays may be supplied, in which case tuples will be output.- flag
numpy.ndarray
, optional Flag data of shape
(row, chan, corr)
.- weight_spectrum
numpy.ndarray
, optional Weight spectrum of shape
(row, chan, corr)
.- sigma_spectrum
numpy.ndarray
, optional Sigma spectrum of shape
(row, chan, corr)
.- max_uvw_distfloat, optional
Maximum UVW distance. Will be inferred from the UVW coordinates if not supplied.
- max_fovfloat
Maximum Field of View Radius. Defaults to 3 degrees.
- decorrelationfloat
Acceptable amount of decorrelation. This is a floating point value between 0.0 and 1.0.
- time_bin_secsfloat, optional
Maximum number of seconds worth of data that can be aggregated into a bin. Defaults to None in which case the value is only bounded by the decorrelation factor and the field of view.
- min_nchanint, optional
Minimum number of channels in an averaged sample. Useful in cases where imagers expect at least min_nchan channels. Defaults to 1.
- time
- Returns:
- namedtuple
A namedtuple whose entries correspond to the input arrays. Output arrays will be
None
if the inputs wereNone
. See the Notes for an explanation of the output formats.
Notes
In all cases arrays starting with
(row, chan)
and(row,)
dimensions are respectively averaged and expanded into a(rowchan,)
dimension, as the number of channels varies per output row.The output namedtuple contains an offsets array of shape
(out_rows + 1,)
encoding the starting offsets of each output row, as well as a single entry at the end such thatnp.diff(offsets)
produces the number of channels for each output row.avg = bda(...) time = avg.time[avg.offsets[:-1]] out_chans = np.diff(avg.offsets)
The implementation currently requires unique lexicographical combinations of (TIME, ANTENNA1, ANTENNA2). This can usually be achieved by suitably partitioning input data on indexing rows, DATA_DESC_ID and SCAN_NUMBER in particular.
Dask#
|
Averages in time and channel. |
|
Averages in time and channel, dependent on baseline length. |
- africanus.averaging.dask.time_and_channel(time, interval, antenna1, antenna2, time_centroid=None, exposure=None, flag_row=None, uvw=None, weight=None, sigma=None, chan_freq=None, chan_width=None, effective_bw=None, resolution=None, visibilities=None, flag=None, weight_spectrum=None, sigma_spectrum=None, time_bin_secs=1.0, chan_bin_size=1)[source]#
Averages in time and channel.
- Parameters:
- time
dask.array.Array
Time values of shape
(row,)
.- interval
dask.array.Array
Interval values of shape
(row,)
.- antenna1
dask.array.Array
First antenna indices of shape
(row,)
- antenna2
dask.array.Array
Second antenna indices of shape
(row,)
- time_centroid
dask.array.Array
, optional Time centroid values of shape
(row,)
- exposure
dask.array.Array
, optional Exposure values of shape
(row,)
- flag_row
dask.array.Array
, optional Flagged rows of shape
(row,)
.- uvw
dask.array.Array
, optional UVW coordinates of shape
(row, 3)
.- weight
dask.array.Array
, optional Weight values of shape
(row, corr)
.- sigma
dask.array.Array
, optional Sigma values of shape
(row, corr)
.- chan_freq
dask.array.Array
, optional Channel frequencies of shape
(chan,)
.- chan_width
dask.array.Array
, optional Channel widths of shape
(chan,)
.- effective_bw
dask.array.Array
, optional Effective channel bandwidth of shape
(chan,)
.- resolution
dask.array.Array
, optional Effective channel resolution of shape
(chan,)
.- visibilities
dask.array.Array
or tuple ofdask.array.Array
, optional Visibility data of shape
(row, chan, corr)
. Tuples of visibilities arrays may be supplied, in which case tuples will be output.- flag
dask.array.Array
, optional Flag data of shape
(row, chan, corr)
.- weight_spectrum
dask.array.Array
, optional Weight spectrum of shape
(row, chan, corr)
.- sigma_spectrum
dask.array.Array
, optional Sigma spectrum of shape
(row, chan, corr)
.- time_bin_secsfloat, optional
Maximum summed interval in seconds to include within a bin. Defaults to 1.0.
- chan_bin_sizeint, optional
Number of bins to average together. Defaults to 1.
- time
- Returns:
- namedtuple
A namedtuple whose entries correspond to the input arrays. Output arrays will be
None
if the inputs wereNone
.
Notes
The implementation currently requires unique lexicographical combinations of (TIME, ANTENNA1, ANTENNA2). This can usually be achieved by suitably partitioning input data on indexing rows, DATA_DESC_ID and SCAN_NUMBER in particular.
- africanus.averaging.dask.bda(time, interval, antenna1, antenna2, time_centroid=None, exposure=None, flag_row=None, uvw=None, weight=None, sigma=None, chan_freq=None, chan_width=None, effective_bw=None, resolution=None, visibilities=None, flag=None, weight_spectrum=None, sigma_spectrum=None, max_uvw_dist=None, max_fov=3.0, decorrelation=0.98, time_bin_secs=None, min_nchan=1, format='flat')[source]#
Averages in time and channel, dependent on baseline length.
- Parameters:
- time
dask.array.Array
Time values of shape
(row,)
.- interval
dask.array.Array
Interval values of shape
(row,)
.- antenna1
dask.array.Array
First antenna indices of shape
(row,)
- antenna2
dask.array.Array
Second antenna indices of shape
(row,)
- time_centroid
dask.array.Array
, optional Time centroid values of shape
(row,)
- exposure
dask.array.Array
, optional Exposure values of shape
(row,)
- flag_row
dask.array.Array
, optional Flagged rows of shape
(row,)
.- uvw
dask.array.Array
, optional UVW coordinates of shape
(row, 3)
.- weight
dask.array.Array
, optional Weight values of shape
(row, corr)
.- sigma
dask.array.Array
, optional Sigma values of shape
(row, corr)
.- chan_freq
dask.array.Array
, optional Channel frequencies of shape
(chan,)
.- chan_width
dask.array.Array
, optional Channel widths of shape
(chan,)
.- effective_bw
dask.array.Array
, optional Effective channel bandwidth of shape
(chan,)
.- resolution
dask.array.Array
, optional Effective channel resolution of shape
(chan,)
.- visibilities
dask.array.Array
or tuple ofdask.array.Array
, optional Visibility data of shape
(row, chan, corr)
. Tuples of visibilities arrays may be supplied, in which case tuples will be output.- flag
dask.array.Array
, optional Flag data of shape
(row, chan, corr)
.- weight_spectrum
dask.array.Array
, optional Weight spectrum of shape
(row, chan, corr)
.- sigma_spectrum
dask.array.Array
, optional Sigma spectrum of shape
(row, chan, corr)
.- max_uvw_distfloat, optional
Maximum UVW distance. Will be inferred from the UVW coordinates if not supplied.
- max_fovfloat
Maximum Field of View Radius. Defaults to 3 degrees.
- decorrelationfloat
Acceptable amount of decorrelation. This is a floating point value between 0.0 and 1.0.
- time_bin_secsfloat, optional
Maximum number of seconds worth of data that can be aggregated into a bin. Defaults to None in which case the value is only bounded by the decorrelation factor and the field of view.
- min_nchanint, optional
Minimum number of channels in an averaged sample. Useful in cases where imagers expect at least min_nchan channels. Defaults to 1.
- time
- Returns:
- namedtuple
A namedtuple whose entries correspond to the input arrays. Output arrays will be
None
if the inputs wereNone
. See the Notes for an explanation of the output formats.
Notes
In all cases arrays starting with
(row, chan)
and(row,)
dimensions are respectively averaged and expanded into a(rowchan,)
dimension, as the number of channels varies per output row.The output namedtuple contains an offsets array of shape
(out_rows + 1,)
encoding the starting offsets of each output row, as well as a single entry at the end such thatnp.diff(offsets)
produces the number of channels for each output row.avg = bda(...) time = avg.time[avg.offsets[:-1]] out_chans = np.diff(avg.offsets)
The implementation currently requires unique lexicographical combinations of (TIME, ANTENNA1, ANTENNA2). This can usually be achieved by suitably partitioning input data on indexing rows, DATA_DESC_ID and SCAN_NUMBER in particular.