src.utils#

Module Contents#

Functions#

load_config(name)

Load a config .yml file for a specified dataset

composite_function(function_dict)

Return a composite function of all functions and kwargs specified in a

extract_lon_lat_box(ds, box, weighted_average[, ...])

Return a region specified by a range of longitudes and latitudes.

calculate_nino34(sst_anom[, sst_name])

Calculate the NINO3.4 index. The NINO3.4 index is calculated as the spatial average

calculate_dmi(sst_anom[, sst_name])

Calculate the Dipole Mode Index (DMI) for the Indian Ocean Dipole. The DMI is

calculate_sam(slp, clim_period[, groupby_dim, ...])

Calculate the Southern Annular Mode index from monthly data as defined by Gong, D.

calculate_nao(slp, clim_period[, groupby_dim, ...])

Calculate the Northern Atlantic Oscillation index from monthly data as defined by

calculate_amv(sst_anom[, sst_name])

Calculate the Atlantic Multi-decadal Variability (AMV)--also known as the Atlantic

calculate_ipo(sst_anom[, sst_name])

Calculate the tripolar pacific index for the Interdecadal Pacific Oscillation (IPO)

calculate_ohc300(temp[, depth_dim, temp_name])

Calculate the ocean heat content above 300m

calculate_wind_speed(u_v, u_name, v_name[, lon_dim, ...])

Calculate the wind speed

calculate_tmean_from_tmin_tmax(ds[, tmin_name, ...])

Estimate tmean as the average of tmin and tmax

calculate_ffdi(ds, clim_period, wind_from_components)

Returns the McArthur Forest Fire Danger Index following the formula provided

calculate_EHF(T[, T_p95_file, T_p95_period, ...])

Calculate the Excess Heat Factor (EHF) index, defined as:

calculate_EHF_severity(T[, T_p95_file, EHF_p85_file, ...])

Calculate the severity of the Excess Heat Factor index, defined as:

ensemble_mean(ds[, ensemble_dim])

Return the ensemble mean of the input array

greater_than(ds, value)

Return a boolean array with True where elements > value

where_greater_than(ds, value)

Return array with elements <= value masked to nan

add_CAFE_grid_info(ds)

Add CAFE grid info to a CAFE dataset that doesn't already have it

normalise_by_days_in_month(ds)

Normalise input array by the number of days in each month

convert_time_to_lead(ds[, time_dim, time_freq, ...])

Return provided array with time dimension converted to lead time dimension

truncate_latitudes(ds[, dp, lat_dim])

Return provided array with latitudes truncated to specified dp.

convert_calendar(ds, calendar[, time_dim])

Convert calendar, dropping invalid/surplus dates or inserting missing dates

rechunk(ds, **chunks)

Rechunk a dataset

select(ds, **selection)

Returns a new dataset with each array indexed by tick labels along the

add_attrs(ds, attrs[, variable])

Add attributes to a dataset

rename(ds, **names)

Rename all variables etc that have an entry in names

convert(ds, **conversion)

Convert variables in a dataset according to provided dictionary

keep_period(ds, period)

Keep only times outside of a specified period

_get_groupby_and_reduce_dims(ds, frequency)

Get the groupby and reduction dimensions for performing operations like

anomalise(ds, clim_period[, frequency])

Returns the anomalies of ds relative to its climatology over clim_period.

calculate_percentile_thresholds(ds, percentile, ...[, ...])

Returns the percentile values of ds over a provided period.

over_percentile_threshold(ds, percentile, ...[, ...])

Find which values in the input array are over a specified percentile

under_percentile_threshold(ds, percentile, ...[, ...])

Find which values in the input array are under a specified percentile

correct_bias(ds, obsv_file, period, frequency, method)

Correct the mean bias of ds relative to observations over a provided period

interpolate_to_grid_from_file(ds, file[, add_area, ...])

round_to_start_of_day(ds, dim)

Return provided array with specified time dimension rounded to the start of

round_to_start_of_month(ds, dim)

Return provided array with specified time dimension rounded to the start of

coarsen(ds, window_size[, start_points, dim])

Coarsen data, applying 'max' to all relevant coords and optionally starting

rolling_mean(ds, window_size[, start_points, dim])

Apply a rolling mean to the data, applying 'max' to all relevant coords and

resample(ds, freq[, start_points, min_samples, dim])

Resample data to a different temporal frequency by taking the mean

get_region_masks_from_shp(ds, shapefile, header)

Extract region masks according to a shapefile

average_over_NRM_super_clusters(ds)

Average the provided array over the NRM super cluster regions

mask_CAFEf6_reduced_dt(ds)

Mask out the ensemble members of CAFE-f6 that were run with a reduced timestep

gridarea_cdo(ds)

Returns the area weights computed using cdo's gridarea function

add_area_using_cdo_gridarea(ds[, lon_dim, lat_dim])

Add a area coordinate to the provided dataset containing the cell areas

max_chunk_size_MB(ds)

Get the max chunk size in a dataset

Attributes#

PROJECT_DIR

src.utils.PROJECT_DIR#
src.utils.load_config(name)#

Load a config .yml file for a specified dataset

Parameters
namestr

The path to the config file to load

src.utils.composite_function(function_dict)#

Return a composite function of all functions and kwargs specified in a provided dictionary

Parameters
function_dictdict

Dictionary with functions in this module to composite as keys and kwargs as values

src.utils.extract_lon_lat_box(ds, box, weighted_average, lon_dim='lon', lat_dim='lat')#

Return a region specified by a range of longitudes and latitudes.

Parameters
dsxarray Dataset or DataArray

The data to subset and average. Assumed to include an “area” Variable

boxiterable

Iterable with the following elements in this order: [lon_lower, lon_upper, lat_lower, lat_upper] where longitudes are specified between 0 and 360 deg E and latitudes are specified between -90 and 90 deg N

weighted_averageboolean

If True, reture the area weighted average over the region, otherwise return the region

lon_dimstr, optional

The name of the longitude dimension

lat_dimstr, optional

The name of the latitude dimension

src.utils.calculate_nino34(sst_anom, sst_name='sst')#

Calculate the NINO3.4 index. The NINO3.4 index is calculated as the spatial average of SST anomalies over the tropical Pacific region (5∘S–5∘N and 170–120∘ W).

Parameters
sst_anomxarray Dataset

Array of sst anomalies

sst_namestr, optional

The name of the sst variable in sst_anom

src.utils.calculate_dmi(sst_anom, sst_name='sst')#

Calculate the Dipole Mode Index (DMI) for the Indian Ocean Dipole. The DMI is calculated as the difference between the spatial averages of SST anomalies over two regions of the tropical Indian Ocean: (10°S-10°N and 50°E-70°E) and (10°S-0°S and 90°E-110°E).

Parameters
sst_anomxarray Dataset

Array of sst anomalies

sst_namestr, optional

The name of the sst variable in sst_anom

src.utils.calculate_sam(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat')#

Calculate the Southern Annular Mode index from monthly data as defined by Gong, D. and Wang, S., 1999. The SAM index is defined as the difference between the normalized monthly zonal mean sea level pressure at 40∘S and 65∘S.

Parameters
slpxarray Dataset

Array of sea level pressures

clim_perioditerable

Size 2 iterable containing strings indicating the start and end dates of the climatological period used to normalise the SAM index

groupby_dimstr

The dimension to compute the normalisation over

slp_namestr, optional

The name of the slp variable in the input slp Dataset

lon_dimstr, optional

The name of the longitude dimension

lat_dimstr, optional

The name of the latitude dimension

src.utils.calculate_nao(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat')#

Calculate the Northern Atlantic Oscillation index from monthly data as defined by Jianping, L. & Wang, J. X. L. (2003). The NAO index is defined as the difference between the normalized monthly mean sea level pressure at 35∘N and 65∘N, averaged over the zonal band spanning 80◦W–30◦E

Parameters
slpxarray Dataset

Array of sea level pressures

clim_perioditerable

Size 2 iterable containing strings indicating the start and end dates of the climatological period used to normalise the NAO index

groupby_dimstr

The dimension to compute the normalisation over

slp_namestr, optional

The name of the slp variable in the input slp Dataset

lon_dimstr, optional

The name of the longitude dimension

lat_dimstr, optional

The name of the latitude dimension

src.utils.calculate_amv(sst_anom, sst_name='sst')#

Calculate the Atlantic Multi-decadal Variability (AMV)–also known as the Atlantic Multi-decadal Oscillation (AMO)–according to Trenberth and Shea (2006). The AMV is calculated as the spatial average of SST anomalies over the North Atlantic (Equator–60∘ N and 80–0∘ W) minus the spatial average of SST anomalies averaged from 60∘ S to 60∘ N.

Note typically the SST anomalies are smoothed in time using a 10-year moving average (Goldenberg et al., 2001; Enfield et al., 2001), a low-pass filter (Trenberth and Shea 2006) or a 4-year temporal average (Bilbao at al., 2021).

Parameters
sst_anomxarray Dataset

Array of sst anomalies

sst_namestr, optional

The name of the sst variable in sst_anom

src.utils.calculate_ipo(sst_anom, sst_name='sst')#

Calculate the tripolar pacific index for the Interdecadal Pacific Oscillation (IPO) following Henley et al (2015). The IPO is calculated as the average of SST anomalies over the central equatorial Pacific (region 2: 10∘ S–10∘ N, 170∘ E–90∘ W) minus the average of the SST anomalies in the northwestern (region 1: 25–45∘ N, 140∘ E–145∘ W) and southwestern Pacific (region 3: 50–15∘ S, 150∘ E–160∘ W).

Note typically the IPO index is smoothed in time using a 13-year Chebyshev low-pass filter (Henley et al., 2015) or by first applying a 4-year temporal average to the sst anomalies (Bilbao at al., 2021).

src.utils.calculate_ohc300(temp, depth_dim='depth', temp_name='temp')#

Calculate the ocean heat content above 300m

The input DataArray or Dataset is assumed to be in Kelvin

Parameters
tempxarray Dataset

Array of temperature values in Kelvin

depth_dimstr, optional

The name of the depth dimension

temp_namestr, optional

The name of the temperature variable in temp

src.utils.calculate_wind_speed(u_v, u_name, v_name, lon_dim='lon', lat_dim='lat')#

Calculate the wind speed

Parameters
u_vxarray Dataset

Dataset containing the longitudinal and latitudinal components of the wind

u_namestr

The name of the u-velocity variable in u

v_namestr

The name of the v-velocity variable in v

lon_dimstr, optional

The name of the longitude dimension for u and v

lat_dimstr, optional

The name of the latitude dimension for u and v

src.utils.calculate_tmean_from_tmin_tmax(ds, tmin_name='tmin', tmax_name='tmax', tmean_name='tmean')#

Estimate tmean as the average of tmin and tmax

Parameters
dsxarray Dataset

Dataset containing tmin and tmax variables

tmin_namestr

The name of the tmin variable

tmax_namestr

The name of the tmax variable

tmean_namestr

The name of the output tmean variable

src.utils.calculate_ffdi(ds, clim_period, wind_from_components, precip_name='precip', rh_name='rh', tmax_name='t_ref_max', wmax_name='V_ref_max', u_name='u_ref', v_name='v_ref')#

Returns the McArthur Forest Fire Danger Index following the formula provided in Dowdy (2018): FFDI = D ** 0.987 * exp (0.0338 * T - 0.0345 * H + 0.0234 * W + 0.243147)

Parameters
dsxarray Dataset

Dataset containing the following variables - precip; Daily total precipitation [mm]. This is used to estimate the drought factor, D, as the 20-day accumulated rainfall scaled to lie between 0 and 10, with larger values indicating less precipitation (see Richardson et al. (2021) and Squire et al. (2021)). The drought factor is used as D in the above equation. - tmax; Daily max 2 m temperature [deg C]. This is used as T in the above equation. - rh; Daily max relative humidity at 2m [%] (or similar, depending on data availability). Richardson et al. (2021) uses mid-afternoon relative humidity at 2 m, Squire et al. (2021) uses daily mean relative humidity at 1000 hPa. This is used as H in the above equation. - wmax; Daily max 10 m wind speed [km/h] (or similar, depending on data availability). Squire et al. (2021) uses daily mean wind speed. This is used as W in the above equation.

clim_perioditerable

Size 2 iterable containing strings indicating the start and end dates of the climatological period used to calculate the drought factor

wind_from_componentsboolean

Whether to calculate the wmax estimate from provided individual components of wind or whether to use a provide max estimate. If True, variables with names matching those provided as parameters ‘u_name’ and ‘v_name’ must exist in ds. If False, uses for wmax the variable name provided as the wmax_name parameter.

precip_namestr, optional

The name of the precip variable

rh_namestr, optional

The name of the rh variable

tmax_namestr, optional

The name of the tmax variable

wmax_namestr, optional

The name of the wmax variable. This is only used if wind_from_components=False Otherwise an estimate of wmax is calculated from the variables u_name and v_name

u_namestr, optional

The name of the u-component of wind variable to use to estimate wmax when wind_from_components=True. Not used if wind_from_components=False.

v_namestr, optional

The name of the v-component of wind variable to use to estimate wmax when wind_from_components=True. Not used if wind_from_components=False.

References

Dowdy, A. J. (2018). “Climatological Variability of Fire Weather in Australia”. Journal of Applied Meteorology and Climatology 57.2, pp. 221–234. issn: 1558-8424. doi: 10.1175/JAMC-D-17-0167.1.

src.utils.calculate_EHF(T, T_p95_file=None, T_p95_period=None, T_p95_dim=None, rolling_dim='time', T_name='t_ref')#

Calculate the Excess Heat Factor (EHF) index, defined as:

EHF = max(0, EHI_sig) * max(1, EHI_accl)

with

EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95 EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + … + T_i–30) / 30

T is the daily mean temperature (commonly calculated as the mean of the min and max daily temperatures, usually with daily maximum typically preceding the daily minimum, and the two observations relate to the same 9am-to-9am 24-h period) and T_p95 is the 95th percentile of T using all days in the year.

Parameters
Txarray DataArray

Array of daily mean temperature

T_p95_filexarray DataArray, optional

Path to a file with the 95th percentiles of T using all days in the year. This should be relative to the project directory. If not provided, T_p95_period and T_p95_dim must be provided

T_p95_periodlist of str, optional

Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate T_p95. Only used if T_p95 is None

T_p95_dimstr or list of str, optional

The dimension(s) over which to calculate T_p95. Only used if T_p95 is None

rolling_dimstr, optional

The dimension over which to compute the rolling averages in the definition of EHF

T_namestr, optional

The name of the temperature variable in T

References
———-
Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227
src.utils.calculate_EHF_severity(T, T_p95_file=None, EHF_p85_file=None, T_p95_period=None, T_p95_dim=None, EHF_p85_period=None, EHF_p85_dim=None, rolling_dim='time', T_name='t_ref')#

Calculate the severity of the Excess Heat Factor index, defined as:

EHF_severity = EHF / EHF_p85

where “_p85” denotes the 85th percentile of all positive values using all days in the year and the Excess Heat Factor (EHF) is defined as:

EHF = max(0, EHI_sig) * max(1, EHI_accl)

with

EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95 EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + … + T_i–30) / 30

T is the daily mean temperature (commonly calculated as the mean of the min and max daily temperatures, usually with daily maximum typically preceding the daily minimum, and the two observations relate to the same 9am-to-9am 24-h period) and T_p95 is the 95th percentile of T using all days in the year.

Parameters
Txarray DataArray

Array of daily mean temperature

T_p95_filexarray DataArray, optional

Path to a file with the 95th percentiles of T using all days in the year. This should be relative to the project directory. If not provided, T_p95_period and T_p95_dim must be provided

EHF_p85_filexarray DataArray, optional

Path to a file with the 85th percentiles of positive EHF using all days in the year. This should be relative to the project directory. If not provided, EHF_p85_period and EHF_p85_dim must be provided

T_p95_periodlist of str, optional

Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate T_p95. Only used if T_p95 is None

T_p95_dimstr or list of str, optional

The dimension(s) over which to calculate T_p95. Only used if T_p95 is None

EHF_p85_periodlist of str, optional

Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate EHF_p85. Only used if EHF_p85 is None

EHF_p85_dimstr or list of str, optional

The dimension(s) over which to calculate EHF_p85. Only used if EHF_p85 is None

rolling_dimstr, optional

The dimension over which to compute the rolling averages in the definition of EHF

T_namestr, optional

The name of the temperature variable in T

References

Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227

src.utils.ensemble_mean(ds, ensemble_dim='member')#

Return the ensemble mean of the input array

Parameters
dsxarray Dataset

Array to take the ensemble mean of

ensemble_dimstr, optional

The name of the ensemble dimension

src.utils.greater_than(ds, value)#

Return a boolean array with True where elements > value

Parameters
ds: xarray Dataset

The array to mask

value: float, xarray Dataset

The value(s) to use to mask ds

src.utils.where_greater_than(ds, value)#

Return array with elements <= value masked to nan

Parameters
ds: xarray Dataset

The array to mask

value: float, xarray Dataset

The value(s) to use to mask ds

src.utils.add_CAFE_grid_info(ds)#

Add CAFE grid info to a CAFE dataset that doesn’t already have it

Parameters
dsxarray Dataset

The dataset to add grid info to

src.utils.normalise_by_days_in_month(ds)#

Normalise input array by the number of days in each month

Parameters
dsxarray Dataset

The array to normalise

src.utils.convert_time_to_lead(ds, time_dim='time', time_freq=None, init_dim='init', lead_dim='lead')#

Return provided array with time dimension converted to lead time dimension and time added as additional coordinate

Parameters
dsxarray Dataset

A dataset with a time dimension

time_dimstr, optional

The name of the time dimension

time_freqstr, optional

The frequency of the time dimension. If not provided, will try to use xr.infer_freq to determine the frequency. This is only used to add a freq attr to the lead time coordinate

init_dimstr, optional

The name of the initial date dimension in the output

lead_dimstr, optional

The name of the lead time dimension in the output

src.utils.truncate_latitudes(ds, dp=10, lat_dim='lat')#

Return provided array with latitudes truncated to specified dp.

This is necessary due to precision differences from running forecasts on different systems

Parameters
dsxarray Dataset

A dataset with a latitude dimension

dpint, optional

The number of decimal places to truncate at

lat_dimstr, optional

The name of the latitude dimension

src.utils.convert_calendar(ds, calendar, time_dim='time')#

Convert calendar, dropping invalid/surplus dates or inserting missing dates

Parameters
dsxarray Dataset

A dataset with a time dimension

time_dimstr, optional

The name of the time dimension

src.utils.rechunk(ds, **chunks)#

Rechunk a dataset

Parameters
dsxarray Dataset

A dataset to be rechunked

chunksdict

Dictionary of {dim: chunksize}

src.utils.select(ds, **selection)#

Returns a new dataset with each array indexed by tick labels along the specified dimension(s)

Parameters
dsxarray Dataset

A dataset to select from

selectiondict

A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels

src.utils.add_attrs(ds, attrs, variable=None)#

Add attributes to a dataset

Parameters
dsxarray Dataset

The data to add attributes to

attrsdict

The attributes to add

variablestr, optional

The name of the variable or coordinate to add the attributes to. If None, the attributes will be added as global attributes

src.utils.rename(ds, **names)#

Rename all variables etc that have an entry in names

Parameters
dsxarray Dataset

A dataset to be renamed

namesdict

Dictionary of {old_name: new_name}

src.utils.convert(ds, **conversion)#

Convert variables in a dataset according to provided dictionary

Parameters
dsxarray Dataset

A dataset to be converted

conversiondict

Dictionary of {variable: oper} where oper is a dictionary specifying the operation and the value. Current possible operations are ‘multiply_by’ and ‘add’.

src.utils.keep_period(ds, period)#

Keep only times outside of a specified period

Parameters
dsxarray Dataset

The data to mask

perioditerable

Size 2 iterable containing strings indicating the start and end dates of the period to retain

src.utils._get_groupby_and_reduce_dims(ds, frequency)#

Get the groupby and reduction dimensions for performing operations like calculating anomalies and percentile thresholds

src.utils.anomalise(ds, clim_period, frequency=None)#

Returns the anomalies of ds relative to its climatology over clim_period.

Uses a shortcut for calculating hindcast climatologies that will not work for hindcasts with initialisation frequencies more regular than monthly.

Parameters
dsxarray Dataset

The data to anomalise

clim_perioditerable

Size 2 iterable containing strings indicating the start and end dates of the climatological period

frequencystr, optional

The frequency at which to bin the climatology, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (climatology calculated by averaging all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.calculate_percentile_thresholds(ds, percentile, percentile_period, percentile_dim=None, frequency=None)#

Returns the percentile values of ds over a provided period.

Parameters
dsxarray Dataset

The data to calculate the percentiles

percentilefloat

The percentile to calculate

percentile_perioditerable

Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds

percentile_dimstr or list of str, optional

The dimension(s) over which to compute the percentile thresholds. If None, these will determined automatically based on the type of input data: - timeseries : percentile_dim = “time” - forecasts : percentile_dim = “init” [, “member”]

frequencystr, optional

The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (percentiles calculated over all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.over_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None)#

Find which values in the input array are over a specified percentile calculated over a specified period. Returns a boolean array with True where values are over the specified percentile and False elsewhere.

Parameters
dsxarray Dataset

The data threshold based in it’s percentiles

percentilefloat

The percentile use to threshold the data

percentile_perioditerable

Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds

frequencystr, optional

The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (percentiles calculated over all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.under_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None)#

Find which values in the input array are under a specified percentile calculated over a specified period. Returns a boolean array with True where values are under the specified percentile and False elsewhere.

Parameters
dsxarray Dataset

The data threshold based in it’s percentiles

percentilefloat

The percentile use to threshold the data

percentile_perioditerable

Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds

frequencystr, optional

The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (percentiles calculated over all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.correct_bias(ds, obsv_file, period, frequency, method)#

Correct the mean bias of ds relative to observations over a provided period

Will not work for hindcasts with initialisation frequencies more regular than monthly.

Parameters
dsxarray Dataset

The hindcast data to correct

obsv_filestr

Path to a file with the appropriate observation data to correct to. This should be relative to the project directory

perioditerable

Size 2 iterable containing strings indicating period over which to calculate the biases

frequencystr

The frequency at which to bin the biases, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (climatology calculated by averaging all times). Note, setting to “None” can be dangerous, since only certain times may be available at each lead and there is no check that the same times are available between the observations and forecasts.

methodstr

The method to use to correct the mean bias. Options are: - “additive”: the difference between the ds and obsv climatology is

subtracted from ds

  • “multiplicative”: ds is divided by the ratio of the ds and obsv

    climatologies

src.utils.interpolate_to_grid_from_file(ds, file, add_area=True, ignore_degenerate=True)#
src.utils.round_to_start_of_day(ds, dim)#

Return provided array with specified time dimension rounded to the start of the day

Parameters
dsxarray Dataset

The dataset with a dimension(s) to round

dimstr

The name of the dimensions to round

src.utils.round_to_start_of_month(ds, dim)#

Return provided array with specified time dimension rounded to the start of the month

Parameters
dsxarray Dataset

The dataset with a dimension(s) to round

dimstr

The name of the dimensions to round

src.utils.coarsen(ds, window_size, start_points=None, dim='time')#

Coarsen data, applying ‘max’ to all relevant coords and optionally starting at a particular time point in the array

Parameters
dsxarray Dataset

The dataset to coarsen

start_pointslist

Value(s) of coordinate dim to start the coarsening from. If these fall outside the range of the coordinate, coarsening starts at the beginning of the array

dimstr, optional

The name of the dimension to coarsen along

src.utils.rolling_mean(ds, window_size, start_points=None, dim='time')#

Apply a rolling mean to the data, applying ‘max’ to all relevant coords and optionally starting at a particular time point in the array

Parameters
dsxarray Dataset

The dataset to apply the rolling mean to

start_pointsstr or list of str

Value(s) of coordinate dim to start the coarsening from. If these fall outside the range of the coordinate, coarsening starts at the beginning of the array

dimstr, optional

The name of the dimension to coarsen along

src.utils.resample(ds, freq, start_points=None, min_samples=None, dim='time')#

Resample data to a different temporal frequency by taking the mean over all values at the downsampled frequency and optionally starting at a particular time point in the array

Parameters
dsxarray Dataset

The dataset to resample

freqstr

Resample frequency expressed using pandas offset alias

start_pointsstr or list of str

Value(s) of coordinate dim to start the resampling from. If these fall outside the range of the coordinate, resampling starts at the beginning of the array

min_samplesint, optional

The minimum number of samples that must occur within a resampled group. If there are less samples a nan will be assigned.

dimstr, optional

The name of the time dimension to resample along

src.utils.get_region_masks_from_shp(ds, shapefile, header)#

Extract region masks according to a shapefile

Parameters
dsxarray Dataset

The array with the grid to build the masks for

shapefilestr

The path to the shapefile to use

headerstr

Name of the shapefile column to use to name the regions

src.utils.average_over_NRM_super_clusters(ds)#

Average the provided array over the NRM super cluster regions

Parameters
dsxarray Dataset

The array to average over the NRM super cluster regions

src.utils.mask_CAFEf6_reduced_dt(ds)#

Mask out the ensemble members of CAFE-f6 that were run with a reduced timestep since reducing the timestep was found to produce a different model equilibrium

Parameters
dsxarray Dataset

The CAFE-f6 data to mask

src.utils.gridarea_cdo(ds)#

Returns the area weights computed using cdo’s gridarea function Note, this function writes ds to disk, so strip back ds to only what is needed

Parameters
dsxarray Dataset

The dataset to passed to cdo gridarea

src.utils.add_area_using_cdo_gridarea(ds, lon_dim='lon', lat_dim='lat')#

Add a area coordinate to the provided dataset containing the cell areas estimated by cdo’s gridarea function

Parameters
dsxarray Dataset

The data to use to estimate the cell areas

lon_dimstr, optional

The name of the longitude dimension on ds

lat_dimstr, optional

The name of the latitude dimension on ds

src.utils.max_chunk_size_MB(ds)#

Get the max chunk size in a dataset