`src.utils`#

Module Contents#

Functions#

`load_config`(name)	Load a config .yml file for a specified dataset
`composite_function`(function_dict)	Return a composite function of all functions and kwargs specified in a
`extract_lon_lat_box`(ds, box, weighted_average[, ...])	Return a region specified by a range of longitudes and latitudes.
`calculate_nino34`(sst_anom[, sst_name])	Calculate the NINO3.4 index. The NINO3.4 index is calculated as the spatial average
`calculate_dmi`(sst_anom[, sst_name])	Calculate the Dipole Mode Index (DMI) for the Indian Ocean Dipole. The DMI is
`calculate_sam`(slp, clim_period[, groupby_dim, ...])	Calculate the Southern Annular Mode index from monthly data as defined by Gong, D.
`calculate_nao`(slp, clim_period[, groupby_dim, ...])	Calculate the Northern Atlantic Oscillation index from monthly data as defined by
`calculate_amv`(sst_anom[, sst_name])	Calculate the Atlantic Multi-decadal Variability (AMV)--also known as the Atlantic
`calculate_ipo`(sst_anom[, sst_name])	Calculate the tripolar pacific index for the Interdecadal Pacific Oscillation (IPO)
`calculate_ohc300`(temp[, depth_dim, temp_name])	Calculate the ocean heat content above 300m
`calculate_wind_speed`(u_v, u_name, v_name[, lon_dim, ...])	Calculate the wind speed
`calculate_tmean_from_tmin_tmax`(ds[, tmin_name, ...])	Estimate tmean as the average of tmin and tmax
`calculate_ffdi`(ds, clim_period, wind_from_components)	Returns the McArthur Forest Fire Danger Index following the formula provided
`calculate_EHF`(T[, T_p95_file, T_p95_period, ...])	Calculate the Excess Heat Factor (EHF) index, defined as:
`calculate_EHF_severity`(T[, T_p95_file, EHF_p85_file, ...])	Calculate the severity of the Excess Heat Factor index, defined as:
`ensemble_mean`(ds[, ensemble_dim])	Return the ensemble mean of the input array
`greater_than`(ds, value)	Return a boolean array with True where elements > value
`where_greater_than`(ds, value)	Return array with elements <= value masked to nan
`add_CAFE_grid_info`(ds)	Add CAFE grid info to a CAFE dataset that doesn't already have it
`normalise_by_days_in_month`(ds)	Normalise input array by the number of days in each month
`convert_time_to_lead`(ds[, time_dim, time_freq, ...])	Return provided array with time dimension converted to lead time dimension
`truncate_latitudes`(ds[, dp, lat_dim])	Return provided array with latitudes truncated to specified dp.
`convert_calendar`(ds, calendar[, time_dim])	Convert calendar, dropping invalid/surplus dates or inserting missing dates
`rechunk`(ds, **chunks)	Rechunk a dataset
`select`(ds, **selection)	Returns a new dataset with each array indexed by tick labels along the
`add_attrs`(ds, attrs[, variable])	Add attributes to a dataset
`rename`(ds, **names)	Rename all variables etc that have an entry in names
`convert`(ds, **conversion)	Convert variables in a dataset according to provided dictionary
`keep_period`(ds, period)	Keep only times outside of a specified period
`_get_groupby_and_reduce_dims`(ds, frequency)	Get the groupby and reduction dimensions for performing operations like
`anomalise`(ds, clim_period[, frequency])	Returns the anomalies of ds relative to its climatology over clim_period.
`calculate_percentile_thresholds`(ds, percentile, ...[, ...])	Returns the percentile values of ds over a provided period.
`over_percentile_threshold`(ds, percentile, ...[, ...])	Find which values in the input array are over a specified percentile
`under_percentile_threshold`(ds, percentile, ...[, ...])	Find which values in the input array are under a specified percentile
`correct_bias`(ds, obsv_file, period, frequency, method)	Correct the mean bias of ds relative to observations over a provided period
`interpolate_to_grid_from_file`(ds, file[, add_area, ...])
`round_to_start_of_day`(ds, dim)	Return provided array with specified time dimension rounded to the start of
`round_to_start_of_month`(ds, dim)	Return provided array with specified time dimension rounded to the start of
`coarsen`(ds, window_size[, start_points, dim])	Coarsen data, applying 'max' to all relevant coords and optionally starting
`rolling_mean`(ds, window_size[, start_points, dim])	Apply a rolling mean to the data, applying 'max' to all relevant coords and
`resample`(ds, freq[, start_points, min_samples, dim])	Resample data to a different temporal frequency by taking the mean
`get_region_masks_from_shp`(ds, shapefile, header)	Extract region masks according to a shapefile
`average_over_NRM_super_clusters`(ds)	Average the provided array over the NRM super cluster regions
`mask_CAFEf6_reduced_dt`(ds)	Mask out the ensemble members of CAFE-f6 that were run with a reduced timestep
`gridarea_cdo`(ds)	Returns the area weights computed using cdo's gridarea function
`add_area_using_cdo_gridarea`(ds[, lon_dim, lat_dim])	Add a area coordinate to the provided dataset containing the cell areas
`max_chunk_size_MB`(ds)	Get the max chunk size in a dataset

Attributes#

PROJECT_DIR

src.utils.PROJECT_DIR#

src.utils.load_config(name)#

Load a config .yml file for a specified dataset

Parameters

namestr: The path to the config file to load

src.utils.composite_function(function_dict)#

Return a composite function of all functions and kwargs specified in a provided dictionary

Parameters

function_dictdict: Dictionary with functions in this module to composite as keys and kwargs as values

src.utils.extract_lon_lat_box(ds, box, weighted_average, lon_dim='lon', lat_dim='lat')#

Return a region specified by a range of longitudes and latitudes.

Parameters

dsxarray Dataset or DataArray: The data to subset and average. Assumed to include an “area” Variable
boxiterable: Iterable with the following elements in this order: [lon_lower, lon_upper, lat_lower, lat_upper] where longitudes are specified between 0 and 360 deg E and latitudes are specified between -90 and 90 deg N
weighted_averageboolean: If True, reture the area weighted average over the region, otherwise return the region
lon_dimstr, optional: The name of the longitude dimension
lat_dimstr, optional: The name of the latitude dimension

src.utils.calculate_nino34(sst_anom, sst_name='sst')#

Calculate the NINO3.4 index. The NINO3.4 index is calculated as the spatial average of SST anomalies over the tropical Pacific region (5∘S–5∘N and 170–120∘ W).

Parameters

sst_anomxarray Dataset: Array of sst anomalies
sst_namestr, optional: The name of the sst variable in sst_anom

src.utils.calculate_dmi(sst_anom, sst_name='sst')#

Calculate the Dipole Mode Index (DMI) for the Indian Ocean Dipole. The DMI is calculated as the difference between the spatial averages of SST anomalies over two regions of the tropical Indian Ocean: (10°S-10°N and 50°E-70°E) and (10°S-0°S and 90°E-110°E).

Parameters

sst_anomxarray Dataset: Array of sst anomalies
sst_namestr, optional: The name of the sst variable in sst_anom

src.utils.calculate_sam(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat')#

Calculate the Southern Annular Mode index from monthly data as defined by Gong, D. and Wang, S., 1999. The SAM index is defined as the difference between the normalized monthly zonal mean sea level pressure at 40∘S and 65∘S.

Parameters

slpxarray Dataset: Array of sea level pressures
clim_perioditerable: Size 2 iterable containing strings indicating the start and end dates of the climatological period used to normalise the SAM index
groupby_dimstr: The dimension to compute the normalisation over
slp_namestr, optional: The name of the slp variable in the input slp Dataset
lon_dimstr, optional: The name of the longitude dimension
lat_dimstr, optional: The name of the latitude dimension

src.utils.calculate_nao(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat')#

Calculate the Northern Atlantic Oscillation index from monthly data as defined by Jianping, L. & Wang, J. X. L. (2003). The NAO index is defined as the difference between the normalized monthly mean sea level pressure at 35∘N and 65∘N, averaged over the zonal band spanning 80◦W–30◦E

Parameters

slpxarray Dataset: Array of sea level pressures
clim_perioditerable: Size 2 iterable containing strings indicating the start and end dates of the climatological period used to normalise the NAO index
groupby_dimstr: The dimension to compute the normalisation over
slp_namestr, optional: The name of the slp variable in the input slp Dataset
lon_dimstr, optional: The name of the longitude dimension
lat_dimstr, optional: The name of the latitude dimension

src.utils.calculate_amv(sst_anom, sst_name='sst')#

Calculate the Atlantic Multi-decadal Variability (AMV)–also known as the Atlantic Multi-decadal Oscillation (AMO)–according to Trenberth and Shea (2006). The AMV is calculated as the spatial average of SST anomalies over the North Atlantic (Equator–60∘ N and 80–0∘ W) minus the spatial average of SST anomalies averaged from 60∘ S to 60∘ N.

Note typically the SST anomalies are smoothed in time using a 10-year moving average (Goldenberg et al., 2001; Enfield et al., 2001), a low-pass filter (Trenberth and Shea 2006) or a 4-year temporal average (Bilbao at al., 2021).

Parameters

sst_anomxarray Dataset: Array of sst anomalies
sst_namestr, optional: The name of the sst variable in sst_anom

src.utils.calculate_ipo(sst_anom, sst_name='sst')#

Calculate the tripolar pacific index for the Interdecadal Pacific Oscillation (IPO) following Henley et al (2015). The IPO is calculated as the average of SST anomalies over the central equatorial Pacific (region 2: 10∘ S–10∘ N, 170∘ E–90∘ W) minus the average of the SST anomalies in the northwestern (region 1: 25–45∘ N, 140∘ E–145∘ W) and southwestern Pacific (region 3: 50–15∘ S, 150∘ E–160∘ W).

Note typically the IPO index is smoothed in time using a 13-year Chebyshev low-pass filter (Henley et al., 2015) or by first applying a 4-year temporal average to the sst anomalies (Bilbao at al., 2021).

src.utils.calculate_ohc300(temp, depth_dim='depth', temp_name='temp')#

Calculate the ocean heat content above 300m

The input DataArray or Dataset is assumed to be in Kelvin

Parameters

tempxarray Dataset: Array of temperature values in Kelvin
depth_dimstr, optional: The name of the depth dimension
temp_namestr, optional: The name of the temperature variable in temp

src.utils.calculate_wind_speed(u_v, u_name, v_name, lon_dim='lon', lat_dim='lat')#

Calculate the wind speed

Parameters

u_vxarray Dataset: Dataset containing the longitudinal and latitudinal components of the wind
u_namestr: The name of the u-velocity variable in u
v_namestr: The name of the v-velocity variable in v
lon_dimstr, optional: The name of the longitude dimension for u and v
lat_dimstr, optional: The name of the latitude dimension for u and v

src.utils.calculate_tmean_from_tmin_tmax(ds, tmin_name='tmin', tmax_name='tmax', tmean_name='tmean')#

Estimate tmean as the average of tmin and tmax

Parameters

dsxarray Dataset: Dataset containing tmin and tmax variables
tmin_namestr: The name of the tmin variable
tmax_namestr: The name of the tmax variable
tmean_namestr: The name of the output tmean variable

src.utils.calculate_ffdi(ds, clim_period, wind_from_components, precip_name='precip', rh_name='rh', tmax_name='t_ref_max', wmax_name='V_ref_max', u_name='u_ref', v_name='v_ref')#

Returns the McArthur Forest Fire Danger Index following the formula provided in Dowdy (2018): FFDI = D ** 0.987 * exp (0.0338 * T - 0.0345 * H + 0.0234 * W + 0.243147)

Parameters

dsxarray Dataset: Dataset containing the following variables - precip; Daily total precipitation [mm]. This is used to estimate the drought factor, D, as the 20-day accumulated rainfall scaled to lie between 0 and 10, with larger values indicating less precipitation (see Richardson et al. (2021) and Squire et al. (2021)). The drought factor is used as D in the above equation. - tmax; Daily max 2 m temperature [deg C]. This is used as T in the above equation. - rh; Daily max relative humidity at 2m [%] (or similar, depending on data availability). Richardson et al. (2021) uses mid-afternoon relative humidity at 2 m, Squire et al. (2021) uses daily mean relative humidity at 1000 hPa. This is used as H in the above equation. - wmax; Daily max 10 m wind speed [km/h] (or similar, depending on data availability). Squire et al. (2021) uses daily mean wind speed. This is used as W in the above equation.
clim_perioditerable: Size 2 iterable containing strings indicating the start and end dates of the climatological period used to calculate the drought factor
wind_from_componentsboolean: Whether to calculate the wmax estimate from provided individual components of wind or whether to use a provide max estimate. If True, variables with names matching those provided as parameters ‘u_name’ and ‘v_name’ must exist in ds. If False, uses for wmax the variable name provided as the wmax_name parameter.
precip_namestr, optional: The name of the precip variable
rh_namestr, optional: The name of the rh variable
tmax_namestr, optional: The name of the tmax variable
wmax_namestr, optional: The name of the wmax variable. This is only used if wind_from_components=False Otherwise an estimate of wmax is calculated from the variables u_name and v_name
u_namestr, optional: The name of the u-component of wind variable to use to estimate wmax when wind_from_components=True. Not used if wind_from_components=False.
v_namestr, optional: The name of the v-component of wind variable to use to estimate wmax when wind_from_components=True. Not used if wind_from_components=False.

References

Dowdy, A. J. (2018). “Climatological Variability of Fire Weather in Australia”. Journal of Applied Meteorology and Climatology 57.2, pp. 221–234. issn: 1558-8424. doi: 10.1175/JAMC-D-17-0167.1.

src.utils.calculate_EHF(T, T_p95_file=None, T_p95_period=None, T_p95_dim=None, rolling_dim='time', T_name='t_ref')#

Calculate the Excess Heat Factor (EHF) index, defined as:

EHF = max(0, EHI_sig) * max(1, EHI_accl)

with

EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95 EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + … + T_i–30) / 30

T is the daily mean temperature (commonly calculated as the mean of the min and max daily temperatures, usually with daily maximum typically preceding the daily minimum, and the two observations relate to the same 9am-to-9am 24-h period) and T_p95 is the 95th percentile of T using all days in the year.

Parameters

Txarray DataArray: Array of daily mean temperature
T_p95_filexarray DataArray, optional: Path to a file with the 95th percentiles of T using all days in the year. This should be relative to the project directory. If not provided, T_p95_period and T_p95_dim must be provided
T_p95_periodlist of str, optional: Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate T_p95. Only used if T_p95 is None
T_p95_dimstr or list of str, optional: The dimension(s) over which to calculate T_p95. Only used if T_p95 is None
rolling_dimstr, optional: The dimension over which to compute the rolling averages in the definition of EHF
T_namestr, optional: The name of the temperature variable in T
References
———-
Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227

src.utils.calculate_EHF_severity(T, T_p95_file=None, EHF_p85_file=None, T_p95_period=None, T_p95_dim=None, EHF_p85_period=None, EHF_p85_dim=None, rolling_dim='time', T_name='t_ref')#

Calculate the severity of the Excess Heat Factor index, defined as:

EHF_severity = EHF / EHF_p85

where “_p85” denotes the 85th percentile of all positive values using all days in the year and the Excess Heat Factor (EHF) is defined as:

EHF = max(0, EHI_sig) * max(1, EHI_accl)

with

EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95 EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + … + T_i–30) / 30

Parameters

Txarray DataArray: Array of daily mean temperature
T_p95_filexarray DataArray, optional: Path to a file with the 95th percentiles of T using all days in the year. This should be relative to the project directory. If not provided, T_p95_period and T_p95_dim must be provided
EHF_p85_filexarray DataArray, optional: Path to a file with the 85th percentiles of positive EHF using all days in the year. This should be relative to the project directory. If not provided, EHF_p85_period and EHF_p85_dim must be provided
T_p95_periodlist of str, optional: Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate T_p95. Only used if T_p95 is None
T_p95_dimstr or list of str, optional: The dimension(s) over which to calculate T_p95. Only used if T_p95 is None
EHF_p85_periodlist of str, optional: Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate EHF_p85. Only used if EHF_p85 is None
EHF_p85_dimstr or list of str, optional: The dimension(s) over which to calculate EHF_p85. Only used if EHF_p85 is None
rolling_dimstr, optional: The dimension over which to compute the rolling averages in the definition of EHF
T_namestr, optional: The name of the temperature variable in T

References

Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227

src.utils.ensemble_mean(ds, ensemble_dim='member')#

Return the ensemble mean of the input array

Parameters

dsxarray Dataset: Array to take the ensemble mean of
ensemble_dimstr, optional: The name of the ensemble dimension

src.utils.greater_than(ds, value)#

Return a boolean array with True where elements > value

Parameters

ds: xarray Dataset: The array to mask
value: float, xarray Dataset: The value(s) to use to mask ds

src.utils.where_greater_than(ds, value)#

Return array with elements <= value masked to nan

Parameters

ds: xarray Dataset: The array to mask
value: float, xarray Dataset: The value(s) to use to mask ds

src.utils.add_CAFE_grid_info(ds)#

Add CAFE grid info to a CAFE dataset that doesn’t already have it

Parameters

dsxarray Dataset: The dataset to add grid info to

src.utils.normalise_by_days_in_month(ds)#

Normalise input array by the number of days in each month

Parameters

dsxarray Dataset: The array to normalise

src.utils.convert_time_to_lead(ds, time_dim='time', time_freq=None, init_dim='init', lead_dim='lead')#

Return provided array with time dimension converted to lead time dimension and time added as additional coordinate

Parameters

dsxarray Dataset: A dataset with a time dimension
time_dimstr, optional: The name of the time dimension
time_freqstr, optional: The frequency of the time dimension. If not provided, will try to use xr.infer_freq to determine the frequency. This is only used to add a freq attr to the lead time coordinate
init_dimstr, optional: The name of the initial date dimension in the output
lead_dimstr, optional: The name of the lead time dimension in the output

src.utils.truncate_latitudes(ds, dp=10, lat_dim='lat')#

Return provided array with latitudes truncated to specified dp.

This is necessary due to precision differences from running forecasts on different systems

Parameters

dsxarray Dataset: A dataset with a latitude dimension
dpint, optional: The number of decimal places to truncate at
lat_dimstr, optional: The name of the latitude dimension

src.utils.convert_calendar(ds, calendar, time_dim='time')#

Convert calendar, dropping invalid/surplus dates or inserting missing dates

Parameters

dsxarray Dataset: A dataset with a time dimension
time_dimstr, optional: The name of the time dimension

src.utils.rechunk(ds, **chunks)#

Rechunk a dataset

Parameters

dsxarray Dataset: A dataset to be rechunked
chunksdict: Dictionary of {dim: chunksize}

src.utils.select(ds, **selection)#

Returns a new dataset with each array indexed by tick labels along the specified dimension(s)

Parameters

dsxarray Dataset: A dataset to select from
selectiondict: A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels

src.utils.add_attrs(ds, attrs, variable=None)#

Add attributes to a dataset

Parameters

dsxarray Dataset: The data to add attributes to
attrsdict: The attributes to add
variablestr, optional: The name of the variable or coordinate to add the attributes to. If None, the attributes will be added as global attributes

src.utils.rename(ds, **names)#

Rename all variables etc that have an entry in names

Parameters

dsxarray Dataset: A dataset to be renamed
namesdict: Dictionary of {old_name: new_name}

src.utils.convert(ds, **conversion)#

Convert variables in a dataset according to provided dictionary

Parameters

dsxarray Dataset: A dataset to be converted
conversiondict: Dictionary of {variable: oper} where oper is a dictionary specifying the operation and the value. Current possible operations are ‘multiply_by’ and ‘add’.

src.utils.keep_period(ds, period)#

Keep only times outside of a specified period

Parameters

dsxarray Dataset: The data to mask
perioditerable: Size 2 iterable containing strings indicating the start and end dates of the period to retain

src.utils._get_groupby_and_reduce_dims(ds, frequency)#: Get the groupby and reduction dimensions for performing operations like calculating anomalies and percentile thresholds

src.utils.anomalise(ds, clim_period, frequency=None)#

Returns the anomalies of ds relative to its climatology over clim_period.

Uses a shortcut for calculating hindcast climatologies that will not work for hindcasts with initialisation frequencies more regular than monthly.

Parameters

dsxarray Dataset: The data to anomalise
clim_perioditerable: Size 2 iterable containing strings indicating the start and end dates of the climatological period
frequencystr, optional: The frequency at which to bin the climatology, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (climatology calculated by averaging all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.calculate_percentile_thresholds(ds, percentile, percentile_period, percentile_dim=None, frequency=None)#

Returns the percentile values of ds over a provided period.

Parameters

dsxarray Dataset: The data to calculate the percentiles
percentilefloat: The percentile to calculate
percentile_perioditerable: Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds
percentile_dimstr or list of str, optional: The dimension(s) over which to compute the percentile thresholds. If None, these will determined automatically based on the type of input data: - timeseries : percentile_dim = “time” - forecasts : percentile_dim = “init” [, “member”]
frequencystr, optional: The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (percentiles calculated over all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.over_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None)#

Find which values in the input array are over a specified percentile calculated over a specified period. Returns a boolean array with True where values are over the specified percentile and False elsewhere.

Parameters

dsxarray Dataset: The data threshold based in it’s percentiles
percentilefloat: The percentile use to threshold the data
percentile_perioditerable: Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds
frequencystr, optional: The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (percentiles calculated over all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.under_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None)#

Find which values in the input array are under a specified percentile calculated over a specified period. Returns a boolean array with True where values are under the specified percentile and False elsewhere.

Parameters

dsxarray Dataset: The data threshold based in it’s percentiles
percentilefloat: The percentile use to threshold the data
percentile_perioditerable: Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds
frequencystr, optional: The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (percentiles calculated over all times). Note, setting to “None” for hindcast data can be dangerous, since only certain times may be available at each lead.

src.utils.correct_bias(ds, obsv_file, period, frequency, method)#

Correct the mean bias of ds relative to observations over a provided period

Will not work for hindcasts with initialisation frequencies more regular than monthly.

Parameters

dsxarray Dataset

The hindcast data to correct

obsv_filestr

Path to a file with the appropriate observation data to correct to. This should be relative to the project directory

perioditerable

Size 2 iterable containing strings indicating period over which to calculate the biases

frequencystr

The frequency at which to bin the biases, e.g. per month. Must be an available attribute of the datetime accessor. Specify “None” to indicate no frequency (climatology calculated by averaging all times). Note, setting to “None” can be dangerous, since only certain times may be available at each lead and there is no check that the same times are available between the observations and forecasts.

methodstr

The method to use to correct the mean bias. Options are: - “additive”: the difference between the ds and obsv climatology is

subtracted from ds

“multiplicative”: ds is divided by the ratio of the ds and obsv
climatologies

src.utils.interpolate_to_grid_from_file(ds, file, add_area=True, ignore_degenerate=True)#

src.utils.round_to_start_of_day(ds, dim)#

Return provided array with specified time dimension rounded to the start of the day

Parameters

dsxarray Dataset: The dataset with a dimension(s) to round
dimstr: The name of the dimensions to round

src.utils.round_to_start_of_month(ds, dim)#

Return provided array with specified time dimension rounded to the start of the month

Parameters

dsxarray Dataset: The dataset with a dimension(s) to round
dimstr: The name of the dimensions to round

src.utils.coarsen(ds, window_size, start_points=None, dim='time')#

Coarsen data, applying ‘max’ to all relevant coords and optionally starting at a particular time point in the array

Parameters

dsxarray Dataset: The dataset to coarsen
start_pointslist: Value(s) of coordinate dim to start the coarsening from. If these fall outside the range of the coordinate, coarsening starts at the beginning of the array
dimstr, optional: The name of the dimension to coarsen along

src.utils.rolling_mean(ds, window_size, start_points=None, dim='time')#

Apply a rolling mean to the data, applying ‘max’ to all relevant coords and optionally starting at a particular time point in the array

Parameters

dsxarray Dataset: The dataset to apply the rolling mean to
start_pointsstr or list of str: Value(s) of coordinate dim to start the coarsening from. If these fall outside the range of the coordinate, coarsening starts at the beginning of the array
dimstr, optional: The name of the dimension to coarsen along

src.utils.resample(ds, freq, start_points=None, min_samples=None, dim='time')#

Resample data to a different temporal frequency by taking the mean over all values at the downsampled frequency and optionally starting at a particular time point in the array

Parameters

dsxarray Dataset: The dataset to resample
freqstr: Resample frequency expressed using pandas offset alias
start_pointsstr or list of str: Value(s) of coordinate dim to start the resampling from. If these fall outside the range of the coordinate, resampling starts at the beginning of the array
min_samplesint, optional: The minimum number of samples that must occur within a resampled group. If there are less samples a nan will be assigned.
dimstr, optional: The name of the time dimension to resample along

src.utils.get_region_masks_from_shp(ds, shapefile, header)#

Extract region masks according to a shapefile

Parameters

dsxarray Dataset: The array with the grid to build the masks for
shapefilestr: The path to the shapefile to use
headerstr: Name of the shapefile column to use to name the regions

src.utils.average_over_NRM_super_clusters(ds)#

Average the provided array over the NRM super cluster regions

Parameters

dsxarray Dataset: The array to average over the NRM super cluster regions

src.utils.mask_CAFEf6_reduced_dt(ds)#

Mask out the ensemble members of CAFE-f6 that were run with a reduced timestep since reducing the timestep was found to produce a different model equilibrium

Parameters

dsxarray Dataset: The CAFE-f6 data to mask

src.utils.gridarea_cdo(ds)#

Returns the area weights computed using cdo’s gridarea function Note, this function writes ds to disk, so strip back ds to only what is needed

Parameters

dsxarray Dataset: The dataset to passed to cdo gridarea

src.utils.add_area_using_cdo_gridarea(ds, lon_dim='lon', lat_dim='lat')#

Add a area coordinate to the provided dataset containing the cell areas estimated by cdo’s gridarea function

Parameters

dsxarray Dataset: The data to use to estimate the cell areas
lon_dimstr, optional: The name of the longitude dimension on ds
lat_dimstr, optional: The name of the latitude dimension on ds

src.utils.max_chunk_size_MB(ds)#: Get the max chunk size in a dataset

src.utils#

Module Contents#

Functions#

Attributes#

`src.utils`#