:py:mod:`src.utils` =================== .. py:module:: src.utils Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: src.utils.load_config src.utils.composite_function src.utils.extract_lon_lat_box src.utils.calculate_nino34 src.utils.calculate_dmi src.utils.calculate_sam src.utils.calculate_nao src.utils.calculate_amv src.utils.calculate_ipo src.utils.calculate_ohc300 src.utils.calculate_wind_speed src.utils.calculate_tmean_from_tmin_tmax src.utils.calculate_ffdi src.utils.calculate_EHF src.utils.calculate_EHF_severity src.utils.ensemble_mean src.utils.greater_than src.utils.where_greater_than src.utils.add_CAFE_grid_info src.utils.normalise_by_days_in_month src.utils.convert_time_to_lead src.utils.truncate_latitudes src.utils.convert_calendar src.utils.rechunk src.utils.select src.utils.add_attrs src.utils.rename src.utils.convert src.utils.keep_period src.utils._get_groupby_and_reduce_dims src.utils.anomalise src.utils.calculate_percentile_thresholds src.utils.over_percentile_threshold src.utils.under_percentile_threshold src.utils.correct_bias src.utils.interpolate_to_grid_from_file src.utils.round_to_start_of_day src.utils.round_to_start_of_month src.utils.coarsen src.utils.rolling_mean src.utils.resample src.utils.get_region_masks_from_shp src.utils.average_over_NRM_super_clusters src.utils.mask_CAFEf6_reduced_dt src.utils.gridarea_cdo src.utils.add_area_using_cdo_gridarea src.utils.max_chunk_size_MB Attributes ~~~~~~~~~~ .. autoapisummary:: src.utils.PROJECT_DIR .. py:data:: PROJECT_DIR .. py:function:: load_config(name) Load a config .yml file for a specified dataset :Parameters: **name** : str The path to the config file to load .. !! processed by numpydoc !! .. py:function:: composite_function(function_dict) Return a composite function of all functions and kwargs specified in a provided dictionary :Parameters: **function_dict** : dict Dictionary with functions in this module to composite as keys and kwargs as values .. !! processed by numpydoc !! .. py:function:: extract_lon_lat_box(ds, box, weighted_average, lon_dim='lon', lat_dim='lat') Return a region specified by a range of longitudes and latitudes. :Parameters: **ds** : xarray Dataset or DataArray The data to subset and average. Assumed to include an "area" Variable **box** : iterable Iterable with the following elements in this order: [lon_lower, lon_upper, lat_lower, lat_upper] where longitudes are specified between 0 and 360 deg E and latitudes are specified between -90 and 90 deg N **weighted_average** : boolean If True, reture the area weighted average over the region, otherwise return the region **lon_dim** : str, optional The name of the longitude dimension **lat_dim** : str, optional The name of the latitude dimension .. !! processed by numpydoc !! .. py:function:: calculate_nino34(sst_anom, sst_name='sst') Calculate the NINO3.4 index. The NINO3.4 index is calculated as the spatial average of SST anomalies over the tropical Pacific region (5∘S–5∘N and 170–120∘ W). :Parameters: **sst_anom** : xarray Dataset Array of sst anomalies **sst_name** : str, optional The name of the sst variable in sst_anom .. !! processed by numpydoc !! .. py:function:: calculate_dmi(sst_anom, sst_name='sst') Calculate the Dipole Mode Index (DMI) for the Indian Ocean Dipole. The DMI is calculated as the difference between the spatial averages of SST anomalies over two regions of the tropical Indian Ocean: (10°S-10°N and 50°E-70°E) and (10°S-0°S and 90°E-110°E). :Parameters: **sst_anom** : xarray Dataset Array of sst anomalies **sst_name** : str, optional The name of the sst variable in sst_anom .. !! processed by numpydoc !! .. py:function:: calculate_sam(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat') Calculate the Southern Annular Mode index from monthly data as defined by Gong, D. and Wang, S., 1999. The SAM index is defined as the difference between the normalized monthly zonal mean sea level pressure at 40∘S and 65∘S. :Parameters: **slp** : xarray Dataset Array of sea level pressures **clim_period** : iterable Size 2 iterable containing strings indicating the start and end dates of the climatological period used to normalise the SAM index **groupby_dim** : str The dimension to compute the normalisation over **slp_name** : str, optional The name of the slp variable in the input slp Dataset **lon_dim** : str, optional The name of the longitude dimension **lat_dim** : str, optional The name of the latitude dimension .. !! processed by numpydoc !! .. py:function:: calculate_nao(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat') Calculate the Northern Atlantic Oscillation index from monthly data as defined by Jianping, L. & Wang, J. X. L. (2003). The NAO index is defined as the difference between the normalized monthly mean sea level pressure at 35∘N and 65∘N, averaged over the zonal band spanning 80◦W–30◦E :Parameters: **slp** : xarray Dataset Array of sea level pressures **clim_period** : iterable Size 2 iterable containing strings indicating the start and end dates of the climatological period used to normalise the NAO index **groupby_dim** : str The dimension to compute the normalisation over **slp_name** : str, optional The name of the slp variable in the input slp Dataset **lon_dim** : str, optional The name of the longitude dimension **lat_dim** : str, optional The name of the latitude dimension .. !! processed by numpydoc !! .. py:function:: calculate_amv(sst_anom, sst_name='sst') Calculate the Atlantic Multi-decadal Variability (AMV)--also known as the Atlantic Multi-decadal Oscillation (AMO)--according to Trenberth and Shea (2006). The AMV is calculated as the spatial average of SST anomalies over the North Atlantic (Equator–60∘ N and 80–0∘ W) minus the spatial average of SST anomalies averaged from 60∘ S to 60∘ N. Note typically the SST anomalies are smoothed in time using a 10-year moving average (Goldenberg et al., 2001; Enfield et al., 2001), a low-pass filter (Trenberth and Shea 2006) or a 4-year temporal average (Bilbao at al., 2021). :Parameters: **sst_anom** : xarray Dataset Array of sst anomalies **sst_name** : str, optional The name of the sst variable in sst_anom .. !! processed by numpydoc !! .. py:function:: calculate_ipo(sst_anom, sst_name='sst') Calculate the tripolar pacific index for the Interdecadal Pacific Oscillation (IPO) following Henley et al (2015). The IPO is calculated as the average of SST anomalies over the central equatorial Pacific (region 2: 10∘ S–10∘ N, 170∘ E–90∘ W) minus the average of the SST anomalies in the northwestern (region 1: 25–45∘ N, 140∘ E–145∘ W) and southwestern Pacific (region 3: 50–15∘ S, 150∘ E–160∘ W). Note typically the IPO index is smoothed in time using a 13-year Chebyshev low-pass filter (Henley et al., 2015) or by first applying a 4-year temporal average to the sst anomalies (Bilbao at al., 2021). .. !! processed by numpydoc !! .. py:function:: calculate_ohc300(temp, depth_dim='depth', temp_name='temp') Calculate the ocean heat content above 300m The input DataArray or Dataset is assumed to be in Kelvin :Parameters: **temp** : xarray Dataset Array of temperature values in Kelvin **depth_dim** : str, optional The name of the depth dimension **temp_name** : str, optional The name of the temperature variable in temp .. !! processed by numpydoc !! .. py:function:: calculate_wind_speed(u_v, u_name, v_name, lon_dim='lon', lat_dim='lat') Calculate the wind speed :Parameters: **u_v** : xarray Dataset Dataset containing the longitudinal and latitudinal components of the wind **u_name** : str The name of the u-velocity variable in u **v_name** : str The name of the v-velocity variable in v **lon_dim** : str, optional The name of the longitude dimension for u and v **lat_dim** : str, optional The name of the latitude dimension for u and v .. !! processed by numpydoc !! .. py:function:: calculate_tmean_from_tmin_tmax(ds, tmin_name='tmin', tmax_name='tmax', tmean_name='tmean') Estimate tmean as the average of tmin and tmax :Parameters: **ds** : xarray Dataset Dataset containing tmin and tmax variables **tmin_name** : str The name of the tmin variable **tmax_name** : str The name of the tmax variable **tmean_name** : str The name of the output tmean variable .. !! processed by numpydoc !! .. py:function:: calculate_ffdi(ds, clim_period, wind_from_components, precip_name='precip', rh_name='rh', tmax_name='t_ref_max', wmax_name='V_ref_max', u_name='u_ref', v_name='v_ref') Returns the McArthur Forest Fire Danger Index following the formula provided in Dowdy (2018): FFDI = D ** 0.987 * exp (0.0338 * T - 0.0345 * H + 0.0234 * W + 0.243147) :Parameters: **ds** : xarray Dataset Dataset containing the following variables - precip; Daily total precipitation [mm]. This is used to estimate the drought factor, D, as the 20-day accumulated rainfall scaled to lie between 0 and 10, with larger values indicating less precipitation (see Richardson et al. (2021) and Squire et al. (2021)). The drought factor is used as D in the above equation. - tmax; Daily max 2 m temperature [deg C]. This is used as T in the above equation. - rh; Daily max relative humidity at 2m [%] (or similar, depending on data availability). Richardson et al. (2021) uses mid-afternoon relative humidity at 2 m, Squire et al. (2021) uses daily mean relative humidity at 1000 hPa. This is used as H in the above equation. - wmax; Daily max 10 m wind speed [km/h] (or similar, depending on data availability). Squire et al. (2021) uses daily mean wind speed. This is used as W in the above equation. **clim_period** : iterable Size 2 iterable containing strings indicating the start and end dates of the climatological period used to calculate the drought factor **wind_from_components** : boolean Whether to calculate the wmax estimate from provided individual components of wind or whether to use a provide max estimate. If True, variables with names matching those provided as parameters 'u_name' and 'v_name' must exist in ds. If False, uses for wmax the variable name provided as the `wmax_name` parameter. **precip_name** : str, optional The name of the precip variable **rh_name** : str, optional The name of the rh variable **tmax_name** : str, optional The name of the tmax variable **wmax_name** : str, optional The name of the wmax variable. This is only used if wind_from_components=False Otherwise an estimate of wmax is calculated from the variables u_name and v_name **u_name** : str, optional The name of the u-component of wind variable to use to estimate wmax when wind_from_components=True. Not used if wind_from_components=False. **v_name** : str, optional The name of the v-component of wind variable to use to estimate wmax when wind_from_components=True. Not used if wind_from_components=False. .. rubric:: References Dowdy, A. J. (2018). “Climatological Variability of Fire Weather in Australia”. Journal of Applied Meteorology and Climatology 57.2, pp. 221–234. issn: 1558-8424. doi: 10.1175/JAMC-D-17-0167.1. .. only:: latex .. !! processed by numpydoc !! .. py:function:: calculate_EHF(T, T_p95_file=None, T_p95_period=None, T_p95_dim=None, rolling_dim='time', T_name='t_ref') Calculate the Excess Heat Factor (EHF) index, defined as: EHF = max(0, EHI_sig) * max(1, EHI_accl) with EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95 EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + ... + T_i–30) / 30 T is the daily mean temperature (commonly calculated as the mean of the min and max daily temperatures, usually with daily maximum typically preceding the daily minimum, and the two observations relate to the same 9am-to-9am 24-h period) and T_p95 is the 95th percentile of T using all days in the year. :Parameters: **T** : xarray DataArray Array of daily mean temperature **T_p95_file** : xarray DataArray, optional Path to a file with the 95th percentiles of T using all days in the year. This should be relative to the project directory. If not provided, T_p95_period and T_p95_dim must be provided **T_p95_period** : list of str, optional Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate T_p95. Only used if T_p95 is None **T_p95_dim** : str or list of str, optional The dimension(s) over which to calculate T_p95. Only used if T_p95 is None **rolling_dim** : str, optional The dimension over which to compute the rolling averages in the definition of EHF **T_name** : str, optional The name of the temperature variable in T **References** .. **----------** .. **Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227** .. .. !! processed by numpydoc !! .. py:function:: calculate_EHF_severity(T, T_p95_file=None, EHF_p85_file=None, T_p95_period=None, T_p95_dim=None, EHF_p85_period=None, EHF_p85_dim=None, rolling_dim='time', T_name='t_ref') Calculate the severity of the Excess Heat Factor index, defined as: EHF_severity = EHF / EHF_p85 where "_p85" denotes the 85th percentile of all positive values using all days in the year and the Excess Heat Factor (EHF) is defined as: EHF = max(0, EHI_sig) * max(1, EHI_accl) with EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95 EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + ... + T_i–30) / 30 T is the daily mean temperature (commonly calculated as the mean of the min and max daily temperatures, usually with daily maximum typically preceding the daily minimum, and the two observations relate to the same 9am-to-9am 24-h period) and T_p95 is the 95th percentile of T using all days in the year. :Parameters: **T** : xarray DataArray Array of daily mean temperature **T_p95_file** : xarray DataArray, optional Path to a file with the 95th percentiles of T using all days in the year. This should be relative to the project directory. If not provided, T_p95_period and T_p95_dim must be provided **EHF_p85_file** : xarray DataArray, optional Path to a file with the 85th percentiles of positive EHF using all days in the year. This should be relative to the project directory. If not provided, EHF_p85_period and EHF_p85_dim must be provided **T_p95_period** : list of str, optional Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate T_p95. Only used if T_p95 is None **T_p95_dim** : str or list of str, optional The dimension(s) over which to calculate T_p95. Only used if T_p95 is None **EHF_p85_period** : list of str, optional Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate EHF_p85. Only used if EHF_p85 is None **EHF_p85_dim** : str or list of str, optional The dimension(s) over which to calculate EHF_p85. Only used if EHF_p85 is None **rolling_dim** : str, optional The dimension over which to compute the rolling averages in the definition of EHF **T_name** : str, optional The name of the temperature variable in T .. rubric:: References Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227 .. only:: latex .. !! processed by numpydoc !! .. py:function:: ensemble_mean(ds, ensemble_dim='member') Return the ensemble mean of the input array :Parameters: **ds** : xarray Dataset Array to take the ensemble mean of **ensemble_dim** : str, optional The name of the ensemble dimension .. !! processed by numpydoc !! .. py:function:: greater_than(ds, value) Return a boolean array with True where elements > value :Parameters: **ds: xarray Dataset** The array to mask **value: float, xarray Dataset** The value(s) to use to mask ds .. !! processed by numpydoc !! .. py:function:: where_greater_than(ds, value) Return array with elements <= value masked to nan :Parameters: **ds: xarray Dataset** The array to mask **value: float, xarray Dataset** The value(s) to use to mask ds .. !! processed by numpydoc !! .. py:function:: add_CAFE_grid_info(ds) Add CAFE grid info to a CAFE dataset that doesn't already have it :Parameters: **ds** : xarray Dataset The dataset to add grid info to .. !! processed by numpydoc !! .. py:function:: normalise_by_days_in_month(ds) Normalise input array by the number of days in each month :Parameters: **ds** : xarray Dataset The array to normalise .. !! processed by numpydoc !! .. py:function:: convert_time_to_lead(ds, time_dim='time', time_freq=None, init_dim='init', lead_dim='lead') Return provided array with time dimension converted to lead time dimension and time added as additional coordinate :Parameters: **ds** : xarray Dataset A dataset with a time dimension **time_dim** : str, optional The name of the time dimension **time_freq** : str, optional The frequency of the time dimension. If not provided, will try to use xr.infer_freq to determine the frequency. This is only used to add a freq attr to the lead time coordinate **init_dim** : str, optional The name of the initial date dimension in the output **lead_dim** : str, optional The name of the lead time dimension in the output .. !! processed by numpydoc !! .. py:function:: truncate_latitudes(ds, dp=10, lat_dim='lat') Return provided array with latitudes truncated to specified dp. This is necessary due to precision differences from running forecasts on different systems :Parameters: **ds** : xarray Dataset A dataset with a latitude dimension **dp** : int, optional The number of decimal places to truncate at **lat_dim** : str, optional The name of the latitude dimension .. !! processed by numpydoc !! .. py:function:: convert_calendar(ds, calendar, time_dim='time') Convert calendar, dropping invalid/surplus dates or inserting missing dates :Parameters: **ds** : xarray Dataset A dataset with a time dimension **time_dim** : str, optional The name of the time dimension .. !! processed by numpydoc !! .. py:function:: rechunk(ds, **chunks) Rechunk a dataset :Parameters: **ds** : xarray Dataset A dataset to be rechunked **chunks** : dict Dictionary of {dim: chunksize} .. !! processed by numpydoc !! .. py:function:: select(ds, **selection) Returns a new dataset with each array indexed by tick labels along the specified dimension(s) :Parameters: **ds** : xarray Dataset A dataset to select from **selection** : dict A dict with keys matching dimensions and values given by scalars, slices or arrays of tick labels .. !! processed by numpydoc !! .. py:function:: add_attrs(ds, attrs, variable=None) Add attributes to a dataset :Parameters: **ds** : xarray Dataset The data to add attributes to **attrs** : dict The attributes to add **variable** : str, optional The name of the variable or coordinate to add the attributes to. If None, the attributes will be added as global attributes .. !! processed by numpydoc !! .. py:function:: rename(ds, **names) Rename all variables etc that have an entry in names :Parameters: **ds** : xarray Dataset A dataset to be renamed **names** : dict Dictionary of {old_name: new_name} .. !! processed by numpydoc !! .. py:function:: convert(ds, **conversion) Convert variables in a dataset according to provided dictionary :Parameters: **ds** : xarray Dataset A dataset to be converted **conversion** : dict Dictionary of {variable: oper} where oper is a dictionary specifying the operation and the value. Current possible operations are 'multiply_by' and 'add'. .. !! processed by numpydoc !! .. py:function:: keep_period(ds, period) Keep only times outside of a specified period :Parameters: **ds** : xarray Dataset The data to mask **period** : iterable Size 2 iterable containing strings indicating the start and end dates of the period to retain .. !! processed by numpydoc !! .. py:function:: _get_groupby_and_reduce_dims(ds, frequency) Get the groupby and reduction dimensions for performing operations like calculating anomalies and percentile thresholds .. !! processed by numpydoc !! .. py:function:: anomalise(ds, clim_period, frequency=None) Returns the anomalies of ds relative to its climatology over clim_period. Uses a shortcut for calculating hindcast climatologies that will not work for hindcasts with initialisation frequencies more regular than monthly. :Parameters: **ds** : xarray Dataset The data to anomalise **clim_period** : iterable Size 2 iterable containing strings indicating the start and end dates of the climatological period **frequency** : str, optional The frequency at which to bin the climatology, e.g. per month. Must be an available attribute of the datetime accessor. Specify "None" to indicate no frequency (climatology calculated by averaging all times). Note, setting to "None" for hindcast data can be dangerous, since only certain times may be available at each lead. .. !! processed by numpydoc !! .. py:function:: calculate_percentile_thresholds(ds, percentile, percentile_period, percentile_dim=None, frequency=None) Returns the percentile values of ds over a provided period. :Parameters: **ds** : xarray Dataset The data to calculate the percentiles **percentile** : float The percentile to calculate **percentile_period** : iterable Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds **percentile_dim** : str or list of str, optional The dimension(s) over which to compute the percentile thresholds. If None, these will determined automatically based on the type of input data: - timeseries : percentile_dim = "time" - forecasts : percentile_dim = "init" [, "member"] **frequency** : str, optional The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify "None" to indicate no frequency (percentiles calculated over all times). Note, setting to "None" for hindcast data can be dangerous, since only certain times may be available at each lead. .. !! processed by numpydoc !! .. py:function:: over_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None) Find which values in the input array are over a specified percentile calculated over a specified period. Returns a boolean array with True where values are over the specified percentile and False elsewhere. :Parameters: **ds** : xarray Dataset The data threshold based in it's percentiles **percentile** : float The percentile use to threshold the data **percentile_period** : iterable Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds **frequency** : str, optional The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify "None" to indicate no frequency (percentiles calculated over all times). Note, setting to "None" for hindcast data can be dangerous, since only certain times may be available at each lead. .. !! processed by numpydoc !! .. py:function:: under_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None) Find which values in the input array are under a specified percentile calculated over a specified period. Returns a boolean array with True where values are under the specified percentile and False elsewhere. :Parameters: **ds** : xarray Dataset The data threshold based in it's percentiles **percentile** : float The percentile use to threshold the data **percentile_period** : iterable Size 2 iterable containing strings indicating the start and end dates of the period over which to calculate the percentile thresholds **frequency** : str, optional The frequency at which to bin the percentiles percentiles, e.g. per month. Must be an available attribute of the datetime accessor. Specify "None" to indicate no frequency (percentiles calculated over all times). Note, setting to "None" for hindcast data can be dangerous, since only certain times may be available at each lead. .. !! processed by numpydoc !! .. py:function:: correct_bias(ds, obsv_file, period, frequency, method) Correct the mean bias of ds relative to observations over a provided period Will not work for hindcasts with initialisation frequencies more regular than monthly. :Parameters: **ds** : xarray Dataset The hindcast data to correct **obsv_file** : str Path to a file with the appropriate observation data to correct to. This should be relative to the project directory **period** : iterable Size 2 iterable containing strings indicating period over which to calculate the biases **frequency** : str The frequency at which to bin the biases, e.g. per month. Must be an available attribute of the datetime accessor. Specify "None" to indicate no frequency (climatology calculated by averaging all times). Note, setting to "None" can be dangerous, since only certain times may be available at each lead and there is no check that the same times are available between the observations and forecasts. **method** : str The method to use to correct the mean bias. Options are: - "additive": the difference between the ds and obsv climatology is subtracted from ds - "multiplicative": ds is divided by the ratio of the ds and obsv climatologies .. !! processed by numpydoc !! .. py:function:: interpolate_to_grid_from_file(ds, file, add_area=True, ignore_degenerate=True) .. py:function:: round_to_start_of_day(ds, dim) Return provided array with specified time dimension rounded to the start of the day :Parameters: **ds** : xarray Dataset The dataset with a dimension(s) to round **dim** : str The name of the dimensions to round .. !! processed by numpydoc !! .. py:function:: round_to_start_of_month(ds, dim) Return provided array with specified time dimension rounded to the start of the month :Parameters: **ds** : xarray Dataset The dataset with a dimension(s) to round **dim** : str The name of the dimensions to round .. !! processed by numpydoc !! .. py:function:: coarsen(ds, window_size, start_points=None, dim='time') Coarsen data, applying 'max' to all relevant coords and optionally starting at a particular time point in the array :Parameters: **ds** : xarray Dataset The dataset to coarsen **start_points** : list Value(s) of coordinate `dim` to start the coarsening from. If these fall outside the range of the coordinate, coarsening starts at the beginning of the array **dim** : str, optional The name of the dimension to coarsen along .. !! processed by numpydoc !! .. py:function:: rolling_mean(ds, window_size, start_points=None, dim='time') Apply a rolling mean to the data, applying 'max' to all relevant coords and optionally starting at a particular time point in the array :Parameters: **ds** : xarray Dataset The dataset to apply the rolling mean to **start_points** : str or list of str Value(s) of coordinate `dim` to start the coarsening from. If these fall outside the range of the coordinate, coarsening starts at the beginning of the array **dim** : str, optional The name of the dimension to coarsen along .. !! processed by numpydoc !! .. py:function:: resample(ds, freq, start_points=None, min_samples=None, dim='time') Resample data to a different temporal frequency by taking the mean over all values at the downsampled frequency and optionally starting at a particular time point in the array :Parameters: **ds** : xarray Dataset The dataset to resample **freq** : str Resample frequency expressed using pandas offset alias **start_points** : str or list of str Value(s) of coordinate `dim` to start the resampling from. If these fall outside the range of the coordinate, resampling starts at the beginning of the array **min_samples** : int, optional The minimum number of samples that must occur within a resampled group. If there are less samples a nan will be assigned. **dim** : str, optional The name of the time dimension to resample along .. !! processed by numpydoc !! .. py:function:: get_region_masks_from_shp(ds, shapefile, header) Extract region masks according to a shapefile :Parameters: **ds** : xarray Dataset The array with the grid to build the masks for **shapefile** : str The path to the shapefile to use **header** : str Name of the shapefile column to use to name the regions .. !! processed by numpydoc !! .. py:function:: average_over_NRM_super_clusters(ds) Average the provided array over the NRM super cluster regions :Parameters: **ds** : xarray Dataset The array to average over the NRM super cluster regions .. !! processed by numpydoc !! .. py:function:: mask_CAFEf6_reduced_dt(ds) Mask out the ensemble members of CAFE-f6 that were run with a reduced timestep since reducing the timestep was found to produce a different model equilibrium :Parameters: **ds** : xarray Dataset The CAFE-f6 data to mask .. !! processed by numpydoc !! .. py:function:: gridarea_cdo(ds) Returns the area weights computed using cdo's gridarea function Note, this function writes ds to disk, so strip back ds to only what is needed :Parameters: **ds** : xarray Dataset The dataset to passed to cdo gridarea .. !! processed by numpydoc !! .. py:function:: add_area_using_cdo_gridarea(ds, lon_dim='lon', lat_dim='lat') Add a area coordinate to the provided dataset containing the cell areas estimated by cdo's gridarea function :Parameters: **ds** : xarray Dataset The data to use to estimate the cell areas **lon_dim** : str, optional The name of the longitude dimension on ds **lat_dim** : str, optional The name of the latitude dimension on ds .. !! processed by numpydoc !! .. py:function:: max_chunk_size_MB(ds) Get the max chunk size in a dataset .. !! processed by numpydoc !!