:py:mod:`src.utils`
===================

.. py:module:: src.utils


Module Contents
---------------


Functions
~~~~~~~~~

.. autoapisummary::

   src.utils.load_config
   src.utils.composite_function
   src.utils.extract_lon_lat_box
   src.utils.calculate_nino34
   src.utils.calculate_dmi
   src.utils.calculate_sam
   src.utils.calculate_nao
   src.utils.calculate_amv
   src.utils.calculate_ipo
   src.utils.calculate_ohc300
   src.utils.calculate_wind_speed
   src.utils.calculate_tmean_from_tmin_tmax
   src.utils.calculate_ffdi
   src.utils.calculate_EHF
   src.utils.calculate_EHF_severity
   src.utils.ensemble_mean
   src.utils.greater_than
   src.utils.where_greater_than
   src.utils.add_CAFE_grid_info
   src.utils.normalise_by_days_in_month
   src.utils.convert_time_to_lead
   src.utils.truncate_latitudes
   src.utils.convert_calendar
   src.utils.rechunk
   src.utils.select
   src.utils.add_attrs
   src.utils.rename
   src.utils.convert
   src.utils.keep_period
   src.utils._get_groupby_and_reduce_dims
   src.utils.anomalise
   src.utils.calculate_percentile_thresholds
   src.utils.over_percentile_threshold
   src.utils.under_percentile_threshold
   src.utils.correct_bias
   src.utils.interpolate_to_grid_from_file
   src.utils.round_to_start_of_day
   src.utils.round_to_start_of_month
   src.utils.coarsen
   src.utils.rolling_mean
   src.utils.resample
   src.utils.get_region_masks_from_shp
   src.utils.average_over_NRM_super_clusters
   src.utils.mask_CAFEf6_reduced_dt
   src.utils.gridarea_cdo
   src.utils.add_area_using_cdo_gridarea
   src.utils.max_chunk_size_MB


Attributes
~~~~~~~~~~

.. autoapisummary::

   src.utils.PROJECT_DIR


.. py:data:: PROJECT_DIR
   

.. py:function:: load_config(name)

   
   Load a config .yml file for a specified dataset


   :Parameters:

       **name** : str
           The path to the config file to load


   ..
       !! processed by numpydoc !!

.. py:function:: composite_function(function_dict)

   
   Return a composite function of all functions and kwargs specified in a
   provided dictionary


   :Parameters:

       **function_dict** : dict
           Dictionary with functions in this module to composite as keys and
           kwargs as values


   ..
       !! processed by numpydoc !!

.. py:function:: extract_lon_lat_box(ds, box, weighted_average, lon_dim='lon', lat_dim='lat')

   
   Return a region specified by a range of longitudes and latitudes.


   :Parameters:

       **ds** : xarray Dataset or DataArray
           The data to subset and average. Assumed to include an "area" Variable

       **box** : iterable
           Iterable with the following elements in this order:
           [lon_lower, lon_upper, lat_lower, lat_upper]
           where longitudes are specified between 0 and 360 deg E and latitudes
           are specified between -90 and 90 deg N

       **weighted_average** : boolean
           If True, reture the area weighted average over the region, otherwise
           return the region

       **lon_dim** : str, optional
           The name of the longitude dimension

       **lat_dim** : str, optional
           The name of the latitude dimension


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_nino34(sst_anom, sst_name='sst')

   
   Calculate the NINO3.4 index. The NINO3.4 index is calculated as the spatial average
   of SST anomalies over the tropical Pacific region (5∘S–5∘N and 170–120∘ W).


   :Parameters:

       **sst_anom** : xarray Dataset
           Array of sst anomalies

       **sst_name** : str, optional
           The name of the sst variable in sst_anom


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_dmi(sst_anom, sst_name='sst')

   
   Calculate the Dipole Mode Index (DMI) for the Indian Ocean Dipole. The DMI is
   calculated as the difference between the spatial averages of SST anomalies over
   two regions of the tropical Indian Ocean: (10°S-10°N and 50°E-70°E) and
   (10°S-0°S and 90°E-110°E).


   :Parameters:

       **sst_anom** : xarray Dataset
           Array of sst anomalies

       **sst_name** : str, optional
           The name of the sst variable in sst_anom


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_sam(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat')

   
   Calculate the Southern Annular Mode index from monthly data as defined by Gong, D.
   and Wang, S., 1999. The SAM index is defined as the difference between the normalized
   monthly zonal mean sea level pressure at 40∘S and 65∘S.


   :Parameters:

       **slp** : xarray Dataset
           Array of sea level pressures

       **clim_period** : iterable
           Size 2 iterable containing strings indicating the start and end dates of the
           climatological period used to normalise the SAM index

       **groupby_dim** : str
           The dimension to compute the normalisation over

       **slp_name** : str, optional
           The name of the slp variable in the input slp Dataset

       **lon_dim** : str, optional
           The name of the longitude dimension

       **lat_dim** : str, optional
           The name of the latitude dimension


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_nao(slp, clim_period, groupby_dim='time', slp_name='slp', lon_dim='lon', lat_dim='lat')

   
   Calculate the Northern Atlantic Oscillation index from monthly data as defined by
   Jianping, L. & Wang, J. X. L. (2003). The NAO index is defined as the difference
   between the normalized monthly mean sea level pressure at 35∘N and 65∘N, averaged
   over the zonal band spanning 80◦W–30◦E


   :Parameters:

       **slp** : xarray Dataset
           Array of sea level pressures

       **clim_period** : iterable
           Size 2 iterable containing strings indicating the start and end dates of the
           climatological period used to normalise the NAO index

       **groupby_dim** : str
           The dimension to compute the normalisation over

       **slp_name** : str, optional
           The name of the slp variable in the input slp Dataset

       **lon_dim** : str, optional
           The name of the longitude dimension

       **lat_dim** : str, optional
           The name of the latitude dimension


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_amv(sst_anom, sst_name='sst')

   
   Calculate the Atlantic Multi-decadal Variability (AMV)--also known as the Atlantic
   Multi-decadal Oscillation (AMO)--according to Trenberth and Shea (2006). The AMV
   is calculated as the spatial average of SST anomalies over the North Atlantic
   (Equator–60∘ N and 80–0∘ W) minus the spatial average of SST anomalies averaged from
   60∘ S to 60∘ N.

   Note typically the SST anomalies are smoothed in time using a 10-year moving average
   (Goldenberg et al., 2001; Enfield et al., 2001), a low-pass filter (Trenberth and Shea
   2006) or a 4-year temporal average (Bilbao at al., 2021).

   :Parameters:

       **sst_anom** : xarray Dataset
           Array of sst anomalies

       **sst_name** : str, optional
           The name of the sst variable in sst_anom


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_ipo(sst_anom, sst_name='sst')

   
   Calculate the tripolar pacific index for the Interdecadal Pacific Oscillation (IPO)
   following Henley et al (2015). The IPO is calculated as the average of SST anomalies
   over the central equatorial Pacific (region 2: 10∘ S–10∘ N, 170∘ E–90∘ W) minus the
   average of the SST anomalies in the northwestern (region 1: 25–45∘ N, 140∘ E–145∘ W)
   and southwestern Pacific (region 3: 50–15∘ S, 150∘ E–160∘ W).

   Note typically the IPO index is smoothed in time using a 13-year Chebyshev low-pass
   filter (Henley et al., 2015) or by first applying a 4-year temporal average to the
   sst anomalies (Bilbao at al., 2021).


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_ohc300(temp, depth_dim='depth', temp_name='temp')

   
   Calculate the ocean heat content above 300m

   The input DataArray or Dataset is assumed to be in Kelvin

   :Parameters:

       **temp** : xarray Dataset
           Array of temperature values in Kelvin

       **depth_dim** : str, optional
           The name of the depth dimension

       **temp_name** : str, optional
           The name of the temperature variable in temp


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_wind_speed(u_v, u_name, v_name, lon_dim='lon', lat_dim='lat')

   
   Calculate the wind speed


   :Parameters:

       **u_v** : xarray Dataset
           Dataset containing the longitudinal and latitudinal components of the wind

       **u_name** : str
           The name of the u-velocity variable in u

       **v_name** : str
           The name of the v-velocity variable in v

       **lon_dim** : str, optional
           The name of the longitude dimension for u and v

       **lat_dim** : str, optional
           The name of the latitude dimension for u and v


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_tmean_from_tmin_tmax(ds, tmin_name='tmin', tmax_name='tmax', tmean_name='tmean')

   
   Estimate tmean as the average of tmin and tmax


   :Parameters:

       **ds** : xarray Dataset
           Dataset containing tmin and tmax variables

       **tmin_name** : str
           The name of the tmin variable

       **tmax_name** : str
           The name of the tmax variable

       **tmean_name** : str
           The name of the output tmean variable


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_ffdi(ds, clim_period, wind_from_components, precip_name='precip', rh_name='rh', tmax_name='t_ref_max', wmax_name='V_ref_max', u_name='u_ref', v_name='v_ref')

   
   Returns the McArthur Forest Fire Danger Index following the formula provided
   in Dowdy (2018):
   FFDI = D ** 0.987 * exp (0.0338 * T - 0.0345 * H + 0.0234 * W + 0.243147)


   :Parameters:

       **ds** : xarray Dataset
           Dataset containing the following variables
           - precip; Daily total precipitation [mm]. This is used to estimate the
           drought factor, D, as the 20-day accumulated rainfall scaled to lie between
           0 and 10, with larger values indicating less precipitation (see Richardson
           et al. (2021) and Squire et al. (2021)). The drought factor is used as D in
           the above equation.
           - tmax; Daily max 2 m temperature [deg C]. This is used as T in the above
           equation.
           - rh; Daily max relative humidity at 2m [%] (or similar, depending on data
           availability). Richardson et al. (2021) uses mid-afternoon relative humidity
           at 2 m, Squire et al. (2021) uses daily mean relative humidity at 1000 hPa.
           This is used as H in the above equation.
           - wmax; Daily max 10 m wind speed [km/h] (or similar, depending on data
           availability). Squire et al. (2021) uses daily mean wind speed. This is used
           as W in the above equation.

       **clim_period** : iterable
           Size 2 iterable containing strings indicating the start and end dates of the
           climatological period used to calculate the drought factor

       **wind_from_components** : boolean
           Whether to calculate the wmax estimate from provided individual components of
           wind or whether to use a provide max estimate. If True, variables with names
           matching those provided as parameters 'u_name' and 'v_name' must exist in ds.
           If False, uses for wmax the variable name provided as the `wmax_name`
           parameter.

       **precip_name** : str, optional
           The name of the precip variable

       **rh_name** : str, optional
           The name of the rh variable

       **tmax_name** : str, optional
           The name of the tmax variable

       **wmax_name** : str, optional
           The name of the wmax variable. This is only used if wind_from_components=False
           Otherwise an estimate of wmax is calculated from the variables u_name and
           v_name

       **u_name** : str, optional
           The name of the u-component of wind variable to use to estimate wmax when
           wind_from_components=True. Not used if wind_from_components=False.

       **v_name** : str, optional
           The name of the v-component of wind variable to use to estimate wmax when
           wind_from_components=True. Not used if wind_from_components=False.


   .. rubric:: References

   Dowdy, A. J. (2018). “Climatological Variability of Fire Weather in Australia”.
   Journal of Applied Meteorology and Climatology 57.2, pp. 221–234. issn:
   1558-8424. doi: 10.1175/JAMC-D-17-0167.1.

   .. only:: latex


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_EHF(T, T_p95_file=None, T_p95_period=None, T_p95_dim=None, rolling_dim='time', T_name='t_ref')

   
   Calculate the Excess Heat Factor (EHF) index, defined as:

       EHF = max(0, EHI_sig) * max(1, EHI_accl)

   with

       EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95
       EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + ... + T_i–30) / 30

   T is the daily mean temperature (commonly calculated as the mean of the min and max
   daily temperatures, usually with daily maximum typically preceding the daily minimum,
   and the two observations relate to the same 9am-to-9am 24-h period) and T_p95 is the 95th
   percentile of T using all days in the year.

   :Parameters:

       **T** : xarray DataArray
           Array of daily mean temperature

       **T_p95_file** : xarray DataArray, optional
           Path to a file with the 95th percentiles of T using all days in the year. This should be
           relative to the project directory. If not provided, T_p95_period and T_p95_dim must be
           provided

       **T_p95_period** : list of str, optional
           Size 2 iterable containing strings indicating the start and end dates of the period over
           which to calculate T_p95. Only used if T_p95 is None

       **T_p95_dim** : str or list of str, optional
           The dimension(s) over which to calculate T_p95. Only used if T_p95 is None

       **rolling_dim** : str, optional
           The dimension over which to compute the rolling averages in the definition of EHF

       **T_name** : str, optional
           The name of the temperature variable in T

       **References**
           ..

       **----------**
           ..

       **Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227**
           ..


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_EHF_severity(T, T_p95_file=None, EHF_p85_file=None, T_p95_period=None, T_p95_dim=None, EHF_p85_period=None, EHF_p85_dim=None, rolling_dim='time', T_name='t_ref')

   
   Calculate the severity of the Excess Heat Factor index, defined as:

       EHF_severity = EHF / EHF_p85

   where "_p85" denotes the 85th percentile of all positive values using all days in the
   year and the Excess Heat Factor (EHF) is defined as:

       EHF = max(0, EHI_sig) * max(1, EHI_accl)

   with

       EHI_sig = (T_i + T_i+1 + T_i+2) / 3 – T_p95
       EHI_accl = (T_i + T_i+1 + T_i+2) / 3 – (T_i–1 + ... + T_i–30) / 30

   T is the daily mean temperature (commonly calculated as the mean of the min and max
   daily temperatures, usually with daily maximum typically preceding the daily minimum,
   and the two observations relate to the same 9am-to-9am 24-h period) and T_p95 is the 95th
   percentile of T using all days in the year.

   :Parameters:

       **T** : xarray DataArray
           Array of daily mean temperature

       **T_p95_file** : xarray DataArray, optional
           Path to a file with the 95th percentiles of T using all days in the year. This should be
           relative to the project directory. If not provided, T_p95_period and T_p95_dim must be
           provided

       **EHF_p85_file** : xarray DataArray, optional
           Path to a file with the 85th percentiles of positive EHF using all days in the year. This
           should be relative to the project directory. If not provided, EHF_p85_period and
           EHF_p85_dim must be provided

       **T_p95_period** : list of str, optional
           Size 2 iterable containing strings indicating the start and end dates of the period over
           which to calculate T_p95. Only used if T_p95 is None

       **T_p95_dim** : str or list of str, optional
           The dimension(s) over which to calculate T_p95. Only used if T_p95 is None

       **EHF_p85_period** : list of str, optional
           Size 2 iterable containing strings indicating the start and end dates of the period over
           which to calculate EHF_p85. Only used if EHF_p85 is None

       **EHF_p85_dim** : str or list of str, optional
           The dimension(s) over which to calculate EHF_p85. Only used if EHF_p85 is None

       **rolling_dim** : str, optional
           The dimension over which to compute the rolling averages in the definition of EHF

       **T_name** : str, optional
           The name of the temperature variable in T


   .. rubric:: References

   Nairn et al. 2015: https://doi.org/10.3390/ijerph120100227

   .. only:: latex


   ..
       !! processed by numpydoc !!

.. py:function:: ensemble_mean(ds, ensemble_dim='member')

   
   Return the ensemble mean of the input array


   :Parameters:

       **ds** : xarray Dataset
           Array to take the ensemble mean of

       **ensemble_dim** : str, optional
           The name of the ensemble dimension


   ..
       !! processed by numpydoc !!

.. py:function:: greater_than(ds, value)

   
   Return a boolean array with True where elements > value


   :Parameters:

       **ds: xarray Dataset**
           The array to mask

       **value: float, xarray Dataset**
           The value(s) to use to mask ds


   ..
       !! processed by numpydoc !!

.. py:function:: where_greater_than(ds, value)

   
   Return array with elements <= value masked to nan


   :Parameters:

       **ds: xarray Dataset**
           The array to mask

       **value: float, xarray Dataset**
           The value(s) to use to mask ds


   ..
       !! processed by numpydoc !!

.. py:function:: add_CAFE_grid_info(ds)

   
   Add CAFE grid info to a CAFE dataset that doesn't already have it


   :Parameters:

       **ds** : xarray Dataset
           The dataset to add grid info to


   ..
       !! processed by numpydoc !!

.. py:function:: normalise_by_days_in_month(ds)

   
   Normalise input array by the number of days in each month


   :Parameters:

       **ds** : xarray Dataset
           The array to normalise


   ..
       !! processed by numpydoc !!

.. py:function:: convert_time_to_lead(ds, time_dim='time', time_freq=None, init_dim='init', lead_dim='lead')

   
   Return provided array with time dimension converted to lead time dimension
   and time added as additional coordinate


   :Parameters:

       **ds** : xarray Dataset
           A dataset with a time dimension

       **time_dim** : str, optional
           The name of the time dimension

       **time_freq** : str, optional
           The frequency of the time dimension. If not provided, will try to use
           xr.infer_freq to determine the frequency. This is only used to add a
           freq attr to the lead time coordinate

       **init_dim** : str, optional
           The name of the initial date dimension in the output

       **lead_dim** : str, optional
           The name of the lead time dimension in the output


   ..
       !! processed by numpydoc !!

.. py:function:: truncate_latitudes(ds, dp=10, lat_dim='lat')

   
   Return provided array with latitudes truncated to specified dp.

   This is necessary due to precision differences from running forecasts on
   different systems

   :Parameters:

       **ds** : xarray Dataset
           A dataset with a latitude dimension

       **dp** : int, optional
           The number of decimal places to truncate at

       **lat_dim** : str, optional
           The name of the latitude dimension


   ..
       !! processed by numpydoc !!

.. py:function:: convert_calendar(ds, calendar, time_dim='time')

   
   Convert calendar, dropping invalid/surplus dates or inserting missing dates


   :Parameters:

       **ds** : xarray Dataset
           A dataset with a time dimension

       **time_dim** : str, optional
           The name of the time dimension


   ..
       !! processed by numpydoc !!

.. py:function:: rechunk(ds, **chunks)

   
   Rechunk a dataset


   :Parameters:

       **ds** : xarray Dataset
           A dataset to be rechunked

       **chunks** : dict
           Dictionary of {dim: chunksize}


   ..
       !! processed by numpydoc !!

.. py:function:: select(ds, **selection)

   
   Returns a new dataset with each array indexed by tick labels along the
   specified dimension(s)


   :Parameters:

       **ds** : xarray Dataset
           A dataset to select from

       **selection** : dict
           A dict with keys matching dimensions and values given by scalars,
           slices or arrays of tick labels


   ..
       !! processed by numpydoc !!

.. py:function:: add_attrs(ds, attrs, variable=None)

   
   Add attributes to a dataset


   :Parameters:

       **ds** : xarray Dataset
           The data to add attributes to

       **attrs** : dict
           The attributes to add

       **variable** : str, optional
           The name of the variable or coordinate to add the attributes to.
           If None, the attributes will be added as global attributes


   ..
       !! processed by numpydoc !!

.. py:function:: rename(ds, **names)

   
   Rename all variables etc that have an entry in names


   :Parameters:

       **ds** : xarray Dataset
           A dataset to be renamed

       **names** : dict
           Dictionary of {old_name: new_name}


   ..
       !! processed by numpydoc !!

.. py:function:: convert(ds, **conversion)

   
   Convert variables in a dataset according to provided dictionary


   :Parameters:

       **ds** : xarray Dataset
           A dataset to be converted

       **conversion** : dict
           Dictionary of {variable: oper} where oper is a dictionary
           specifying the operation and the value. Current possible
           operations are 'multiply_by' and 'add'.


   ..
       !! processed by numpydoc !!

.. py:function:: keep_period(ds, period)

   
   Keep only times outside of a specified period


   :Parameters:

       **ds** : xarray Dataset
           The data to mask

       **period** : iterable
           Size 2 iterable containing strings indicating the start and end dates
           of the period to retain


   ..
       !! processed by numpydoc !!

.. py:function:: _get_groupby_and_reduce_dims(ds, frequency)

   
   Get the groupby and reduction dimensions for performing operations like
   calculating anomalies and percentile thresholds


   ..
       !! processed by numpydoc !!

.. py:function:: anomalise(ds, clim_period, frequency=None)

   
   Returns the anomalies of ds relative to its climatology over clim_period.

   Uses a shortcut for calculating hindcast climatologies that will not work
   for hindcasts with initialisation frequencies more regular than monthly.

   :Parameters:

       **ds** : xarray Dataset
           The data to anomalise

       **clim_period** : iterable
           Size 2 iterable containing strings indicating the start and end dates
           of the climatological period

       **frequency** : str, optional
           The frequency at which to bin the climatology, e.g. per month. Must be
           an available attribute of the datetime accessor. Specify "None" to
           indicate no frequency (climatology calculated by averaging all times).
           Note, setting to "None" for hindcast data can be dangerous, since only
           certain times may be available at each lead.


   ..
       !! processed by numpydoc !!

.. py:function:: calculate_percentile_thresholds(ds, percentile, percentile_period, percentile_dim=None, frequency=None)

   
   Returns the percentile values of ds over a provided period.


   :Parameters:

       **ds** : xarray Dataset
           The data to calculate the percentiles

       **percentile** : float
           The percentile to calculate

       **percentile_period** : iterable
           Size 2 iterable containing strings indicating the start and end dates
           of the period over which to calculate the percentile thresholds

       **percentile_dim** : str or list of str, optional
           The dimension(s) over which to compute the percentile thresholds. If None,
           these will determined automatically based on the type of input data:
           - timeseries : percentile_dim = "time"
           - forecasts : percentile_dim = "init" [, "member"]

       **frequency** : str, optional
           The frequency at which to bin the percentiles percentiles, e.g. per month.
           Must be an available attribute of the datetime accessor. Specify "None" to
           indicate no frequency (percentiles calculated over all times). Note, setting
           to "None" for hindcast data can be dangerous, since only certain times may
           be available at each lead.


   ..
       !! processed by numpydoc !!

.. py:function:: over_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None)

   
   Find which values in the input array are over a specified percentile
   calculated over a specified period. Returns a boolean array with True
   where values are over the specified percentile and False elsewhere.


   :Parameters:

       **ds** : xarray Dataset
           The data threshold based in it's percentiles

       **percentile** : float
           The percentile use to threshold the data

       **percentile_period** : iterable
           Size 2 iterable containing strings indicating the start and end dates
           of the period over which to calculate the percentile thresholds

       **frequency** : str, optional
           The frequency at which to bin the percentiles percentiles, e.g. per month.
           Must be an available attribute of the datetime accessor. Specify "None" to
           indicate no frequency (percentiles calculated over all times). Note, setting
           to "None" for hindcast data can be dangerous, since only certain times may
           be available at each lead.


   ..
       !! processed by numpydoc !!

.. py:function:: under_percentile_threshold(ds, percentile, percentile_period, percentile_dim=None, frequency=None)

   
   Find which values in the input array are under a specified percentile
   calculated over a specified period. Returns a boolean array with True
   where values are under the specified percentile and False elsewhere.


   :Parameters:

       **ds** : xarray Dataset
           The data threshold based in it's percentiles

       **percentile** : float
           The percentile use to threshold the data

       **percentile_period** : iterable
           Size 2 iterable containing strings indicating the start and end dates
           of the period over which to calculate the percentile thresholds

       **frequency** : str, optional
           The frequency at which to bin the percentiles percentiles, e.g. per month.
           Must be an available attribute of the datetime accessor. Specify "None" to
           indicate no frequency (percentiles calculated over all times). Note, setting
           to "None" for hindcast data can be dangerous, since only certain times may
           be available at each lead.


   ..
       !! processed by numpydoc !!

.. py:function:: correct_bias(ds, obsv_file, period, frequency, method)

   
   Correct the mean bias of ds relative to observations over a provided period

   Will not work for hindcasts with initialisation frequencies more regular
   than monthly.

   :Parameters:

       **ds** : xarray Dataset
           The hindcast data to correct

       **obsv_file** : str
           Path to a file with the appropriate observation data to correct to.
           This should be relative to the project directory

       **period** : iterable
           Size 2 iterable containing strings indicating period over which to
           calculate the biases

       **frequency** : str
           The frequency at which to bin the biases, e.g. per month. Must be an
           available attribute of the datetime accessor. Specify "None" to indicate
           no frequency (climatology calculated by averaging all times). Note,
           setting to "None" can be dangerous, since only certain times may be
           available at each lead and there is no check that the same times are
           available between the observations and forecasts.

       **method** : str
           The method to use to correct the mean bias. Options are:
           - "additive": the  difference between the ds and obsv climatology is
               subtracted from ds
           - "multiplicative": ds is divided by the ratio of the ds and obsv
               climatologies


   ..
       !! processed by numpydoc !!

.. py:function:: interpolate_to_grid_from_file(ds, file, add_area=True, ignore_degenerate=True)


.. py:function:: round_to_start_of_day(ds, dim)

   
   Return provided array with specified time dimension rounded to the start of
   the day


   :Parameters:

       **ds** : xarray Dataset
           The dataset with a dimension(s) to round

       **dim** : str
           The name of the dimensions to round


   ..
       !! processed by numpydoc !!

.. py:function:: round_to_start_of_month(ds, dim)

   
   Return provided array with specified time dimension rounded to the start of
   the month


   :Parameters:

       **ds** : xarray Dataset
           The dataset with a dimension(s) to round

       **dim** : str
           The name of the dimensions to round


   ..
       !! processed by numpydoc !!

.. py:function:: coarsen(ds, window_size, start_points=None, dim='time')

   
   Coarsen data, applying 'max' to all relevant coords and optionally starting
   at a particular time point in the array


   :Parameters:

       **ds** : xarray Dataset
           The dataset to coarsen

       **start_points** : list
           Value(s) of coordinate `dim` to start the coarsening from. If these fall
           outside the range of the coordinate, coarsening starts at the beginning
           of the array

       **dim** : str, optional
           The name of the dimension to coarsen along


   ..
       !! processed by numpydoc !!

.. py:function:: rolling_mean(ds, window_size, start_points=None, dim='time')

   
   Apply a rolling mean to the data, applying 'max' to all relevant coords and
   optionally starting at a particular time point in the array


   :Parameters:

       **ds** : xarray Dataset
           The dataset to apply the rolling mean to

       **start_points** : str or list of str
           Value(s) of coordinate `dim` to start the coarsening from. If these fall
           outside the range of the coordinate, coarsening starts at the beginning
           of the array

       **dim** : str, optional
           The name of the dimension to coarsen along


   ..
       !! processed by numpydoc !!

.. py:function:: resample(ds, freq, start_points=None, min_samples=None, dim='time')

   
   Resample data to a different temporal frequency by taking the mean
   over all values at the downsampled frequency and optionally starting
   at a particular time point in the array


   :Parameters:

       **ds** : xarray Dataset
           The dataset to resample

       **freq** : str
           Resample frequency expressed using pandas offset alias

       **start_points** : str or list of str
           Value(s) of coordinate `dim` to start the resampling from. If these fall
           outside the range of the coordinate, resampling starts at the beginning
           of the array

       **min_samples** : int, optional
           The minimum number of samples that must occur within a resampled group. If
           there are less samples a nan will be assigned.

       **dim** : str, optional
           The name of the time dimension to resample along


   ..
       !! processed by numpydoc !!

.. py:function:: get_region_masks_from_shp(ds, shapefile, header)

   
   Extract region masks according to a shapefile


   :Parameters:

       **ds** : xarray Dataset
           The array with the grid to build the masks for

       **shapefile** : str
           The path to the shapefile to use

       **header** : str
           Name of the shapefile column to use to name the regions


   ..
       !! processed by numpydoc !!

.. py:function:: average_over_NRM_super_clusters(ds)

   
   Average the provided array over the NRM super cluster regions


   :Parameters:

       **ds** : xarray Dataset
           The array to average over the NRM super cluster regions


   ..
       !! processed by numpydoc !!

.. py:function:: mask_CAFEf6_reduced_dt(ds)

   
   Mask out the ensemble members of CAFE-f6 that were run with a reduced timestep
   since reducing the timestep was found to produce a different model equilibrium


   :Parameters:

       **ds** : xarray Dataset
           The CAFE-f6 data to mask


   ..
       !! processed by numpydoc !!

.. py:function:: gridarea_cdo(ds)

   
   Returns the area weights computed using cdo's gridarea function
   Note, this function writes ds to disk, so strip back ds to only what is needed


   :Parameters:

       **ds** : xarray Dataset
           The dataset to passed to cdo gridarea


   ..
       !! processed by numpydoc !!

.. py:function:: add_area_using_cdo_gridarea(ds, lon_dim='lon', lat_dim='lat')

   
   Add a area coordinate to the provided dataset containing the cell areas
   estimated by cdo's gridarea function


   :Parameters:

       **ds** : xarray Dataset
           The data to use to estimate the cell areas

       **lon_dim** : str, optional
           The name of the longitude dimension on ds

       **lat_dim** : str, optional
           The name of the latitude dimension on ds


   ..
       !! processed by numpydoc !!

.. py:function:: max_chunk_size_MB(ds)

   
   Get the max chunk size in a dataset


   ..
       !! processed by numpydoc !!