rwgen.RainfallModel.preprocess

RainfallModel.preprocess(calculation_period='full_record', completeness_threshold=0.0, outlier_method=None, maximum_relative_difference=2.0, maximum_alterations=5, amax_durations=None, amax_window_type='sliding', output_filenames='default', use_pooling=True, dayfirst=False)

Prepare reference statistics, weights and scale factors for use in model fitting, simulation and evaluation.

Updates self.reference_statistics and self.phi attributes and writes a reference_statistics output file.

Parameters:
  • calculation_period (str or list of int) – Start year and end year of calculation period as list. If string 'full_record' is passed (default) then all available data will be used.

  • completeness_threshold (float) – Percentage completeness for a month or season to be included in statistics calculations. Default is 0.0, i.e. any completeness (or missing data) percentage is acceptable.

  • outlier_method (str) – Flag indicating which (if any) method should be to reduce the influence of outliers. Options are None (default), 'trim' (remove outliers) or 'clip' (Winsorise). See Notes.

  • maximum_relative_difference (float) – Maximum relative difference to allow between the two largest values in a timeseries. Used only if outlier_method is not None.

  • maximum_alterations (int) – Maximum number of trimming or clipping alterations permitted. Used only if outlier_method is not None.

  • amax_durations (int or list of int) – Durations (in hours) for which annual maxima (AMAX) should be identified (default is None).

  • amax_window_type (str) – Use a 'sliding' (default) or 'fixed' window in AMAX extraction.

  • output_filenames (str or dict) – Either key/value pairs indicating output file names, 'default' to use {‘statistics’: ‘reference_statistics.csv’, ‘amax’: ‘reference_amax.csv’} or None to indicate that no output files should be written.

  • use_pooling (bool) – Indicates whether to pool (scaled) point series for calculating statistics for a spatial model. If True (default), cross-correlations are also “averaged” for a set of separation distance bins.

  • dayfirst (bool) – mm]. Default False (i.e. yyyy-mm-dd hh:mm).

Notes

Currently .csv files are used for time series inputs. These files are expected to contain a DateTime column using dd/mm/yyyy hh:mm format (‘%d/%m/%Y %H:%M’) or yyyy-mm-dd hh:mm (‘%Y-%m-%d %H:%M’). They should also contain a Value column using units of mm/timestep.