rwgen.RainfallModel.preprocess
- RainfallModel.preprocess(calculation_period='full_record', completeness_threshold=0.0, outlier_method=None, maximum_relative_difference=2.0, maximum_alterations=5, amax_durations=None, amax_window_type='sliding', output_filenames='default', use_pooling=True, dayfirst=False)
Prepare reference statistics, weights and scale factors for use in model fitting, simulation and evaluation.
Updates
self.reference_statisticsandself.phiattributes and writes areference_statisticsoutput file.- Parameters:
calculation_period (str or list of int) – Start year and end year of calculation period as list. If string
'full_record'is passed (default) then all available data will be used.completeness_threshold (float) – Percentage completeness for a month or season to be included in statistics calculations. Default is 0.0, i.e. any completeness (or missing data) percentage is acceptable.
outlier_method (str) – Flag indicating which (if any) method should be to reduce the influence of outliers. Options are None (default),
'trim'(remove outliers) or'clip'(Winsorise). See Notes.maximum_relative_difference (float) – Maximum relative difference to allow between the two largest values in a timeseries. Used only if
outlier_methodis not None.maximum_alterations (int) – Maximum number of trimming or clipping alterations permitted. Used only if
outlier_methodis not None.amax_durations (int or list of int) – Durations (in hours) for which annual maxima (AMAX) should be identified (default is None).
amax_window_type (str) – Use a
'sliding'(default) or'fixed'window in AMAX extraction.output_filenames (str or dict) – Either key/value pairs indicating output file names,
'default'to use {‘statistics’: ‘reference_statistics.csv’, ‘amax’: ‘reference_amax.csv’} orNoneto indicate that no output files should be written.use_pooling (bool) – Indicates whether to pool (scaled) point series for calculating statistics for a spatial model. If True (default), cross-correlations are also “averaged” for a set of separation distance bins.
dayfirst (bool) – mm]. Default False (i.e. yyyy-mm-dd hh:mm).
Notes
Currently
.csvfiles are used for time series inputs. These files are expected to contain aDateTimecolumn usingdd/mm/yyyy hh:mmformat (‘%d/%m/%Y %H:%M’) oryyyy-mm-dd hh:mm(‘%Y-%m-%d %H:%M’). They should also contain aValuecolumn using units of mm/timestep.