rwgen.RainfallModel.preprocess
- RainfallModel.preprocess(calculation_period='full_record', completeness_threshold=0.0, outlier_method=None, maximum_relative_difference=2.0, maximum_alterations=5, amax_durations=None, amax_window_type='sliding', output_filenames='default', use_pooling=True, dayfirst=False)
Prepare reference statistics, weights and scale factors for use in model fitting, simulation and evaluation.
Updates
self.reference_statistics
andself.phi
attributes and writes areference_statistics
output file.- Parameters:
calculation_period (str or list of int) – Start year and end year of calculation period as list. If string
'full_record'
is passed (default) then all available data will be used.completeness_threshold (float) – Percentage completeness for a month or season to be included in statistics calculations. Default is 0.0, i.e. any completeness (or missing data) percentage is acceptable.
outlier_method (str) – Flag indicating which (if any) method should be to reduce the influence of outliers. Options are None (default),
'trim'
(remove outliers) or'clip'
(Winsorise). See Notes.maximum_relative_difference (float) – Maximum relative difference to allow between the two largest values in a timeseries. Used only if
outlier_method
is not None.maximum_alterations (int) – Maximum number of trimming or clipping alterations permitted. Used only if
outlier_method
is not None.amax_durations (int or list of int) – Durations (in hours) for which annual maxima (AMAX) should be identified (default is None).
amax_window_type (str) – Use a
'sliding'
(default) or'fixed'
window in AMAX extraction.output_filenames (str or dict) – Either key/value pairs indicating output file names,
'default'
to use {‘statistics’: ‘reference_statistics.csv’, ‘amax’: ‘reference_amax.csv’} orNone
to indicate that no output files should be written.use_pooling (bool) – Indicates whether to pool (scaled) point series for calculating statistics for a spatial model. If True (default), cross-correlations are also “averaged” for a set of separation distance bins.
dayfirst (bool) – mm]. Default False (i.e. yyyy-mm-dd hh:mm).
Notes
Currently
.csv
files are used for time series inputs. These files are expected to contain aDateTime
column usingdd/mm/yyyy hh:mm
format (‘%d/%m/%Y %H:%M’) oryyyy-mm-dd hh:mm
(‘%Y-%m-%d %H:%M’). They should also contain aValue
column using units of mm/timestep.