caeli package¶
Submodules¶
caeli.distributions module¶
-
caeli.distributions.
pwm
(x, n=4)¶ Return a list with the n first probability weighted moments (\(b_r\)).
\[b_r = \frac{\sum_{i=1}^{n_s} x_i {i \choose r}}{n_s {n_s - 1\choose r}}\]where:
\(n_s\) — size of the sample x
See, for example, Diana Bilkova (2014) (eq. 17)
- Parameters
x (list or numpy.array) – sample values
n – number of returned probability weighted moments (\(b_r\))
- Returns
(list) probability weighted moments (\(b_r\))
-
caeli.distributions.
lmoments
(x, n=4, ratio=True, lcv=False)¶ Return a list with the n first L-moments of the sample x.
\[\lambda_{r + 1} = \sum_{k=0}^{r} (-1)^{r - k} {r \choose k} {r + k \choose k} b_k\]with:
\(0 \leq r \leq n - 1\)
where:
\(b_k\) — first probability weighted moments (see
pwm()
)See, for example, Diana Bilkova (2014) (eq. 26)
If ratio is True, replace \(\lambda_r\) with \(\lambda_r/\lambda_2\) for \(r \geq 3\), where \(\lambda_3/\lambda_2\) is the L-skewness and \(\lambda_4/\lambda_2\) is the L-kurtosis.
If lcv is True, replace \(\lambda_2\) with the coefficient of L-variation \(\lambda_2/\lambda_1\). For a non-negative random variable, this lies in the interval (0,1) and is identical to the Gini coefficient (see https://en.wikipedia.org/wiki/L-moment).
- Parameters
x (list or numpy.array) – sample values
n (int) – number of returned probability weighted moments (\(b_r\))
ratio (bool) – if True, replace \(\lambda_r\) with \(\lambda_r/\lambda_2\) for \(r \geq 3\). Default \(ratio = True\)
lcv (bool) – if True, replace \(\lambda_2\) with \(\lambda_2/\lambda_1\)
- Returns
(list) L-moments of the sample x
-
caeli.distributions.
lmoments_parameter_estimation_generalized_logistic
(lambda1, lambda2, tau)¶ Return the location, scale and shape or the generalized logistic distribution
Based on SUBROUTINE PELGLO of the LMOMENTS Fortran package version 3.04, July 2005
- Parameters
lambda1 – L-moment-1
lambda2 – L-moment-2
tau – L-moment-3 / L-moment-2
- Returns
(float) location, scale and shape
-
caeli.distributions.
lmoments_parameter_estimation_gamma
(lambda1, lambda2)¶ Return the location and scale of the gamma distribution.
Based on SUBROUTINE PELGAM of the LMOMENTS Fortran package version 3.04, July 2005
- Parameters
lambda1 (float) – L-moment-1 (\(\lambda_1\))
lambda2 – L-moment-2 (\(\lambda_2\))
- Returns
(float) location and scale
-
caeli.distributions.
genloglogistic_cdf
(x, loc, scale, shape)¶ Return the cumulative distribution function of the generalized logistic distribution
Based on SUBROUTINE CDFGLO of the LMOMENTS Fortran package version 3.04, July 2005
- Parameters
x (numpy.array) – sample values
loc (float) – location parameter (\(\mu\))
scale (float) – scale parameter (\(\sigma\) > 0)
shape (float) – shape parameter (\(\kappa\))
- Returns
(numpy.array) cdf
caeli.drought_indices module¶
-
caeli.drought_indices.
spi
(precipitation)¶ - Parameters
precipitation (list or np.array) – meteo values
- Returns
list of spi
- Return type
list
-
caeli.drought_indices.
spi_monthly
(sr, months=range(1, 13), aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, prefix='P', min_years=20)¶ - Parameters
sr – Series with precipitation depth
months – list of months, e.g. [11, 12, 1, 2, 3]
aggregation – number of months to aggregate: 1, 2, 3, 4, or 6. Default: n=1
start_at –
closed_left –
closed_right –
label –
is_sorted –
prefix –
min_years –
- Returns
pandas.DataFrame with precipitations ‘Prec’ and drought indices ‘Spi’ for each year and month
-
caeli.drought_indices.
spei
(values)¶ Calculate SPEI from given values.
Values are the differences between meteo and potential evapotranspiration.
For example if you want to calculate spei from January in the
- Parameters
values (list, numpy array) – list or numpy array of values
- Returns
- Return type
-
caeli.drought_indices.
spei_monthly
(sr, months=range(1, 13), aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, prefix='P', min_years=20)¶ - Parameters
sr (pandas series) – Series with precipitation depth minus potential evapotranspiration (\(P - ETo\)) as values and pandas TimeStamp as index. The series frequencies can be, for example, minutely, hourly, daily, or monthly.
months (list) – list of months, e.g. [11, 12, 1, 2, 3]
aggregation – number of months to aggregate: 1, 2, 3, 4, or 6. Default: n=1
start_at –
:type : :param closed_left: :type : :param closed_right: :type : :param label: :type : :param is_sorted: :type : :param prefix: :type : :param min_years: :type : :return: (**)
caeli.time_series module¶
-
caeli.time_series.
replace_year
(dt, year)¶ Replace the year in
dt
byyear
. If dt has the last day in the month, keep also the last day of the month for leap years- Parameters
dt –
year –
- Returns
- Return type
-
caeli.time_series.
is_leap_day
(dt)¶ Check whether
dt
is the 29.02- Parameters
dt (datetime, pd.Timestamp, np.datetime64) – datetime
- Returns
True/False
- Return type
bool
-
caeli.time_series.
last_day_of_month
(dt)¶
-
caeli.time_series.
is_last_day_of_month
(dt)¶ Check whether day in
dt
is the last day of the month :param dt: datetime :type dt: datetime, pd.Timestamp, np.datetime64 :return: True/False :rtype: bool
-
caeli.time_series.
increment_months
(dt, months=1, microseconds=0)¶ Increment
dt
bymonths
. Default is to increment one month. Return apd.Timestamp
.- Parameters
dt (datetime, pd.Timestamp, np.datetime64) – timestamp
months (int) – number of months to increment. Negative values are allowed. Default months = 1
microseconds (int) – microseconds to add to the right interval: 0 for closed, -1 for right opened interval
- Returns
ts incremented by
months
- Return type
pd.Timestamp
-
caeli.time_series.
monthly_intervals
(indices, months=None, aggregation=1, start_at='beg', closed_left=True, closed_right=True)¶ Return a list of tuples [from, to], where the intervals correspond to the begin and end of aggregated months (default aggregation=1 means monthly intervals). The aggregation may be also negative.
- Parameters
indices (pd.DatetimeIndex, list) – sorted list of timestamps
months (None or list) – output months for the intervals
aggregation (int) – number of aggregated months. Default 1 (monthly)
start_at (datetime.datetime, str) – date and time to start. Only day and time are used, year and month are only placeholders and will be discarded. start_at=’end’ for the end of the first month in the time series. start_at=’beg’ for the first day of the month at 00:00:00. start_at=None is equivalent to start_at=’beg’
closed_left (bool) – left close interval
closed_right (bool) – right close interval
- Returns
list of intervals [[begin0, end0], [begin1, end1], …, [beginN, endN]]
- Return type
list of [pd.Timestamp, pd.Timestamp]
For the examples below the following indices will be used:
>>> import numpy as np >>> import pandas as pd >>> from caeli.time_series import monthly_intervals >>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d') >>> index DatetimeIndex(['1990-01-01 07:30:00', '1990-01-02 07:30:00', '1990-01-03 07:30:00', '1990-01-04 07:30:00', '1990-01-05 07:30:00', '1990-01-06 07:30:00', '1990-01-07 07:30:00', '1990-01-08 07:30:00', '1990-01-09 07:30:00', '1990-01-10 07:30:00', ... '2019-12-23 07:30:00', '2019-12-24 07:30:00', '2019-12-25 07:30:00', '2019-12-26 07:30:00', '2019-12-27 07:30:00', '2019-12-28 07:30:00', '2019-12-29 07:30:00', '2019-12-30 07:30:00', '2019-12-31 07:30:00', '2020-01-01 07:30:00'], dtype='datetime64[ns]', length=10958, freq='D')
Examples:
Using default values. Note that the time series starts at 07:30 but as per default the month starts at 00:00. Therefore, the first month is ignored.
>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='beg', closed_left=True, closed_right=True) >>> print('{}, ..., {}'.format(itv[0], itv[-1])) [Timestamp('1990-02-01 00:00:00'), Timestamp('1990-03-01 00:00:00')], ..., [Timestamp('2019-12-01 00:00:00'), Timestamp('2020-01-01 00:00:00')]
Setting start_at=1999-01-01 07:30’. YYYY-MM (‘1999-01’) is a place holder.
>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='1999-01-01 07:30', closed_left=True, closed_right=True) >>> print('{}, ..., {}'.format(itv[0], itv[-1])) [Timestamp('1990-01-01 07:30:00'), Timestamp('1990-02-01 07:30:00')], ..., [Timestamp('2019-12-01 07:30:00'), Timestamp('2020-01-01 07:30:00')]
closed_right=False.
>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='1999-01-01 07:30', closed_left=True, closed_right=False) >>> print('{}, ..., {}'.format(itv[0], itv[-1])) [Timestamp('1990-01-01 07:30:00'), Timestamp('1990-02-01 07:29:59.999999')], ..., [Timestamp('2019-12-01 07:30:00'), Timestamp('2020-01-01 07:29:59.999999')]
aggregation=2.
>>> itv = monthly_intervals(index, months=None, aggregation=2, start_at='1999-01-01 07:30', closed_left=True, closed_right=False) >>> print('{}, ..., {}'.format(itv[0], itv[-1])) [Timestamp('1990-01-01 07:30:00'), Timestamp('1990-03-01 07:29:59.999999')], ..., [Timestamp('2019-11-01 07:30:00'), Timestamp('2020-01-01 07:29:59.999999')]
months=[1, 4, 7, 10].
>>> itv = monthly_intervals(index, months=[1, 4, 7, 10], aggregation=3, start_at='1999-01-01 07:30', closed_left=True, closed_right=False) >>> itv[:5] [[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-04-01 07:29:59.999999')], [Timestamp('1990-04-01 07:30:00'), Timestamp('1990-07-01 07:29:59.999999')], [Timestamp('1990-07-01 07:30:00'), Timestamp('1990-10-01 07:29:59.999999')], [Timestamp('1990-10-01 07:30:00'), Timestamp('1991-01-01 07:29:59.999999')], [Timestamp('1991-01-01 07:30:00'), Timestamp('1991-04-01 07:29:59.999999')]]
Negative aggregation (aggregation=-3). Note that the first aggregation [1989-12-31 07:30:00, 1990-02-01 07:29:59.999999] is ignored because the time series starts at 1990-01-01 07:30:00.
>>> itv = monthly_intervals(index, months=[2, 5, 8, 11], aggregation=-3, start_at='1999-01-01 07:30', closed_left=True, closed_right=False) >>> itv[:5] [[Timestamp('1990-02-01 07:30:00'), Timestamp('1990-05-01 07:29:59.999999')], [Timestamp('1990-05-01 07:30:00'), Timestamp('1990-08-01 07:29:59.999999')], [Timestamp('1990-08-01 07:30:00'), Timestamp('1990-11-01 07:29:59.999999')], [Timestamp('1990-11-01 07:30:00'), Timestamp('1991-02-01 07:29:59.999999')], [Timestamp('1991-02-01 07:30:00'), Timestamp('1991-05-01 07:29:59.999999')]]
-
caeli.time_series.
monthly_series
(sr, rule='sum', months=None, aggregation=1, start_at=None, closed_left=True, closed_right=False, label='right', is_sorted=False, time_format='d')¶ Return the series resampled to the months listed in
months
, takingaccum
adjacent months. The default resampling rule issum
.- Parameters
sr (pandas.Series, pandas.DataFrame) – pandas.Series with DateTimeIndex as index. The series at any frequency will be aggregated to month(s)
rule (str) – resample rule. Default rule=’sum’
months – see
monthly_intervals()
aggregation – see
monthly_intervals()
start_at – see
monthly_intervals()
closed_left – see
monthly_intervals()
closed_right – see
monthly_intervals()
label (str) – ‘right’ for setting the index at the end and ‘left’ for setting the index at the begin of the interval in the time series. Default label=’right’
is_sorted (bool) – True if the input time series is alredy sorted, otherwise False. Default is_sorted = False
time_format (str, None) – ‘d’ (day/date): round hour, minute, sencond, and milliseconds to 0; ‘h’ (hour): round minute, second, and milliseconds to 0, ‘m’ (minute)’: round second and milliseconds to 0, ‘s’ (second): round milliseconds to 0; None: do not round anything
- Returns
(pandas.DataFrame, pandas.Series): monthly time series
For the examples below the following time series will be used:
>>> import numpy as np >>> import pandas as pd >>> from caeli.time_series import monthly_series >>> np.random.seed(1) >>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d') >>> p = np.random.normal(2, 0.1, size=len(index)) >>> p[p < 0.0] = 0.0 >>> sr_daily = pd.Series(p, index=index) >>> sr_daily 1990-01-01 07:30:00 2.162435 1990-01-02 07:30:00 1.938824 ... 2019-12-31 07:30:00 1.937972 2020-01-01 07:30:00 2.081355 Freq: D, Length: 10958, dtype: float64
Right labeled, showing date only:
>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30') >>> sr_monthly 1990-03-01 117.961372 1990-04-01 118.789945 ... 2019-12-01 122.096353 2020-01-01 123.361334 Length: 359, dtype: float64
Right labeled, showing the full date/time:
>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30', time_format=None) >>> sr_monthly 1990-03-01 07:29:59.999999 117.961372 1990-04-01 07:29:59.999999 118.789945 ... 2019-12-01 07:29:59.999999 122.096353 2020-01-01 07:29:59.999999 123.361334 Length: 359, dtype: float64
Left labeled, showing date only:
>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30', label='left') >>> sr_monthly 1990-01-01 117.961372 1990-02-01 118.789945 ... 2019-10-01 122.096353 2019-11-01 123.361334 Length: 359, dtype: float64
Left labeled, showing the full date/time:
>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30', label='left', time_format=None) >>> sr_monthly 1990-01-01 07:30:00 117.961372 1990-02-01 07:30:00 118.789945 ... 2019-10-01 07:30:00 122.096353 2019-11-01 07:30:00 123.361334 Length: 359, dtype: float64
-
caeli.time_series.
months_split_annually
(sr, rule='sum', months=None, aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, time_format='d', prefix='M')¶ Return a pandas.DataFrame with aggregated months as columns and year as index.
- Parameters
sr (pandas.Series or pandas.DataFrame) – pandas.Series with DateTimeIndex as index
rule – see
monthly_series()
months – see
monthly_intervals()
aggregation – see
monthly_intervals()
start_at – see
monthly_intervals()
closed_left – see
monthly_intervals()
closed_right – see
monthly_intervals()
label – see
monthly_intervals()
is_sorted – see
monthly_intervals()
time_format – see
monthly_intervals()
prefix (str) – Prefix for columns names. Default prefix=’M’
- Returns
(pandas.DataFrame) with aggregated months as columns and year as index
For the examples below the following time series will be used:
>>> import numpy as np >>> import pandas as pd >>> from caeli.time_series import months_split_annually >>> np.random.seed(1) >>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d') >>> p = np.random.normal(2, 0.1, size=len(index)) >>> p[p < 0.0] = 0.0 >>> sr_daily = pd.Series(p, index=index) >>> sr_daily 1990-01-01 07:30:00 2.162435 1990-01-02 07:30:00 1.938824 ... 2019-12-31 07:30:00 1.937972 2020-01-01 07:30:00 2.081355 Freq: D, Length: 10958, dtype: float64
>>> spy = months_split_annually(sr_daily, aggregation=2, start_at='1999-01-01 07:30') >>> print(spy) M01-02 M02-03 ... M11-12 M12-01 year ... 1990 117.961372 118.789945 ... 121.819112 123.028979 1991 117.760247 118.953375 ... 121.958717 123.601324 ... ... ... ... ... ... 2018 117.549323 117.780231 ... 121.336530 122.549497 2019 116.797959 117.721573 ... 123.361334 NaN [30 rows x 12 columns] M01-02 M02-03 ... M11-12 M12-01 year ... 1990 117.961372 118.789945 ... 121.819112 123.028979 1991 117.760247 118.953375 ... 121.958717 123.601324 ... ... ... ... ... ... 2018 117.549323 117.780231 ... 121.336530 122.549497 2019 116.797959 117.721573 ... 123.361334 NaN [30 rows x 12 columns]
-
caeli.time_series.
slice_by_timestamp
(df, beg_timestamp=Timestamp('1677-09-21 00:12:43.145225'), end_timestamp=Timestamp('2262-04-11 23:47:16.854775807'))¶ Slice the data frame from index starting at beg_timestamp to end_timestamp, including the latter.
- Parameters
df (pandas.DataFrame or pandas.Series) – data frame
beg_timestamp (datetime.datetime, pandas.timestamp, or numpy.datetime64) – begin of slice
end_timestamp (datetime.datetime, pandas.timestamp, or numpy.datetime64) – end of slice (inclusive)
- Returns
(pandas.DataFrame or pandas.Series) sliced data frame