caeli package¶

Submodules¶

caeli.distributions module¶

caeli.distributions.pwm(x, n=4)¶

Return a list with the n first probability weighted moments (\(b_r\)).

\[b_r = \frac{\sum_{i=1}^{n_s} x_i {i \choose r}}{n_s {n_s - 1\choose r}}\]

where:

\(n_s\) — size of the sample x

See, for example, Diana Bilkova (2014) (eq. 17)

Parameters

x (list or numpy.array) – sample values
n – number of returned probability weighted moments (\(b_r\))

Returns

(list) probability weighted moments (\(b_r\))

caeli.distributions.lmoments(x, n=4, ratio=True, lcv=False)¶

Return a list with the n first L-moments of the sample x.

\[\lambda_{r + 1} = \sum_{k=0}^{r} (-1)^{r - k} {r \choose k} {r + k \choose k} b_k\]

with:

\(0 \leq r \leq n - 1\)

where:

\(b_k\) — first probability weighted moments (see pwm())

See, for example, Diana Bilkova (2014) (eq. 26)

If ratio is True, replace \(\lambda_r\) with \(\lambda_r/\lambda_2\) for \(r \geq 3\), where \(\lambda_3/\lambda_2\) is the L-skewness and \(\lambda_4/\lambda_2\) is the L-kurtosis.

If lcv is True, replace \(\lambda_2\) with the coefficient of L-variation \(\lambda_2/\lambda_1\). For a non-negative random variable, this lies in the interval (0,1) and is identical to the Gini coefficient (see https://en.wikipedia.org/wiki/L-moment).

Parameters

x (list or numpy.array) – sample values
n (int) – number of returned probability weighted moments (\(b_r\))
ratio (bool) – if True, replace \(\lambda_r\) with \(\lambda_r/\lambda_2\) for \(r \geq 3\). Default \(ratio = True\)
lcv (bool) – if True, replace \(\lambda_2\) with \(\lambda_2/\lambda_1\)

Returns

(list) L-moments of the sample x

caeli.distributions.lmoments_parameter_estimation_generalized_logistic(lambda1, lambda2, tau)¶

Return the location, scale and shape or the generalized logistic distribution

Based on SUBROUTINE PELGLO of the LMOMENTS Fortran package version 3.04, July 2005

Parameters

lambda1 – L-moment-1
lambda2 – L-moment-2
tau – L-moment-3 / L-moment-2

Returns

(float) location, scale and shape

caeli.distributions.lmoments_parameter_estimation_gamma(lambda1, lambda2)¶

Return the location and scale of the gamma distribution.

Based on SUBROUTINE PELGAM of the LMOMENTS Fortran package version 3.04, July 2005

Parameters

lambda1 (float) – L-moment-1 (\(\lambda_1\))
lambda2 – L-moment-2 (\(\lambda_2\))

Returns

(float) location and scale

caeli.distributions.genloglogistic_cdf(x, loc, scale, shape)¶

Return the cumulative distribution function of the generalized logistic distribution

Based on SUBROUTINE CDFGLO of the LMOMENTS Fortran package version 3.04, July 2005

Parameters

x (numpy.array) – sample values
loc (float) – location parameter (\(\mu\))
scale (float) – scale parameter (\(\sigma\) > 0)
shape (float) – shape parameter (\(\kappa\))

Returns

(numpy.array) cdf

caeli.drought_indices module¶

caeli.drought_indices.spi(precipitation)¶

Parameters: precipitation (list or np.array) – meteo values
Returns: list of spi
Return type: list

caeli.drought_indices.spi_monthly(sr, months=range(1, 13), aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, prefix='P', min_years=20)¶

Parameters

sr – Series with precipitation depth
months – list of months, e.g. [11, 12, 1, 2, 3]
aggregation – number of months to aggregate: 1, 2, 3, 4, or 6. Default: n=1
start_at –
closed_left –
closed_right –
label –
is_sorted –
prefix –
min_years –

Returns

pandas.DataFrame with precipitations ‘Prec’ and drought indices ‘Spi’ for each year and month

caeli.drought_indices.spei(values)¶

Calculate SPEI from given values.

Values are the differences between meteo and potential evapotranspiration.

For example if you want to calculate spei from January in the

Parameters: values (list, numpy array) – list or numpy array of values
Returns
Return type

caeli.drought_indices.spei_monthly(sr, months=range(1, 13), aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, prefix='P', min_years=20)¶

Parameters

sr (pandas series) – Series with precipitation depth minus potential evapotranspiration (\(P - ETo\)) as values and pandas TimeStamp as index. The series frequencies can be, for example, minutely, hourly, daily, or monthly.
months (list) – list of months, e.g. [11, 12, 1, 2, 3]
aggregation – number of months to aggregate: 1, 2, 3, 4, or 6. Default: n=1
start_at –

:type : :param closed_left: :type : :param closed_right: :type : :param label: :type : :param is_sorted: :type : :param prefix: :type : :param min_years: :type : :return: (**)

caeli.time_series module¶

caeli.time_series.replace_year(dt, year)¶

Replace the year in dt by year. If dt has the last day in the month, keep also the last day of the month for leap years

Parameters

dt –
year –

Returns

Return type

caeli.time_series.is_leap_day(dt)¶

Check whether dt is the 29.02

Parameters: dt (datetime, pd.Timestamp, np.datetime64) – datetime
Returns: True/False
Return type: bool

caeli.time_series.last_day_of_month(dt)¶

caeli.time_series.is_last_day_of_month(dt)¶: Check whether day in dt is the last day of the month :param dt: datetime :type dt: datetime, pd.Timestamp, np.datetime64 :return: True/False :rtype: bool

caeli.time_series.increment_months(dt, months=1, microseconds=0)¶

Increment dt by months. Default is to increment one month. Return a pd.Timestamp.

Parameters

dt (datetime, pd.Timestamp, np.datetime64) – timestamp
months (int) – number of months to increment. Negative values are allowed. Default months = 1
microseconds (int) – microseconds to add to the right interval: 0 for closed, -1 for right opened interval

Returns

ts incremented by months

Return type

pd.Timestamp

caeli.time_series.monthly_intervals(indices, months=None, aggregation=1, start_at='beg', closed_left=True, closed_right=True)¶

Return a list of tuples [from, to], where the intervals correspond to the begin and end of aggregated months (default aggregation=1 means monthly intervals). The aggregation may be also negative.

Parameters

indices (pd.DatetimeIndex, list) – sorted list of timestamps
months (None or list) – output months for the intervals
aggregation (int) – number of aggregated months. Default 1 (monthly)
start_at (datetime.datetime, str) – date and time to start. Only day and time are used, year and month are only placeholders and will be discarded. start_at=’end’ for the end of the first month in the time series. start_at=’beg’ for the first day of the month at 00:00:00. start_at=None is equivalent to start_at=’beg’
closed_left (bool) – left close interval
closed_right (bool) – right close interval

Returns

list of intervals [[begin0, end0], [begin1, end1], …, [beginN, endN]]

Return type

list of [pd.Timestamp, pd.Timestamp]

For the examples below the following indices will be used:

>>> import numpy as np
>>> import pandas as pd
>>> from caeli.time_series import monthly_intervals
>>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d')
>>> index
DatetimeIndex(['1990-01-01 07:30:00', '1990-01-02 07:30:00',
               '1990-01-03 07:30:00', '1990-01-04 07:30:00',
               '1990-01-05 07:30:00', '1990-01-06 07:30:00',
               '1990-01-07 07:30:00', '1990-01-08 07:30:00',
               '1990-01-09 07:30:00', '1990-01-10 07:30:00',
               ...
               '2019-12-23 07:30:00', '2019-12-24 07:30:00',
               '2019-12-25 07:30:00', '2019-12-26 07:30:00',
               '2019-12-27 07:30:00', '2019-12-28 07:30:00',
               '2019-12-29 07:30:00', '2019-12-30 07:30:00',
               '2019-12-31 07:30:00', '2020-01-01 07:30:00'],
              dtype='datetime64[ns]', length=10958, freq='D')

Examples:

Using default values. Note that the time series starts at 07:30 but as per default the month starts at 00:00. Therefore, the first month is ignored.

>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='beg',
                           closed_left=True, closed_right=True)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-02-01 00:00:00'), Timestamp('1990-03-01 00:00:00')], ...,
[Timestamp('2019-12-01 00:00:00'), Timestamp('2020-01-01 00:00:00')]

Setting start_at=1999-01-01 07:30’. YYYY-MM (‘1999-01’) is a place holder.

>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='1999-01-01 07:30',
                           closed_left=True, closed_right=True)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-02-01 07:30:00')], ...,
[Timestamp('2019-12-01 07:30:00'), Timestamp('2020-01-01 07:30:00')]

closed_right=False.

>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='1999-01-01 07:30',
                           closed_left=True, closed_right=False)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-02-01 07:29:59.999999')], ...,
[Timestamp('2019-12-01 07:30:00'), Timestamp('2020-01-01 07:29:59.999999')]

aggregation=2.

>>> itv = monthly_intervals(index, months=None, aggregation=2, start_at='1999-01-01 07:30',
                           closed_left=True, closed_right=False)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-03-01 07:29:59.999999')], ...,
[Timestamp('2019-11-01 07:30:00'), Timestamp('2020-01-01 07:29:59.999999')]

months=[1, 4, 7, 10].

>>> itv = monthly_intervals(index, months=[1, 4, 7, 10], aggregation=3,
                           start_at='1999-01-01 07:30', closed_left=True, closed_right=False)
>>> itv[:5]
[[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-04-01 07:29:59.999999')],
[Timestamp('1990-04-01 07:30:00'), Timestamp('1990-07-01 07:29:59.999999')],
[Timestamp('1990-07-01 07:30:00'), Timestamp('1990-10-01 07:29:59.999999')],
[Timestamp('1990-10-01 07:30:00'), Timestamp('1991-01-01 07:29:59.999999')],
[Timestamp('1991-01-01 07:30:00'), Timestamp('1991-04-01 07:29:59.999999')]]

Negative aggregation (aggregation=-3). Note that the first aggregation [1989-12-31 07:30:00, 1990-02-01 07:29:59.999999] is ignored because the time series starts at 1990-01-01 07:30:00.

>>> itv = monthly_intervals(index, months=[2, 5, 8, 11], aggregation=-3,
                           start_at='1999-01-01 07:30', closed_left=True, closed_right=False)
>>> itv[:5]
[[Timestamp('1990-02-01 07:30:00'), Timestamp('1990-05-01 07:29:59.999999')],
[Timestamp('1990-05-01 07:30:00'), Timestamp('1990-08-01 07:29:59.999999')],
[Timestamp('1990-08-01 07:30:00'), Timestamp('1990-11-01 07:29:59.999999')],
[Timestamp('1990-11-01 07:30:00'), Timestamp('1991-02-01 07:29:59.999999')],
[Timestamp('1991-02-01 07:30:00'), Timestamp('1991-05-01 07:29:59.999999')]]

caeli.time_series.monthly_series(sr, rule='sum', months=None, aggregation=1, start_at=None, closed_left=True, closed_right=False, label='right', is_sorted=False, time_format='d')¶

Return the series resampled to the months listed in months, taking accum adjacent months. The default resampling rule is sum.

Parameters

sr (pandas.Series, pandas.DataFrame) – pandas.Series with DateTimeIndex as index. The series at any frequency will be aggregated to month(s)
rule (str) – resample rule. Default rule=’sum’
months – see monthly_intervals()
aggregation – see monthly_intervals()
start_at – see monthly_intervals()
closed_left – see monthly_intervals()
closed_right – see monthly_intervals()
label (str) – ‘right’ for setting the index at the end and ‘left’ for setting the index at the begin of the interval in the time series. Default label=’right’
is_sorted (bool) – True if the input time series is alredy sorted, otherwise False. Default is_sorted = False
time_format (str, None) – ‘d’ (day/date): round hour, minute, sencond, and milliseconds to 0; ‘h’ (hour): round minute, second, and milliseconds to 0, ‘m’ (minute)’: round second and milliseconds to 0, ‘s’ (second): round milliseconds to 0; None: do not round anything

Returns

(pandas.DataFrame, pandas.Series): monthly time series

For the examples below the following time series will be used:

>>> import numpy as np
>>> import pandas as pd
>>> from caeli.time_series import monthly_series
>>> np.random.seed(1)
>>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d')
>>> p = np.random.normal(2, 0.1, size=len(index))
>>> p[p < 0.0] = 0.0
>>> sr_daily = pd.Series(p, index=index)
>>> sr_daily
1990-01-01 07:30:00    2.162435
1990-01-02 07:30:00    1.938824
                         ...
2019-12-31 07:30:00    1.937972
2020-01-01 07:30:00    2.081355
Freq: D, Length: 10958, dtype: float64

Right labeled, showing date only:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30')
>>> sr_monthly
1990-03-01    117.961372
1990-04-01    118.789945
                 ...
2019-12-01    122.096353
2020-01-01    123.361334
Length: 359, dtype: float64

Right labeled, showing the full date/time:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30',
                                time_format=None)
>>> sr_monthly
1990-03-01 07:29:59.999999    117.961372
1990-04-01 07:29:59.999999    118.789945
                                 ...
2019-12-01 07:29:59.999999    122.096353
2020-01-01 07:29:59.999999    123.361334
Length: 359, dtype: float64

Left labeled, showing date only:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30',
                                label='left')
>>> sr_monthly
1990-01-01    117.961372
1990-02-01    118.789945
                 ...
2019-10-01    122.096353
2019-11-01    123.361334
Length: 359, dtype: float64

Left labeled, showing the full date/time:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30',
                                label='left', time_format=None)
>>> sr_monthly
1990-01-01 07:30:00    117.961372
1990-02-01 07:30:00    118.789945
                          ...
2019-10-01 07:30:00    122.096353
2019-11-01 07:30:00    123.361334
Length: 359, dtype: float64

caeli.time_series.months_split_annually(sr, rule='sum', months=None, aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, time_format='d', prefix='M')¶

Return a pandas.DataFrame with aggregated months as columns and year as index.

Parameters

sr (pandas.Series or pandas.DataFrame) – pandas.Series with DateTimeIndex as index
rule – see monthly_series()
months – see monthly_intervals()
aggregation – see monthly_intervals()
start_at – see monthly_intervals()
closed_left – see monthly_intervals()
closed_right – see monthly_intervals()
label – see monthly_intervals()
is_sorted – see monthly_intervals()
time_format – see monthly_intervals()
prefix (str) – Prefix for columns names. Default prefix=’M’

Returns

(pandas.DataFrame) with aggregated months as columns and year as index

For the examples below the following time series will be used:

>>> import numpy as np
>>> import pandas as pd
>>> from caeli.time_series import months_split_annually
>>> np.random.seed(1)
>>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d')
>>> p = np.random.normal(2, 0.1, size=len(index))
>>> p[p < 0.0] = 0.0
>>> sr_daily = pd.Series(p, index=index)
>>> sr_daily
1990-01-01 07:30:00    2.162435
1990-01-02 07:30:00    1.938824
                         ...
2019-12-31 07:30:00    1.937972
2020-01-01 07:30:00    2.081355
Freq: D, Length: 10958, dtype: float64

>>> spy = months_split_annually(sr_daily, aggregation=2, start_at='1999-01-01 07:30')
>>> print(spy)
          M01-02      M02-03  ...      M11-12      M12-01
year                          ...
1990  117.961372  118.789945  ...  121.819112  123.028979
1991  117.760247  118.953375  ...  121.958717  123.601324
...          ...         ...  ...         ...         ...
2018  117.549323  117.780231  ...  121.336530  122.549497
2019  116.797959  117.721573  ...  123.361334         NaN

[30 rows x 12 columns]
          M01-02      M02-03  ...      M11-12      M12-01
year                          ...
1990  117.961372  118.789945  ...  121.819112  123.028979
1991  117.760247  118.953375  ...  121.958717  123.601324
...          ...         ...  ...         ...         ...
2018  117.549323  117.780231  ...  121.336530  122.549497
2019  116.797959  117.721573  ...  123.361334         NaN

[30 rows x 12 columns]

caeli.time_series.slice_by_timestamp(df, beg_timestamp=Timestamp('1677-09-21 00:12:43.145225'), end_timestamp=Timestamp('2262-04-11 23:47:16.854775807'))¶

Slice the data frame from index starting at beg_timestamp to end_timestamp, including the latter.

Parameters

df (pandas.DataFrame or pandas.Series) – data frame
beg_timestamp (datetime.datetime, pandas.timestamp, or numpy.datetime64) – begin of slice
end_timestamp (datetime.datetime, pandas.timestamp, or numpy.datetime64) – end of slice (inclusive)

Returns

(pandas.DataFrame or pandas.Series) sliced data frame