caeli package

Submodules

caeli.distributions module

caeli.distributions.pwm(x, n=4)

Return a list with the n first probability weighted moments (\(b_r\)).

\[b_r = \frac{\sum_{i=1}^{n_s} x_i {i \choose r}}{n_s {n_s - 1\choose r}}\]

where:

\(n_s\) — size of the sample x

See, for example, Diana Bilkova (2014) (eq. 17)

Parameters
  • x (list or numpy.array) – sample values

  • n – number of returned probability weighted moments (\(b_r\))

Returns

(list) probability weighted moments (\(b_r\))

caeli.distributions.lmoments(x, n=4, ratio=True, lcv=False)

Return a list with the n first L-moments of the sample x.

\[\lambda_{r + 1} = \sum_{k=0}^{r} (-1)^{r - k} {r \choose k} {r + k \choose k} b_k\]

with:

\(0 \leq r \leq n - 1\)

where:

\(b_k\) — first probability weighted moments (see pwm())

See, for example, Diana Bilkova (2014) (eq. 26)

If ratio is True, replace \(\lambda_r\) with \(\lambda_r/\lambda_2\) for \(r \geq 3\), where \(\lambda_3/\lambda_2\) is the L-skewness and \(\lambda_4/\lambda_2\) is the L-kurtosis.

If lcv is True, replace \(\lambda_2\) with the coefficient of L-variation \(\lambda_2/\lambda_1\). For a non-negative random variable, this lies in the interval (0,1) and is identical to the Gini coefficient (see https://en.wikipedia.org/wiki/L-moment).

Parameters
  • x (list or numpy.array) – sample values

  • n (int) – number of returned probability weighted moments (\(b_r\))

  • ratio (bool) – if True, replace \(\lambda_r\) with \(\lambda_r/\lambda_2\) for \(r \geq 3\). Default \(ratio = True\)

  • lcv (bool) – if True, replace \(\lambda_2\) with \(\lambda_2/\lambda_1\)

Returns

(list) L-moments of the sample x

caeli.distributions.lmoments_parameter_estimation_generalized_logistic(lambda1, lambda2, tau)

Return the location, scale and shape or the generalized logistic distribution

Based on SUBROUTINE PELGLO of the LMOMENTS Fortran package version 3.04, July 2005

Parameters
  • lambda1 – L-moment-1

  • lambda2 – L-moment-2

  • tau – L-moment-3 / L-moment-2

Returns

(float) location, scale and shape

caeli.distributions.lmoments_parameter_estimation_gamma(lambda1, lambda2)

Return the location and scale of the gamma distribution.

Based on SUBROUTINE PELGAM of the LMOMENTS Fortran package version 3.04, July 2005

Parameters
  • lambda1 (float) – L-moment-1 (\(\lambda_1\))

  • lambda2 – L-moment-2 (\(\lambda_2\))

Returns

(float) location and scale

caeli.distributions.genloglogistic_cdf(x, loc, scale, shape)

Return the cumulative distribution function of the generalized logistic distribution

Based on SUBROUTINE CDFGLO of the LMOMENTS Fortran package version 3.04, July 2005

Parameters
  • x (numpy.array) – sample values

  • loc (float) – location parameter (\(\mu\))

  • scale (float) – scale parameter (\(\sigma\) > 0)

  • shape (float) – shape parameter (\(\kappa\))

Returns

(numpy.array) cdf

caeli.drought_indices module

caeli.drought_indices.spi(precipitation)
Parameters

precipitation (list or np.array) – meteo values

Returns

list of spi

Return type

list

caeli.drought_indices.spi_monthly(sr, months=range(1, 13), aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, prefix='P', min_years=20)
Parameters
  • sr – Series with precipitation depth

  • months – list of months, e.g. [11, 12, 1, 2, 3]

  • aggregation – number of months to aggregate: 1, 2, 3, 4, or 6. Default: n=1

  • start_at

  • closed_left

  • closed_right

  • label

  • is_sorted

  • prefix

  • min_years

Returns

pandas.DataFrame with precipitations ‘Prec’ and drought indices ‘Spi’ for each year and month

caeli.drought_indices.spei(values)

Calculate SPEI from given values.

Values are the differences between meteo and potential evapotranspiration.

For example if you want to calculate spei from January in the

Parameters

values (list, numpy array) – list or numpy array of values

Returns

Return type

caeli.drought_indices.spei_monthly(sr, months=range(1, 13), aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, prefix='P', min_years=20)
Parameters
  • sr (pandas series) – Series with precipitation depth minus potential evapotranspiration (\(P - ETo\)) as values and pandas TimeStamp as index. The series frequencies can be, for example, minutely, hourly, daily, or monthly.

  • months (list) – list of months, e.g. [11, 12, 1, 2, 3]

  • aggregation – number of months to aggregate: 1, 2, 3, 4, or 6. Default: n=1

  • start_at

:type : :param closed_left: :type : :param closed_right: :type : :param label: :type : :param is_sorted: :type : :param prefix: :type : :param min_years: :type : :return: (**)

caeli.time_series module

caeli.time_series.replace_year(dt, year)

Replace the year in dt by year. If dt has the last day in the month, keep also the last day of the month for leap years

Parameters
  • dt

  • year

Returns

Return type

caeli.time_series.is_leap_day(dt)

Check whether dt is the 29.02

Parameters

dt (datetime, pd.Timestamp, np.datetime64) – datetime

Returns

True/False

Return type

bool

caeli.time_series.last_day_of_month(dt)
caeli.time_series.is_last_day_of_month(dt)

Check whether day in dt is the last day of the month :param dt: datetime :type dt: datetime, pd.Timestamp, np.datetime64 :return: True/False :rtype: bool

caeli.time_series.increment_months(dt, months=1, microseconds=0)

Increment dt by months. Default is to increment one month. Return a pd.Timestamp.

Parameters
  • dt (datetime, pd.Timestamp, np.datetime64) – timestamp

  • months (int) – number of months to increment. Negative values are allowed. Default months = 1

  • microseconds (int) – microseconds to add to the right interval: 0 for closed, -1 for right opened interval

Returns

ts incremented by months

Return type

pd.Timestamp

caeli.time_series.monthly_intervals(indices, months=None, aggregation=1, start_at='beg', closed_left=True, closed_right=True)

Return a list of tuples [from, to], where the intervals correspond to the begin and end of aggregated months (default aggregation=1 means monthly intervals). The aggregation may be also negative.

Parameters
  • indices (pd.DatetimeIndex, list) – sorted list of timestamps

  • months (None or list) – output months for the intervals

  • aggregation (int) – number of aggregated months. Default 1 (monthly)

  • start_at (datetime.datetime, str) – date and time to start. Only day and time are used, year and month are only placeholders and will be discarded. start_at=’end’ for the end of the first month in the time series. start_at=’beg’ for the first day of the month at 00:00:00. start_at=None is equivalent to start_at=’beg’

  • closed_left (bool) – left close interval

  • closed_right (bool) – right close interval

Returns

list of intervals [[begin0, end0], [begin1, end1], …, [beginN, endN]]

Return type

list of [pd.Timestamp, pd.Timestamp]

For the examples below the following indices will be used:

>>> import numpy as np
>>> import pandas as pd
>>> from caeli.time_series import monthly_intervals
>>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d')
>>> index
DatetimeIndex(['1990-01-01 07:30:00', '1990-01-02 07:30:00',
               '1990-01-03 07:30:00', '1990-01-04 07:30:00',
               '1990-01-05 07:30:00', '1990-01-06 07:30:00',
               '1990-01-07 07:30:00', '1990-01-08 07:30:00',
               '1990-01-09 07:30:00', '1990-01-10 07:30:00',
               ...
               '2019-12-23 07:30:00', '2019-12-24 07:30:00',
               '2019-12-25 07:30:00', '2019-12-26 07:30:00',
               '2019-12-27 07:30:00', '2019-12-28 07:30:00',
               '2019-12-29 07:30:00', '2019-12-30 07:30:00',
               '2019-12-31 07:30:00', '2020-01-01 07:30:00'],
              dtype='datetime64[ns]', length=10958, freq='D')

Examples:

Using default values. Note that the time series starts at 07:30 but as per default the month starts at 00:00. Therefore, the first month is ignored.

>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='beg',
                           closed_left=True, closed_right=True)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-02-01 00:00:00'), Timestamp('1990-03-01 00:00:00')], ...,
[Timestamp('2019-12-01 00:00:00'), Timestamp('2020-01-01 00:00:00')]

Setting start_at=1999-01-01 07:30’. YYYY-MM (‘1999-01’) is a place holder.

>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='1999-01-01 07:30',
                           closed_left=True, closed_right=True)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-02-01 07:30:00')], ...,
[Timestamp('2019-12-01 07:30:00'), Timestamp('2020-01-01 07:30:00')]

closed_right=False.

>>> itv = monthly_intervals(index, months=None, aggregation=1, start_at='1999-01-01 07:30',
                           closed_left=True, closed_right=False)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-02-01 07:29:59.999999')], ...,
[Timestamp('2019-12-01 07:30:00'), Timestamp('2020-01-01 07:29:59.999999')]

aggregation=2.

>>> itv = monthly_intervals(index, months=None, aggregation=2, start_at='1999-01-01 07:30',
                           closed_left=True, closed_right=False)
>>> print('{}, ..., {}'.format(itv[0], itv[-1]))
[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-03-01 07:29:59.999999')], ...,
[Timestamp('2019-11-01 07:30:00'), Timestamp('2020-01-01 07:29:59.999999')]

months=[1, 4, 7, 10].

>>> itv = monthly_intervals(index, months=[1, 4, 7, 10], aggregation=3,
                           start_at='1999-01-01 07:30', closed_left=True, closed_right=False)
>>> itv[:5]
[[Timestamp('1990-01-01 07:30:00'), Timestamp('1990-04-01 07:29:59.999999')],
[Timestamp('1990-04-01 07:30:00'), Timestamp('1990-07-01 07:29:59.999999')],
[Timestamp('1990-07-01 07:30:00'), Timestamp('1990-10-01 07:29:59.999999')],
[Timestamp('1990-10-01 07:30:00'), Timestamp('1991-01-01 07:29:59.999999')],
[Timestamp('1991-01-01 07:30:00'), Timestamp('1991-04-01 07:29:59.999999')]]

Negative aggregation (aggregation=-3). Note that the first aggregation [1989-12-31 07:30:00, 1990-02-01 07:29:59.999999] is ignored because the time series starts at 1990-01-01 07:30:00.

>>> itv = monthly_intervals(index, months=[2, 5, 8, 11], aggregation=-3,
                           start_at='1999-01-01 07:30', closed_left=True, closed_right=False)
>>> itv[:5]
[[Timestamp('1990-02-01 07:30:00'), Timestamp('1990-05-01 07:29:59.999999')],
[Timestamp('1990-05-01 07:30:00'), Timestamp('1990-08-01 07:29:59.999999')],
[Timestamp('1990-08-01 07:30:00'), Timestamp('1990-11-01 07:29:59.999999')],
[Timestamp('1990-11-01 07:30:00'), Timestamp('1991-02-01 07:29:59.999999')],
[Timestamp('1991-02-01 07:30:00'), Timestamp('1991-05-01 07:29:59.999999')]]
caeli.time_series.monthly_series(sr, rule='sum', months=None, aggregation=1, start_at=None, closed_left=True, closed_right=False, label='right', is_sorted=False, time_format='d')

Return the series resampled to the months listed in months, taking accum adjacent months. The default resampling rule is sum.

Parameters
  • sr (pandas.Series, pandas.DataFrame) – pandas.Series with DateTimeIndex as index. The series at any frequency will be aggregated to month(s)

  • rule (str) – resample rule. Default rule=’sum’

  • months – see monthly_intervals()

  • aggregation – see monthly_intervals()

  • start_at – see monthly_intervals()

  • closed_left – see monthly_intervals()

  • closed_right – see monthly_intervals()

  • label (str) – ‘right’ for setting the index at the end and ‘left’ for setting the index at the begin of the interval in the time series. Default label=’right’

  • is_sorted (bool) – True if the input time series is alredy sorted, otherwise False. Default is_sorted = False

  • time_format (str, None) – ‘d’ (day/date): round hour, minute, sencond, and milliseconds to 0; ‘h’ (hour): round minute, second, and milliseconds to 0, ‘m’ (minute)’: round second and milliseconds to 0, ‘s’ (second): round milliseconds to 0; None: do not round anything

Returns

(pandas.DataFrame, pandas.Series): monthly time series

For the examples below the following time series will be used:

>>> import numpy as np
>>> import pandas as pd
>>> from caeli.time_series import monthly_series
>>> np.random.seed(1)
>>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d')
>>> p = np.random.normal(2, 0.1, size=len(index))
>>> p[p < 0.0] = 0.0
>>> sr_daily = pd.Series(p, index=index)
>>> sr_daily
1990-01-01 07:30:00    2.162435
1990-01-02 07:30:00    1.938824
                         ...
2019-12-31 07:30:00    1.937972
2020-01-01 07:30:00    2.081355
Freq: D, Length: 10958, dtype: float64

Right labeled, showing date only:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30')
>>> sr_monthly
1990-03-01    117.961372
1990-04-01    118.789945
                 ...
2019-12-01    122.096353
2020-01-01    123.361334
Length: 359, dtype: float64

Right labeled, showing the full date/time:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30',
                                time_format=None)
>>> sr_monthly
1990-03-01 07:29:59.999999    117.961372
1990-04-01 07:29:59.999999    118.789945
                                 ...
2019-12-01 07:29:59.999999    122.096353
2020-01-01 07:29:59.999999    123.361334
Length: 359, dtype: float64

Left labeled, showing date only:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30',
                                label='left')
>>> sr_monthly
1990-01-01    117.961372
1990-02-01    118.789945
                 ...
2019-10-01    122.096353
2019-11-01    123.361334
Length: 359, dtype: float64

Left labeled, showing the full date/time:

>>> sr_monthly = monthly_series(sr_daily, aggregation=2, start_at='1999-01-01 07:30',
                                label='left', time_format=None)
>>> sr_monthly
1990-01-01 07:30:00    117.961372
1990-02-01 07:30:00    118.789945
                          ...
2019-10-01 07:30:00    122.096353
2019-11-01 07:30:00    123.361334
Length: 359, dtype: float64
caeli.time_series.months_split_annually(sr, rule='sum', months=None, aggregation=1, start_at=None, closed_left=True, closed_right=False, label='left', is_sorted=False, time_format='d', prefix='M')

Return a pandas.DataFrame with aggregated months as columns and year as index.

Parameters
Returns

(pandas.DataFrame) with aggregated months as columns and year as index

For the examples below the following time series will be used:

>>> import numpy as np
>>> import pandas as pd
>>> from caeli.time_series import months_split_annually
>>> np.random.seed(1)
>>> index = pd.date_range('1990-01-01 07:30', '2020-01-01 07:30', freq='1d')
>>> p = np.random.normal(2, 0.1, size=len(index))
>>> p[p < 0.0] = 0.0
>>> sr_daily = pd.Series(p, index=index)
>>> sr_daily
1990-01-01 07:30:00    2.162435
1990-01-02 07:30:00    1.938824
                         ...
2019-12-31 07:30:00    1.937972
2020-01-01 07:30:00    2.081355
Freq: D, Length: 10958, dtype: float64
>>> spy = months_split_annually(sr_daily, aggregation=2, start_at='1999-01-01 07:30')
>>> print(spy)
          M01-02      M02-03  ...      M11-12      M12-01
year                          ...
1990  117.961372  118.789945  ...  121.819112  123.028979
1991  117.760247  118.953375  ...  121.958717  123.601324
...          ...         ...  ...         ...         ...
2018  117.549323  117.780231  ...  121.336530  122.549497
2019  116.797959  117.721573  ...  123.361334         NaN

[30 rows x 12 columns]
          M01-02      M02-03  ...      M11-12      M12-01
year                          ...
1990  117.961372  118.789945  ...  121.819112  123.028979
1991  117.760247  118.953375  ...  121.958717  123.601324
...          ...         ...  ...         ...         ...
2018  117.549323  117.780231  ...  121.336530  122.549497
2019  116.797959  117.721573  ...  123.361334         NaN

[30 rows x 12 columns]
caeli.time_series.slice_by_timestamp(df, beg_timestamp=Timestamp('1677-09-21 00:12:43.145225'), end_timestamp=Timestamp('2262-04-11 23:47:16.854775807'))

Slice the data frame from index starting at beg_timestamp to end_timestamp, including the latter.

Parameters
  • df (pandas.DataFrame or pandas.Series) – data frame

  • beg_timestamp (datetime.datetime, pandas.timestamp, or numpy.datetime64) – begin of slice

  • end_timestamp (datetime.datetime, pandas.timestamp, or numpy.datetime64) – end of slice (inclusive)

Returns

(pandas.DataFrame or pandas.Series) sliced data frame