Macroframework Forecasting Package API¶

class macroframe_forecast.MFF.MFF(df: DataFrame, forecaster: BaseForecaster = None, equality_constraints: list[str] = [], inequality_constraints: list[str] = [], parallelize: bool = True, n_forecast_error: int = 5, shrinkage_method: str = 'oas', default_lam: float = -1, max_lam: float = 129600)¶

Bases: object

A class for Macro-Framework Forecasting (MFF).

This class facilitates forecasting of single frequency time series data using a two-step process. First step of the forecasting procedure generates unconstrained forecasts using the forecaster specified. In the next step, these forecasts are then reconclied so that they satisfy the supplied constrants, and smoothness of the forecasts is maintained.

Parameters:

dfpd.DataFrame: Input dataframe containing time series data. Data should be in wide format, with each row containing data for one period, and each column containing data for one variable.
forecasterBaseForecaster, optional(default: None): sktime BaseForecaster descendant. If not defined, then DefaultForecaster is used.
constraints_with_wildcardstr, optional(default: None): Constraints that hold with equality. Constraints may include wildcard, in which case constraints will be applied across all horizons, or may be defined for specified time periods.
ineq_constraints_with_wildcardstr, optional(default: None): Inequality constraints, comparable to constraints_with_wildcard. Constraints may include wildcard, in which case constraints will be applied across all horizons, or may be defined for specified time periods.
parallelizeboolean: Indicate whether parallelization should be employed for generating the first step forecasts. Default value is True.
n_forecast_errorint: Number of windows to split data into training and testing sets for generating matrix of forecast errors. Default is 5.
shrinkage_methodstr, optional(default: ‘oas’): Method to be used for shrinking sample covariance matrix. Default is Oracle Shrinking Approximating Estimator (‘oas’). Other options are oas, identity and monotone_diagonal.
default_lamfloat, optional(default: -1): The value of lambda to be used for calculating smoothing parameter if frequency of observations cannot be determined from index names. If this is set to -1, lambda is calculated empirically. Default is -1.
max_lamfloat, optional(default: 129600): Maximum value of lamstar to be used for smoothing forecasts when being estimated empirically.

Methods

fit()

Fits the model and generates reconciled forecasts for the input dataframe subject to defined constraints.

Returns:

df2pd.Dataframe: Output dataframe with all reconciled forecasts filled into the original input.

fit()¶: Fits the model and generates reconciled forecasts for the input dataframe subject to defined constraints.

macroframe_forecast.utils.AddIslandsToConstraints(C: DataFrame, d: DataFrame, islands: Series) → tuple[DataFrame, DataFrame]¶

Add island values into the matrix form equality constraints which have been constructed by StringToMatrixConstraints.

Parameters:

Cpd.DataFrame: Dataframe containing matrix of the linear constraints on the left side of equation Cy=d.
dpd.DataFrame: Dataframe containing matrix of the linear constraints on the right side of equation Cy=d.
islandspd.Series: Series containing island values to be introduced into linear equation.

Returns:

C_augpd.DataFrame: Dataframe containing the augmented C matrix, with island values incorporated.
d_augpd.DataFrame: Dataframe containing the augmented d vector, with island values incorporated.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:-1,:1] = np.nan
>>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df)
>>> df0_stacked = df0.T.stack()
>>> constraints_with_wildcard = ['a?+b?']
>>> C,d = StringToMatrixConstraints(df0_stacked,
>>>                                 all_cells,
>>>                                 unknown_cells,
>>>                                 known_cells,
>>>                                 constraints_with_wildcard)
>>> C,d = AddIslandsToConstraints(C,d,islands)

macroframe_forecast.utils.BreakDataFrameIntoTimeSeriesList(df0: DataFrame, df1: DataFrame, pred: DataFrame, true: DataFrame) → tuple[list[DataFrame], list[DataFrame], list[DataFrame]]¶

Transform relevant dataframes into lists for ensuing reconciliation step.

Parameters:

df0pd.DataFrame

Dataframe with all known and unknown values, without any islands.

df1pd.DataFrame

Dataframe with unknown values as well as islands filled in with first step forecasts.

predpd.DataFrame

Dataframe with in-sample predictions generated using pseudo-historical datasets, output from GenPredTrueData.

truepd.DataFrame

Dataframe with actual values of the variable corresponding to predicted: values contained in pred.

Returns:

ts_listlist: List containing all first step out of sample forecasts.
pred_listlist: List of dataframes, with each dataframe containing in-sample forecasts for one variable.
true_listlist: List of dataframes, with each dataframe containing the actual values for a variable corresponding to in-sample predictions stored in pred_list.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sktime.forecasting.compose import YfromX
>>> from sklearn.linear_model import ElasticNetCV
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:,:1] = np.nan
>>> def DefaultForecaster():
>>>     return YfromX(ElasticNetCV(max_iter=5000))
>>> df1,df1_models = FillAllEmptyCells(df,DefaultForecaster())
>>> pred,true,model = GenPredTrueData(df0,forecaster,parallelize=parallelize)
>>> ts_list,pred_list,true_list = BreakDataFrameIntoTimeSeriesList(df,df1,pred,true)

macroframe_forecast.utils.CheckTrainingSampleSize(df0: DataFrame, n_forecast_error: int = 5) → bool¶

Check sample size available for training window. Raise an exception if the number of observations available is too low.

Parameters:

df0pd.DataFrame: Input dataframe with island values replaced by nan.
n_forecast_errorint: Number of training and testing sets to split data into for generating matrix of forecast errors.

Returns:

small_samplebool: Indicator for whether the sample of observations available for training is small.

macroframe_forecast.utils.CleanIslands(df: DataFrame) → tuple[DataFrame, Series]¶

Separate island values from input dataframe, replacing them with nan. Called by OrganizeCells.

Parameters:

dfpd.DataFrame: Input dataframe with raw data.

Returns:

df_no_islandspd.DataFrame: Dataframe with island values replaced by nan.
islandspd.Series: Series containing island values.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:-1,:1] = np.nan
>>> df0, islands = CleanIslands(df)

macroframe_forecast.utils.DefaultForecaster(small_sample: bool = False) → BaseForecaster¶

Set up forecasting pipeline, specifying the scaling (transforming) to be applied and forecasting model to be used.

Parameters:

small_sampleboolean: Indicator for whether the sample of observations available for training is small. By default this is turned to False.

Returns:

gscvBaseForecaster: Instance of sktime’s Grid Search forecaster, derived from BaseForecaster, which is configured for hyperparameter tuning and model selection.

macroframe_forecast.utils.FillAllEmptyCells(df: DataFrame, forecaster: BaseForecaster, parallelize: bool = True) → tuple[DataFrame, DataFrame]¶

Generate forecasts for all unknown cells in the supplied dataframe. All forecasts are made independently from each other. (TBC)

Parameters:

df: pd.DataFrame: Dataframe containing known values of all variables and nan for unknown values.
forecasterBaseForecaster: sktime BaseForecaster descendant
parallelizeboolean: Indicate whether parallelization should be employed for generating the first step forecasts. Default value is True.

Examples

>>> from string import ascii_lowercase
>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import ElasticNetCV
>>> from sktime.forecasting.compose import YfromX
>>> from mff.utils import FillAllEmptyCells
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=list(ascii_lowercase[:p]),
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:,:1] = np.nan
>>> def DefaultForecaster():
>>>     return YfromX(ElasticNetCV(max_iter=5000))
>>> df1,df1_models = FillAllEmptyCells(df,DefaultForecaster())

macroframe_forecast.utils.FillAnEmptyCell(df: DataFrame, row: int | str, col: int | str, forecaster: BaseForecaster) → tuple[float, BaseForecaster]¶

Generate a forecast for a given cell based on the latest known value for the given column (variable) and using the predefined forecasting pipeline. Called by FillAllEmptyCells.

Parameters:

dfpd.DataFrame: Dataframe containing known values of all variables and nan for unknown values.
rowstr: Row index of cell to be forecasted.
colstr: Column index of cell to be forecasted.
forecasterBaseForecaster

Returns:

y_preddouble: Forecasted value of the variable for the given horizon.
forecasterBaseForecaster: sktime BaseForecaster descendant

Examples

>>> from string import ascii_lowercase
>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.linear_model import ElasticNetCV
>>> from sktime.forecasting.compose import YfromX
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=list(ascii_lowercase[:p]),
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:,:1] = np.nan
>>> row = df.index[-1]
>>> col = df.columns[0]
>>> forecaster = YfromX(ElasticNetCV())
>>> y_pred, forecaster = FillAnEmptyCell(df,row,col,forecaster)

macroframe_forecast.utils.GenLamstar(pred_list: list, true_list: list, default_lam: float = -1, max_lam: float = 129600) → Series¶

Calculate the smoothness parameter (lambda) associated with each variable being forecasted.

Parameters:

pred_listlist: List of dataframes, with each dataframe containing in-sample forecasts for one variable.
true_listlist: List of dataframes, with each dataframe containing the actual values for a variable corresponding to in-sample predictions stored in pred_list.
default_lamfloat, optional(default: -1): The value of lambda to be used for calculating smoothing parameter if frequency of observations cannot be determined from index names. If this is set to -1, lambda is calculated empirically. The default value is -1.
max_lamfloat, optional: The upperbound of HP filter penalty term (lambda) searched by scipy minimizer. The default is 129600.

Returns:

lamstarpd.Series: Series containing smoothing parameters to be used for each variable.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> pred_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)]
>>> true_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)]
>>> W,shrinkage = GenWeightMatrix(pred_list, true_list)

macroframe_forecast.utils.GenPredTrueData(df: DataFrame, forecaster: BaseForecaster, n_forecast_error: int = 5, parallelize: bool = True) → tuple[DataFrame, DataFrame, DataFrame]¶

Generate in-sample forecasts from existing data by constructing pseudo-historical datasets.

Parameters:

dfpd.DataFrame: Dataframe with all known as well as unknown values.
forecasterBaseForecaster: sktime BaseForecaster descendant.
n_forecast_errorint, optional: Number of horizons for which in-sample forecasts are generated. The default is 5.
parallelizeboolean, optional: Indicate whether parallelization should be used. The default is True.

Returns:

predpd.DataFrame: Dataframe with in-sample predictions generated using pseudo-historical datasets.
truepd.DataFrame: Dataframe with actual values of the variable corresponding to predicted values contained in pred.
modelpd.DataFrame: Dataframe with information on the models used for generating each forecast.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sktime.forecasting.compose import YfromX
>>> from sklearn.linear_model import ElasticNetCV
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:,:1] = np.nan
>>> def DefaultForecaster():
>>>     return YfromX(ElasticNetCV(max_iter=5000))
>>> pred,true,model = GenPredTrueData(df0,forecaster,parallelize=parallelize)

macroframe_forecast.utils.GenSmoothingMatrix(W: DataFrame, lamstar: Series) → DataFrame¶

Generate symmetric smoothing matrix using optimal lambda and weighting matrix.

Parameters:

Wpd.DataFrame: Dataframe containing the weighting matrix.
lamstarpd.Series: Series containing smoothing parameters to be used for each variable.

Returns:

Phipd.DataFrame: Dataframe containing the smoothing matrix.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> pred_list_1 = [pd.DataFrame(np.random.rand(5, 5),
>>>                             columns=pd.MultiIndex.from_product([['A'], [f'Col{i+1}' for i in range(5)]])) if i == 0 else
>>>                pd.DataFrame(np.random.rand(5, 5),
>>>                             columns=pd.MultiIndex.from_product([['B'], [f'Col{i+1}' for i in range(5)]]))
>>>                for i in range(2)]
>>> true_list_1 = [pd.DataFrame(np.random.rand(5, 5),
>>>                             columns=pd.MultiIndex.from_product([['A'], [f'Col{i+1}' for i in range(5)]])) if i == 0 else
>>>                pd.DataFrame(np.random.rand(5, 5),
>>>                             columns=pd.MultiIndex.from_product([['B'], [f'Col{i+1}' for i in range(5)]]))
>>>                for i in range(2)]
>>> smoothness = GenLamstar(pred_list_1,true_list_1)

macroframe_forecast.utils.GenVecForecastWithIslands(ts_list: list[DataFrame], islands: list[Series]) → Series¶

Overwrite forecasted values for islands with known island value.

Parameters:

ts_listlist: List of all first step forecasted values.
islandspd.Series: Series containing island values.

Returns:

y1pd.Series: Series of forecasted values with island values incorporated.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sktime.forecasting.compose import YfromX
>>> from sklearn.linear_model import ElasticNetCV
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:-1,:1] = np.nan
>>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df)
>>> def DefaultForecaster():
>>>     return YfromX(ElasticNetCV(max_iter=5000))
>>> df1,df1_models = FillAllEmptyCells(df,DefaultForecaster(),parallelize=False)
>>> ts_list = [df1[df0.isna()].loc[:,col:col].dropna().T.stack() for col in df0.columns[df.isna().any()]]
>>> y1 = GenVecForecastWithIslands(ts_list,islands)

macroframe_forecast.utils.GenWeightMatrix(pred_list: list[DataFrame], true_list: list[DataFrame], shrinkage_method: Literal['oas', 'oasd'] = 'oas') → tuple[DataFrame, float]¶

Generate weighting matrix based on in-sample forecasts and actual values for the corresponding periods.

Parameters:

pred_listlist: List of dataframes, with each dataframe containing in-sample forecasts for one variable..
true_listlist: List of dataframes, with each dataframe containing the actual values for a variable corresponding to in-sample predictions stored in pred_list.
shrinkage_methodstr, optional: Type of algorithm to use for shrinking the covariance matrix, with options of identity, oas and oasd. The default is ‘oas’.

Returns:

Wpd.DataFrame: Weighting matrix to be used for reconciliation.
shrinkage: float: Shrinkage parameter associated with the weight. Nan in case identity is selected as method.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> pred_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)]
>>> true_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)]
>>> W,shrinkage = GenWeightMatrix(pred_list, true_list)

macroframe_forecast.utils.HP_matrix(size: int) → ndarray¶

Create the degenerate penta-diagonal matrix (the one used in HP Filter), with dimensions (size x size).

Parameters:

sizeinteger: Number of rows for the square matrix.

Returns:

Fnp.array: Array containing the F matrix.

macroframe_forecast.utils.OrganizeCells(df: DataFrame) → tuple[DataFrame, Series, Series, Series]¶

Extract island values (if existing) from input dataframe, replacing them with nan values. This is useful for generating first step forecasts, which disregard known island values for the prediction. Also identifies separate Pandas series of names of cells for known and unknown values in the input dataframe.

Parameters:

dfpd.DataFrame: Input dataframe with raw data.

Returns:

df0pd.DataFrame: Dataframe with island values replaced by nan.
all_cellspd.Series: Series containing cell names of all cells in the input dataframe.
unknown_cellspd.Series: Series containing cell names of cells whose values are to be forecasted.
known_cellspd.Series: Series containing cell names of cells whose values are known.
islandspd.Series: Series containing island values.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:-1,:1] = np.nan
>>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df)

macroframe_forecast.utils.Reconciliation(y1: Series, W: DataFrame, Phi: DataFrame, C: DataFrame, d: DataFrame, C_ineq: DataFrame | None = None, d_ineq: DataFrame | None = None) → DataFrame¶

Reconcile first step forecasts to satisfy equality as well as inequality constraints, subject to smoothening.

Parameters:

y1pd.Series: Series of all forecasted and island values.
Wpd.DataFrame: Dataframe containing the weighting matrix.
Phipd.DataFrame: Dataframe containing the smoothing matrix.
Cpd.DataFrame: Dataframe containing matrix of the linear constraints on the left side of the equality constraint Cy=d.
dpd.DataFrame: Dataframe containing matrix of the linear constraints on the right side of the equality constraint Cy=d.
C_ineqpd.DataFrame, optional: Dataframe containing matrix of the linear constraints on the left side of the equality constraint Cy <= d. The default is None.
d_ineqTYPE, optional: Dataframe containing matrix of the linear constraints on the left side of the equality constraint Cy <= d. The default is None.

Returns:

y2pd.DataFrame: Dataframe containing the final reconciled forecasts for all variables.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sktime.forecasting.compose import YfromX
>>> from sklearn.linear_model import ElasticNetCV
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:,:1] = np.nan
>>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df)
>>> def DefaultForecaster():
>>>     return YfromX(ElasticNetCV(max_iter=5000))
>>> df1,df1_models = FillAllEmptyCells(df0,DefaultForecaster(),parallelize=False)
>>> pred,true,model = GenPredTrueData(df0,forecaster,parallelize=False)
>>> ts_list,pred_list,true_list = BreakDataFrameIntoTimeSeriesList(df0,df1,pred,true)
>>> y1 = pd.concat(ts_list)
>>> C = pd.DataFrame(columns = y1.index).astype(float)
>>> d = pd.DataFrame().astype(float)
>>> W = pd.DataFrame(np.eye(5),index=y1.index,columns=y1.index)
>>> smoothness = GenLamstar(pred_list,true_list)
>>> Phi = GenSmoothingMatrix(W,smoothness)
>>> y2 = Reconciliation(y1,W,Phi,C,d)
>>> y2 = Reconciliation(m.y1,m.W,m.Phi,m.C,m.d)

macroframe_forecast.utils.StringToMatrixConstraints(df0_stacked: DataFrame, all_cells: Series, unknown_cells: Series, known_cells: Series, constraints_with_wildcard: list[str] | None = None, wildcard_string: str = '?') → tuple[DataFrame, DataFrame]¶

Convert equality constraints from list to matrix form for horizons to be forecasted (Cy = d, where C and d are dataframes containing the linear constraints). The input dataframe should not be in a standard wide format, but instead all columns should be stacked on one another. This is needed to control for dealing with the case of mixed frequency among observations. All island values in the dinput dataframe should be replaced by nan prior to this step.

Parameters:

df0_stackedpd.Series: Stacked version of df0 (input dataframe with islands removed).
all_cellspd.Series: Series containing cell names of all cells in the input dataframe.
unknown_cellspd.Series: Series containing cell names of cells whose values are to be forecasted.
known_cellspd.Series: Series containing cell names of cells whose values are known..
constraints_with_wildcardstr, optional: String specifying equality constraints that have to hold. The default is [].
wildcard_stringstr, optional: String that is used as wildcard identifier in constraint. The default is ‘?’.

Returns:

C: pd.DataFrame: Dataframe containing matrix of the linear constraints on the left side of equation Cy=d.
d: pd.DataFrame: Dataframe containing matrix of the linear constraints on the right side of equation Cy=d.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df.iloc[-5:-1,:1] = np.nan
>>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df)
>>> df0_stacked = df0.T.stack()
>>> constraints_with_wildcard = ['a?+b?']
>>> C,d = StringToMatrixConstraints(df0_stacked,
>>>                                 all_cells,
>>>                                 unknown_cells,
>>>                                 known_cells,
>>>                                 constraints_with_wildcard)

macroframe_forecast.utils.expand_wildcard(constraints_with_alphabet_wildcard: list[str], var_list: Series, wildcard: str)¶

Expand constraints with wildcard to all possible time periods. This is called within StringToMatrixConstraints, and the wildcard character has already been replaced by a random letter before this function is called.

Parameters:

constraints_with_alphabet_wildcardstring: Linear equality constraints with wildcard string replaced with alphabets.
var_listlist: List of indices of all cells (known and unknown) in raw dataframe.
wildcardstring: Alphabet which has replaced wildcard string in the constraints.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> n = 30
>>> p = 2
>>> df = pd.DataFrame(np.random.sample([n,p]),
>>>                   columns=['a','b'],
>>>                   index=pd.date_range(start='2000',periods=n,freq='YE').year)
>>> df0_stacked = df.T.stack()
>>> all_cells_index = df0_stacked.index
>>> var_list = pd.Series([f'{a}_{b}' for a, b in all_cells_index],
>>>                      index = all_cells_index)
>>> constraints_with_alphabet_wildcard = ['ax + bx']
>>> alphabet_wildcard = 'x'
>>> constraints = expand_wildcard(constraints_with_alphabet_wildcard,
>>>                               var_list = var_list,
>>>                               wildcard = alphabet_wildcard)

macroframe_forecast.utils.find_permissible_wildcard(constraints_with_wildcard: list[str], _seed: int = 0) → str¶: Generate random letter to be used in constraints.

macroframe_forecast.utils.find_strings_to_replace_wildcard(constraint: str, var_list: Series, wildcard: str) → list[str]¶: Identify list of strings to be substituted with the wildcard character.