Macroframework Forecasting Package API¶
- class macroframe_forecast.MFF.MFF(df: DataFrame, forecaster: BaseForecaster = None, equality_constraints: list[str] = [], inequality_constraints: list[str] = [], parallelize: bool = True, n_forecast_error: int = 5, shrinkage_method: str = 'oas', default_lam: float = -1, max_lam: float = 129600)¶
Bases:
object
A class for Macro-Framework Forecasting (MFF).
This class facilitates forecasting of single frequency time series data using a two-step process. First step of the forecasting procedure generates unconstrained forecasts using the forecaster specified. In the next step, these forecasts are then reconclied so that they satisfy the supplied constrants, and smoothness of the forecasts is maintained.
- Parameters:
- dfpd.DataFrame
Input dataframe containing time series data. Data should be in wide format, with each row containing data for one period, and each column containing data for one variable.
- forecasterBaseForecaster, optional(default: None)
sktime BaseForecaster descendant. If not defined, then DefaultForecaster is used.
- constraints_with_wildcardstr, optional(default: None)
Constraints that hold with equality. Constraints may include wildcard, in which case constraints will be applied across all horizons, or may be defined for specified time periods.
- ineq_constraints_with_wildcardstr, optional(default: None)
Inequality constraints, comparable to
constraints_with_wildcard
. Constraints may include wildcard, in which case constraints will be applied across all horizons, or may be defined for specified time periods.- parallelizeboolean
Indicate whether parallelization should be employed for generating the first step forecasts. Default value is True.
- n_forecast_errorint
Number of windows to split data into training and testing sets for generating matrix of forecast errors. Default is 5.
- shrinkage_methodstr, optional(default: ‘oas’)
Method to be used for shrinking sample covariance matrix. Default is Oracle Shrinking Approximating Estimator (‘oas’). Other options are oas, identity and monotone_diagonal.
- default_lamfloat, optional(default: -1)
The value of lambda to be used for calculating smoothing parameter if frequency of observations cannot be determined from index names. If this is set to -1, lambda is calculated empirically. Default is -1.
- max_lamfloat, optional(default: 129600)
Maximum value of lamstar to be used for smoothing forecasts when being estimated empirically.
Methods
fit
()Fits the model and generates reconciled forecasts for the input dataframe subject to defined constraints.
- Returns:
- df2pd.Dataframe
Output dataframe with all reconciled forecasts filled into the original input.
- fit()¶
Fits the model and generates reconciled forecasts for the input dataframe subject to defined constraints.
- macroframe_forecast.utils.AddIslandsToConstraints(C: DataFrame, d: DataFrame, islands: Series) tuple[DataFrame, DataFrame] ¶
Add island values into the matrix form equality constraints which have been constructed by
StringToMatrixConstraints
.- Parameters:
- Cpd.DataFrame
Dataframe containing matrix of the linear constraints on the left side of equation Cy=d.
- dpd.DataFrame
Dataframe containing matrix of the linear constraints on the right side of equation Cy=d.
- islandspd.Series
Series containing island values to be introduced into linear equation.
- Returns:
- C_augpd.DataFrame
Dataframe containing the augmented C matrix, with island values incorporated.
- d_augpd.DataFrame
Dataframe containing the augmented d vector, with island values incorporated.
Examples
>>> import numpy as np >>> import pandas as pd >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:-1,:1] = np.nan >>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df) >>> df0_stacked = df0.T.stack() >>> constraints_with_wildcard = ['a?+b?'] >>> C,d = StringToMatrixConstraints(df0_stacked, >>> all_cells, >>> unknown_cells, >>> known_cells, >>> constraints_with_wildcard) >>> C,d = AddIslandsToConstraints(C,d,islands)
- macroframe_forecast.utils.BreakDataFrameIntoTimeSeriesList(df0: DataFrame, df1: DataFrame, pred: DataFrame, true: DataFrame) tuple[list[DataFrame], list[DataFrame], list[DataFrame]] ¶
Transform relevant dataframes into lists for ensuing reconciliation step.
- Parameters:
- df0pd.DataFrame
Dataframe with all known and unknown values, without any islands.
- df1pd.DataFrame
Dataframe with unknown values as well as islands filled in with first step forecasts.
- predpd.DataFrame
Dataframe with in-sample predictions generated using pseudo-historical datasets, output from
GenPredTrueData
.- truepd.DataFrame
- Dataframe with actual values of the variable corresponding to predicted
values contained in pred.
- Returns:
- ts_listlist
List containing all first step out of sample forecasts.
- pred_listlist
List of dataframes, with each dataframe containing in-sample forecasts for one variable.
- true_listlist
List of dataframes, with each dataframe containing the actual values for a variable corresponding to in-sample predictions stored in pred_list.
Examples
>>> import numpy as np >>> import pandas as pd >>> from sktime.forecasting.compose import YfromX >>> from sklearn.linear_model import ElasticNetCV >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:,:1] = np.nan >>> def DefaultForecaster(): >>> return YfromX(ElasticNetCV(max_iter=5000)) >>> df1,df1_models = FillAllEmptyCells(df,DefaultForecaster()) >>> pred,true,model = GenPredTrueData(df0,forecaster,parallelize=parallelize) >>> ts_list,pred_list,true_list = BreakDataFrameIntoTimeSeriesList(df,df1,pred,true)
- macroframe_forecast.utils.CheckTrainingSampleSize(df0: DataFrame, n_forecast_error: int = 5) bool ¶
Check sample size available for training window. Raise an exception if the number of observations available is too low.
- Parameters:
- df0pd.DataFrame
Input dataframe with island values replaced by nan.
- n_forecast_errorint
Number of training and testing sets to split data into for generating matrix of forecast errors.
- Returns:
- small_samplebool
Indicator for whether the sample of observations available for training is small.
- macroframe_forecast.utils.CleanIslands(df: DataFrame) tuple[DataFrame, Series] ¶
Separate island values from input dataframe, replacing them with nan. Called by
OrganizeCells
.- Parameters:
- dfpd.DataFrame
Input dataframe with raw data.
- Returns:
- df_no_islandspd.DataFrame
Dataframe with island values replaced by nan.
- islandspd.Series
Series containing island values.
Examples
>>> import numpy as np >>> import pandas as pd >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:-1,:1] = np.nan >>> df0, islands = CleanIslands(df)
- macroframe_forecast.utils.DefaultForecaster(small_sample: bool = False) BaseForecaster ¶
Set up forecasting pipeline, specifying the scaling (transforming) to be applied and forecasting model to be used.
- Parameters:
- small_sampleboolean
Indicator for whether the sample of observations available for training is small. By default this is turned to False.
- Returns:
- gscvBaseForecaster
Instance of sktime’s Grid Search forecaster, derived from BaseForecaster, which is configured for hyperparameter tuning and model selection.
- macroframe_forecast.utils.FillAllEmptyCells(df: DataFrame, forecaster: BaseForecaster, parallelize: bool = True) tuple[DataFrame, DataFrame] ¶
Generate forecasts for all unknown cells in the supplied dataframe. All forecasts are made independently from each other. (TBC)
- Parameters:
- df: pd.DataFrame
Dataframe containing known values of all variables and nan for unknown values.
- forecasterBaseForecaster
sktime BaseForecaster descendant
- parallelizeboolean
Indicate whether parallelization should be employed for generating the first step forecasts. Default value is True.
Examples
>>> from string import ascii_lowercase >>> import numpy as np >>> import pandas as pd >>> from sklearn.linear_model import ElasticNetCV >>> from sktime.forecasting.compose import YfromX >>> from mff.utils import FillAllEmptyCells >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=list(ascii_lowercase[:p]), >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:,:1] = np.nan >>> def DefaultForecaster(): >>> return YfromX(ElasticNetCV(max_iter=5000)) >>> df1,df1_models = FillAllEmptyCells(df,DefaultForecaster())
- macroframe_forecast.utils.FillAnEmptyCell(df: DataFrame, row: int | str, col: int | str, forecaster: BaseForecaster) tuple[float, BaseForecaster] ¶
Generate a forecast for a given cell based on the latest known value for the given column (variable) and using the predefined forecasting pipeline. Called by
FillAllEmptyCells
.- Parameters:
- dfpd.DataFrame
Dataframe containing known values of all variables and nan for unknown values.
- rowstr
Row index of cell to be forecasted.
- colstr
Column index of cell to be forecasted.
- forecasterBaseForecaster
- Returns:
- y_preddouble
Forecasted value of the variable for the given horizon.
- forecasterBaseForecaster
sktime BaseForecaster descendant
Examples
>>> from string import ascii_lowercase >>> import numpy as np >>> import pandas as pd >>> from sklearn.linear_model import ElasticNetCV >>> from sktime.forecasting.compose import YfromX >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=list(ascii_lowercase[:p]), >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:,:1] = np.nan >>> row = df.index[-1] >>> col = df.columns[0] >>> forecaster = YfromX(ElasticNetCV()) >>> y_pred, forecaster = FillAnEmptyCell(df,row,col,forecaster)
- macroframe_forecast.utils.GenLamstar(pred_list: list, true_list: list, default_lam: float = -1, max_lam: float = 129600) Series ¶
Calculate the smoothness parameter (lambda) associated with each variable being forecasted.
- Parameters:
- pred_listlist
List of dataframes, with each dataframe containing in-sample forecasts for one variable.
- true_listlist
List of dataframes, with each dataframe containing the actual values for a variable corresponding to in-sample predictions stored in pred_list.
- default_lamfloat, optional(default: -1)
The value of lambda to be used for calculating smoothing parameter if frequency of observations cannot be determined from index names. If this is set to -1, lambda is calculated empirically. The default value is -1.
- max_lamfloat, optional
The upperbound of HP filter penalty term (lambda) searched by scipy minimizer. The default is 129600.
- Returns:
- lamstarpd.Series
Series containing smoothing parameters to be used for each variable.
Examples
>>> import pandas as pd >>> import numpy as np >>> pred_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)] >>> true_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)] >>> W,shrinkage = GenWeightMatrix(pred_list, true_list)
- macroframe_forecast.utils.GenPredTrueData(df: DataFrame, forecaster: BaseForecaster, n_forecast_error: int = 5, parallelize: bool = True) tuple[DataFrame, DataFrame, DataFrame] ¶
Generate in-sample forecasts from existing data by constructing pseudo-historical datasets.
- Parameters:
- dfpd.DataFrame
Dataframe with all known as well as unknown values.
- forecasterBaseForecaster
sktime BaseForecaster descendant.
- n_forecast_errorint, optional
Number of horizons for which in-sample forecasts are generated. The default is 5.
- parallelizeboolean, optional
Indicate whether parallelization should be used. The default is True.
- Returns:
- predpd.DataFrame
Dataframe with in-sample predictions generated using pseudo-historical datasets.
- truepd.DataFrame
Dataframe with actual values of the variable corresponding to predicted values contained in pred.
- modelpd.DataFrame
Dataframe with information on the models used for generating each forecast.
Examples
>>> import numpy as np >>> import pandas as pd >>> from sktime.forecasting.compose import YfromX >>> from sklearn.linear_model import ElasticNetCV >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:,:1] = np.nan >>> def DefaultForecaster(): >>> return YfromX(ElasticNetCV(max_iter=5000)) >>> pred,true,model = GenPredTrueData(df0,forecaster,parallelize=parallelize)
- macroframe_forecast.utils.GenSmoothingMatrix(W: DataFrame, lamstar: Series) DataFrame ¶
Generate symmetric smoothing matrix using optimal lambda and weighting matrix.
- Parameters:
- Wpd.DataFrame
Dataframe containing the weighting matrix.
- lamstarpd.Series
Series containing smoothing parameters to be used for each variable.
- Returns:
- Phipd.DataFrame
Dataframe containing the smoothing matrix.
Examples
>>> import pandas as pd >>> import numpy as np >>> pred_list_1 = [pd.DataFrame(np.random.rand(5, 5), >>> columns=pd.MultiIndex.from_product([['A'], [f'Col{i+1}' for i in range(5)]])) if i == 0 else >>> pd.DataFrame(np.random.rand(5, 5), >>> columns=pd.MultiIndex.from_product([['B'], [f'Col{i+1}' for i in range(5)]])) >>> for i in range(2)] >>> true_list_1 = [pd.DataFrame(np.random.rand(5, 5), >>> columns=pd.MultiIndex.from_product([['A'], [f'Col{i+1}' for i in range(5)]])) if i == 0 else >>> pd.DataFrame(np.random.rand(5, 5), >>> columns=pd.MultiIndex.from_product([['B'], [f'Col{i+1}' for i in range(5)]])) >>> for i in range(2)] >>> smoothness = GenLamstar(pred_list_1,true_list_1)
- macroframe_forecast.utils.GenVecForecastWithIslands(ts_list: list[DataFrame], islands: list[Series]) Series ¶
Overwrite forecasted values for islands with known island value.
- Parameters:
- ts_listlist
List of all first step forecasted values.
- islandspd.Series
Series containing island values.
- Returns:
- y1pd.Series
Series of forecasted values with island values incorporated.
Examples
>>> import numpy as np >>> import pandas as pd >>> from sktime.forecasting.compose import YfromX >>> from sklearn.linear_model import ElasticNetCV >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:-1,:1] = np.nan >>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df) >>> def DefaultForecaster(): >>> return YfromX(ElasticNetCV(max_iter=5000)) >>> df1,df1_models = FillAllEmptyCells(df,DefaultForecaster(),parallelize=False) >>> ts_list = [df1[df0.isna()].loc[:,col:col].dropna().T.stack() for col in df0.columns[df.isna().any()]] >>> y1 = GenVecForecastWithIslands(ts_list,islands)
- macroframe_forecast.utils.GenWeightMatrix(pred_list: list[DataFrame], true_list: list[DataFrame], shrinkage_method: Literal['oas', 'oasd'] = 'oas') tuple[DataFrame, float] ¶
Generate weighting matrix based on in-sample forecasts and actual values for the corresponding periods.
- Parameters:
- pred_listlist
List of dataframes, with each dataframe containing in-sample forecasts for one variable..
- true_listlist
List of dataframes, with each dataframe containing the actual values for a variable corresponding to in-sample predictions stored in pred_list.
- shrinkage_methodstr, optional
Type of algorithm to use for shrinking the covariance matrix, with options of identity, oas and oasd. The default is ‘oas’.
- Returns:
- Wpd.DataFrame
Weighting matrix to be used for reconciliation.
- shrinkage: float
Shrinkage parameter associated with the weight. Nan in case identity is selected as method.
Examples
>>> import pandas as pd >>> import numpy as np >>> pred_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)] >>> true_list = [pd.DataFrame(np.random.rand(5, 5), columns=[f'Col{i+1}' for i in range(5)]) for _ in range(2)] >>> W,shrinkage = GenWeightMatrix(pred_list, true_list)
- macroframe_forecast.utils.HP_matrix(size: int) ndarray ¶
Create the degenerate penta-diagonal matrix (the one used in HP Filter), with dimensions (size x size).
- Parameters:
- sizeinteger
Number of rows for the square matrix.
- Returns:
- Fnp.array
Array containing the F matrix.
- macroframe_forecast.utils.OrganizeCells(df: DataFrame) tuple[DataFrame, Series, Series, Series] ¶
Extract island values (if existing) from input dataframe, replacing them with nan values. This is useful for generating first step forecasts, which disregard known island values for the prediction. Also identifies separate Pandas series of names of cells for known and unknown values in the input dataframe.
- Parameters:
- dfpd.DataFrame
Input dataframe with raw data.
- Returns:
- df0pd.DataFrame
Dataframe with island values replaced by nan.
- all_cellspd.Series
Series containing cell names of all cells in the input dataframe.
- unknown_cellspd.Series
Series containing cell names of cells whose values are to be forecasted.
- known_cellspd.Series
Series containing cell names of cells whose values are known.
- islandspd.Series
Series containing island values.
Examples
>>> import numpy as np >>> import pandas as pd >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:-1,:1] = np.nan >>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df)
- macroframe_forecast.utils.Reconciliation(y1: Series, W: DataFrame, Phi: DataFrame, C: DataFrame, d: DataFrame, C_ineq: DataFrame | None = None, d_ineq: DataFrame | None = None) DataFrame ¶
Reconcile first step forecasts to satisfy equality as well as inequality constraints, subject to smoothening.
- Parameters:
- y1pd.Series
Series of all forecasted and island values.
- Wpd.DataFrame
Dataframe containing the weighting matrix.
- Phipd.DataFrame
Dataframe containing the smoothing matrix.
- Cpd.DataFrame
Dataframe containing matrix of the linear constraints on the left side of the equality constraint Cy=d.
- dpd.DataFrame
Dataframe containing matrix of the linear constraints on the right side of the equality constraint Cy=d.
- C_ineqpd.DataFrame, optional
Dataframe containing matrix of the linear constraints on the left side of the equality constraint Cy <= d. The default is None.
- d_ineqTYPE, optional
Dataframe containing matrix of the linear constraints on the left side of the equality constraint Cy <= d. The default is None.
- Returns:
- y2pd.DataFrame
Dataframe containing the final reconciled forecasts for all variables.
Examples
>>> import numpy as np >>> import pandas as pd >>> from sktime.forecasting.compose import YfromX >>> from sklearn.linear_model import ElasticNetCV >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:,:1] = np.nan >>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df) >>> def DefaultForecaster(): >>> return YfromX(ElasticNetCV(max_iter=5000)) >>> df1,df1_models = FillAllEmptyCells(df0,DefaultForecaster(),parallelize=False) >>> pred,true,model = GenPredTrueData(df0,forecaster,parallelize=False) >>> ts_list,pred_list,true_list = BreakDataFrameIntoTimeSeriesList(df0,df1,pred,true) >>> y1 = pd.concat(ts_list) >>> C = pd.DataFrame(columns = y1.index).astype(float) >>> d = pd.DataFrame().astype(float) >>> W = pd.DataFrame(np.eye(5),index=y1.index,columns=y1.index) >>> smoothness = GenLamstar(pred_list,true_list) >>> Phi = GenSmoothingMatrix(W,smoothness) >>> y2 = Reconciliation(y1,W,Phi,C,d) >>> y2 = Reconciliation(m.y1,m.W,m.Phi,m.C,m.d)
- macroframe_forecast.utils.StringToMatrixConstraints(df0_stacked: DataFrame, all_cells: Series, unknown_cells: Series, known_cells: Series, constraints_with_wildcard: list[str] | None = None, wildcard_string: str = '?') tuple[DataFrame, DataFrame] ¶
Convert equality constraints from list to matrix form for horizons to be forecasted (Cy = d, where C and d are dataframes containing the linear constraints). The input dataframe should not be in a standard wide format, but instead all columns should be stacked on one another. This is needed to control for dealing with the case of mixed frequency among observations. All island values in the dinput dataframe should be replaced by nan prior to this step.
- Parameters:
- df0_stackedpd.Series
Stacked version of df0 (input dataframe with islands removed).
- all_cellspd.Series
Series containing cell names of all cells in the input dataframe.
- unknown_cellspd.Series
Series containing cell names of cells whose values are to be forecasted.
- known_cellspd.Series
Series containing cell names of cells whose values are known..
- constraints_with_wildcardstr, optional
String specifying equality constraints that have to hold. The default is [].
- wildcard_stringstr, optional
String that is used as wildcard identifier in constraint. The default is ‘?’.
- Returns:
- C: pd.DataFrame
Dataframe containing matrix of the linear constraints on the left side of equation Cy=d.
- d: pd.DataFrame
Dataframe containing matrix of the linear constraints on the right side of equation Cy=d.
Examples
>>> import numpy as np >>> import pandas as pd >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df.iloc[-5:-1,:1] = np.nan >>> df0, all_cells, unknown_cells, known_cells, islands = OrganizeCells(df) >>> df0_stacked = df0.T.stack() >>> constraints_with_wildcard = ['a?+b?'] >>> C,d = StringToMatrixConstraints(df0_stacked, >>> all_cells, >>> unknown_cells, >>> known_cells, >>> constraints_with_wildcard)
- macroframe_forecast.utils.expand_wildcard(constraints_with_alphabet_wildcard: list[str], var_list: Series, wildcard: str)¶
Expand constraints with wildcard to all possible time periods. This is called within
StringToMatrixConstraints
, and the wildcard character has already been replaced by a random letter before this function is called.- Parameters:
- constraints_with_alphabet_wildcardstring
Linear equality constraints with wildcard string replaced with alphabets.
- var_listlist
List of indices of all cells (known and unknown) in raw dataframe.
- wildcardstring
Alphabet which has replaced wildcard string in the constraints.
Examples
>>> import numpy as np >>> import pandas as pd >>> n = 30 >>> p = 2 >>> df = pd.DataFrame(np.random.sample([n,p]), >>> columns=['a','b'], >>> index=pd.date_range(start='2000',periods=n,freq='YE').year) >>> df0_stacked = df.T.stack() >>> all_cells_index = df0_stacked.index >>> var_list = pd.Series([f'{a}_{b}' for a, b in all_cells_index], >>> index = all_cells_index) >>> constraints_with_alphabet_wildcard = ['ax + bx'] >>> alphabet_wildcard = 'x' >>> constraints = expand_wildcard(constraints_with_alphabet_wildcard, >>> var_list = var_list, >>> wildcard = alphabet_wildcard)
- macroframe_forecast.utils.find_permissible_wildcard(constraints_with_wildcard: list[str], _seed: int = 0) str ¶
Generate random letter to be used in constraints.
- macroframe_forecast.utils.find_strings_to_replace_wildcard(constraint: str, var_list: Series, wildcard: str) list[str] ¶
Identify list of strings to be substituted with the wildcard character.