More than one input
In this course we mainly used one time series to model and predict future steps
For a better prediction one would usually use information of variables that potentially correlate with the output
- One can the look if adding a specific feature improves the predictions or not
Black boxmodels and interpretability
- The weights in a neural network are difficult to directly link to feature importance andthe structure of the dynamics, so the neural network is considered a black-box model
- In this case it is particularly important to have methods that make the model explainable after it was fitted
- This is related but not equivalent to measures of cross-correlations and causality; in this chapter we will discuss several measures which all have different interpretations
Cross Correlation
Autocovariance: \[ \langle (x_i-\langle x_i\rangle )(y_j-\langle y_j\rangle ) \rangle \]
Autocorrelation matrix: \[ \left(\begin{array}{c c c c} \langle x_1y_1\rangle & \langle x_1y_2\rangle & ... & \langle x_1y_n\rangle \\ \langle x_2y_1\rangle & \langle x_2y_2\rangle & ... & \langle x_2y_n\rangle \\ \vdots & \vdots & ... & \vdots\\ \langle x_my_1\rangle & \langle x_my_2\rangle & ... & \langle x_my_n\rangle \end{array}\right) \]
Zero mean: Convolution
Stationary series - time averages: \[ K_\Delta = mean_t x_{t+\Delta} y_t \]
Granger causality
"if X can predict (portions of) Y, X causes Y"
Compare two autoregressive models for predicting x, one only using past values and one augmenting the prediction with a second time series y
\[ x_t = a_0 + \sum_i a_i x_{t-i} + \xi_t \]
\[ x_t = a_0 + \sum_i a_i x_{t-i} \sum_j B_jy_{t-j} + \xi_t \]
Perform hypothesis test with null hypothesis y does not Granger cause x. The model performs signifficantly better by including y, we found Granger causality
One can use neural networks instead of autoregressive models
Neural granger causality, A Tank, I Covert, N Foti, A Shojaie, EB Fox (2021)
- One can compare more than two inputs, in this case regularizing methods like lasso regression can be used
Counterfactuals
“x causes y” means that changing x alone changes y
- Find a new input such that y' is achieved from a minimal changed input x'
- Many different models
- Example of one definition: \[ \min_{x^\prime} \max_\lambda \left[ \lambda (f(x^\prime)-y^\prime)^2 + \sum_j \frac{|x_j - x_j^\prime|}{median(|x_j-median(x_j)|)} \right] \]
Full analysis of causality
- One can draw graphs of causality from the full analysis
- Look for inconsistencies to find external influences
Permutation importance
- Change order of one input vector and see how much model performance decreases
- Can be performed on training set and/or on test set
- Only if the model did not overfit, the results are the same for both training set y test set
Shapley values
- Concept from game theory
- Each feature is a player
- Players work together to achieve best possible prediction
- Calculate predictions and corresponding errors for all combiations of features (from no features = predict mean, to the full model with all features) by setting the remaining features to random input
- Contribution of i (other inputs: S; total number of inputs: n; measure of prediction improvement: v) \[ v(S\cup i)-v(S) \] the Shapley value is \[ \sum_{S} \frac{(n-1-|S|)!|S|!}{n!}(v(S\cup i)-v(S)) \]
- 'fair' but computationally costly
- Example: General phase-structure relationship in polar rod-shaped liquid crystals: Importance of shape anisotropy and dipolar strength