Datenanalyse und Stochastische Modellierung
12. Causality

More than one input

In this course we mainly used one time series to model and predict future steps

For a better prediction one would usually use information of variables that potentially correlate with the output

  • One can the look if adding a specific feature improves the predictions or not

Black boxmodels and interpretability

  • The weights in a neural network are difficult to directly link to feature importance andthe structure of the dynamics, so the neural network is considered a black-box model
  • In this case it is particularly important to have methods that make the model explainable after it was fitted
  • This is related but not equivalent to measures of cross-correlations and causality; in this chapter we will discuss several measures which all have different interpretations

Cross Correlation

Autocovariance: \[ \langle (x_i-\langle x_i\rangle )(y_j-\langle y_j\rangle ) \rangle \]

Autocorrelation matrix: \[ \left(\begin{array}{c c c c} \langle x_1y_1\rangle & \langle x_1y_2\rangle & ... & \langle x_1y_n\rangle \\ \langle x_2y_1\rangle & \langle x_2y_2\rangle & ... & \langle x_2y_n\rangle \\ \vdots & \vdots & ... & \vdots\\ \langle x_my_1\rangle & \langle x_my_2\rangle & ... & \langle x_my_n\rangle \end{array}\right) \]

Zero mean: Convolution

Stationary series - time averages: \[ K_\Delta = mean_t x_{t+\Delta} y_t \]

Correlation versus causality

The benefit of eating chocolate

Granger causality

"if X can predict (portions of) Y, X causes Y"

Compare two autoregressive models for predicting x, one only using past values and one augmenting the prediction with a second time series y

\[ x_t = a_0 + \sum_i a_i x_{t-i} + \xi_t \] \[ x_t = a_0 + \sum_i a_i x_{t-i} \sum_j B_jy_{t-j} + \xi_t \]

Perform hypothesis test with null hypothesis y does not Granger cause x. The model performs signifficantly better by including y, we found Granger causality

One can use neural networks instead of autoregressive models

Neural granger causality, A Tank, I Covert, N Foti, A Shojaie, EB Fox (2021)
  • One can compare more than two inputs, in this case regularizing methods like lasso regression can be used

Counterfactuals

“x causes y” means that changing x alone changes y
  • Find a new input such that y' is achieved from a minimal changed input x'
  • Many different models
  • Example of one definition: \[ \min_{x^\prime} \max_\lambda \left[ \lambda (f(x^\prime)-y^\prime)^2 + \sum_j \frac{|x_j - x_j^\prime|}{median(|x_j-median(x_j)|)} \right] \]

Full analysis of causality

  • One can draw graphs of causality from the full analysis
  • Look for inconsistencies to find external influences

Permutation importance

  • Change order of one input vector and see how much model performance decreases
  • Can be performed on training set and/or on test set
  • Only if the model did not overfit, the results are the same for both training set y test set

Shapley values

  • Concept from game theory
  • Each feature is a player
  • Players work together to achieve best possible prediction
  • Calculate predictions and corresponding errors for all combiations of features (from no features = predict mean, to the full model with all features) by setting the remaining features to random input
  • Contribution of i (other inputs: S; total number of inputs: n; measure of prediction improvement: v) \[ v(S\cup i)-v(S) \] the Shapley value is \[ \sum_{S} \frac{(n-1-|S|)!|S|!}{n!}(v(S\cup i)-v(S)) \]
  • 'fair' but computationally costly
  • Example: General phase-structure relationship in polar rod-shaped liquid crystals: Importance of shape anisotropy and dipolar strength