Causality

Datenanalyse und Stochastische Modellierung
12. Causality

More than one input

In this course we mainly used one time series to model and predict future steps

For a better prediction one would usually use information of variables that potentially correlate with the output

One can the look if adding a specific feature improves the predictions or not

Black boxmodels and interpretability

The weights in a neural network are difficult to directly link to feature importance andthe structure of the dynamics, so the neural network is considered a black-box model
In this case it is particularly important to have methods that make the model explainable after it was fitted
This is related but not equivalent to measures of cross-correlations and causality; in this chapter we will discuss several measures which all have different interpretations

Cross Correlation

Autocovariance: \[ \langle (x_i-\langle x_i\rangle )(y_j-\langle y_j\rangle ) \rangle \]

Autocorrelation matrix: \[ \left(\begin{array}{c c c c} \langle x_1y_1\rangle & \langle x_1y_2\rangle & ... & \langle x_1y_n\rangle \\ \langle x_2y_1\rangle & \langle x_2y_2\rangle & ... & \langle x_2y_n\rangle \\ \vdots & \vdots & ... & \vdots\\ \langle x_my_1\rangle & \langle x_my_2\rangle & ... & \langle x_my_n\rangle \end{array}\right) \]

Zero mean: Convolution

Stationary series - time averages: \[ K_\Delta = mean_t x_{t+\Delta} y_t \]

Correlation versus causality

The benefit of eating chocolate

Granger causality

"if X can predict (portions of) Y, X causes Y"

Compare two autoregressive models for predicting x, one only using past values and one augmenting the prediction with a second time series y

\[ x_t = a_0 + \sum_i a_i x_{t-i} + \xi_t \] \[ x_t = a_0 + \sum_i a_i x_{t-i} \sum_j B_jy_{t-j} + \xi_t \]

Perform hypothesis test with null hypothesis y does not Granger cause x. The model performs signifficantly better by including y, we found Granger causality

One can use neural networks instead of autoregressive models

Neural granger causality, A Tank, I Covert, N Foti, A Shojaie, EB Fox (2021)

One can compare more than two inputs, in this case regularizing methods like lasso regression can be used

Counterfactuals

“x causes y” means that changing x alone changes y

Find a new input such that y' is achieved from a minimal changed input x'
Many different models
Example of one definition: \[ \min_{x^\prime} \max_\lambda \left[ \lambda (f(x^\prime)-y^\prime)^2 + \sum_j \frac{|x_j - x_j^\prime|}{median(|x_j-median(x_j)|)} \right] \]

Full analysis of causality

One can draw graphs of causality from the full analysis
Look for inconsistencies to find external influences

Permutation importance

Change order of one input vector and see how much model performance decreases
Can be performed on training set and/or on test set
Only if the model did not overfit, the results are the same for both training set y test set

Shapley values

Concept from game theory
Each feature is a player
Players work together to achieve best possible prediction
Calculate predictions and corresponding errors for all combiations of features (from no features = predict mean, to the full model with all features) by setting the remaining features to random input
Contribution of i (other inputs: S; total number of inputs: n; measure of prediction improvement: v) \[ v(S\cup i)-v(S) \] the Shapley value is \[ \sum_{S} \frac{(n-1-|S|)!|S|!}{n!}(v(S\cup i)-v(S)) \]
'fair' but computationally costly
Example: General phase-structure relationship in polar rod-shaped liquid crystals: Importance of shape anisotropy and dipolar strength