What if we want to estimate a parameter, that is not constant
Update
\[ x_{t|t} = x_{t|t-1} + \alpha ( z_t - x_{t|t-1} ) \] \[ \dot{x}_{t|t} = \dot{x}_{t|t-1} + \beta \left( \frac{z_t - x_{t|t-1}}{\Delta t} \right) \]Predict
\[ x_{t+1|t} = x_{t|t} + \dot{x}_{t|t} \cdot \Delta t \] \[ \dot{x}_{t+1|t} = \dot{x}_{t+1|t} \]Tracking a random walker
Similar to the alpha-beta-filter, the Kalman filter makes estimates based on the previous estimate and the latest measurement z
\[ \hat{x}_{t} = Kz_t + (1-K) \hat{x}_{t-1} \]Treating x as a stochastic variable, we can look at the variance
\[ p_{t} = K^2 {\sigma_z^2}_t + (1-K)^2 p_{t-1} \]Now we minimize p
\[ \frac{\mathrm{d}p_{t}}{\mathrm{d}K} = 2K{\sigma_z^2}_t - 2(1-K)p_{t-1} = 0 \] \[ K({\sigma_z^2}_t + p_{t-1}) - p_{t-1} = 0 \; \; \; \Rightarrow \; \; \; K = \frac{ p_{t-1} }{ p_{t-1} + {\sigma_z^2}_t } \]Update
\[ x_{t|t} = x_{t|t-1} + \frac{p_{t|t-1}}{p_{t|t-1}+r_t} ( z_t - x_{t|t-1} ) \] \[ p_{t|t} = \left( 1 - \frac{p_{t|t-1}}{p_{t|t-1}+r_t} \right) p_{t|t-1} \]Predict
\[ x_{t+1|t} = x_{t|t} \] \[ p_{t+1|t} = p_{t|t} \]A test that can be positive p or negative n, is used by a person that can be infected i or healthy h
\[ \mbox{Likelihood: }\; P(p|i)=0.93 \;\;\;\;\;\; P(n|h)=0.99 \;\;\;\;\; \mbox{Prior:}\; P(i)=0.00148 \] \[ P(i\& p) = P(i)P(p|i) = 0.0013764 \;\;\;\;\; P(p)=P(h\& p)+P(i\& p) = 0.0113616 \] \[ P(i|p) = P(i\& p)/P(p) = 0.12 \]Second test with prior P(i)=0.12
\[ P(i|pp) = \frac{P(i)P(pp|i)}{P(pp)} = \frac{P(i)P(pp|i)}{P(i)P(pp|i)+P(h)P(pp|h)} = \frac{0.12\cdot 0.93}{0.12\cdot 0.93 + (1-0.12)\cdot (1-0.99) = 0.93} \]In our case, Bayes' rule defines the update step in general as \[ P(x_t|z_{1...t}) = \frac{P(z_t|x_t) P(x_t|z_{1...t-1})}{P(z_t|z_{1...t-1})} = \frac{P(z_t|x_t) P(x_t|z_{1...t-1})}{\int P(z_t|x_t) P(x_t|z_{1...t-1})\mathrm{d}x_t} \]
\[ P(x_{t|t}) \propto P(z_t|x_t) P(x_{t|t-1}) \]Find the derivation at https://people.bordeaux.inria.fr/pierre.delmoral/chen_bayesian.pdf
Hidden Markov model
\[ x_{t} = F x_{t-1} + q_{t-1}\] \[ z_t = Hx_t +r_t \] Where q is Gaussian white noise with variance Q, r is Gaussian white noise with variance R, x is a vector, and A and H are matrices
Accordingly, the algorithm is update: \[ K_t = \frac{p_{t|t-1} H^T}{ Hp_{t|t-1} H^T + r_t } \] \[ x_{t|t} = x_{t|t-1} + K_t ( z_t-Hx_{t|t-1} ) \;\;\;\; p_{t|t} = (\mathbb{1}-K_t H) p_{t|t-1} (\mathbb{1} - K_t H)^T + K_t r_t K_t^T \] and prediction: \[ x_{t+1|t} = F x_{t|t} \;\;\;\;\; p_{t+1|t} = F p_{t|t} F^T + q_t \]
Find a slower introduction at www.kalmanfilter.net
where the derivatives is given by the Jacobian matrices
The update step with NxN matrices in the Kalman filter is computationally costly. The ensemble Kalman filter approximates some quantities and reduces the dimensions to Nxn with N>n
Such an approximation is called a Monte-Carlo approximation
Optimizing the parameters theta
\[ P( \Theta | D,M ) P(D|M) = P( \Theta,D|M ) = P( D|\Theta,M ) P(\Theta|M) \] \[ P( \Theta | D,M ) = \frac{P( D|\Theta,M ) P( \Theta|M ) }{ P(D|M) } \]Calculating the Likelihood
\[ p( z_{1...T} | \Theta ) = \prod_{t=1}^T p( z_t | z_{1...t-1} , \Theta ) \]with
\[ p( z_t, x_t | z_{1...t-1}, \Theta ) = p( z_t | x_t, z_{1...t-1}, \Theta ) p( x_t | z_{1...t-1}, \Theta ) = p( z_t | x_t, \Theta ) p( x_t | z_{1...t-1}, \Theta ) \] \[ p( z_t | z_{1...t-1}, \Theta ) = \int p( z_t | x_t, \Theta ) p( x_t | z_{1...t-1}, \Theta ) \mathrm{d}x_t \]The update and prediction steps are performed with some Bayesian filter (e.g. the Kalman filter)
\[ p(x_t | z_{1...t-1},\Theta ) = \int p(x_t | x_{t-1},\Theta) p(x_{t-1}|z_{1...t-1},\Theta) \mathrm{d}x_{t-1} \] \[ p(x_t|z_{1...t},\Theta) = \frac{p(z_t|x_t,\Theta)p(x_t|z_{1...t-1},\Theta)}{p(z_t|z_{1...t-1},\Theta)} \]Numerically more stable: the energy function
\[ \phi_T(\Theta) = -\log p(z_{1...T}|\Theta)-\log p(\Theta) \]Solve by
\[ \phi_0(\Theta) = -\log p(\Theta) \] \[ \phi_t(\Theta) = \phi_{t-1}(\Theta) - \log p(z_t|z_{1...t-1},\Theta) \]So the optimal parameter is given by
\[ argmin_\Theta [\phi_T(\Theta)] \]With N=S+E+I+R constant
\[ m = (m+bI^*)S^* \;\;\;\;\;\; bS^*I^*=(m+a)E^* \;\;\;\;\;\; aE^* = (m+g)I^* \;\;\;\;\;\; gI^* = mR^* \]No birth and death: m->0
Reproduction number
\[ r = \frac{1}{S^*} = \frac{ab}{(m+a)(m+g)} \rightarrow \frac{b}{g} \]Some parameters can be estimated, i.e. g=1/3 per day, a=1/5.2 per day
Task: Find contact parameter b
With X=(S,E,I,R); Observable state: Y=I+R; corresponding measurements z
Kahlman filter: \[ X_{t|t} = X_{t|t-1} - \frac{1}{2} K_{t} (Y_{t|t-1} + \langle Y _{t|t-1} \rangle - 2 ) \] Kahlman gain \[ K_t = \frac{\langle (X_{t|t-1}-\langle X_{t|t-1}\rangle)[(I_{t|t-1}-\langle I_{t|t-1}\rangle)+(R_{t|t-1}-\langle R_{t|t-1}\rangle)] \rangle}{\langle (I-\langle I \rangle)^2 \rangle + 2\langle (R-\langle R\rangle)(I-\langle I\rangle ) \rangle + \langle (I-\langle I \rangle)^2 \rangle + r} \] estimated observation error r=10
Approximate Log-Likelihood as \[ L(t,b) = \frac{1}{2} \frac{|(\langle I \rangle-z_I)+(\langle R \rangle-z_R)|^2}{\langle (I-\langle I \rangle)^2 \rangle + 2\langle (R-\langle R \rangle)(I-\langle I\rangle) \rangle + \langle (I-\langle I \rangle)^2 \rangle + r} \] \[\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; + \frac{1}{2} \log(\langle (I-\langle I \rangle)^2 \rangle + 2\langle (R-\langle R\rangle)(R-\langle R\rangle ) \rangle + \langle (I-\langle I \rangle)^2 \rangle + r), \]
Assuming a Gaussian Likelihood function \[ -\log( \frac{1}{\sigma_Y}e^{-\Delta Y/(2\sigma_Y^2)} ) = -\frac{1}{2}\log(\sigma_Y^2) - \frac{\Delta Y}{2\sigma_Y^2} \] Obtain b by minimizing \[ b^* = argmin_b \sum_t L(t,b) \]
If we do not assume Gaussian distributions, the observables like mean and variance have to be calculated from the posterior. Markov Chain Monte Carlo is a method to approximate
\[ \rho(x) \]Require:
Construct a series x_t of length T that has the probability distribution \[\rho(x)\], then \[ \langle f(x) \rangle = \frac{1}{T} \sum_t f(x_t) \]
Detailed Balance
\[ p(\Theta_{t+1}|\Theta_t)p(\Theta_t) = p(\Theta_{t}|\Theta_{t+1})p(\Theta_{t+1}) \;\;\rightarrow \;\; \frac{p(\Theta_{t+1}|\Theta_t)}{p(\Theta_{t}|\Theta_{t+1})} = \frac{ p(\Theta_{t+1})}{p(\Theta_t)} \]Defining the proposal distribution Q and the acceptance distribution T \[ p(\Theta_{t+1}|\Theta_t)= Q(\Theta_{t+1}|\Theta_t) T(\Theta_{t+1}|\Theta_t) \]
So \[\frac{T(\Theta_{t+1}|\Theta_t)}{\Theta_{t}|\Theta_{t+1}} = \frac{p(\Theta_{t+1})}{p(\Theta_t)} \frac{Q(\Theta_{t}|\Theta_{t+1})}{Q(\Theta_{t+1}|\Theta_t)}\]
This is fulfilled by \[ T(\Theta_{t+1}|\Theta_t) = \min\left[1, \frac{p(\Theta_{t+1})}{p(\Theta_t)} \frac{Q(\Theta_{t}|\Theta_{t+1})}{Q(\Theta_{t+1}|\Theta_t)} \right] \]
At each point in time, draw sample from Q and compute the corresponding transition probability T. Then generate a random number u between 0 and 1, If u is smaller than T, accept the move and otherwise reject it