least squares - What Is the Relationship Between a Kalman Filter and Polynomial Regression?

Friday, February 16, 2018

least squares - What Is the Relationship Between a Kalman Filter and Polynomial Regression?

What is the relationship, if any, between Kalman filtering and (repeated, if necessary) least squares polynomial regression?

Answer

1. There is a Difference in terms of optimality criteria

Kalman filter is a Linear estimator. It is a linear optimal estimator - i.e. infers model parameters of interest from indirect, inaccurate and uncertain observations.

But optimal in what sense? If all noise is Gaussian, the Kalman filter minimizes the mean square error of the estimated parameters. This means, that when underlying noise is NOT Gaussian the promise no longer holds. In case of nonlinear dynamics, it is well-known that the problem of state estimation becomes difficult. In this context, no filtering scheme clearly outperforms all other strategies. In such case, Non-linear estimators may be better if they can better model the system with additional information. [See Ref 1-2]

Polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial.

$$ Y = a_0 + a_1x + a_2x^2 + \epsilon $$

Note that, while polynomial regression fits a nonlinear model to the data, these models are all linear from the point of view of estimation, since the regression function is linear in terms of the unknown parameters $a_0, a_1, a_2$. If we treat $x, x^2$ as different variables, polynomial regression can also be treated as multiple linear regression.

Polynomial regression models are usually fit using the method of least squares. In the least squares method also, we minimize the mean squared error. The least-squares method minimizes the variance of the unbiased estimators of the coefficients, under the conditions of the Gauss–Markov theorem. This theorem, states that ordinary least squares (OLS) or linear least squares is the Best Linear Unbaised Estimator (BLUE) under following conditions:

a. when errors have expectation zero i.e. $E(e_i) = 0 $
b. have equal variances i.e. $ Variance(e_i) = \sigma^2 < \infty $
c. and errors are uncorrelated i.e. $ cov(e_i,e_j) = 0 $

NOTE: that here, errors don't have to be Gaussian nor need to be IID. It only needs to be uncorrelated.

2. Kalman Filter is an evolution of estimators from least square

In 1970, H. W. Sorenson published an IEEE Spectrum article titled "Least-squares estimation: from Gauss to Kalman." [See Ref 3.] This is a seminal paper that provides great insight about how Gauss' original idea of least squares to today's modern estimators like Kalman.

Gauss' work not only introduced the least square framework but it was actually one of the earliest work that used a probabilistic view. While least squares evolved in the form of various regression methods, there was another critical work that brought filter theory to be used as an estimator.

The theory of ﬁltering to be used for stationary time series estimation was constructed by Norbert Wiener during 1940s (during WW-II) and published in 1949 which is now known as Wiener filter. The work was done much earlier, but was classiﬁed until well after World War II). The discrete-time equivalent of Wiener's work was derived independently by Kolmogorov and published in 1941. Hence the theory is often called the Wiener-Kolmogorov filtering theory.

Traditionally filters are designed for the desired frequency response. However, in case of Wiener filter, it reduces the amount of noise present in a signal by comparison with an estimation of the desired noiseless signal. Weiner filter is actually an estimator. In an important paper, however, Levinson (1947) [See Ref 6] showed that in discrete time, the entire theory could be reduced to least squares and so was mathematically very simple. See Ref 4

Thus, we can see that Weiner's work gave a new approach for estimation problem; an evolution from using least squares to another well-established filter theory. However, the critical limitation is that Wiener filter assumes the inputs are stationary. We can say that Kalman filter is a next step in the evolution which drops the stationary criteria. In Kalman filter, state space model can dynamically be adapted to deal with non-stationary nature of signal or system.

The Kalman filters are based on linear dynamic systems in discrete time domain. Hence it is capable of dealing with potentially time varying signal as opposed to Wiener. As the Sorenson's paper draws parallel between Gauss' least squares and Kalman filter as

...therefore, one sees that the basic assumption of Gauss and Kalman are identical except that later allows the state to change from one time to next. The difference introduces a non-trivial modification to Gauss' problem but one that can be treated within the least squares framework.

3. They are same as far as causality direction of prediction is concerned; besides implementation efficiency

Sometimes it is perceived that Kalman filter is used for prediction of future events based on past data where as regression or least squares does smoothing within end to end points. This is not really true. Readers should note that both the estimators (and almost all estimators you can think of) can do either job. You can apply Kalman filter to apply Kalman smoothing.

Similarly, regression based models can also be used for prediction. Given the training vector, $X_t$ and you applied $Y_t$ and discovered the model parameters $α_0 ... a_K$ now for another sample $X_k$ we can extrapolate $Y_K$ based on the model.

Hence, both methods can be used in the form of smoothing or fitting (non-causal) as well as for future predictions (causal case). However, the critical difference is the implementation which is significant. In case of polynomial regression - with entire process needs to get repeated and hence, while it may be possible to implement causal estimation but it might be computationally expensive. [While, I am sure there must be some research by now to make things iterative].

On the other hand, Kalman filter is inherently recursive. Hence, using it for prediction for future only using on past data will be very efficient.

Here is another good presentation that compares several methods: Ref 5

References