Thursday, December 26, 2019

estimation - Sequential Form of the Least Squares Estimator for Linear Least Squares Model


I'm currently working on a project in which I need to find the tilt of a surface. Let's assume I'm only concerned with a single dimension tilt (i.e. slope) to begin.


I currently have the ability to calculate a least-squared slope when processing batch data. However, this requires me to batch all the data prior to performing the calculation.


Is there instead a recursive form for computing the least squares linear fit such that I'm not required to batch the data prior to performing the calculation? Instead I would like to continuously update the least squares slope for each new data point that is received. The motivation to this is that I could avoid saturation/overflows due to batching data.


If you still don't understand what I'm asking, refer to this web page here which outlines a recursive form for calculating the mean - http://www.heikohoffmann.de/htmlthesis/node134.html. The advantage of using this method is that you don't need to keep any large sums going thus avoiding potential overflows, and instead, you simply weight the new data against the previously calculated mean.



Answer





To summarize the question's problem, you want to calculate the slope based on all samples obtained thus far, and as new samples are obtained, update the slope without going through all the samples again.


On the page you cite is the equation for calculation of the slope $m_n$ that together with $b_n$ minimizes the sum of squares of errors of the first $n$ of the $y$-values approximated by the linear function $f_n(x) = b_n + m_n x$. Converting the equation to zero-based array indexing for $x$ and $y$, we get:


$$m_n = \frac{\displaystyle\sum_{i=0}^{n-1}(x_i-\overline X)(y_i-\overline Y)}{\displaystyle\sum_{i=0}^{n-1}(x_i-\overline X)^2},\quad n \ge 2,\tag{1}$$


where $\overline X$ is the mean of the $x$ values and $\overline Y$ is the mean of the $y$ values. The problem with Eq. 1 is that the sums are actually nested sums:


$$m_n = \frac{\displaystyle\sum_{i=0}^{n-1}\left(x_i-\frac{\sum_{j=0}^{n-1} x_j}{n}\right)\left(y_i-\frac{\sum_{j=0}^{n-1} y_j}{n}\right)}{\displaystyle\sum_{i=0}^{n-1}\left(x_i-\frac{\sum_{j=0}^{n-1} x_j}{n}\right)^2}, \quad n \ge 2.\tag{2}$$


As $n$ is incremented, the inner sums can be updated recursively (iteratively), but by direct inspection the outer sum would need to be recalculated from the beginning. Least squares is an old and well-studied problem, so we don't try to bang our heads but look elsewhere. Converted to zero-based array indexing, Eqs. 16–21 of the Wolfram page on least squares fitting state:


$$\sum_{i=0}^{n-1}(x_i-\overline X)(y_i-\overline Y) = \left(\sum_{i=0}^{n-1} x_iy_i\right) - n\overline X\overline Y,\tag{3}$$


$$\sum_{i=0}^{n-1}(x_i-\overline X)^2 = \left(\sum_{i=0}^{n-1} x_i^2\right) - n\overline X^2.\tag{4}$$


These allow to rewrite Eq. 1 as (also given as Eq. 15 on the Wolfram page):


$$m_n = \frac{\left(\displaystyle\sum_{i=0}^{n-1} x_iy_i\right) - n\overline X\overline Y}{\left(\displaystyle\sum_{i=0}^{n-1} x_i^2\right) - n\overline X^2}, \quad n \ge 2.\tag{5}$$



It is easy to update the sums as $n$ is increased, and also updating $\overline X$ and $\overline Y$ is easy, by keeping and updating an accumulator containing the sum of all $x_i$ and another containing the sum of all $y_i$. The accumulated sum is then divided by the current $n$. The factor $n$ in the numerator and the denominator of Eq. 5 cancels together with one such division each, giving a formula for you to use:


$$m_n = \frac{\left(\displaystyle\sum_{i=0}^{n-1} x_iy_i\right) - \left(\displaystyle\sum_{i=0}^{n-1} x_i\right)\left(\displaystyle\sum_{i=0}^{n-1} y_i\right)/n}{\left(\displaystyle\sum_{i=0}^{n-1} x_i^2\right) - \left(\displaystyle\sum_{i=0}^{n-1} x_i\right)^2/n}, \quad n \ge 2.\tag{6}$$


Uniform sampling


If the samples are uniformly distributed, then further optimizations are possible. If the $x$ values are successive ascending integers, for example:


$$x_i = i,\tag{7}$$


then we get:


$$m_n = \frac{12\left(\displaystyle\sum_{i=0}^{n-1} i y_i\right) - 6(n - 1)\displaystyle\sum_{i=0}^{n-1} y_i}{n^3 - n}, \quad n \ge 2.\tag{8}$$


Avoiding large sums


Based on your comment to the question and to @M529's answer, perhaps this is not enough and you want to only ever store normalized sums that do not grow indefinitely with increasing $n$, similarly to what was done here. We can rewrite Eq. 8 as:


$$\begin{gather}m_n = 12 A_n - 6 B_n,\\ A_n = \frac{\displaystyle\sum_{i=0}^{n-1} i y_i}{n^3 - n},\quad B_n = \frac{\displaystyle\sum_{i=0}^{n-1} y_i}{n^2 - n},\quad n \ge 2,\end{gather}\tag{9}$$



which has recursions for $n \ge 3$:


$$\begin{gather}A_n = A_{n-1} + \frac{y_i}{n^2 + n} - \frac{3 A_{n-1}}{n + 1},\\ B_n = B_{n-1} + \frac{y_i}{n^2 - n} - \frac{2 B_{n-1}}{n},\end{gather}\tag{10}$$


that were found by simplifying the boxed parts of:


$$\begin{gather}A_n = A_{n-1} + \boxed{\frac{A_{n-1}\left((n - 1)^3 - (n - 1)\right) + (n - 1)y_{n-1}}{n^3 - n} - A_{n-1}},\\ B_n = B_{n-1} + \boxed{\frac{B_{n-1}\left((n - 1)^2 - (n - 1)\right) + y_{n-1}}{n^2 - n} - B_{n-1}},\end{gather}\tag{11}$$


where:


$$\begin{gather}A_{n-1}\left((n - 1)^3 - (n - 1)\right) = \sum_{i=0}^{n-2} i y_i,\\ B_{n-1}\left((n - 1)^2 - (n - 1)\right) = \sum_{i=0}^{n-2} y_i.\end{gather}\tag{12}$$


The recursive (iterative) algorithm can be tested in Python against the direct algorithm by:


from random import random
N = 10000 # total number of samples
n = 1

y = 2*random()-1 # Input y_0
sum1 = 0 # Direct (also needed to init recursive)
sum2 = y # Direct (also needed to init recursive)
print("n="+str(n))
n = 2
y = 2*random()-1 # Input y_1
sum1 += y # Direct (also needed to init recursive)
sum2 += y # Direct (also needed to init recursive)
A = sum1/(n**3 - n) # Recursive A_2
B = sum2/(n**2 - n) # Recursive B_2

m = m_ref = 12*A - 6*B # Recursive and direct m_2
print("n="+str(n)+", m=m_ref="+str(m_ref))
for n in range(3, N + 1):
y = 2*random()-1 # Input y_2 ... y_(N-1)
A += y/(n**2 + n) - 3*A/(n + 1) # Recursive A_3 ... A_N
B += y/(n**2 - n) - 2*B/n # Recursive B_3 ... B_N
m = 12*A - 6*B; # Recursive m_3 ... m_N
sum1 += (n-1)*y # Direct (not needed for recursive)
sum2 += y # Direct (not needed for recursive)
m_ref = 12*sum1/(n**3 - n) - 6*sum2/(n**2 - n) # Direct m_3 ... m_N

print("n="+str(n)+", A="+str(A)+", B="+str(B)+", sum1="+str(sum1)+", sum2="+str(sum2)+", m="+str(m)+", m_ref="+str(m_ref))

The recursive algorithm seems stable, with the state variables A and B remaining small with a total of N=10000 independent random samples from a uniform distribution $-1 \le y < 1$. At the last sample the state variables and the output are:


n     = 10000
A = 6.246471489186524e-07
B = 5.20978725548091e-07
sum1 = 624647.1426721811
sum2 = 52.09266276755386
m = 4.369893433735283e-06
m_ref = 4.369893433735271e-06


where m is $m_n$ calculated by Eq. 9 using the recursions of Eq. 10 for $n \ge 3$ and m_ref is the direct calculation of $m_n$ by Eq. 9. The two ways of calculating $m_n$ agree to 12 decimal digits. The direct sums sum1 and sum2 seem to be growing without bound. Of course, n is also growing without bound, and I don't think there's anything that can be done about that if this is what you want to calculate.


I did some testing against an arbitrary precision implementation using Python's mpmath (data not shown), and found that the root mean square error of the direct and the recursive methods did not differ significantly for $N = 10^5$ random samples. Personally, I would prefer the direct summation method using a large (in number of bits) fixed-point accumulator, if necessary. With direct summation, especially if it is not necessary to compute $m_n$ for all $n$, many slow division instructions are avoided, compared to the recursive method.


Recursion without auxiliary state variables


For what it's worth, I also found a recursion that doesn't require auxiliary state variables (such as $A$ and $B$), but requires remembering the previous two $m$ and the previous $y$:


$$m_n = \frac{2m_{n-1}(n - 2)}{n + 1} - \frac{m_{n-2}(n - 2)(n - 3)}{n^2 + n} + \frac{6y_{n-1}(n - 3)}{n^3 - n} - \frac{6y_{n-2}(n - 3)}{n^3-n}.\tag{13}$$



Earlier I had thought that you want to calculate the slope based on $N$ most recent samples. I will keep the following stub of an analysis about that, in case it might be useful for someone.


For a least squares fit polynomial $f(x) = c_0 + c_1 x$ with $x=0$ at the center of the length-$N$ window, you can calculate the "intercept" $f(0) = c_0$ using a Savitzky–Golay filter, and the "slope" $f'(0) = c_1$ using a Savitzky–Golay differentiation filter, with both filters parameterized to have a polynomial degree of 1. These are linear time-invariant filters each described by an impulse response, easy to obtain for example using the function sgolay in Octave:


pkg load signal;

N=9; # Number of samples in the window
sg0 = sgolay(1, N, 0)((N+1)/2,:); # Savitzky-Golay filter
sg1 = sgolay(1, N, 1)((N+1)/2,:); # Savitzky-Golay differentiation filter

The filters obtained have the following coefficients (which when reversed give the impulse response):


sg0 = 0.11111   0.11111   0.11111   0.11111   0.11111   0.11111   0.11111   0.11111   0.11111
sg1 = -6.6667e-02 -5.0000e-02 -3.3333e-02 -1.6667e-02 3.5832e-18 1.6667e-02 3.3333e-02 5.0000e-02 6.6667e-02

enter image description here
Figure 1. Savitzky–Golay (blue crosses) and Savitzky–Golay differentiation filter (red o's) coefficients, with $N=9$ and polynomial order 1. Swap the sign of the horizontal axis for impulse responses.



The first filter is a moving average. The second filter has a truncated polynomial impulse response. I think such filters can be implemented recursively (see my question about that), perhaps elaborated in this pay-walled paper: T.G. Campbell, Y. Neuvo: "Predictive FIR filters with low computational complexity" IEEE Transactions on Circuits and Systems (Volume: 38, Issue: 9, Sep 1991).


No comments:

Post a Comment

digital communications - Understanding the Matched Filter

I have a question about matched filtering. Does the matched filter maximise the SNR at the moment of decision only? As far as I understand, ...