Skip to main content

Least Squares

Recall our previous example, here we have a system of equation AX=Y\mathbb{A}\mathbb{X}=\mathbb{Y}.
let's take an example, suppose we have 33 points (in form of a1,a2a_1,a_2) (1,1),(2,2),(3,2)(1,1),(2,2),(3,2).
Our objective is to get a best possible linear function for a2a_2, say that function be ff.
Our function might not give exact a2a_2 that corresponds to a1a_1 but it will give us best possible approximation for a2a_2.
The simplest linear function is a2=f(a1)=x1+a1x2a_2 = f(a_1) = x_1 + a_1 x_2.
Here x1,x2x_1, x_2 are our parameters (unknown)
Our observations says,
f=x1+1x2=1f=x_1 + 1 x_2 =1,
f=x1+2x2=2f=x_1 + 2 x_2 =2,
f=x1+3x2=2f=x_1 + 3 x_2 =2,
We can also write it as,

[111213]A[x1x2]X=[122]Y\underbrace{\begin{bmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ \end{bmatrix}}_{\mathbb{A}} \underbrace{\begin{bmatrix} x_1\\ x_2\\ \end{bmatrix}}_{\mathbb{X}} = \underbrace{\begin{bmatrix} 1\\ 2\\ 2\\ \end{bmatrix}}_{\mathbb{Y}}
AX=Y\mathbb{A}\mathbb{X}=\mathbb{Y}

So we want to find the linear combinations of column vectors of A\mathbb{A} that gives us Y\mathbb{Y}, but Y\mathbb{Y} does not lives in the column space of A\mathbb{A}.
So now we will find a vector Y^\widehat{\mathbb{Y}} in the column space of A\mathbb{A} that is closest to Y\mathbb{Y}, here closeness is determined by the Euclidean distance between Y\mathbb{Y} and Y^\widehat{\mathbb{Y}}.
So instead we will solve,

AX^=Y^\mathbb{A}\widehat{\mathbb{X}}=\widehat{\mathbb{Y}}

(And X^\widehat{\mathbb{X}} is just a way to tell that our solution is an estimate of exact solution).
Y^\widehat{\mathbb{Y}} lives in the column space of A\mathbb{A}, and Y\mathbb{Y} is out of the column space of A\mathbb{A}, so
The vector YY^\mathbb{Y} - \widehat{\mathbb{Y}} is perpendicular to the column space of A\mathbb{A}.
YY^\Rightarrow \mathbb{Y} - \widehat{\mathbb{Y}} is in the Null space of AT\mathbb{A}^T.
AT(YY^)=0\Rightarrow \mathbb{A}^T(\mathbb{Y} - \widehat{\mathbb{Y}})=0 and we know that Y^=AX^\widehat{\mathbb{Y}}=\mathbb{A}\widehat{\mathbb{X}}.
AT(YAX^)=0\Rightarrow \mathbb{A}^T(\mathbb{Y} - \mathbb{A}\widehat{\mathbb{X}})=0
ATAX^=ATY\Rightarrow \mathbb{A}^T\mathbb{A}\widehat{\mathbb{X}}=\mathbb{A}^T \mathbb{Y}

ATA=[36614],\mathbb{A}^T\mathbb{A}= \begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix},\quad ATY=[511],\mathbb{A}^T\mathbb{Y}= \begin{bmatrix} 5\\ 11\\ \end{bmatrix},\quad

Now we have to solve,

[36614][x1x2]=[511]\begin{bmatrix} 3 & 6 \\ 6 & 14 \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ \end{bmatrix} = \begin{bmatrix} 5\\ 11\\ \end{bmatrix}

We can write it as,
3x1+6x2=53x_1 + 6x_2 = 5
6x1+14x2=116x_1 + 14x_2 = 11
By solving we get x1=2/3x_1=2/3 and x2=1/2x_2=1/2

So our function f(a1)=x1+a1x2f(a_1) = x_1 + a_1 x_2 becomes,

f(a1)=23+12a1f(a_1) = \frac{2}{3} + \frac{1}{2}a_1

Let's take a look at our estimate for our 33 data points and it's error(which is a2a2^a_2 - \hat{a_2}).

  • For (a1,a2)=(1,1)(a_1,a_2)=(1,1)

    a2^=23+12(1)=76\hat{a_2} = \frac{2}{3} + \frac{1}{2}(1)=\frac{7}{6}

    e1=176=16e_1= 1 - \frac{7}{6} = -\frac{1}{6}

  • For (a1,a2)=(2,2)(a_1,a_2)=(2,2)

    a2^=23+12(2)=53\hat{a_2} = \frac{2}{3} + \frac{1}{2}(2)=\frac{5}{3}

    e2=253=13e_2= 2 - \frac{5}{3}= \frac{1}{3}

  • For (a1,a2)=(3,2)(a_1,a_2)=(3,2)

    a2^=23+12(3)=136\hat{a_2} = \frac{2}{3} + \frac{1}{2}(3)=\frac{13}{6}

    e3=2136=16e_3= 2 - \frac{13}{6} = -\frac{1}{6}

Now represent our estimate and errors as vector,

Y^=[7653136],e=YY^=[161316]\widehat{\mathbb{Y}} = \begin{bmatrix} \frac{7}{6}\\ \\ \frac{5}{3} \\ \\ \frac{13}{6} \\ \end{bmatrix} ,\quad e=\mathbb{Y}-\widehat{\mathbb{Y}} = \begin{bmatrix} -\frac{1}{6}\\ \\ \frac{1}{3} \\ \\ -\frac{1}{6} \\ \end{bmatrix}

As we discussed above that Y^\widehat{\mathbb{Y}} is in column space of A\mathbb{A} and YY^\mathbb{Y}-\widehat{\mathbb{Y}} is perpendicular to that column space.
We can now see it in this example.
First notice that dot product of Y\mathbb{Y} and YY^\mathbb{Y}-\widehat{\mathbb{Y}} is 00.

Y(YY^)=YT(YY^)=0\mathbb{Y}\cdot(\mathbb{Y}-\widehat{\mathbb{Y}})=\mathbb{Y}^T(\mathbb{Y}-\widehat{\mathbb{Y}})=0

As we said that YY^\mathbb{Y}-\widehat{\mathbb{Y}} is perpendicular to the whole column space, you can took any linear combinations of the columns of A\mathbb{A} it will be perpendicular to YY^\mathbb{Y}-\widehat{\mathbb{Y}}.