Skip to main content

Projection

Vector Projection

Say that we have two vector vR2\vec{v}\in\mathbb{R}^2 and uR2\vec{u}\in\mathbb{R}^2.
Now in space of v\vec{v} which vector is closest to w\vec{w}?
But let's first ask what is the space of v\vec{v}?

\quad

info

We are asking for the space of a single vector, a single vector can only give us a space in R\mathbb{R} which is a line.
So the span of v\vec{v} is Cv;CRC\vec{v};\quad C\in\mathbb{R}

Now we have to find a vector closest to w\vec{w} in the vector space of v\vec{v}.
Let's call this closest vector as p\vec{p}, this p\vec{p} is the projection of w\vec{w} on v\vec{v}.
We know that p\vec{p} lives in the vector space of v\vec{v}.
So p=xv;xR\vec{p}=x\vec{v};\quad x\in\mathbb{R}.
p\vec{p} is the closest vector to w\vec{w} then the vector joining p\vec{p} to w\vec{w} is perpendicular to v\vec{v}.
Say the vector joining p\vec{p} to w\vec{w} be e\vec{e}, and e=wp\vec{e} = \vec{w}-\vec{p} So,

info

ve=0\vec{v}\cdot \vec{e} =0
vTe=0\Rightarrow \vec{v}^T \vec{e} =0
vT(wp)=0\Rightarrow \vec{v}^T (\vec{w} -\vec{p})=0
vT(wxv)=0\Rightarrow \vec{v}^T (\vec{w} -x\vec{v})=0
vTwxvTv=0\Rightarrow \vec{v}^T \vec{w} -x\vec{v}^T\vec{v}=0

x=vTwvTvR\Rightarrow x=\frac{\vec{v}^T \vec{w}}{\vec{v}^T\vec{v}} \in\mathbb{R}

Projection of w\vec{w} on v\vec{v} is p=vx\vec{p}=\vec{v}x

Projection Matrix of a vector

Say we have a vector vRn\vec{v}\in\mathbb{R}^n now we want a matrix that gives us projection of any vector onto this vector v\vec{v}, we call this matrix a projection matrix.
Say this projected vector on v\vec{v} be p\vec{p}.
As we discussed above,
p=vx\vec{p}=\vec{v}x
p=vvTwvTv\displaystyle \Rightarrow \vec{p}= \vec{v} \frac{\vec{v}^T \vec{w}}{\vec{v}^T\vec{v}}
p=vvTvTvw\displaystyle \Rightarrow \vec{p}= \frac{\vec{v} \vec{v}^T}{\vec{v}^T\vec{v}} \vec{w}

Projection matrix(P)(P) of a vector v\vec{v} is

danger
P=v vTvTv\begin{matrix} \displaystyle P=\frac{\vec{v}\ \vec{v}^T}{\vec{v}^T\vec{v}} \end{matrix}
  • Columns of PP is the linear combinations of v\vec{v}, so
info

Column space of PP is the vector space of v\vec{v}.

  • We have our Projection matrix(P)(P) we can take projection of any vector w\vec{w} on v\vec{v} by evaluating PwP\vec{w}
info

In PwP\vec{w} we are taking the linear combination of v\vec{v} so the resultant projection of w\vec{w} on v\vec{v} lives in the vector space of v\vec{v}.

  • What about the rank of PP?
info

Rank of PP is 11.
Because all the columns of PP is the linear combination of a single vector v\vec{v}.

  • Is PP symmetric? P=vvTvTv\displaystyle P= \frac{\vec{v} \vec{v}^T}{\vec{v}^T\vec{v}}

PT=(vvT)TvTv\displaystyle \Rightarrow P^T = \frac{(\vec{v} \vec{v}^T)^T}{\vec{v}^T\vec{v}}

PT=vTTvTvTv\displaystyle \Rightarrow P^T = \frac{\vec{v}^{T^T} \vec{v}^T}{\vec{v}^T\vec{v}}

PT=vvTvTv\displaystyle \Rightarrow P^T = \frac{\vec{v} \vec{v}^T}{\vec{v}^T\vec{v}}
So PP is symmetric.

info
P=PTP=P^T
  • What if we apply the projection of a vector w\vec{w} on v\vec{v} twice? When we apply the projection first time, we land in the vector space of v\vec{v}.
    Let's say the projected vector be p1\vec{p}_1.
    If we again apply the projection on p1\vec{p}_1 then it project the p1\vec{p_1} into the vector space of v\vec{v}.
    But p1\vec{p}_1 is already in vector space of v\vec{v}, so the second projection did nothing.
    So we can say that
info
P2=PP^2=P

Why are we doing projection?

Ok we can find the projection of a vector on another vector, but why are we projecting the vector in the first place?
What is the benefit of projecting a vector?

Say a function f=x1a+x2b+x3cf=x_1 a+ x_2 b+ x_3 c is governed by 3 variables (say a,b,ca,b,c) and we have 55 noisy observation of ff now we want to predict ff for some set of (a,b,c)(a,b,c).
Because our 55 observations are noisy so we can not perfectly predict the outcome.
Our observation is something like,
f1=x1a1+x2b1+x3c1f_1=x_1 a_1+ x_2 b_1+ x_3 c_1
f2=x1a2+x2b2+x3c2f_2=x_1 a_2+ x_2 b_2+ x_3 c_2
f3=x1a3+x2b3+x3c3f_3=x_1 a_3+ x_2 b_3+ x_3 c_3
f4=x1a4+x2b4+x3c4f_4=x_1 a_4+ x_2 b_4+ x_3 c_4
f5=x1a5+x2b5+x3c5f_5=x_1 a_5+ x_2 b_5+ x_3 c_5
We can write it as,

[a1b1c1a2b2c2a3b3c3a4b4c4a5b5c5][x1x2x3]=[f1f2f3f4f5]\begin{bmatrix} a_1 & b_1 & c_1\\ a_2 & b_2 & c_2\\ a_3 & b_3 & c_3\\ a_4 & b_4 & c_4\\ a_5 & b_5 & c_5\\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix} = \begin{bmatrix} f_1\\ f_2\\ f_3\\ f_4\\ f_5\\ \end{bmatrix}

Here x1,x2,x3x_1 , x_2 , x_3 are our parameters(unknown), so our parameter space is R5\mathbb{R}^5.
We want to find a 44 dimensional plane that best fit our noisy data.
Say

[a1a2a3a4a5]=a,[b1b2b3b4b5]=b,[c1c2c3c4c5]=c,[a1b1c1a2b2c2a3b3c3a4b4c4a5b5c5]=A,[x1x2x3]=X  and  [f1f2f3f4f5]=Y\begin{bmatrix} a_1 \\ a_2 \\ a_3 \\ a_4 \\ a_5 \\ \end{bmatrix} = \vec{a},\quad \begin{bmatrix} b_1 \\ b_2 \\ b_3 \\ b_4 \\ b_5 \\ \end{bmatrix} = \vec{b},\quad \begin{bmatrix} c_1 \\ c_2 \\ c_3 \\ c_4 \\ c_5 \\ \end{bmatrix} = \vec{c},\quad \begin{bmatrix} a_1 & b_1 & c_1\\ a_2 & b_2 & c_2\\ a_3 & b_3 & c_3\\ a_4 & b_4 & c_4\\ a_5 & b_5 & c_5\\ \end{bmatrix}= \mathbb{A}, \quad \begin{bmatrix} x_1\\ x_2\\ x_3\\ \end{bmatrix}=\mathbb{X} \ \ and \ \ \begin{bmatrix} f_1\\ f_2\\ f_3\\ f_4\\ f_5\\ \end{bmatrix} =\mathbb{Y}

So AX=Y\mathbb{A}\mathbb{X}=\mathbb{Y}
Our data is noisy.
Rank(A)3\text{Rank}(\mathbb{A})\leq 3, XR3\mathbb{X}\in\mathbb{R}^3, YR5\mathbb{Y}\in\mathbb{R}^5.

So there are high chance that Y\mathbb{Y} does not lives in column space of A\mathbb{A}.
So instead we find the solution for AX^=Y^\mathbb{A}\widehat{\mathbb{X}}=\widehat{\mathbb{Y}}.
Where Y^\widehat{\mathbb{Y}} lives in the column space of A\mathbb{A} and Y^\widehat{\mathbb{Y}} is closest to Y\mathbb{Y}

info

We can get this Y^\widehat{\mathbb{Y}} by taking the projection of Y\mathbb{Y} onto the column space of A\mathbb{A}

Matrix Projection

Say that we have a m×nm\times n matrix ARm×n\mathbb{A}\in\mathbb{R}^{m\times n} and a vector vRn\vec{v}\in\mathbb{R}^n.
Then projection of a vector v\vec{v} on a matrix A\mathbb{A} generally means projection of v\vec{v} on the column space of matrix A\mathbb{A}.
We can also take the projection of a vector v\vec{v} on the null space of matrix AT\mathbb{A}^T.
And as we know that null space of AT\mathbb{A}^T is perpendicular to the column space of A\mathbb{A}.
Then,

info

Projection of a vector v\vec{v} on the,

  • null space of matrix AT\mathbb{A}^T, and
  • column space of matrix A\mathbb{A}

collectively gives us back our vector v\vec{v}

Projection onto the column space of a matrix

Recall our above example, here we have a system of equation AX=Y\mathbb{A}\mathbb{X}=\mathbb{Y} where there is no solution.
Here problem was that there are high chance that Y\mathbb{Y} does not lives in column space of A\mathbb{A}, So it does not have a solution.
So we go for the closest solution.
AX^=Y^\mathbb{A}\widehat{\mathbb{X}}=\widehat{\mathbb{Y}}
where, Y^\widehat{\mathbb{Y}} lives in the column space of A\mathbb{A}, so.

info

Y^\widehat{\mathbb{Y}} is the linear combinations of the columns of A\mathbb{A}.

(vector)Y^\widehat{\mathbb{Y}} lives in the column space A\mathbb{A} and (vector)Y\mathbb{Y} is at some angle to the column space of A\mathbb{A}, So

info

The vector YY^\mathbb{Y} - \widehat{\mathbb{Y}} is perpendicular to the column space of A\mathbb{A}

YY^\Rightarrow \mathbb{Y} - \widehat{\mathbb{Y}} is also perpendicular to the column vectors of A\mathbb{A}

\Rightarrow aT(YY^)=0\vec{a}^T (\mathbb{Y} - \widehat{\mathbb{Y}})=0, bT(YY^)=0\vec{b}^T (\mathbb{Y} - \widehat{\mathbb{Y}})=0 and cT(YY^)=0\vec{c}^T (\mathbb{Y} - \widehat{\mathbb{Y}})=0 where a,\vec{a}, b\vec{b} and c\vec{c} are the column vector of A\mathbb{A}.
We can write it as,

danger
AT(YY^)=0\begin{matrix} \displaystyle \mathbb{A}^T(\mathbb{Y} - \widehat{\mathbb{Y}})=0 \end{matrix}

Notice that YY^\mathbb{Y} - \widehat{\mathbb{Y}} lives in (N(AT))(N(A^T))

And Null Space of AT\mathbb{A}^T is perpendicular to the column space of A\mathbb{A}.

info

So YY^\mathbb{Y} - \widehat{\mathbb{Y}} is perpendicular to the column space of A\mathbb{A}.

AT(YY^)=0\mathbb{A}^T(\mathbb{Y} - \widehat{\mathbb{Y}})=0 and Y^=AX^\widehat{\mathbb{Y}}=\mathbb{A}\widehat{\mathbb{X}}
AT(YAX^)=0\Rightarrow \mathbb{A}^T(\mathbb{Y} - \mathbb{A}\widehat{\mathbb{X}})=0
ATYATAX^=0\Rightarrow \mathbb{A}^T \mathbb{Y} - \mathbb{A}^T\mathbb{A}\widehat{\mathbb{X}}=0
ATAX^=ATY\Rightarrow \mathbb{A}^T\mathbb{A}\widehat{\mathbb{X}}=\mathbb{A}^T \mathbb{Y}

danger
X^=(ATA)1ATY\begin{matrix} \displaystyle \widehat{\mathbb{X}}=\left(\mathbb{A}^T\mathbb{A}\right)^{-1}\mathbb{A}^T \mathbb{Y} \end{matrix}

Our projection of Y\mathbb{Y} on the columnn space of A\mathbb{A} is Y^\widehat{\mathbb{Y}} .

Y^=AX^\widehat{\mathbb{Y}}=\mathbb{A}\widehat{\mathbb{X}} and X^=(ATA)1ATY\widehat{\mathbb{X}}=(\mathbb{A}^T\mathbb{A})^{-1}\mathbb{A}^T \mathbb{Y}, so

  • Projection(Y^)(\widehat{\mathbb{Y}}) of Y\mathbb{Y} on the column space of A\mathbb{A}:
danger
Y^=A(ATA)1ATY\begin{matrix} \displaystyle \widehat{\mathbb{Y}}=\mathbb{A}\left(\mathbb{A}^T\mathbb{A}\right)^{-1}\mathbb{A}^T \mathbb{Y} \end{matrix}

So our Projection Matrix is,

danger
P=A(ATA)1AT\begin{matrix} \displaystyle P=\mathbb{A}\left(\mathbb{A}^T\mathbb{A}\right)^{-1}\mathbb{A}^T \end{matrix}
  • Is PP symmetric? P=A(ATA)1AT\displaystyle P= \mathbb{A}(\mathbb{A}^T\mathbb{A})^{-1}\mathbb{A}^T
    PT=(A(ATA)1AT)T\displaystyle \Rightarrow P^T = \left(\mathbb{A}(\mathbb{A}^T\mathbb{A})^{-1}\mathbb{A}^T\right)^T
    PT=ATT((ATA)1)TAT\displaystyle \Rightarrow P^T = \mathbb{A}^{T^T}((\mathbb{A}^T\mathbb{A})^{-1})^T\mathbb{A}^T
    PT=ATT((ATA)T)1AT\displaystyle \Rightarrow P^T = \mathbb{A}^{T^T}\left((\mathbb{A}^T\mathbb{A})^T\right)^{-1}\mathbb{A}^T
    PT=ATT(ATATT)1AT\displaystyle \Rightarrow P^T = \mathbb{A}^{T^T}(\mathbb{A}^T\mathbb{A}^{T^T})^{-1}\mathbb{A}^T
    PT=A(ATA)1AT\displaystyle \Rightarrow P^T = \mathbb{A}(\mathbb{A}^T\mathbb{A})^{-1}\mathbb{A}^T
    So PP is symmetric.
info
P=PTP=P^T
  • What if we project vector Y\mathbb{Y} on the column space of a matrix A\mathbb{A} of a matrix A\mathbb{A} twice? When we apply the projection first time, we land in the column space of A\mathbb{A}.
    Let's say the projected vector be Y^\widehat{\mathbb{Y}}.
    If we again project the Y^\widehat{\mathbb{Y}} on the column space of A\mathbb{A} then the resultant vector will be Y^\widehat{\mathbb{Y}}.
    Because Y^\widehat{\mathbb{Y}} is already in column space of A\mathbb{A}, so the second projection did nothing.
    So we can say that
info
P2=PP^2=P

Let's look at the extreme cases,
Suppose we want to take projection of a vector b\vec{b} in the column space of a matrix A\mathbb{A}.

1.1. What If the vector b\vec{b} is perpendicular to the column space of matrix A\mathbb{A}?

note

First let's think about the space of vectors who are perpendicular to the column space of matrix A\mathbb{A}.
Space perpendicular to the column space is Null space of AT\mathbb{A}^T.
So b\vec{b} lives in the Null space of AT\mathbb{A}^T.
So,

ATb=0\mathbb{A}^T\vec{b}=\vec{0}

projection matrix(P)(P) of the column space of a matrix A\mathbb{A} is, And projectionPP of b\vec{b} in the column space of a matrix A\mathbb{A} is PbP\vec{b}

Pb=A(ATA)1ATb=0=0P\vec{b}= \mathbb{A}(\mathbb{A}^T\mathbb{A})^{-1}\underbrace{\mathbb{A}^T\vec{b}}_{=\vec{0}}=\vec{0}

2.2. What If the vector b\vec{b} is in the column space of m×nm\times n matrix A\mathbb{A}?

note

Vectors in the column space of A\mathbb{A} is the linear combinations of the column vectors of matrixA\mathbb{A}.
We can write the all the linear combinations of matrix all as Ax;xRn\mathbb{A}\vec{x};\quad\vec{x}\in\mathbb{R}^n.
And we are saying that b\vec{b} is in the column space of matrix A\mathbb{A}, so,

b=Ax;xRn\vec{b}=\mathbb{A}\vec{x};\quad\vec{x}\in\mathbb{R}^n

So projectionPP of b\vec{b} in the column space of a matrix A\mathbb{A} is PbP\vec{b}
Pb=A(ATA)1ATbP\vec{b}= \mathbb{A}(\mathbb{A}^T\mathbb{A})^{-1}\mathbb{A}^T\vec{b}.
Pb=A(ATA)1ATAI(identity)xP\vec{b}= \mathbb{A}\underbrace{(\mathbb{A}^T\mathbb{A})^{-1}\mathbb{A}^T\mathbb{A}}_{\mathcal{I}( identity)}\vec{x}.
Pb=AxP\vec{b}= \mathbb{A}\vec{x}.
Pb=bP\vec{b}= \vec{b}.
So projecting b\vec{b}(which is already in the column space of A\mathbb{A}) on the the column space of A\mathbb{A} changes nothing.