Skip to main content

Orthonormal Vectors

Say we have a m×nm\times n matrix QQ, ith columns of QQ is denotes as qiq_i.

Q=[q1q2qn]m×nQ=\begin{bmatrix} \vdots & \vdots & & \vdots \\ q_1 & q_2 & \cdots & q_n \\ \vdots & \vdots & & \vdots \\ \end{bmatrix}_{m\times n}

And all columns vectors are perpendicular to each other, and all column vectors are of unit length

qiTqj={0if ij1if i=jq_i^Tq_j=\left\{\begin{matrix} 0& \text{if }i\neq j\\ 1& \text{if }i=j \\ \end{matrix}\right.

equivalently we can say that QTQ=In×nQ^TQ=\mathcal{I}_{n\times n},

QTQ=[q1q2qn][q1q2qn]Q^TQ= \begin{bmatrix} \cdots q_1 \cdots\\ \cdots q_2 \cdots\\ \vdots\\ \cdots q_n \cdots\\ \end{bmatrix} \begin{bmatrix} \vdots & \vdots & & \vdots \\ q_1 & q_2 & \cdots & q_n \\ \vdots & \vdots & & \vdots \end{bmatrix}
QTQ=[q1q1q1q2q1qnq2q1q2q2q2qnqnq1qnq2qnqn]\Rightarrow Q^TQ= \begin{bmatrix} q_1q_1 & q_1q_2 & \cdots & q_1q_n \\ q_2q_1 & q_2q_2 & \cdots & q_2q_n \\ \vdots & \vdots & \ddots & \vdots \\ q_nq_1 & q_nq_2 & \cdots & q_nq_n \\ \end{bmatrix}
QTQ=[1000100001]\Rightarrow Q^TQ= \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & 0 \\ 0 & 0 & \cdots & 1 \\ \end{bmatrix}
info

QQ is an Orthogonal matrix if QQ is a square matrix

So if QQ is an Orthogonal matrix then,
QTQ=IQ^TQ=\mathcal{I} tells us that QT=Q1Q^T = Q^{-1}, So,

  • If QQ is an Orthogonal matrix then
danger
QT=Q1\begin{matrix} \displaystyle Q^T = Q^{-1} \end{matrix}

Examples of matrices with orthogonal vectors,

note
Q=[001100010]Q=\begin{bmatrix} 0 & 0 & 1\\ 1 & 0 & 0\\ 0 & 1 & 0\\ \end{bmatrix}

here all vectors are orthogonal, unit length and it's an 3×33\times 3 square matrix so it's orthogonal matrix.

Q=[122212221]Q=\begin{bmatrix} 1 & -2 & 2\\ 2 & -1 & -2\\ 2 & 2 & 1\\ \end{bmatrix}

here all vectors are orthogonal, unit length and it's an 3×33\times 3 square matrix so it's orthogonal matrix.

What is the benefit of orthonormal column vector

Say we have a m×nm\times n matrix QQ with orthonormal column vectors .
Now say we want the projection of a vector (say v\vec{v}) onto the column space of QQ.
Our projection matrix(P)(P) is:

P=Q(QTQ)1QTP=Q(Q^TQ)^{-1}Q^T

And projection of vector v\vec{v} onto the column space of QQ is PvP\vec{v}
because QQ has orthonormal column vectors .
So QTQ=IQ^TQ=\mathcal{I}.
So our projection matrix$(P) becomes P=QIQTP=Q\mathcal{I}Q^T, so.

  • Projection Matrix(P)(P):
danger
P=QQT\begin{matrix} \displaystyle P=QQ^T \end{matrix}

(So now we don't have to take the inverse is QTQQ^TQ)
We can see the direct benefit of having a matrix with orthonormal column vectors is in least squares.

info

In Least squares we have equation of form ATAX^=ATvA^TA\widehat{\mathbb{X}}=A^T\vec{v} and if AA has orthonormal column vectors, then ATA=IA^TA=\mathcal{I} so our equation becomes X^=ATv\widehat{\mathbb{X}}=A^T\vec{v}
No need to take (ATA)1(A^TA)^{-1}

Now let's test the properties of a projection matrix,

  • P=PTP=P^T

    P=QQTP=QQ^T
    PT=(QQT)T=QTTQT=QQT=PP^T=(QQ^T)^T = Q^{T^T}Q^T = QQ^T=P

  • P=P2P=P^2

    P=QQTP=QQ^T
    P2=Q(QTQ)QT=QIQT=QQTP^2=Q(Q^TQ)Q^T=Q\mathcal{I}Q^T=QQ^T

Gram Schmidt

Ok now we know that matrices with orthonormal column vectors are important.
But if our matrix(with independent column vectors) do not have orthonormal column vectors then how to make them orthonormal column vectors?
This is where Gram Schmidt came into picture.

First let's look at the smaller picture,
Say that we have 22 vectors aRn\vec{a}\in\mathbb{R}^n and bRn\vec{b}\in\mathbb{R}^n.
We have just 22(non-parallel) vectors, and span of 22(non-parallel) vectors is just a 22 dimensional plane.

What we want is two orthonormal vectors(say qa\vec{q}_a and qb\vec{q}_b) in this 22 dimensional plane.
Let's took a\vec{a} as our first orthogonal vector, it's easy because it's only one vector.
So

info
qa=aa\vec{q}_a=\frac{\vec{a}}{\|\vec{a}\|}

Now let's find second orthonormal vectors?
IDEA is to remove it's direction along qa\vec{q}_a.
So first take the projection of b\vec{b} onto the vector space of qa\vec{q}_a, say this projected b\vec{b} on qa\vec{q}_a to be bp\vec{b}_p.
Now the vector joining b\vec{b} and bp\vec{b}_p is orthogonal to qa\vec{q}_a, say this vector bo\vec{b}_o.
And bp+bo=b\vec{b}_p + \vec{b}_o = \vec{b} so,
bo=bbp\vec{b}_o = \vec{b}-\vec{b}_p.
Recall our projection matrix(P)(P) of a vector,

P=vvTvTv\displaystyle P=\frac{\vec{v} \vec{v}^T}{\vec{v}^T\vec{v}}

So projection of b\vec{b} onto the vector space of qa\vec{q}_a is bp=Pb\vec{b}_p=P\vec{b},
Pb=qaqaTqaTqab\displaystyle P\vec{b}=\frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{b}.

bp=qaqaTqaTqab  ;\displaystyle \Rightarrow \vec{b}_p =\frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{b}\ \ ;\quad (And qaTqa=1\vec{q}_a^T\vec{q}_a=1)

bp=qaqaTb\displaystyle \Rightarrow \vec{b}_p =\vec{q}_a \vec{q}_a^T\vec{b}.

info
bo=b(qab)qa\vec{b}_o = \vec{b}-(\vec{q}_a\cdot\vec{b})\vec{q}_a

And

qb=bobo\vec{q}_b=\frac{\vec{b}_o}{\|\vec{b}_o\|}

And we know that b\vec{b} is perpendicular to qa\vec{q}_a, and if we think qa\vec{q}_a as a matrix with one column then bo\vec{b}_o is perpendicular to the column space of qa\vec{q}_a.
So bo\vec{b}_o is in the Null space of qaT\vec{q}_a ^T so,
qaTbo=0\vec{q}_a^T \vec{b}_o=0, let's verify it,

qaT(bqaqaTb)=0\vec{q}_a^T \left( \vec{b}-\vec{q}_a \vec{q}_a^T\vec{b} \right) = 0
qaTb(qaTqa)qaTb=0\Rightarrow \vec{q}_a^T \vec{b}- (\vec{q}_a^T \vec{q}_a) \vec{q}_a^T\vec{b} = 0
qaTbqaTb=0\Rightarrow \vec{q}_a^T \vec{b}- \vec{q}_a^T \vec{b} = 0\quad \color{green}{✓}

Now let's push ourselves a little further now add another vector c\vec{c}.
We got our two orthogonal vectors qa\vec{q}_a and qb\vec{q}_b and this vector c\vec{c} is not in the vector space of qa\vec{q}_a and qb\vec{q}_b, in other words c\vec{c} is out of the plane spanned by vector qa\vec{q}_a and qb\vec{q}_b.
So c\vec{c} gives us access to the 33rd dimension, using this c\vec{c} we need to find a vector orthogonal to the vector space of qa\vec{q}_a and qb\vec{q}_b.
IDEA: first from c\vec{c} remove it's direction along qa\vec{q}_a and then remove it's direction along qb\vec{q}_b.

  • First remove it's direction along qa\vec{q}_a Let's call this vector coa\vec{c}_{o_{a}}.
    coa=cqaqaTqaTqac\vec{c}_{o_{a}} = \vec{c} - \frac{\vec{q}_a \vec{q}_a^T}{\vec{q}_a^T\vec{q}_a}\vec{c}
    qaTqa=1\vec{q}_a^T\vec{q}_a=1
    coa=cqaqaTc\Rightarrow \vec{c}_{o_{a}} = \vec{c} - \vec{q}_a \vec{q}_a^T \vec{c}
    coa=c(qac)qa\Rightarrow \vec{c}_{o_{a}} = \vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a

  • Now remove it's direction along qa\vec{q}_a This is our vector co\vec{c}_{o}.
    co=(c(qac)qa)qbqbTqbTqbc\displaystyle \vec{c}_{o} = \left(\vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a\right) - \frac{\vec{q}_b \vec{q}_b^T}{\vec{q}_b^T\vec{q}_b}\vec{c}
    qbTqb=1\vec{q}_b^T\vec{q}_b=1
    co=(c(qac)qa)(qbc)qb\displaystyle \Rightarrow \vec{c}_{o} = \left( \vec{c} - (\vec{q}_a\cdot\vec{c})\vec{q}_a\right) - (\vec{q}_b\cdot\vec{c})\vec{q}_b
    qc=coco\vec{q}_c=\frac{\vec{c}_o}{\|\vec{c}_o\|}

Now we can see a pattern here.
Say that we have nn(independent) (a1,a2,,an)( a_1, a_2, \cdots, a_n ) vectors and we have to find nn orthonormal vectors (q1,q2,,qn)( q_1, q_2, \cdots, q_n ) using these nn(independent) vectors.
we can deduce a pattern above,

aio=aik=1i1(qkai)qk\vec{a_i}_o=\vec{a_i}-\sum_{k=1}^{i-1} (\vec{q}_k\cdot\vec{a}_i)\vec{q}_k
qi=aioaio\vec{q}_i=\frac{\vec{a_i}_o}{\|\vec{a_i}_o\|}

So now we can find orthonormal vectors for any set of independent vectors.
This is The Gram Schmidt Process.