%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% %%%% LaTeX file for demonstration of various ``texniques''. %%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \documentstyle{article} %--- I prefer to use more of the page than is allowed by defaut. Hence, ... \textwidth=6.5in \textheight=9.0in \voffset=-0.75in \hoffset=-0.75in %------------------- \begin{document} %--------- Some of my favorite macros ------------------ \newtheorem{thm}{Theorem} \newtheorem{cor}{Corollary} \newtheorem{lm}{Lemma} % \newcommand{\be}{\begin{equation}} \newcommand{\ee}{\end{equation}} \newcommand{\bear}{\begin{eqnarray}} \newcommand{\bears}{\begin{eqnarray*}} \newcommand{\eear}{\end{eqnarray}} \newcommand{\eears}{\end{eqnarray*}} \newcommand{\eqdef}{\stackrel{\rm def}{=}} \newcommand{\QED}{\hfill\rule{2mm}{2mm}\vspace{3mm}} % \newcommand{\sP}{{\cal P}} \newcommand{\sR}{{\cal R}} % ------------------------------------------------------ \title{Convergence properties of the \\ Conjugate Gradient Method} \author{J. G. Wade \thanks{ \today } } % --------------------------------------------------- \maketitle \begin{abstract} The Conjugate Gradient method for quadratic minimization enjoys widespread popularity because it is relatively easy to implement and, at the same time, posseses good convergence properties. One of the best-known properties is that if the problem has $n$ degrees of freedom then convergence is obtained in at most $n$ steps (under exact arithmetic). This result certainly has some practical value. However, for very large $n$, it may be infeasible or undesirable to carry out $n$ iterations. The good news is that it is often unnecessary also. In this talk we carry out a rather detailed convergence analysis, with two goals. First, we want to study the {\em rate} of convergence, and second, we want to see why, under some circumstances, we can guanantee convergence in fewer than $n$ steps. \end{abstract} % -------------------------------------------------- \section{Review of the method} \setcounter{equation}{0} The notation and analysis here comes largely from \cite[\S 8.7]{SB}. See also \cite{L_blue} for a similar analysis. For the c.g. method in infinite dimensions, see \cite{L_red}. For excellent, ready-to-code algorithms and some additional enlightening analysis, see \cite{GV}. The Conjugate gradient method is a non-linear iterative method for \[ Ax=b, \] where $A$ is an $n\times n$ matrix, $A^T=A>0$, and $b\in {\bf R}^n$ is given. An equivalent formulation of the problem is that of minimizing the function \bear \label{Fdef} F(z) &\eqdef& {1\over2}(b-Az)^TA^{-1}(b-Az) \\ &=& {1\over2}z^TAz - b^Tz + {1\over2}b^TA^{-1}b \label{Fmin} \eear Since $A^T=A>0$, from (\ref{Fmin}) we easily see that $x=\arg\min F(z)$ iff $Ax=b$. \noindent\underline{Algorithm:} \begin{tabbing} Given \= $x_0$, generate $x_k, k=1,2,\dots$ by: \\ \> $r_0 = b - Ax_0$, \\ \> $p_0 = r_0$. \\ \> For \= $k=0,1,2,\dots:$ \\ \> \> If \= $p_0\neq 0$, then: \\ \> \> \> (i) $a_k = (r_k^Tr_k)/(p_k^TAp_k)$ \\ \> \> \> (ii) $x_{k+1} = x_k + a_kp_k$ \\ \> \> \> (iii) $r_{k+1} = r_k - a_kAp_k$ \\ \> \> \> (iv) $b_k = (r_{k+1}^Tr_{k+1})/(r_k^Tr_k)$ \\ \> \> \> (v) $p_{k+1} = r_{k+1} + b_kp_k.$ \\ \> \> End If. \\ \> End While. \\ End. \end{tabbing} The cg method follows the pattern of a wide class of ``descent methods'' for minimization. At the $k^{th}$ step, we (a) compute a ``descent direction'' $p_k$, and then (b) minimize $F(x_k+sp_k)$ over $s>0$ to obtain $x_{k+1}$. One of the main features of the cg method is that the directions $p_k$ turn out to be mutually ``A-orthogonal'', that is, orthogonal in the inner product \[ \langle x,y \rangle_A \eqdef x^TAy. \] Another main feature is this: \begin{thm} If all arithmetic is carried out in infinite precision, then $x_k=x$ for some $k\leq n$. \end{thm} This result certainly has some practical value. However, for very large $n$, it may be infeasible or undesirable to carry out $n$ iterations. The good news is that it is often unnecessary also. Pursuant to this comment, we want to look at two refinements of the analysis. First, we want to study the {\em rate} at which $\Vert x_k-x\Vert_A$ converges to zero. Second, we want to see why, under some circumstances, we can guanantee $x_k=x$ for some $k\leq m$ where $m