Minimizing weighted overlap orthonormalization
Formulation of the problem
Suppose we have a set of \(K\) linearly independent vectors \(\require{physics} \set{\ket{i} ; i=1 \cdots K}\), belonging to some \(N\)-dimensional vector space \(V\), and \(K \leq N\). We will collect their overlap (inner product) in the overlap matris \(\vb{S}\): \[\begin{equation} S_{ij} = \braket{i}{j} \thinspace . \end{equation}\]
The problem is to find a new set of \(K\) vectors \(\set{\ket{\varphi_i} ; i=1 \cdots K}\) that is orthonormal and minimizes \[\begin{equation} \label{eq:to_minimize} \sum_i^K w_i \braket{\Delta_i} \thinspace , \end{equation}\] where \(\set{w_i}\) is a set of weights and \[\begin{equation} \label{eq:Delta_vector} \ket{\Delta_i} = \ket{\varphi_i} - \ket{i} \thinspace . \end{equation}\]
Solving the problem
In literature, we can find a generalization of this problem (Carlson and Keller 1957). Instead of minimizing equation \(\eqref{eq:to_minimize}\), we can immediately turn to the problem of minimizing \[\begin{equation} \label{eq:to_minimize_Carlson} \sum_{ij}^K W_{ji} \braket{\Delta_i}{\Delta_j} \thinspace, \end{equation}\] where \(\vb{W}\) is an invertible and Hermitian \((K \times K)\)-matrix such that \((\vb{W} \vb{S} \vb{W})^{1/2}\) is positive definite.
Since \(V\) is an \(N\)-dimensional vector space, it admits an \(N\)-dimensional orthonormal basis, which we will denote by \[\begin{equation} \set{\ket{\varphi_\alpha} ; \alpha=1 \cdots N} \thinspace , \end{equation}\] in which the label \(\alpha\) runs through \(K\): \[\begin{equation} \alpha = 1, \dots, K, K+1, \dots, N \thinspace . \end{equation}\]
We can now calculate the inner product of every \(\ket{i}\) and \(\ket{\varphi_i}\) with respect to this basis. Since the basis is orthonormal, we immediately find that \[\begin{equation} \braket{\varphi_\alpha}{\varphi_i} = \delta_{\alpha i} \thinspace . \end{equation}\] Let us gather these matrix elements in an \((N \times K)\)-matrix \(\vb{E}\), in which the first \(K\) rows and columns form the identity matrix of dimension \(K\) and in which the other \((N-K)\) rows are full of zeros: \[\begin{equation} \label{eq:E_elements} E_{\alpha i} = \begin{cases} \braket{\varphi_\alpha}{\varphi_i} = \delta_{\alpha i} &\qif \alpha \leq K \\ 0 &\qif \alpha > K \end{cases} \thinspace , \end{equation}\] from which follows that \[\begin{equation} \vb{E}^\dagger \vb{E} = \vb{I} \thinspace . \end{equation}\] Proceeding with calculating the inner products of every \(\ket{i}\) with respect to the orthonormal basis, we will construct the matrix \(\vb{F}\) with elements \[\begin{equation} \label{eq:F_elements} F_{\alpha i} = \braket{\varphi_\alpha}{i} \thinspace , \end{equation}\] from which follows that \[\begin{equation} \vb{F}^\dagger \vb{F} = \vb{S} \thinspace . \end{equation}\]
Turning our attention back to equation \(\eqref{eq:to_minimize_Carlson}\), we can immediately write \[\begin{align} \sum_{ij}^K W_{ji} \braket{\Delta_i}{\Delta_j} &= \sum_{ij}^K W_{ji} \qty( \sum_\alpha^N \braket{\varphi_\alpha}{i} \braket{\varphi_\alpha}{j} ) \\ &= \sum_{ij}^K W_{ji} \qty( \sum_\alpha^N (E_{\alpha i} - F_{\alpha i})^* (E_{\alpha j} - F_{\alpha j}) ) \\ &= \sum_{ij}^K W_{ji} \qty( (\vb{E} - \vb{F})^\dagger (\vb{E} - \vb{F}) )_{ij} \end{align}\] by using a resolution of the identity subsequently recognizing equations \(\eqref{eq:E_elements}\), \(\eqref{eq:F_elements}\) and \(\eqref{eq:Delta_vector}\). We can work this out a little more, leading to \[\begin{align} \sum_{ij}^K W_{ji} \braket{\Delta_i}{\Delta_j} &= \tr(\vb{W} (\vb{E} - \vb{F})^\dagger (\vb{E} - \vb{F}) ) \\ &= \tr(\vb{W} (\vb{E}^\dagger \vb{E} + \vb{F}^\dagger \vb{F} - \vb{E}^\dagger \vb{F} - \vb{F}^\dagger \vb{E})) \\ &= \tr(\vb{W} (\vb{I} + \vb{S})) - \tr(\vb{W} \vb{E}^\dagger \vb{F}) - \tr(\vb{W} \vb{F}^\dagger \vb{E}) \thinspace . \end{align}\]
We will now perform a (small?) leap of faith by constructing an auxiliary matrix \(\vb{G}\) as \[\begin{equation} \vb{G} = \vb{F} \vb{W} \vb{R}^{-1} - \vb{E} \vb{R} \thinspace , \end{equation}\] in which the Hermitian matrix \(\vb{R}\) is the square root of \((\vb{W} \vb{S} \vb{W})^{1/2}\), i.e. \[\begin{equation} \vb{R}^2 = (\vb{W} \vb{S} \vb{W})^{1/2} \thinspace . \end{equation}\] We can now work out the following trace: \[\begin{align} \tr(\vb{G}^\dagger \vb{G}) &= \tr(\vb{R}^{-1} \vb{W} \vb{F}^\dagger \vb{F} \vb{W} \vb{R}^{-1} + \vb{R} \vb{E}^\dagger \vb{E} \vb{R}) \nonumber \\ &\phantom{=} - \tr( \vb{R}^{-1} \vb{W} \vb{F}^\dagger \vb{E} \vb{R} - \vb{R} \vb{E}^\dagger \vb{F} \vb{W} \vb{R}^{-1} ) \\ &= 2 \tr( (\vb{W} \vb{S} \vb{W})^{1/2} ) - \tr(\vb{W} \vb{F}^\dagger \vb{E}) - \tr(\vb{E}^\dagger \vb{F} \vb{W}) \\ &= 2 \tr( (\vb{W} \vb{S} \vb{W})^{1/2} ) - \tr(\vb{W} \vb{F}^\dagger \vb{E}) - \tr(\vb{W} \vb{E}^\dagger \vb{F}) \thinspace , \end{align}\] from which follows that \[\begin{equation} \sum_{ij}^K W_{ji} \braket{\Delta_i}{\Delta_j} = \tr(\vb{W} (\vb{I} + \vb{S})) + \tr(\vb{G}^\dagger \vb{G}) - 2 \tr( (\vb{W} \vb{S} \vb{W})^{1/2} ) \thinspace , \end{equation}\] which reaches its minimum when \[\begin{equation} \tr(\vb{G}^\dagger \vb{G}) = 0 \thinspace , \end{equation}\] since \(\vb{G}^\dagger \vb{G}\) is always positive semi-definite. We can therefore choose \(\vb{G} = \vb{0}\), leading to \[\begin{equation} \vb{F} = \vb{E} (\vb{W} \vb{S} \vb{W})^{1/2} \vb{W}^{-1} \thinspace . \end{equation}\] Therefore, \(\vb{F}\) is an \((N \times K)\)-matrix in which the first \(K\) rows and columns form the matrix \((\vb{W} \vb{S} \vb{W})^{1/2} \vb{W}^{-1}\), and the rest of the rows are zeros: \[\begin{equation} F_{\alpha i} = \begin{cases} ((\vb{W} \vb{S} \vb{W})^{1/2} \vb{W}^{-1})_{\alpha i} &\qif \alpha \leq K \\ 0 &\qif \alpha > K \end{cases} \thinspace . \end{equation}\] This means that \[\begin{align} \ket{i} &= \sum_\alpha^N F_{\alpha i} \ket{\varphi_\alpha} \\ &= \sum_{\alpha = 1}^K F_{\alpha i} + \sum_{\alpha = K+1}^N F_{\alpha i} \ket{\varphi_\alpha} \\ &= \sum_j^K ((\vb{W} \vb{S} \vb{W})^{1/2} \vb{W}^{-1})_{ji} \ket{j} \thinspace , \end{align}\] from which follows that the sought set of orthonormalized vectors in this \(K\)-dimensional subspace can be calculated using \[\begin{equation} \label{eq:orthonormalized_vectors} \ket{\varphi_i} = \sum_j^K (\vb{W} (\vb{W} \vb{S} \vb{W})^{-1/2})_{ji} \ket{j} \thinspace . \end{equation}\]
Let us now return our attention to our original problem, as is done in (Maksic 1981) in a little more detail than in (arlson1957?). Letting \(\vb{W}\) be a diagonal matrix, i.e. \[\begin{equation} W_{ij} = \delta_{ij} w_i \thinspace , \end{equation}\] reduces equation \(\eqref{eq:to_minimize_Carlson}\) to equation \(\eqref{eq:to_minimize}\) and we can still use equation \(\eqref{eq:orthonormalized_vectors}\) to calculate the orthonormalized vectors.
One important note is that, since \(\vb{W}\) must always be invertible, the diagonal elements should all be different from zero. This means that we can assign weights \(w_i\) to every vector \(\ket{i}\), describing their importance. If, for example, we assign a weight of zero, this means that that particular vector does not contribute to the minimization problem, and is therefore excluded from this type of orthogonalization. The vectors with weight zero could then be normalized using, for example, a Gram-Schmidt approach.
A generalization
We can try to generalize this result. Let us take the quantity \[\begin{equation} \tr(\vb{W}^n \vb{D}) = \sum_{ij}^K \qty(W^n)_{ji} \braket{\Delta_i}{\Delta_j} \thinspace , \end{equation}\] where \(n \in \mathbb{R}\), \(\vb{W}\) is still an invertible \((K \times K)\)-matrix, and \(\ket{\Delta_i}\) was already defined in equation \(\eqref{eq:Delta_vector}\). By taking the auxiliary matrix \(\vb{G}\) as \[\begin{equation} \vb{G} = \vb{F} \vb{W}^n \vb{R}^{-1} - \vb{E} \vb{R} \thinspace , \end{equation}\] in which \[\begin{equation} \vb{R}^2 = (\vb{W}^n \vb{S} \vb{W}^n)^{1/2} \thinspace , \end{equation}\] we can prove that the set of orthonormal orbitals \[\begin{equation} \ket{\varphi_i} = \sum_j^K (\vb{W}^n (\vb{W}^n \vb{S} \vb{W}^n)^{-1/2})_{ji} \ket{j} \thinspace . \end{equation}\] minimizes the quantity \(\tr(\vb{W}^n \vb{D})\). If \(\vb{W}\) is diagonal, then \[\begin{equation} \tr(\vb{W}^n \vb{D}) = \sum_i^K w_i^n \braket{\Delta_i} \thinspace , \end{equation}\] and we end up with a generalization of the weights in equation \(\eqref{eq:to_minimize}\).