Restricted orbital optimization

Often, we are working in a space that doesn’t span the whole FCI Fock space, so it makes sense to look for the best spatial orbitals, i.e. those that give the lowest energy. In previous sections, we have already established that the rotated energy is the expectation value of the old Hamiltonian in the rotated state: \[\begin{equation} \require{physics} E(\boldsymbol{\kappa}) % = \ev{% \exp(\hat{\kappa}) % \hat{\mathcal{H}} % \exp(-\hat{\kappa}) % }{\Psi} % \thinspace , \end{equation}\] in which \(\hat{\kappa}\) is the generator of real singlet rotations: \[\begin{equation} \hat{\kappa} = % \sum_{p>q}^K \kappa_{pq} \hat{E}^-_{pq} % \tag{\ref{eq:generator_real_singlet_rotations}} \thinspace . \end{equation}\] We can then use the BCH expansion, and write the energy as a function of the parameters \(\boldsymbol{\kappa}\): \[\begin{align} E(\boldsymbol{\kappa}) % &= \ev{% \hat{\mathcal{H}}% }{\Psi} % + \ev*{ % \comm*{ \hat{\kappa} }{ \hat{\mathcal{H}} } % }{\Psi} % + \frac{1}{2} \ev*{ % \comm*{ \hat{\kappa} }{ % \comm*{ \hat{\kappa} }{ \hat{\mathcal{H}}} % } % }{\Psi} + \cdots \\ & \approx E(\vb{0}) % + \sum_{p>q}^K \kappa_{pq} % \ev*{ % \comm*{ \hat{E}^-_{pq} }{ \hat{\mathcal{H}} } % }{\Psi} % + \frac{1}{2} \sum_{p>q}^K \sum_{r>s}^K \kappa_{pq} \kappa_{rs} % \ev*{ % \comm*{ \hat{E}^-_{pq} }{ % \comm*{ \hat{E}^-_{rs} }{ \hat{\mathcal{H}} } % } % }{\Psi} \end{align}\] up to second order. Taking derivatives with respect to these free, real parameters \(\kappa_{pq}\) (with \(p>q\)) at \(\boldsymbol{\kappa}=\vb{0}\), we define the orbital gradient \(\vb{E}^{(1)}\) as: \[\begin{equation} \vb{E}^{(1)}_{pq} = \eval{\pdv{E(\boldsymbol{\kappa})}{\kappa_{pq}}}_{\boldsymbol{\kappa}=\vb{0}} \end{equation}\] and the orbital Hessian \(\vb{E}^{(2)}\) as: \[\begin{equation} \vb{E}^{(2)}_{pqrs} = \eval{\pdv{E(\boldsymbol{\kappa})}{\kappa_{pq}}{\kappa_{rs}}}_{\boldsymbol{\kappa}=\vb{0}} \thinspace . \end{equation}\]

Using the Fockians and super-Fockians that were introduced in section \(\ref{sec:fockians_spin_separated}\), we can work out the orbital gradient as: \[\begin{align} \vb{E}^{(1)}_{pq} % &= % \eval{ % \pdv{ % E(\boldsymbol{\kappa}) % }{ \kappa_{pq} } % }_{ \boldsymbol{\kappa} = \vb{0} } \\ &= % \ev{ % \comm*{ \hat{E}^-_{pq} }{ \hat{\mathcal{H}} }% }{\Psi} \\ &= 2 ( % \mathscr{F}_{pq}( \hat{\mathcal{H}} ) % - \mathscr{F}_{qp}( \hat{\mathcal{H}}) ) % \thinspace , \end{align}\] in which the Fockian matrix of the Hamiltonian is called the Fock matrix: \[\begin{equation} \mathscr{F}_{pq}( \hat{\mathcal{H}} ) % = \sum_{r}^K h_{qr} D_{pr} % + \sum_{rst}^K g_{qrst} d_{prst} % \thinspace . \end{equation}\] The orbital Hessian can then be calculated as \[\begin{align} \vb{E}^{(2)}_{pqrs} % = \eval{ % \pdv{ % E(\boldsymbol{\kappa}) % }{ \kappa_{pq} }{ \kappa_{rs}} % }_{ \boldsymbol{\kappa} = \vb{0} } % &= \frac{1}{2} \qty( % \ev{ % \comm*{ \hat{E}^-_{pq} }{ % \comm*{ \hat{E}^-_{rs} }{ \hat{\mathcal{H}}} % } % }{\Psi} % + \ev{ % \comm*{ \hat{E}^-_{rs} }{ % \comm*{ \hat{E}^-_{pq} }{ \hat{\mathcal{H}}} % } % }{\Psi}) \\ & = \frac{1}{2} (1 + P_{pq,rs}) % \ev{ % \comm*{ \hat{E}^-_{pq} }{ % \comm*{ \hat{E}^-_{rs} }{ \hat{\mathcal{H}}} % } % }{\Psi} % \thinspace , \end{align}\] in which \[\begin{equation} P_{pq,rs} = P_{pr} P_{qs} \end{equation}\] permutes the indices \(pr\) with each other and \(rs\) with each other. Working out the commutator, we find: \[\begin{align} \vb{E}^{(2)}_{pqrs} % &= \frac{1}{2} (1 + P_{pq,rs}) (1 - P_{pq}) (1 - P_{rs}) % (1 + P_{pr,qs}) \mathscr{G}_{pqrs}( \hat{\mathcal{H}} ) \\ &= (1 + P_{pq,rs}) (1 - P_{pq}) (1 - P_{rs}) % \mathscr{G}_{pqrs}( \hat{\mathcal{H}} ) \\ &= \mathscr{G}_{pqrs}( \hat{\mathcal{H}} ) % - \mathscr{G}_{pqsr}( \hat{\mathcal{H}} ) % - \mathscr{G}_{qprs}( \hat{\mathcal{H}} ) % + \mathscr{G}_{qpsr}( \hat{\mathcal{H}} ) % + \mathscr{G}_{rspq}( \hat{\mathcal{H}} ) % - \mathscr{G}_{rsqp}( \hat{\mathcal{H}} ) \notag \\ & \qquad - \mathscr{G}_{srpq}( \hat{\mathcal{H}} ) % + \mathscr{G}_{srqp}( \hat{\mathcal{H}} ) % \thinspace , \end{align}\] in which the super-Fock matrix of the Hamiltonian is \[\begin{equation} \mathscr{G}_{pqrs}( \hat{\mathcal{H}} ) % = \delta_{qr} \mathscr{F}_{ps}( \hat{\mathcal{H}} ) % - h_{qr} D_{ps} + Z_{pqrs}(\hat{g}) % \thinspace . \end{equation}\] Furthermore, like Helgaker does in chapter 10.8.6, we can play with the permutation operators to finally obtain a less elaborate expression for the orbital Hessian: \[\begin{equation} \vb{E}^{(2)}_{pqrs} % = (1 - P_{pq}) (1 - P_{rs}) \qty[ % 2 h_{qs} D_{pr} % - \delta_{qs} ( % \mathscr{F}_{pr}( \hat{\mathcal{H}} ) % + \mathscr{F}_{rp}( \hat{\mathcal{H}} ) % ) % + 2 Z_{pqrs} % ] % \thinspace . \end{equation}\]

By looking at the formulas for the electronic gradient and electronic Hessian, we can see that the electronic gradient (in matrix form) is anti-symmetric: \[\begin{equation} \vb{E}^{(1)}_{pq} = - \vb{E}^{(1)}_{qp} % \thinspace , \end{equation}\] and the electronic Hessian (in rank-4 tensor form) is symmetric by an interchange of the pair indices \(pq\) and \(rs\): \[\begin{equation} \vb{E}^{(2)}_{pqrs} = \vb{E}^{(2)}_{rspq} % \thinspace , \end{equation}\] in addition to the following symmetries related to an antisymmetric interchange of \(pq\) or \(rs\): \[\begin{equation} \vb{E}^{(2)}_{pqrs} % = -\vb{E}^{(2)}_{qprs} % = - \vb{E}^{(2)}_{pqsr} % = \vb{E}^{(2)}_{qpsr} % \thinspace . \end{equation}\]

As we can see from the formulas above, this gradient-and-Hessian approach for orbital rotations (and subsequently orbital optimization) requires the 1- and 2-DMs. In this section, the goal was to derive formulas for the electronic gradient and electronic Hessian, in order to be able to use numerical optimization techniques to optimize the orbitals with respect to their resulting electronic energy. Once we find, by for example a Newton algorithm, or steepest descent, … a set of parameters \(\boldsymbol{\kappa}\) that minimize the energy, we rotate the one- and two-electron integrals and subsequently check if the energy has converged. This is a procedure consistent with the fact that we are evaluating the electronic gradient and Hessian at \(\boldsymbol{\kappa} = \vb{0}\). Written concisely, we have the following algorithm:

  1. At iteration \(k\), calculate the 1- and 2-DMs;

1.Calculate the electronic gradient and Hessian at the current orbitals

  1. Find the set of orbital rotation parameters \(\boldsymbol{\kappa}_k\) that minimize the electronic energy, for example by using a Newton-step-based algorithm, yielding \(E_k^\text{min}\);

  2. Use the found orbital rotations parameters \(\boldsymbol{\kappa}_k\) to rotate the one- and two-electron integrals (i.e. make sure in the next iteration, we are starting from a \(\boldsymbol{\kappa} = \vb{0}\) situation again) using the unitary matrix \(\vb{U}_k = \exp(-\boldsymbol{\kappa}_k)\);

  3. Terminate if, for example, the change in energy is smaller than a certain threshold \(\epsilon\): \[\begin{equation} |E_k^\text{min} - E_{k-1}^\text{min}| < \epsilon % \thinspace , \end{equation}\] and accept \(E_k^\text{min}\) as the orbital-optimized energy.

We must be cautious however, when employing the previous formulas for the electronic gradient and Hessian, as the electronic gradient is written as a matrix, and the electronic Hessian as a rank-4 tensor. When we would like to perform for example a Newton step, we should use the gradient in column vector form, and Hessian in matrix form. We can put the elements of the gradient matrix (i.e. indices \(pq\)) into a gradient vector (e.g. indices \(i\)) by doing the following (column-major) transformation: \[\begin{equation} i = p + K q \thinspace , \end{equation}\] in which \(K\) is the number of spatial orbitals. For the Hessian (i.e. tensor indices \(pqrs\) and matrix indices \(ij\)), we have \[\begin{align} & i = p + K q \\ & j = r + K s \thinspace . \end{align}\]

Since working with the algebra of the elementary second quantization operators is just another (albeit a much simpler) way of doing electronic structure calculations, we should be able to find the same formulas doing things the old-fashioned way, as in (Siegbahn et al. 1981). We can write the energy after rotation also as \[\begin{align} & E(\vb{U}) % = \sum_{pq}^K h_{pq}(\vb{U}) D_{pq} % + \frac{1}{2} \sum_{pqrs}^K g_{pqrs}(\vb{U}) d_{pqrs} \\ % & E(\boldsymbol{\kappa}) % = \sum_{pq}^K h_{pq}(\boldsymbol{\kappa}) D_{pq} % + \frac{1}{2} \sum_{pqrs}^K g_{pqrs}(\boldsymbol{\kappa}) d_{pqrs} % \thinspace . \end{align}\] By introducing the electronic gradient and Hessian in terms of the parameters \(\boldsymbol{\kappa}\) as \[\begin{equation} \vb{E}^{(1)}_{pq} % = \eval{ % \pdv{ % E(\boldsymbol{\kappa}) % }{ \kappa_{pq} } % }_{ \boldsymbol{\kappa} = \vb{0} } % = \sum_{mn}^K % \eval{ % \pdv{ E(\vb{U}) }{ U_{mn} } % }_{\vb{U} = \vb{I}} % \qty[ % \eval{ % \pdv{ \vb{U} }{ \kappa_{pq}} % }_{ \boldsymbol{\kappa} = \vb{0} } % ]_{mn} \end{equation}\] and \[\begin{equation} \label{eq:hessian_kappa_via_U} \begin{split} \vb{E}^{(2)}_{pqrs} % = \eval{ % \pdv{ % E(\boldsymbol{\kappa}) % }{ \kappa_{pq} }{ \kappa_{rs} } % }_{ \boldsymbol{\kappa} = \vb{0} } % = &\sum_{mn}^K % \eval{ % \pdv{ E(\vb{U}) }{ U_{mn}} % }_{ \vb{U} = \vb{I} } % \qty[ % \eval{ % \pdv{ \vb{U} }{ \kappa_{pq} }{ \kappa_{rs}} % }_{ \boldsymbol{\kappa} = \vb{0} } % ]_{mn} \\ &+ \sum_{klmn}^K % \eval{ % \pdv{ E(\vb{U}) } {U_{kl} }{ U_{mn}} % }_{ \vb{U} = \vb{I} } % \qty[ % \eval{ % \pdv{ \vb{U} }{ \kappa_{pq} } % }_{ \boldsymbol{\kappa} = \vb{0} } % ]_{kl} % \qty[ % \eval{ % \pdv{ \vb{U} }{ \kappa_{rs} } % }_{ \boldsymbol{\kappa} = \vb{0} } % ]_{mn} \thinspace , \end{split} \end{equation}\] respectively (in which we still require \(p>q\) and \(r>s\)), we can write the energy up to second order as \[\begin{equation} E(\boldsymbol{\kappa}) % = E(\vb{0}) % + \boldsymbol{\kappa}^{\text{T}} \thinspace \vb{E}^{(1)} % + \frac{1}{2} % \boldsymbol{\kappa}^{\text{T}} % \thinspace \vb{E}^{(2)} % \boldsymbol{\kappa} % \thinspace . \end{equation}\] For the first derivatives, we can then calculate \[\begin{equation} \begin{split} \pdv{ E(\vb{U}) }{ U_{mn} } = % & 2 \sum_{pq}^K U_{qp} h_{mq} D_{np} \\ & + \frac{1}{2} % \sum_{ pqr,stu } % }^K U_{sp} U_{tq} U_{ur} \big[ % g_{mstu} (d_{npqr} + d_{nprq}) % + g_{must} (d_{pqnr} + d_{pqrn}) % \big] % \thinspace , \end{split} \end{equation}\] which leads to, when evaluating at \(\vb{U} = \vb{I}\): \[\begin{align} \pdv{ E(\vb{U}) }{ U_{mn} } % \eval_{ \vb{U} = \vb{I} } % &= 2 \sum_p^K h_{mp} D_{np} % + 2 \sum_{pqr}^K g_{mpqr} d_{npqr} \\ &= 2 \thinspace \mathscr{F}_{nm}( \hat{\mathcal{H}} ) % \thinspace . \end{align}\] As a side note, we are finding exactly the the transposed result as in (Siegbahn et al. 1981), but that is because they define the orbital transformation by columns and not (see their paper, under equation (2)), like we do, by rows (cfr. equation \(\eqref{eq:spinor_transformation_matrix_expression}\)). The gradient (expressed with respect to the anti-Hermitian parameters \(\boldsymbol{\kappa}\)) then becomes \[\begin{equation} \vb{E}^{(1)}_{pq} % = 2 ( % \mathscr{F}_{pq}( \hat{\mathcal{H}} ) % - \mathscr{F}_{qp}( \hat{\mathcal{H}} ) % ) % \thinspace . \end{equation}\] For the second derivatives, we have \[\begin{equation} \begin{split} \pdv{ E(\vb{U}) }{ U_{kl} }{ U_{mn} } % = 2 h_{mk} D_{nl} % + \sum_{pqrs}^K U_{rp} U_{sq} \Big[ % & g_{mkrs} (d_{nlpq} + d_{nlqp}) % + g_{mskr} (d_{lpnq} + d_{lpqn}) \\ &+ g_{mrks}(d_{nplq} + d_{npql}) % \Big] % \thinspace , \end{split} \end{equation}\] which for \(\vb{U} = \vb{I}\) leads to \[\begin{equation} \pdv{ E(\vb{U}) }{ U_{kl} }{ U_{mn} } % \eval_{ \vb{U} = \vb{I} } % = 2 h_{mk} D_{nl} % + 2 \sum_{pq}^K \qty[ % g_{mkpq} d_{nlpq} % + g_{mpkq} (d_{nplq} + d_{npql}) % ] \thinspace . \end{equation}\] Using equation \(\eqref{eq:hessian_kappa_via_U}\), we subsequently find for the Hessian \[\begin{equation} \begin{split} \vb{E}^{(2)}_{pqrs} % = & \delta_{qr} ( % \mathscr{F}_{sp}( \hat{\mathcal{H}} ) % + \mathscr{F}_{ps}( \hat{\mathcal{H}} ) % ) % - \delta_{pr} ( % \mathscr{F}_{sq}( \hat{\mathcal{H}} ) + \mathscr{F}_{qs}( \hat{\mathcal{H}} ) % ) \\ & - \delta_{qs} ( % \mathscr{F}_{rp}( \hat{\mathcal{H}} ) % + \mathscr{F}_{pr}( \hat{\mathcal{H}} ) % ) % + \delta_{ps} ( % \mathscr{F}_{rq}( \hat{\mathcal{H}} ) % + \mathscr{F}_{qr}( \hat{\mathcal{H}} ) % ) \\ &+ 2 h_{sq} D_{rp} % + 2 \sum_{tu}^K ( % g_{sqtu} d_{rptu} + g_{stqu} (d_{rtpu} + d_{rtup}) % ) \\ &- 2 h_{sp} D_{rq} % - 2 \sum_{tu}^K ( % g_{sptu} d_{rqtu} + g_{stpu} (d_{rtqu} + d_{rtuq}) % ) \\ &- 2 h_{rq} D_{sp} % - 2 \sum_{tu}^K ( % g_{rqtu} d_{sptu} + g_{rtqu} (d_{stpu} + d_{stup}) % ) \\ &+ 2 h_{rp} D_{sq} % + 2 \sum_{tu}^K ( % g_{rptu} d_{sqtu} + g_{rtpu} (d_{stqu} + d_{stuq}) % ) \thinspace , \end{split} \end{equation}\] which we confirm to be equal to the expression that we derived by using the Fockians and super-Fockians.

References

Siegbahn, Per E. M., Jan Almlöf, Anders Heiberg, and Björn O. Roos. 1981. The complete active space SCF (CASSCF) method in a Newton-Raphson formulation with application to the HNO molecule.” The Journal of Chemical Physics 74 (4): 2384–96. https://doi.org/10.1063/1.441359.