FloatingPointOperationsinMatrix-VectorCalculus
                                (Version1.3)


                            RaphaelHunger


                            TechnicalReport
                                 2007


                       TechnischeUniversitätMünchenAssociateInstituteforSignalProcessing
                       Univ.-Prof.Dr.-Ing.WolfgangUtschick       History


       Version1.00:October2005
           -Initialversion
       Version1.01:2006
           -RewriteofsesquilinearformwithareducedamountofFLOPs
           -SeveralTyposﬁxedconcerningthenumberofFLOPSrequiredfortheCholeskydecompo-
            sition
       Version1.2:November2006
           -ConditionsfortheexistenceofthestandardLL H Choleskydecompositionspeciﬁed(pos-
            itivedeﬁniteness)
           -OuterproductversionofLL H Choleskydecompositionremoved
           -FLOPsrequiredinGaxpyversionofLL H Choleskydecompositionupdated
           -L1 DLH Choleskydecompositionadded 1 -Matrix-matrixproductLCaddedwithLtriangular
           -Matrix-matrixproductL 1 CaddedwithLtriangularandL 1 notknownapriori
           -InverseL 1 ofalowertriangularmatrixwithonesonthemaindiagonaladded 1 Version1.3:September2007
           -Firstgloballyaccessibledocumentversion
       ToDo:(unknownwhen)
           -QR-Decomposition
           -LR-Decomposition


          Pleasereportanybugandsuggestiontohunger@tum.de


                                       2         Contents


         1. Introduction                                                    4

         2. FlopCounting                                                   5
            2.1 MatrixProducts....................................  5
                2.1.1 Scalar-VectorMultiplication a.......................  5
                2.1.2 Scalar-MatrixMultiplication A ......................  5
                2.1.3 InnerProductaH bofTwoVectors......................  5
                2.1.4 OuterProductac H ofTwoVectors......................  5
                2.1.5 Matrix-VectorProductAb..........................  6
                2.1.6 Matrix-MatrixProductAC .........................  6
                2.1.7 MatrixDiagonalMatrixProductAD ....................  6
                2.1.8 Matrix-MatrixProductLD .........................  6
                2.1.9 Matrix-MatrixProductL1 D.........................  6
                2.1.10 Matrix-MatrixProductLCwithLLowerTriangular............  6
                2.1.11 GramAH AofA...............................  6
                2.1.12 SquaredFrobeniusNormkAk2 =tr(AH A) ................  7F 2.1.13 SesquilinearFormcH Ab...........................  7
                2.1.14 HermitianFormaH Ra............................  7
                2.1.15 GramLH LofaLowerTriangularMatrixL.................  7
            2.2 Decompositions....................................  8
                2.2.1 CholeskyDecompositionR=LL H (GaxpyVersion) ...........  8
                2.2.2 CholeskyDecompositionR=L1 DLH ................... 10 1 2.3 InversesofMatrices.................................. 11
                2.3.1 InverseL 1 ofaLowerTriangularMatrixL ................ 11
                2.3.2 InverseL 1 ofaLowerTriangularMatrixL1                      1 withOnesontheMainDi-
                     agonal..................................... 12
                2.3.3 InverseR 1 ofaPositiveDeﬁniteMatrixR................. 13
            2.4 SolvingSystemsofEquations ............................ 13
                2.4.1 ProductL 1 CwithL 1 notknownapriori. ................ 13

         3. Overview                                                      14

            Appendix                                                     15

            Bibliography                                                   16


                                         3       1. Introduction


       Forthedesignofefﬁcientundlow-complexityalgorithmsinmanysignal-processingtasks,ade-
       tailedanalysisoftherequirednumberofﬂoating-pointoperations(FLOPs)isofteninevitable.
       Mostfrequently,matrixoperationsareinvolved,suchasmatrix-matrixproductsandinversesof
       matrices.StructureslikeHermitenessortriangularityforexamplecanbeexploitedtoreducethe
       numberofneededFLOPsandwillbediscussedhere.Inthistechnicalreport,wederiveexpressions
       forthenumberofmultiplicationsandsummationsthatamajorityofsignalprocessingalgorithms
       inmobilecommunicationsbringwiththem.


          Acknowledgments:
          TheauthorwouldliketothankDipl.-Ing.DavidA.SchmidtandDipl.-Ing.GuidoDietlforthe
       fruitfuldiscussionsonthistopic.


                                       4         2. FlopCounting


         Inthischapter,weofferexpressionsforthenumberofcomplexmultiplicationsandsummations
         requiredforseveralmatrix-vectoroperations.Aﬂoating-pointoperation(FLOP)isassumedtobe
         eitheracomplexmultiplicationoracomplexsummationhere,despitethefactthatacomplexmul-
         tiplicationrequires4realmultiplicationsand2realsummationswhereasacomplexsummations
         constistsofonly2realsummations,makingamultiplicationmoreexpensivethanasummation.
         However,wecounteachoperationasoneFLOP.
            Throughoutthisreport,weassume 2Ctobeascalar,thevectorsa2CN ,b2CN ,and
         c2CM tohavedimensionN,N,andM,respectively.ThematricesA2CM N ,B2CN N ,
         andC2CN L areassumedtohavenospecialstructure,whereasR=RH 2CN N isHermitian
         andD=diagfd‘ gN 2CN N isdiagonal.LisalowertriangularN Nmatrix,e‘=1                                           n denotes
         theunitvectorwitha1inthen-throwandzeroselsewhere.Itsdimensionalityischosensuchthat
         therespectivematrix-vectorproductexists.Finally,[A]a;b denotestheelementinthea-throwand
         b-thcolumnofA,[A]a:b;c:d selectsthesubmatrixofAconsistingofrowsatobandcolumnscto
         d.0a b isthea bzeromatrix.Transposition,Hermitiantransposition,conjugate,andreal-part
         operatoraredenotedby( )T ,( )H ,( )  ,and<f g,respectively,andrequirenoFLOP.


         2.1MatrixProducts

         FrequentlyarisingmatrixproductsandtheamountofFLOPsrequiredfortheircomputationwill
         bediscussedinthissection.


         2.1.1Scalar-VectorMultiplication a
         Asimplemultiplication aofavectorawithascalar requiresNmultiplicationsandnosum-
         mation.


         2.1.2Scalar-MatrixMultiplication A
         ExtendingtheresultfromSubsection2.1.1toascalarmatrixmultiplication ArequiresNM
         multiplicationsandagainnosummation.


         2.1.3InnerProductaH bofTwoVectors
         AninnerproductaH brequiresNmultiplicationsandN 1summations,i.e.,2N 1FLOPs.


         2.1.4OuterProductac H ofTwoVectors
         Anouterproductac H requiresNMmultiplicationsandnosummation.

                                         5       6        2.FlopCounting

       2.1.5Matrix-VectorProductAb
       ComputingAbcorrespondstoapplyingtheinnerproductruleaH bfromSubsection2.1.3Mtimes. i Obviously,1 i MandaH representsthei-throwofA.Hence,itscomputationcostsMNi multiplicationsandM(N 1)summations,i.e.,2MN MFLOPs.

       2.1.6Matrix-MatrixProductAC
       Repeatedapplicationofthematrix-vectorruleAc i fromSubsection2.1.5withci beingthei-th
       columnofCyieldstheoverallmatrix-matrixproductAC.Since1 i L,thematrix-matrix
       producthastheL-foldcomplexityofthematrix-vectorproduct.Thus,itneedsMNLmultiplica-
       tionsandML(N 1)summations,altogether2MNL MLFLOPs.

       2.1.7MatrixDiagonalMatrixProductAD
       IftherighthandsidematrixDofthematrixproductADisdiagonal,thecomputationalload
       reducestoMmultiplicationsforeachoftheNcolumnsofA,sincethen-thcolumnofAis
       scaledbythen-thmaindiagonalelementofD.Thus,MNmultiplicationsintotalarerequiredfor
       thecomputationofAD,nosummationsareneeded.

       2.1.8Matrix-MatrixProductLD
       WhenmultiplyingalowertriangularmatrixLbyadiagonalmatrixD,columnnofthematrix
       productrequiresN n+1multiplicationsandnosummations.Withn=1;:::;N,weget
        1 N2 +1 Nmultiplications. 2    2

       2.1.9Matrix-MatrixProductL1 D
       WhenmultiplyingalowertriangularmatrixL1 withonesonthemaindiagonalbyadiagonal
       matrixD,columnnofthematrixproductrequiresN nmultiplicationsandnosummations.
       Withn=1;:::;N,weget 1 N2  1 Nmultiplications. 2    2

       2.1.10Matrix-MatrixProductLCwithLLowerTriangular
       ComputingtheproductofalowertriangularmatrixL2CN N andC2CN L isdonecolumn-
       wise.ThenthelementineachPcolumnofLCrequiresnmultiplicationsPandn 1summations,
       sothecompletecolumnneeds  N n=N2 +N multiplicationsand  N (n 1)=N2  N
                              n=1    2   2               n=1        2   2 summations.Thecompletematrix-matrixproductisobtainedfromcomputingLcolumns.Wehave
        N2 L +NL multiplicationsand N2 L  NL summations,yieldingatotalamountofN2 LFLOPs. 2   2               2   2

       2.1.11GramAH AofA
       IncontrasttothegeneralmatrixproductfromSubsection2.1.6,wecanmakeuseoftheHermitian
       structureoftheproductAH A2CN N .Hence,thestrictlylowertriangularpartofAH Aneed
       notbecomputed,sinceitcorrespondstotheHermitianofthestrictlyuppertriangularpart.For
       thisreason,wehavetocomputeonlytheNmaindiagonalentriesofAH Aandthe N2  N upper 2
       off-diagonalelements,soonly N2 +N differententrieshavetobeevaluated.Eachelementrequires 2 aninnerproductstepfromSubsection2.1.3costingMmultiplicationsandM 1summations.
       Therefore, 1 MN(N+1)multiplicationsand 1 (M 1)N(N+1)summationsareneeded,making 2                      2
       upatotalamountofMN2 +MN N2  N FLOPs. 2   2                                                         2.1MatrixProducts   7

         2.1.12SquaredFrobeniusNormkAk2 =tr(AH A)F
         ThesquaredHilbert-SchmidtnormkAk2 followsfromsumminguptheMNsquaredentriesfrom F A.WethereforehaveMNmultiplicationsandMN 1summations,yieldingatotalof2MN 1
         FLOPs.

         2.1.13SesquilinearFormcH Ab
         ThesesquilinearformcH Abshouldbeevaluatedbycomputingthematrix-vectorproductAbina
         ﬁrststepandthenmultiplyingwiththerowvectorcH fromthelefthandside.Thematrixvector
         productrequiresMNmultiplicationsandM(N 1)summations,whereastheinnerproductneeds
         MmultiplicationsandM 1summations.Altogether,M(N+1)multiplicationsandMN 1
         summationshavetobecomputedforthesesquilinearformcH Ab,yieldingatotalnumberof
         2MN+M 1ﬂops.

         2.1.14HermitianFormaH Ra
         WiththeHermitianmatrixR=RH ,theproductaH Racanbeexpressedas

                              XN XN
                        aH Ra=     aH em eT Rem  n eT an
                              m=1n=1
                              XN XN
                             =     a  a                                (2.1) mn rm;n
                              m=1n=1
                              XN           N X 1 XN
                             =   jam j2 rm;m +2       <fa  amn rm;n g;
                              m=1          m=1n=m+1

         witham =[a]m;1 ,andrm;n =[R]m;n .Theﬁrstsumaccumulatestheweightedmaindiagonal
         entriesandrequires2NmultiplicationsandN 1summations. 1 Thesecondpartof(2.1)accumu-
         latesallweightedoff-diagonalentriesfromA.Thelasttwosummationssumup N(N 1) terms 2 .2
         Consequently,thesecondpartof(2.1)requires N(N 1)  1summationsandN(N 1)products 3 .2 Finally,thetwopartshavetobeaddedaccountingforanadditionalsummationandyieldingan
         overallamountofN2 +Nproductsand 1 N2 +1 N 1summations,correspondingto 3 N2 +3 N 12   2                        2   2 FLOPs 4 .

         2.1.15GramLH LofaLowerTriangularMatrixL
         Duringthecomputationoftheinverseofapositivedeﬁnitematrix,theGrammatrixofalower
         triangularmatrixoccurswhenCholeskydecompositionisapplied.Again,wemakeuseofthe
         HermitianstructureoftheGramLH L,soonlythemaindiagonalentriesandtheupperrightoff-
         diagonalentriesoftheproducthavetobeevaluated.Thea-thmain-diagonalentrycanbeexpressed

           1 Wedonotexploitthefactthatonlyreal-valuedsummandsareaccumulatedasweonlyaccountforcomplexﬂops. P  P      P                 P2 N 1 N   1= N 1 (N m)=N(N 1)  N 1 m=N(N 1) N(N 1) =N(N 1) .Wemade m=1 n=m+1    m=1                m=1            2     2 useof(A1)intheAppendixforthecomputationofthelastsumaccumulatingsubsequentintegers.
           3 Thescalingwiththefactor2doesnotrequireaFLOP,asitcanbeimplementedbyasimplebitshift.
           4 Clearly,ifN=1,wehavetosubtractonesummationfromthecalculationsincenooff-diagonalentriesexist.       8        2.FlopCounting

       as                               XN
                                [LH L]a;a =   j‘n;a j2 ;                     (2.2)
                                        n=a
       with‘n;a =[L]n;a ,requiringN a+1multiplicationsandN asummations.Hence,allmainP                                   Pdiagonalelementsneed  N (N n+1)=1 N2 +1 Nmultiplicationsand  N (N n)=n=1           2    2                 n=11 N2  1 Nsummations. 2    2 Theupperrightoff-diagonalentry[LH L]a;b inrowaandcolumnbwitha<breadsas

                                        XN
                                [LH L]a;b =   ‘  ‘n;an;b ;                     (2.3)
                                        n=b
       againaccountingforN b+1multiplicationsandN bsummations.Thesetwoexpressions
       havetobesummedupoverall1 a N 1anda+1 b N,andforthenumberof
       multiplications,weﬁnd
                             "                 #N X 1 XN           N X 1               XN
               (N b+1)=    (N a)(N+1)     b
         a=1b=a+1          a=1              b=a+1
                          N X 1                                  N(N+1) a(a+1)=    N2 +N a(N+1)        2a=1
                          N X 1                     N2 +N  a2        1=          +   a N+2     2        2a=1                                  (N 1)(N+1)N  (N 1)N(2N 1)       1 N(N 1)=             +                N+2             2 6            2    2
                          1    1= N3   N:6    6                                     (2.4)
       Again,wemadeuseof(A1)forthesumofsubsequentintegersand(A2)forthesumofsubsequent
       squaredintegers.Forthenumberofsummations,weevaluate

                          N X 1 XN         1    1    1(N b)= N3   N2 + N:               (2.5)6    2    3a=1 b=a+1
       ComputingallnecessaryelementsoftheGramLH Ltherebyrequires 1 N3 +1 N2 +1 Nmultipli- 6    2    3 cationsand 1 N3  1 Nsummations.Altogether,1 N3 +1 N2 +1 NFLOPsresult.Thesameresult 6    6                  3    2    6 ofcourseholdsfortheGramoftwouppertriangularmatrices.


       2.2 Decompositions
       2.2.1CholeskyDecompositionR=LL H (GaxpyVersion)
       InsteadofcomputingtheinverseofapositivedeﬁnitematrixRdirectly,itismoreefﬁcientto
       startwiththeCholeskydecompositionR=LL H andtheninvertthelowertriangularmatrixL
       andcomputeitsGram.Inthissection,wecountthenumberofFLOPsnecessaryfortheCholesky
       decomposition.                                                         2.2Decompositions   9

            TheimplementationoftheGeneralizedAxplusy(Gaxpy)versionoftheCholeskydecom-
         position,whichoverwritesthelowertriangularpartofthepositivedeﬁnitematrixRislistedin
         Algorithm2.1,see[1].NotethatRneedstobepositivedeﬁnitefortheLL H decomposition!

         Algorithm2.1AlgorithmfortheGaxpyversionoftheCholeskydecomposition.
                    z2 }| CN {
                    [R]1: [R]1:N;1 = p1:N;1
                      [R]1;1
          2: forn=2toNdo
          3:  [R]n:N;n =[R]       ]      [R]H
                      | {z n:N;n   [R}  | n: {z N;1:n  }1 | n; {z 1:n  }1
                      2CN n+1  2C(N n+1) (n 1) 2C(n 1)
                      z2CN }|  n+1 {
                      [4:  [R]     pR]n:N;nn:N;n =  [R]n;n
          5: endfor
          6: L=tril(R)   {lowertriangularpartofoverwrittenR}

         ThecomputationoftheﬁrstcolumnofLinLine1ofAlgorithm2.1requiresN 1multiplica-
         tions 5 ,asinglesquare-rootoperation,andnosummations.Columnn>1takesamatrixvector
         productofdimension(N n+1) (n 1)whichissubtractedfromanother(N n+1)-
         dimensionalvectorinvolvingN n+1summations,seeLine3.Finally,N nmultiplications 6
         andasinglesquare-rootoperationarenecessaryinLine4.Inshort,rownwith1<n Nneeds
          n2 +n(N+1) 1multiplications, n2 +n(N+2) N 1summations(seeSubsection
         2.1.5),andonesquarerootoperation,whichweclassifyasanadditionalFLOP.Summingupthe
         multiplicationsforrows2 n N,weobtain
           XN                        N(N+1) 2  N(N+1)(2N+1) 6( n2 +n(N+1) 1)=(N+1)                             (N 1)2              6n=2
                               N3 +2N2  N  2N3 +3N2 +N=                         (N 1)2            61    1    5= N3 + N2   N+1:6    2    3                            (2.6)
         Thenumberofsummationsforrows2 n Nreadsas
           XN                                          N(N+1) 2( n2 +n(N+2) N 1)= (N+1)(N 1)+(N+2)    2n=2
                                      N(N+1)(2N+1) 6         6                        (2.7)
                                           N3 +3N2  4  2N3 +3N2 +N 6= N2 +1+           2             61    1= N3   N; 6    6
           5 Theﬁrstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthedenominator.
           6 Again,theﬁrstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthe
         denominator.       10        2.FlopCounting

       Algorithm2.2AlgorithmfortheCholeskydecompositionLDLH .
                  z2C }| N 1 {
                  [R]1: [R]       2:N;1
             2:N;1 = [R]1;1
        2: forn=2toNdo
        3:  fori=1 ton 1do
                      [R]         14:    [v]i =      1;n   ifi=
                    [R]i;i [R]   ifi6=1n;i
        5:  endfor
        6:  [v]n =[R]n;n  [R]n;   [v]| {z 1:n  }1 | {z 1:n  }1
                        2C1 n 1 2Cn 1
        7:  [R]n;n =[v]n
                     z2C }| N n { z2C(N  }| n) (n 1) {z2C }| n 1 {
                     [R]      [R]8:  [R]         n+1:N;n    n+1:N;1:n 1 [v]1:n 1
              n+1:N;n =           [v]n
        9: endfor
        10: D=diag(diag(R))(returndiagonalD)
        11: L1 =tril(R)withonesonthemaindiagonal


       andﬁnally,N 1square-rootoperationsareneededfortheN 1rows.IncludingtheN 1
       multiplicationsforcolumnn=1andtheadditionalsquarerootoperation, 1 N3 +1 N2  2 N6    2    3 multiplications, 1 N3  1 Nsummations,andNsquare-rootoperationsoccur,1 N3 +1 N2 +1 N6    6                                   3    2    6 FLOPsintotal.

       2.2.2CholeskyDecompositionR=L1 DLH
                                      1
       ThemainadvantageoftheL1 DLH decompositioncomparedtothestandardLL H decomposition 1 isthatnosquarerootoperationsareneeded,whichmayrequiremorethanoneFLOPdepending
       onthegivenhardwareplatform.AnotherbeneﬁtoftheL1 DLH decompositionisthatitdoesnot 1 requireapositivedeﬁnitematrixR,theonlytwoconditionsfortheuniqueexistencearethatRis
       Hermitianandallbutthelastprincipleminor(i.e.,thedeterminant)ofRneedtobedifferentfrom
       zero[2].Hence,Rmayalsoberankdeﬁcienttoacertaindegree.IfRisnotpositivesemideﬁnite,
       thenDmaycontainnegativemaindiagonalentries.
          TheoutcomeofthedecompositionisalowertriangularmatrixL1 withonesonthemain
       diagonalandadiagonalmatrixD.
          Algorithm2.2overwritesthestrictlylowerleftpartofthematrixRwiththestrictlylowerpart
       ofL1 (i.e.,withouttheonesonthemaindiagonal)andoverwritesthemaindiagonalofRwith
       themaindiagonalofD.Itistakenfrom[1]andslightlymodiﬁed,suchthatisalsoapplicableto
       complexmatrices(seetheconjugateinLine4)andnoexistingscalarshouldbere-computed(see
       casedistinctioninLine4fori=1).
          Line1needsN 1multiplications.P  Lines3to5requiren 2multiplicationsandareexe-
       cutedforn=2;:::;N,yielding  N (n 2)=N2  3N+2 multiplications.Line6takesn 1n=2          2           P multiplicationsandn 1summations,againwithn=2;:::;N,yielding  N (n 1)=N2  N
                                                           n=2         2 multiplicationsandthesameamountofsummations.Line7doesnotrequireanyFLOP.InLine8,
       thematrix-vectorproductneeds(N n)(n 1)multiplications,andadditionalN nmultiplica-                                                     2.3InversesofMatrices   11

         tionsarisewhenthecompletenumeratorisdiPvidedbythedenominator.Hence,wehaveNn n2
         multiplications.Forn=2;:::;N,weget  N (Nn n2 )=1 N3  7 N+1multiplications. n=2          6    6 ThenumberofsummationsinLine8is(N n)(n 2)forthematrixvectorproductandN n
         forthesubtractioninthePnumerator.Together,wehave n2 +n(N+1) Nsummations.With
         n=2;:::;N,weget  N [ n2 +n(N+1) N)]=1 N3  1 N2 +1 Nsummations. n=2                  6    2    3
            Summingup,thisalgorithmrequires 1 N3 +N2  13 N+1multiplications,and 1 N3  1 N6         6                   6    6 summations,yieldingatotalamountof 1 N3 +N2  7 N+1FLOPs.(Notethatthisformulais 3        3 alsovalidforN=1.)


         2.3InversesofMatrices

         2.3.1InverseL 1 ofaLowerTriangularMatrixL

         LetX=[x1 ;:::;xN ]=L 1 denotetheinverseofalowertriangularmatrixL.Then,Xisagain
         lowertriangularwhichmeansthat[X]b;n =0forb<n.Thefollowingequationholds:


                                      Lx n =en :                         (2.8)


         Viaforwardsubstitution,abovesystemcaneasilybesolved.Rowb(n b N)from(2.8)can
         beexpressedas

                                    Xb
                                      ‘b;a xa;n = b;n ;                       (2.9)
                                    a=n


         with b;n denotingtheKroneckerdeltawhichvanishesforb6=n,andxa;n =[X]a;n =[xn ]a;1 .
         Startingfromb=1,thexb;n arecomputedsuccessively,andweﬁnd

                                       "           #
                                     1  Xb 1
                              xb;n =       ‘‘      b;a xa;n   b;n ;                (2.10)
                                     b;b a=n


         withallxa;n ;n a b 1havingbeencomputedinprevioussteps.Hence,ifn=b,xn;n =
          1 andasinglemultiplication 7 isrequired,nosummationsareneeded.Forb>n,b n+1‘ multiplications n;n        andb n 1summationsarerequired,astheKronecker-deltavanishes.Allmain
         diagonalentriescanbecomputedbymeansofNmultiplicationsThelowerleftoff-diagonalentries


           7 Actually,itisadivisionratherthanamultiplication.       12        2.FlopCounting

       require                    "                 #N X 1 XN           N X 1              XN
                    (b n+1)=    (1 n)(N n)+    b
              n=1b=n+1          n=1              b=n+1
                              N X 1                               N2 +N n2  n=    N+n2  n(N+1)+      2n=1
                              N X 1                      N2  3N  n2       3=       +   +   n(N+ )              (2.11)2   2   2       2n=1
                                    N        (N 1)N(2N 1)=(N 1) (N+3)+2             2 6
                                     3(N 1)N (N+ )2    21    1    2= N3 + N2   N6    2    3
       multiplications,and
                         N X 1 XN           1    1    1(b n 1)= N3   N2 + N             (2.12)6    2    3n=1b=n+1
       summations.IncludingtheNmultiplicationsforthemain-diagonalentries, 1 N3 +1 N2 +1 N6    2    3 multiplicationsand 1 N3  1 N2 +1 Nsummationshavetobeimplemented,yieldingatotalamount 6    2   3 of 1 N3 +2 NFLOPs. 3    3

       2.3.2InverseL 1 ofaLowerTriangularMatrixL1                        1 withOnesontheMainDiagonal
       TheinverseofalowertriangularmatrixL1 turnsouttorequireN2 FLOPslessthantheinverse
       ofLwitharbitrarynonzerodiagonalelements.LetXdenotetheinverseofL1 .Clearly,Xis
       againalowertriangularmatrixwithonesonthemaindiagonal.Wecanexploitthisfactinorder
       tocomputeonlytheunknownentries.
          ThemthrowandnthcolumnofthesystemofequationsL1 X=IN withm n+1readsas 8
                                  m X 1
                             lm;n +    lm;i xi;n +xm;n =0;
                                  i=n+1
                                  i m 1
       or,equivalently,                 2             3
                                   6     m X 1     7xm;n = 4lm;n +    lm;i xi;n 5:
                                         i=n+1
                                         i m 1
       Hence,Xiscomputedviaforwardsubstitution.Tocomputexm;n ,weneedm n 1multipli-
       cationsandm n 1summations.Rememberthatm n+1.Thetotalnumberofmultiplica-
       tions/summationsisobtainedfrom
                        N X 1 XN            1    1    1 (m n 1)= N3   N2 + N:            (2.13) 6    2    3n=1m=n+1
         8 Weonlyhavetoconsiderm n+1,sincetheequationsresultingfromm<n+1areautomaticallyfulﬁlled
       duetothestructureofL1 andX.                                               2.4SolvingSystemsofEquations   13

         Summingup, 1 N3  N2 +2 NFLOPsareneeded. 3        3

         2.3.3InverseR 1 ofaPositiveDeﬁniteMatrixR
         TheinverseofamatrixcanforexamplebecomputedviaGaussian-elimination[1].However,this
         approachiscomputationallyexpensiveanddoesnotexploittheHermitianstructureofR.Instead,
         itismoreefﬁcienttostartwiththeCholeskydecompositionofR=LL H (seeSubsection2.2.1),
         invertthelowertriangularmatrixL(seeSubsection2.3.1),andthenbuildtheGramL H L 1
         ofL 1 (seeSubsection2.1.15).Summinguptherespectivenumberofoperations,thisprocedure
         requires 1 N3 +3 N2 multiplications, 1 N3  1 N2 summations,andNsquare-rootoperations,which 2   2             2   2 yieldsatotalamountofN3 +N2 +NFLOPs.


         2.4SolvingSystemsofEquations
         2.4.1ProductL 1 CwithL 1 notknownapriori.
         AnaivewayofcomputingthesolutionX=L 1 CoftheequationLX=CistoﬁndL 1 ﬁrst
         andafterwardsmultiplyitbyC.ThisapproachneedsN2 (L+1 N)+2 NFLOPsasshownin 3    3 Sections2.3.1and2.1.10.However,doingsoisveryexpensivesincewearenotinterestedinthe
         inverseofLingeneral.Hence,theremustbeacomputationallycheapervariant.Again,forward
         substitutionplaysakeyrole.
            Itiseasytosee,thatXcanbecomputedcolumn-wise.Letxb;a =[X]b;a ,‘b;a =[L]b;a ,and
         cb;c =[C]b;a .Then,fromLX=C,wegetfortheelementxb;a inrowbandcolumnaofX:
                                       "           #
                                     1  Xb 1
                               xb;a =       ‘‘      b;i xi;a  cb;a :                 (2.14)
                                     b;b i=1
         ItscomputationrequiresbmultiplicationsP       andb 1summations.PAcompletecolumnofXcan
         thereforethecomputedwith  N b=N2 +N multiplicationsand  N (b 1)=N2  N summa- b=1   2  2               b=1       2  2 tions.ThecompletematrixXwithLcolumnsthusneedsN2 LFLOPs,sotheforwardsubstitution
         saves 1 N3 +2 NFLOPscomparedtothedirectioninversionofLandasubsequentmatrixmatrix 3    3 product.Interestingly,computingL 1 CwithL 1 unknownisasexpensiveascomputingLC,see
         Section2.1.10.       3. Overview


       A2CM N ,B2CN N ,andC2CN L arearbitrarymatrices.D2CN N isadiagonalmatrix,
       L2CN N islowertriangular,L1 2CN N islowertriangularwithonesonthemaindiagonal,
       a;b2CN ,c2CM ,andR2CN N ispositivedeﬁnite.


        Expression Description         products       summations FLOPs
         a      VectorScaling       N                      N
         A      MatrixScaling       MN                    MN
        aH b      InnerProduct        N            N 1      2N 1
        ac H      OuterProduct        MN                    MN
        Ab      MatrixVectorProd.    MN          M(N 1)   2MN M
        AC      MatrixMatrixProd.    MNL         ML(N 1)  2MNL ML
        AD      DiagonalMatrixProd.  MN                    MN
        LD      Matrix-MatrixProd.    1 N2 +1 N      0         1 N2 +1 N2    2                 2    2
        L                                                11 D     Matrix-MatrixProd.    1 N2  1 N      0         N2  1 N2    2                 2    2
        LC      MatrixProduct       N2 L +NL      N2 L  NL   N2 L2   2        2   2
        AH A     Gram             MN(N+1)       (M 1)N(N+1)  MN2 +N(M N ) N
                                      2             2                2  2
        kAk2     FrobeniusNorm      MN          MN 1    2MN 1F
        cH Ab    SesquilinearForm     M(N+1)      MN 1    2MN+M 1
        aH Ra    HermitianForm      N2 +N        N2 +N  1  3 N2 +3 N 12  2     2   2
        LH L     GramofTriangular    N3 +N2 +N     N3  N     1 N3 +1 N2 +1 N6  2  3      6  6     3   2   6
        L       CholeskyR=LL H    N3 +N2  2 N    N3  N     1 N3 +1 N2 +1 N6  2  3      6  6     3   2   6 (Gaxpyversion)                              (Nrootsincluded)
        L;D     CholeskyR=LDLH  N3 +N2  13N +1 N3  N     1 N3 +N2  7 N+16      6     6  6     3       3
        L 1      InverseofTriangular   N3 +N2 +N     N3  N2 +N  1 N3 +2 N6  2  3      6  2  3  3   3
        L 1      InverseofTriangular   N3  N2 +N     N3  N2 +N  1 N3  N2 +2 N1                         6  2  3      6  2  3  3       3 withonesonmaindiag.
        R 1      InverseofPos.Deﬁnite  N3 +3N2       N3  N2    N3 +N2 +N2   2         2  2     (Nrootsincluded)
        L 1 C    L 1 unknown        N2 L +NL      N2 L  NL   N2 L2   2        2   2


                                       14         Appendix


         AfrequentlyoccurringsummationinFLOPcountingisthesumofsubsequentintegers.Bycom-
         pleteinduction,weﬁnd
                                   XN    N(N+1)n=       :                      (A1)2n=1
         Aboveresultcaneasilybeveriﬁedbyrecognizingthatthesumofthen-thandthe(N n)-th
         summandisequaltoN+1,andwehaveN suchpairs. 2 Anothersumofrelevanceisthesumofsubsequentsquaredintegers.Again,viacomplete
         induction,weﬁnd
                               XN     N(N+1)(2N+1)n2 =              :                   (A2)6n=1


                                         15       Bibliography


       [1]G.H.GolubandC.F.VanLoan,MatrixComputations,JohnsHopkinsUniversityPress,1991.
       [2]Kh.D.IkramovandN.V.Savel’eva,“ConditionallyDeﬁniteMatrices,”JournalofMathemat-
          icalSciences,vol.98,no.1,pp.1–50,2000.


                                       16