testing_generation/Corpus/Floating Point Operations i...

        FloatingPointOperationsinMatrix-VectorCalculus
                                (Version1.3)


                            RaphaelHunger


                            TechnicalReport
                                 2007


                       TechnischeUniversitätMünchenAssociateInstituteforSignalProcessing
                       Univ.-Prof.Dr.-Ing.WolfgangUtschick       History


       Version1.00:October2005
           -Initialversion
       Version1.01:2006
           -RewriteofsesquilinearformwithareducedamountofFLOPs
           -SeveralTyposﬁxedconcerningthenumberofFLOPSrequiredfortheCholeskydecompo-
            sition
       Version1.2:November2006
           -ConditionsfortheexistenceofthestandardLL H Choleskydecompositionspeciﬁed(pos-
            itivedeﬁniteness)
           -OuterproductversionofLL H Choleskydecompositionremoved
           -FLOPsrequiredinGaxpyversionofLL H Choleskydecompositionupdated
           -L1 DLH Choleskydecompositionadded 1 -Matrix-matrixproductLCaddedwithLtriangular
           -Matrix-matrixproductL1 CaddedwithLtriangularandL1 notknownapriori
           -InverseL1 ofalowertriangularmatrixwithonesonthemaindiagonaladded 1 Version1.3:September2007
           -Firstgloballyaccessibledocumentversion
       ToDo:(unknownwhen)
           -QR-Decomposition
           -LR-Decomposition


          Pleasereportanybugandsuggestiontohunger@tum.de


                                       2         Contents


         1. Introduction                                                    4

         2. FlopCounting                                                   5
            2.1 MatrixProducts....................................  5
                2.1.1 Scalar-VectorMultiplicationa.......................  5
                2.1.2 Scalar-MatrixMultiplicationA ......................  5
                2.1.3 InnerProductaH bofTwoVectors......................  5
                2.1.4 OuterProductac H ofTwoVectors......................  5
                2.1.5 Matrix-VectorProductAb..........................  6
                2.1.6 Matrix-MatrixProductAC .........................  6
                2.1.7 MatrixDiagonalMatrixProductAD ....................  6
                2.1.8 Matrix-MatrixProductLD .........................  6
                2.1.9 Matrix-MatrixProductL1 D.........................  6
                2.1.10 Matrix-MatrixProductLCwithLLowerTriangular............  6
                2.1.11 GramAH AofA...............................  6
                2.1.12 SquaredFrobeniusNormkAk2 =tr(AH A) ................  7F 2.1.13 SesquilinearFormcH Ab...........................  7
                2.1.14 HermitianFormaH Ra............................  7
                2.1.15 GramLH LofaLowerTriangularMatrixL.................  7
            2.2 Decompositions....................................  8
                2.2.1 CholeskyDecompositionR=LL H (GaxpyVersion) ...........  8
                2.2.2 CholeskyDecompositionR=L1 DLH ................... 10 1 2.3 InversesofMatrices.................................. 11
                2.3.1 InverseL1 ofaLowerTriangularMatrixL ................ 11
                2.3.2 InverseL1 ofaLowerTriangularMatrixL1                      1 withOnesontheMainDi-
                     agonal..................................... 12
                2.3.3 InverseR1 ofaPositiveDeﬁniteMatrixR................. 13
            2.4 SolvingSystemsofEquations ............................ 13
                2.4.1 ProductL1 CwithL1 notknownapriori. ................ 13

         3. Overview                                                      14

            Appendix                                                     15

            Bibliography                                                   16


                                         3       1. Introduction


       Forthedesignofefﬁcientundlow-complexityalgorithmsinmanysignal-processingtasks,ade-
       tailedanalysisoftherequirednumberofﬂoating-pointoperations(FLOPs)isofteninevitable.
       Mostfrequently,matrixoperationsareinvolved,suchasmatrix-matrixproductsandinversesof
       matrices.StructureslikeHermitenessortriangularityforexamplecanbeexploitedtoreducethe
       numberofneededFLOPsandwillbediscussedhere.Inthistechnicalreport,wederiveexpressions
       forthenumberofmultiplicationsandsummationsthatamajorityofsignalprocessingalgorithms
       inmobilecommunicationsbringwiththem.


          Acknowledgments:
          TheauthorwouldliketothankDipl.-Ing.DavidA.SchmidtandDipl.-Ing.GuidoDietlforthe
       fruitfuldiscussionsonthistopic.


                                       4         2. FlopCounting


         Inthischapter,weofferexpressionsforthenumberofcomplexmultiplicationsandsummations
         requiredforseveralmatrix-vectoroperations.Aﬂoating-pointoperation(FLOP)isassumedtobe
         eitheracomplexmultiplicationoracomplexsummationhere,despitethefactthatacomplexmul-
         tiplicationrequires4realmultiplicationsand2realsummationswhereasacomplexsummations
         constistsofonly2realsummations,makingamultiplicationmoreexpensivethanasummation.
         However,wecounteachoperationasoneFLOP.
            Throughoutthisreport,weassume2Ctobeascalar,thevectorsa2CN ,b2CN ,and
         c2CM tohavedimensionN,N,andM,respectively.ThematricesA2CMN ,B2CNN ,
         andC2CNL areassumedtohavenospecialstructure,whereasR=RH 2CNN isHermitian
         andD=diagfd‘ gN 2CNN isdiagonal.LisalowertriangularNNmatrix,e‘=1                                           n denotes
         theunitvectorwitha1inthen-throwandzeroselsewhere.Itsdimensionalityischosensuchthat
         therespectivematrix-vectorproductexists.Finally,[A]a;b denotestheelementinthea-throwand
         b-thcolumnofA,[A]a:b;c:d selectsthesubmatrixofAconsistingofrowsatobandcolumnscto
         d.0ab istheabzeromatrix.Transposition,Hermitiantransposition,conjugate,andreal-part
         operatoraredenotedby()T ,()H ,() ,and<fg,respectively,andrequirenoFLOP.


         2.1MatrixProducts

         FrequentlyarisingmatrixproductsandtheamountofFLOPsrequiredfortheircomputationwill
         bediscussedinthissection.


         2.1.1Scalar-VectorMultiplicationa
         AsimplemultiplicationaofavectorawithascalarrequiresNmultiplicationsandnosum-
         mation.


         2.1.2Scalar-MatrixMultiplicationA
         ExtendingtheresultfromSubsection2.1.1toascalarmatrixmultiplicationArequiresNM
         multiplicationsandagainnosummation.


         2.1.3InnerProductaH bofTwoVectors
         AninnerproductaH brequiresNmultiplicationsandN1summations,i.e.,2N1FLOPs.


         2.1.4OuterProductac H ofTwoVectors
         Anouterproductac H requiresNMmultiplicationsandnosummation.

                                         5       6        2.FlopCounting

       2.1.5Matrix-VectorProductAb
       ComputingAbcorrespondstoapplyingtheinnerproductruleaH bfromSubsection2.1.3Mtimes. i Obviously,1iMandaH representsthei-throwofA.Hence,itscomputationcostsMNi multiplicationsandM(N1)summations,i.e.,2MNMFLOPs.

       2.1.6Matrix-MatrixProductAC
       Repeatedapplicationofthematrix-vectorruleAc i fromSubsection2.1.5withci beingthei-th
       columnofCyieldstheoverallmatrix-matrixproductAC.Since1iL,thematrix-matrix
       producthastheL-foldcomplexityofthematrix-vectorproduct.Thus,itneedsMNLmultiplica-
       tionsandML(N1)summations,altogether2MNLMLFLOPs.

       2.1.7MatrixDiagonalMatrixProductAD
       IftherighthandsidematrixDofthematrixproductADisdiagonal,thecomputationalload
       reducestoMmultiplicationsforeachoftheNcolumnsofA,sincethen-thcolumnofAis
       scaledbythen-thmaindiagonalelementofD.Thus,MNmultiplicationsintotalarerequiredfor
       thecomputationofAD,nosummationsareneeded.

       2.1.8Matrix-MatrixProductLD
       WhenmultiplyingalowertriangularmatrixLbyadiagonalmatrixD,columnnofthematrix
       productrequiresNn+1multiplicationsandnosummations.Withn=1;:::;N,weget
        1 N2 +1 Nmultiplications. 2    2

       2.1.9Matrix-MatrixProductL1 D
       WhenmultiplyingalowertriangularmatrixL1 withonesonthemaindiagonalbyadiagonal
       matrixD,columnnofthematrixproductrequiresNnmultiplicationsandnosummations.
       Withn=1;:::;N,weget 1 N2 1 Nmultiplications. 2    2

       2.1.10Matrix-MatrixProductLCwithLLowerTriangular
       ComputingtheproductofalowertriangularmatrixL2CNN andC2CNL isdonecolumn-
       wise.ThenthelementineachPcolumnofLCrequiresnmultiplicationsPandn1summations,
       sothecompletecolumnneeds  N n=N2 +N multiplicationsand  N (n1)=N2 N
                              n=1    2   2               n=1        2   2 summations.Thecompletematrix-matrixproductisobtainedfromcomputingLcolumns.Wehave
        N2 L +NL multiplicationsand N2 L NL summations,yieldingatotalamountofN2 LFLOPs. 2   2               2   2

       2.1.11GramAH AofA
       IncontrasttothegeneralmatrixproductfromSubsection2.1.6,wecanmakeuseoftheHermitian
       structureoftheproductAH A2CNN .Hence,thestrictlylowertriangularpartofAH Aneed
       notbecomputed,sinceitcorrespondstotheHermitianofthestrictlyuppertriangularpart.For
       thisreason,wehavetocomputeonlytheNmaindiagonalentriesofAH Aandthe N2 N upper 2
       off-diagonalelements,soonly N2 +N differententrieshavetobeevaluated.Eachelementrequires 2 aninnerproductstepfromSubsection2.1.3costingMmultiplicationsandM1summations.
       Therefore, 1 MN(N+1)multiplicationsand 1 (M1)N(N+1)summationsareneeded,making 2                      2
       upatotalamountofMN2 +MNN2 N FLOPs. 2   2                                                         2.1MatrixProducts   7

         2.1.12SquaredFrobeniusNormkAk2 =tr(AH A)F
         ThesquaredHilbert-SchmidtnormkAk2 followsfromsumminguptheMNsquaredentriesfrom F A.WethereforehaveMNmultiplicationsandMN1summations,yieldingatotalof2MN1
         FLOPs.

         2.1.13SesquilinearFormcH Ab
         ThesesquilinearformcH Abshouldbeevaluatedbycomputingthematrix-vectorproductAbina
         ﬁrststepandthenmultiplyingwiththerowvectorcH fromthelefthandside.Thematrixvector
         productrequiresMNmultiplicationsandM(N1)summations,whereastheinnerproductneeds
         MmultiplicationsandM1summations.Altogether,M(N+1)multiplicationsandMN1
         summationshavetobecomputedforthesesquilinearformcH Ab,yieldingatotalnumberof
         2MN+M1ﬂops.

         2.1.14HermitianFormaH Ra
         WiththeHermitianmatrixR=RH ,theproductaH Racanbeexpressedas

                              XN XN
                        aH Ra=     aH em eT Rem  n eT an
                              m=1n=1
                              XN XN
                             =     a a                                (2.1) mn rm;n
                              m=1n=1
                              XN           N X1 XN
                             =   jam j2 rm;m +2       <fa amn rm;n g;
                              m=1          m=1n=m+1

         witham =[a]m;1 ,andrm;n =[R]m;n .Theﬁrstsumaccumulatestheweightedmaindiagonal
         entriesandrequires2NmultiplicationsandN1summations. 1 Thesecondpartof(2.1)accumu-
         latesallweightedoff-diagonalentriesfromA.Thelasttwosummationssumup N(N1) terms 2 .2
         Consequently,thesecondpartof(2.1)requires N(N1) 1summationsandN(N1)products 3 .2 Finally,thetwopartshavetobeaddedaccountingforanadditionalsummationandyieldingan
         overallamountofN2 +Nproductsand 1 N2 +1 N1summations,correspondingto 3 N2 +3 N12   2                        2   2 FLOPs 4 .

         2.1.15GramLH LofaLowerTriangularMatrixL
         Duringthecomputationoftheinverseofapositivedeﬁnitematrix,theGrammatrixofalower
         triangularmatrixoccurswhenCholeskydecompositionisapplied.Again,wemakeuseofthe
         HermitianstructureoftheGramLH L,soonlythemaindiagonalentriesandtheupperrightoff-
         diagonalentriesoftheproducthavetobeevaluated.Thea-thmain-diagonalentrycanbeexpressed

           1 Wedonotexploitthefactthatonlyreal-valuedsummandsareaccumulatedasweonlyaccountforcomplexﬂops. P  P      P                 P2 N1 N   1= N1 (Nm)=N(N1) N1 m=N(N1)N(N1) =N(N1) .Wemade m=1 n=m+1    m=1                m=1            2     2 useof(A1)intheAppendixforthecomputationofthelastsumaccumulatingsubsequentintegers.
           3 Thescalingwiththefactor2doesnotrequireaFLOP,asitcanbeimplementedbyasimplebitshift.
           4 Clearly,ifN=1,wehavetosubtractonesummationfromthecalculationsincenooff-diagonalentriesexist.       8        2.FlopCounting

       as                               XN
                                [LH L]a;a =   j‘n;a j2 ;                     (2.2)
                                        n=a
       with‘n;a =[L]n;a ,requiringNa+1multiplicationsandNasummations.Hence,allmainP                                   Pdiagonalelementsneed  N (Nn+1)=1 N2 +1 Nmultiplicationsand  N (Nn)=n=1           2    2                 n=11 N2 1 Nsummations. 2    2 Theupperrightoff-diagonalentry[LH L]a;b inrowaandcolumnbwitha<breadsas

                                        XN
                                [LH L]a;b =   ‘ ‘n;an;b ;                     (2.3)
                                        n=b
       againaccountingforNb+1multiplicationsandNbsummations.Thesetwoexpressions
       havetobesummedupoverall1aN1anda+1bN,andforthenumberof
       multiplications,weﬁnd
                             "                 #N X1 XN           N X1               XN
               (Nb+1)=    (Na)(N+1)    b
         a=1b=a+1          a=1              b=a+1
                          N X1                                N(N+1)a(a+1)=    N2 +Na(N+1)       2a=1
                          N X1                 N2 +N  a2        1=          +  a N+2     2        2a=1                                (N1)(N+1)N  (N1)N(2N1)       1 N(N1)=             +               N+2             26            2    2
                          1    1= N3  N:6    6                                     (2.4)
       Again,wemadeuseof(A1)forthesumofsubsequentintegersand(A2)forthesumofsubsequent
       squaredintegers.Forthenumberofsummations,weevaluate

                          N X1 XN         1    1    1(Nb)= N3  N2 + N:               (2.5)6    2    3a=1 b=a+1
       ComputingallnecessaryelementsoftheGramLH Ltherebyrequires 1 N3 +1 N2 +1 Nmultipli- 6    2    3 cationsand 1 N3 1 Nsummations.Altogether,1 N3 +1 N2 +1 NFLOPsresult.Thesameresult 6    6                  3    2    6 ofcourseholdsfortheGramoftwouppertriangularmatrices.


       2.2 Decompositions
       2.2.1CholeskyDecompositionR=LL H (GaxpyVersion)
       InsteadofcomputingtheinverseofapositivedeﬁnitematrixRdirectly,itismoreefﬁcientto
       startwiththeCholeskydecompositionR=LL H andtheninvertthelowertriangularmatrixL
       andcomputeitsGram.Inthissection,wecountthenumberofFLOPsnecessaryfortheCholesky
       decomposition.                                                         2.2Decompositions   9

            TheimplementationoftheGeneralizedAxplusy(Gaxpy)versionoftheCholeskydecom-
         position,whichoverwritesthelowertriangularpartofthepositivedeﬁnitematrixRislistedin
         Algorithm2.1,see[1].NotethatRneedstobepositivedeﬁnitefortheLL H decomposition!

         Algorithm2.1AlgorithmfortheGaxpyversionoftheCholeskydecomposition.
                    z2 }| CN {
                    [R]1: [R]1:N;1 = p1:N;1
                      [R]1;1
          2: forn=2toNdo
          3:  [R]n:N;n =[R]       ]      [R]H
                      | {z n:N;n  [R}  | n: {z N;1:n }1 | n; {z 1:n }1
                      2CNn+1  2C(Nn+1)(n1) 2C(n1)
                      z2CN }| n+1 {
                      [4:  [R]     pR]n:N;nn:N;n =  [R]n;n
          5: endfor
          6: L=tril(R)   {lowertriangularpartofoverwrittenR}

         ThecomputationoftheﬁrstcolumnofLinLine1ofAlgorithm2.1requiresN1multiplica-
         tions 5 ,asinglesquare-rootoperation,andnosummations.Columnn>1takesamatrixvector
         productofdimension(Nn+1)(n1)whichissubtractedfromanother(Nn+1)-
         dimensionalvectorinvolvingNn+1summations,seeLine3.Finally,Nnmultiplications 6
         andasinglesquare-rootoperationarenecessaryinLine4.Inshort,rownwith1<nNneeds
         n2 +n(N+1)1multiplications,n2 +n(N+2)N1summations(seeSubsection
         2.1.5),andonesquarerootoperation,whichweclassifyasanadditionalFLOP.Summingupthe
         multiplicationsforrows2nN,weobtain
           XN                        N(N+1)2  N(N+1)(2N+1)6(n2 +n(N+1)1)=(N+1)                           (N1)2              6n=2
                               N3 +2N2 N  2N3 +3N2 +N=                       (N1)2            61    1    5= N3 + N2  N+1:6    2    3                            (2.6)
         Thenumberofsummationsforrows2nNreadsas
           XN                                          N(N+1)2(n2 +n(N+2)N1)=(N+1)(N1)+(N+2)    2n=2
                                      N(N+1)(2N+1)6        6                        (2.7)
                                           N3 +3N2 4  2N3 +3N2 +N6=N2 +1+          2             61    1= N3  N; 6    6
           5 Theﬁrstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthedenominator.
           6 Again,theﬁrstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthe
         denominator.       10        2.FlopCounting

       Algorithm2.2AlgorithmfortheCholeskydecompositionLDLH .
                  z2C }| N1 {
                  [R]1: [R]       2:N;1
             2:N;1 = [R]1;1
        2: forn=2toNdo
        3:  fori=1ton1do
                      [R]         14:    [v]i =      1;n   ifi=
                    [R]i;i [R]  ifi6=1n;i
        5:  endfor
        6:  [v]n =[R]n;n [R]n;   [v]| {z 1:n }1 | {z 1:n }1
                        2C1n1 2Cn1
        7:  [R]n;n =[v]n
                     z2C }| Nn { z2C(N }| n)(n1) {z2C }| n1 {
                     [R]      [R]8:  [R]         n+1:N;n   n+1:N;1:n1 [v]1:n1
              n+1:N;n =           [v]n
        9: endfor
        10: D=diag(diag(R))(returndiagonalD)
        11: L1 =tril(R)withonesonthemaindiagonal


       andﬁnally,N1square-rootoperationsareneededfortheN1rows.IncludingtheN1
       multiplicationsforcolumnn=1andtheadditionalsquarerootoperation, 1 N3 +1 N2 2 N6    2    3 multiplications, 1 N3 1 Nsummations,andNsquare-rootoperationsoccur,1 N3 +1 N2 +1 N6    6                                   3    2    6 FLOPsintotal.

       2.2.2CholeskyDecompositionR=L1 DLH
                                      1
       ThemainadvantageoftheL1 DLH decompositioncomparedtothestandardLL H decomposition 1 isthatnosquarerootoperationsareneeded,whichmayrequiremorethanoneFLOPdepending
       onthegivenhardwareplatform.AnotherbeneﬁtoftheL1 DLH decompositionisthatitdoesnot 1 requireapositivedeﬁnitematrixR,theonlytwoconditionsfortheuniqueexistencearethatRis
       Hermitianandallbutthelastprincipleminor(i.e.,thedeterminant)ofRneedtobedifferentfrom
       zero[2].Hence,Rmayalsoberankdeﬁcienttoacertaindegree.IfRisnotpositivesemideﬁnite,
       thenDmaycontainnegativemaindiagonalentries.
          TheoutcomeofthedecompositionisalowertriangularmatrixL1 withonesonthemain
       diagonalandadiagonalmatrixD.
          Algorithm2.2overwritesthestrictlylowerleftpartofthematrixRwiththestrictlylowerpart
       ofL1 (i.e.,withouttheonesonthemaindiagonal)andoverwritesthemaindiagonalofRwith
       themaindiagonalofD.Itistakenfrom[1]andslightlymodiﬁed,suchthatisalsoapplicableto
       complexmatrices(seetheconjugateinLine4)andnoexistingscalarshouldbere-computed(see
       casedistinctioninLine4fori=1).
          Line1needsN1multiplications.P  Lines3to5requiren2multiplicationsandareexe-
       cutedforn=2;:::;N,yielding  N (n2)=N2 3N+2 multiplications.Line6takesn1n=2          2           P multiplicationsandn1summations,againwithn=2;:::;N,yielding  N (n1)=N2 N
                                                           n=2         2 multiplicationsandthesameamountofsummations.Line7doesnotrequireanyFLOP.InLine8,
       thematrix-vectorproductneeds(Nn)(n1)multiplications,andadditionalNnmultiplica-                                                     2.3InversesofMatrices   11

         tionsarisewhenthecompletenumeratorisdiPvidedbythedenominator.Hence,wehaveNnn2
         multiplications.Forn=2;:::;N,weget  N (Nnn2 )=1 N3 7 N+1multiplications. n=2          6    6 ThenumberofsummationsinLine8is(Nn)(n2)forthematrixvectorproductandNn
         forthesubtractioninthePnumerator.Together,wehaven2 +n(N+1)Nsummations.With
         n=2;:::;N,weget  N [n2 +n(N+1)N)]=1 N3 1 N2 +1 Nsummations. n=2                  6    2    3
            Summingup,thisalgorithmrequires 1 N3 +N2 13 N+1multiplications,and 1 N3 1 N6         6                   6    6 summations,yieldingatotalamountof 1 N3 +N2 7 N+1FLOPs.(Notethatthisformulais 3        3 alsovalidforN=1.)


         2.3InversesofMatrices

         2.3.1InverseL1 ofaLowerTriangularMatrixL

         LetX=[x1 ;:::;xN ]=L1 denotetheinverseofalowertriangularmatrixL.Then,Xisagain
         lowertriangularwhichmeansthat[X]b;n =0forb<n.Thefollowingequationholds:


                                      Lx n =en :                         (2.8)


         Viaforwardsubstitution,abovesystemcaneasilybesolved.Rowb(nbN)from(2.8)can
         beexpressedas

                                    Xb
                                      ‘b;a xa;n =b;n ;                       (2.9)
                                    a=n


         withb;n denotingtheKroneckerdeltawhichvanishesforb6=n,andxa;n =[X]a;n =[xn ]a;1 .
         Startingfromb=1,thexb;n arecomputedsuccessively,andweﬁnd

                                       "           #
                                     1  Xb1
                              xb;n =      ‘‘      b;a xa;n b;n ;                (2.10)
                                     b;b a=n


         withallxa;n ;nab1havingbeencomputedinprevioussteps.Hence,ifn=b,xn;n =
          1 andasinglemultiplication 7 isrequired,nosummationsareneeded.Forb>n,bn+1‘ multiplications n;n        andbn1summationsarerequired,astheKronecker-deltavanishes.Allmain
         diagonalentriescanbecomputedbymeansofNmultiplicationsThelowerleftoff-diagonalentries


           7 Actually,itisadivisionratherthanamultiplication.       12        2.FlopCounting

       require                    "                 #N X1 XN           N X1              XN
                    (bn+1)=    (1n)(Nn)+    b
              n=1b=n+1          n=1              b=n+1
                              N X1                             N2 +Nn2 n=    N+n2 n(N+1)+      2n=1
                              N X1                    N2  3N  n2       3=       +   +  n(N+ )              (2.11)2   2   2       2n=1
                                    N        (N1)N(2N1)=(N1) (N+3)+2             26
                                     3(N1)N(N+ )2    21    1    2= N3 + N2  N6    2    3
       multiplications,and
                         N X1 XN           1    1    1(bn1)= N3  N2 + N             (2.12)6    2    3n=1b=n+1
       summations.IncludingtheNmultiplicationsforthemain-diagonalentries, 1 N3 +1 N2 +1 N6    2    3 multiplicationsand 1 N3 1 N2 +1 Nsummationshavetobeimplemented,yieldingatotalamount 6    2   3 of 1 N3 +2 NFLOPs. 3    3

       2.3.2InverseL1 ofaLowerTriangularMatrixL1                        1 withOnesontheMainDiagonal
       TheinverseofalowertriangularmatrixL1 turnsouttorequireN2 FLOPslessthantheinverse
       ofLwitharbitrarynonzerodiagonalelements.LetXdenotetheinverseofL1 .Clearly,Xis
       againalowertriangularmatrixwithonesonthemaindiagonal.Wecanexploitthisfactinorder
       tocomputeonlytheunknownentries.
          ThemthrowandnthcolumnofthesystemofequationsL1 X=IN withmn+1readsas 8
                                  m X1
                             lm;n +    lm;i xi;n +xm;n =0;
                                  i=n+1
                                  im1
       or,equivalently,                 2             3
                                   6     m X1     7xm;n =4lm;n +    lm;i xi;n 5:
                                         i=n+1
                                         im1
       Hence,Xiscomputedviaforwardsubstitution.Tocomputexm;n ,weneedmn1multipli-
       cationsandmn1summations.Rememberthatmn+1.Thetotalnumberofmultiplica-
       tions/summationsisobtainedfrom
                        N X1 XN            1    1    1 (mn1)= N3  N2 + N:            (2.13) 6    2    3n=1m=n+1
         8 Weonlyhavetoconsidermn+1,sincetheequationsresultingfromm<n+1areautomaticallyfulﬁlled
       duetothestructureofL1 andX.                                               2.4SolvingSystemsofEquations   13

         Summingup, 1 N3 N2 +2 NFLOPsareneeded. 3        3

         2.3.3InverseR1 ofaPositiveDeﬁniteMatrixR
         TheinverseofamatrixcanforexamplebecomputedviaGaussian-elimination[1].However,this
         approachiscomputationallyexpensiveanddoesnotexploittheHermitianstructureofR.Instead,
         itismoreefﬁcienttostartwiththeCholeskydecompositionofR=LL H (seeSubsection2.2.1),
         invertthelowertriangularmatrixL(seeSubsection2.3.1),andthenbuildtheGramLH L1
         ofL1 (seeSubsection2.1.15).Summinguptherespectivenumberofoperations,thisprocedure
         requires 1 N3 +3 N2 multiplications, 1 N3 1 N2 summations,andNsquare-rootoperations,which 2   2             2   2 yieldsatotalamountofN3 +N2 +NFLOPs.


         2.4SolvingSystemsofEquations
         2.4.1ProductL1 CwithL1 notknownapriori.
         AnaivewayofcomputingthesolutionX=L1 CoftheequationLX=CistoﬁndL1 ﬁrst
         andafterwardsmultiplyitbyC.ThisapproachneedsN2 (L+1 N)+2 NFLOPsasshownin 3    3 Sections2.3.1and2.1.10.However,doingsoisveryexpensivesincewearenotinterestedinthe
         inverseofLingeneral.Hence,theremustbeacomputationallycheapervariant.Again,forward
         substitutionplaysakeyrole.
            Itiseasytosee,thatXcanbecomputedcolumn-wise.Letxb;a =[X]b;a ,‘b;a =[L]b;a ,and
         cb;c =[C]b;a .Then,fromLX=C,wegetfortheelementxb;a inrowbandcolumnaofX:
                                       "           #
                                     1  Xb1
                               xb;a =      ‘‘      b;i xi;a cb;a :                 (2.14)
                                     b;b i=1
         ItscomputationrequiresbmultiplicationsP       andb1summations.PAcompletecolumnofXcan
         thereforethecomputedwith  N b=N2 +N multiplicationsand  N (b1)=N2 N summa- b=1   2  2               b=1       2  2 tions.ThecompletematrixXwithLcolumnsthusneedsN2 LFLOPs,sotheforwardsubstitution
         saves 1 N3 +2 NFLOPscomparedtothedirectioninversionofLandasubsequentmatrixmatrix 3    3 product.Interestingly,computingL1 CwithL1 unknownisasexpensiveascomputingLC,see
         Section2.1.10.       3. Overview


       A2CMN ,B2CNN ,andC2CNL arearbitrarymatrices.D2CNN isadiagonalmatrix,
       L2CNN islowertriangular,L1 2CNN islowertriangularwithonesonthemaindiagonal,
       a;b2CN ,c2CM ,andR2CNN ispositivedeﬁnite.


        Expression Description         products       summations FLOPs
        a      VectorScaling       N                      N
        A      MatrixScaling       MN                    MN
        aH b      InnerProduct        N            N1      2N1
        ac H      OuterProduct        MN                    MN
        Ab      MatrixVectorProd.    MN          M(N1)   2MNM
        AC      MatrixMatrixProd.    MNL         ML(N1)  2MNLML
        AD      DiagonalMatrixProd.  MN                    MN
        LD      Matrix-MatrixProd.    1 N2 +1 N      0         1 N2 +1 N2    2                 2    2
        L                                                11 D     Matrix-MatrixProd.    1 N2 1 N      0         N2 1 N2    2                 2    2
        LC      MatrixProduct       N2 L +NL      N2 L NL   N2 L2   2        2   2
        AH A     Gram             MN(N+1)       (M1)N(N+1)  MN2 +N(MN )N
                                      2             2                2  2
        kAk2     FrobeniusNorm      MN          MN1    2MN1F
        cH Ab    SesquilinearForm     M(N+1)      MN1    2MN+M1
        aH Ra    HermitianForm      N2 +N        N2 +N 1  3 N2 +3 N12  2     2   2
        LH L     GramofTriangular    N3 +N2 +N     N3 N     1 N3 +1 N2 +1 N6  2  3      6  6     3   2   6
        L       CholeskyR=LL H    N3 +N2 2 N    N3 N     1 N3 +1 N2 +1 N6  2  3      6  6     3   2   6 (Gaxpyversion)                              (Nrootsincluded)
        L;D     CholeskyR=LDLH  N3 +N2 13N +1 N3 N     1 N3 +N2 7 N+16      6     6  6     3       3
        L1      InverseofTriangular   N3 +N2 +N     N3 N2 +N  1 N3 +2 N6  2  3      6  2  3  3   3
        L1      InverseofTriangular   N3 N2 +N     N3 N2 +N  1 N3 N2 +2 N1                         6  2  3      6  2  3  3       3 withonesonmaindiag.
        R1      InverseofPos.Deﬁnite  N3 +3N2       N3 N2    N3 +N2 +N2   2         2  2     (Nrootsincluded)
        L1 C    L1 unknown        N2 L +NL      N2 L NL   N2 L2   2        2   2


                                       14         Appendix


         AfrequentlyoccurringsummationinFLOPcountingisthesumofsubsequentintegers.Bycom-
         pleteinduction,weﬁnd
                                   XN    N(N+1)n=       :                      (A1)2n=1
         Aboveresultcaneasilybeveriﬁedbyrecognizingthatthesumofthen-thandthe(Nn)-th
         summandisequaltoN+1,andwehaveN suchpairs. 2 Anothersumofrelevanceisthesumofsubsequentsquaredintegers.Again,viacomplete
         induction,weﬁnd
                               XN     N(N+1)(2N+1)n2 =              :                   (A2)6n=1


                                         15       Bibliography


       [1]G.H.GolubandC.F.VanLoan,MatrixComputations,JohnsHopkinsUniversityPress,1991.
       [2]Kh.D.IkramovandN.V.Savel’eva,“ConditionallyDeﬁniteMatrices,”JournalofMathemat-
          icalSciences,vol.98,no.1,pp.1–50,2000.


                                       16