testing_generation/Corpus/Floating Point Operations i...

671 lines
30 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

FloatingPointOperationsinMatrix-VectorCalculus
(Version1.3)
RaphaelHunger
TechnicalReport
2007
TechnischeUniversitätMünchenAssociateInstituteforSignalProcessing
Univ.-Prof.Dr.-Ing.WolfgangUtschick History
Version1.00:October2005
-Initialversion
Version1.01:2006
-RewriteofsesquilinearformwithareducedamountofFLOPs
-SeveralTyposfixedconcerningthenumberofFLOPSrequiredfortheCholeskydecompo-
sition
Version1.2:November2006
-ConditionsfortheexistenceofthestandardLL H Choleskydecompositionspecified(pos-
itivedefiniteness)
-OuterproductversionofLL H Choleskydecompositionremoved
-FLOPsrequiredinGaxpyversionofLL H Choleskydecompositionupdated
-L1 DLH Choleskydecompositionadded 1 -Matrix-matrixproductLCaddedwithLtriangular
-Matrix-matrixproductL1 CaddedwithLtriangularandL1 notknownapriori
-InverseL1 ofalowertriangularmatrixwithonesonthemaindiagonaladded 1 Version1.3:September2007
-Firstgloballyaccessibledocumentversion
ToDo:(unknownwhen)
-QR-Decomposition
-LR-Decomposition
Pleasereportanybugandsuggestiontohunger@tum.de
2 Contents
1. Introduction 4
2. FlopCounting 5
2.1 MatrixProducts.................................... 5
2.1.1 Scalar-VectorMultiplicationa....................... 5
2.1.2 Scalar-MatrixMultiplicationA ...................... 5
2.1.3 InnerProductaH bofTwoVectors...................... 5
2.1.4 OuterProductac H ofTwoVectors...................... 5
2.1.5 Matrix-VectorProductAb.......................... 6
2.1.6 Matrix-MatrixProductAC ......................... 6
2.1.7 MatrixDiagonalMatrixProductAD .................... 6
2.1.8 Matrix-MatrixProductLD ......................... 6
2.1.9 Matrix-MatrixProductL1 D......................... 6
2.1.10 Matrix-MatrixProductLCwithLLowerTriangular............ 6
2.1.11 GramAH AofA............................... 6
2.1.12 SquaredFrobeniusNormkAk2 =tr(AH A) ................ 7F 2.1.13 SesquilinearFormcH Ab........................... 7
2.1.14 HermitianFormaH Ra............................ 7
2.1.15 GramLH LofaLowerTriangularMatrixL................. 7
2.2 Decompositions.................................... 8
2.2.1 CholeskyDecompositionR=LL H (GaxpyVersion) ........... 8
2.2.2 CholeskyDecompositionR=L1 DLH ................... 10 1 2.3 InversesofMatrices.................................. 11
2.3.1 InverseL1 ofaLowerTriangularMatrixL ................ 11
2.3.2 InverseL1 ofaLowerTriangularMatrixL1 1 withOnesontheMainDi-
agonal..................................... 12
2.3.3 InverseR1 ofaPositiveDefiniteMatrixR................. 13
2.4 SolvingSystemsofEquations ............................ 13
2.4.1 ProductL1 CwithL1 notknownapriori. ................ 13
3. Overview 14
Appendix 15
Bibliography 16
3 1. Introduction
Forthedesignofefficientundlow-complexityalgorithmsinmanysignal-processingtasks,ade-
tailedanalysisoftherequirednumberoffloating-pointoperations(FLOPs)isofteninevitable.
Mostfrequently,matrixoperationsareinvolved,suchasmatrix-matrixproductsandinversesof
matrices.StructureslikeHermitenessortriangularityforexamplecanbeexploitedtoreducethe
numberofneededFLOPsandwillbediscussedhere.Inthistechnicalreport,wederiveexpressions
forthenumberofmultiplicationsandsummationsthatamajorityofsignalprocessingalgorithms
inmobilecommunicationsbringwiththem.
Acknowledgments:
TheauthorwouldliketothankDipl.-Ing.DavidA.SchmidtandDipl.-Ing.GuidoDietlforthe
fruitfuldiscussionsonthistopic.
4 2. FlopCounting
Inthischapter,weofferexpressionsforthenumberofcomplexmultiplicationsandsummations
requiredforseveralmatrix-vectoroperations.Afloating-pointoperation(FLOP)isassumedtobe
eitheracomplexmultiplicationoracomplexsummationhere,despitethefactthatacomplexmul-
tiplicationrequires4realmultiplicationsand2realsummationswhereasacomplexsummations
constistsofonly2realsummations,makingamultiplicationmoreexpensivethanasummation.
However,wecounteachoperationasoneFLOP.
Throughoutthisreport,weassume2Ctobeascalar,thevectorsa2CN ,b2CN ,and
c2CM tohavedimensionN,N,andM,respectively.ThematricesA2CMN ,B2CNN ,
andC2CNL areassumedtohavenospecialstructure,whereasR=RH 2CNN isHermitian
andD=diagfd gN 2CNN isdiagonal.LisalowertriangularNNmatrix,e=1 n denotes
theunitvectorwitha1inthen-throwandzeroselsewhere.Itsdimensionalityischosensuchthat
therespectivematrix-vectorproductexists.Finally,[A]a;b denotestheelementinthea-throwand
b-thcolumnofA,[A]a:b;c:d selectsthesubmatrixofAconsistingofrowsatobandcolumnscto
d.0ab istheabzeromatrix.Transposition,Hermitiantransposition,conjugate,andreal-part
operatoraredenotedby()T ,()H ,() ,and<fg,respectively,andrequirenoFLOP.
2.1MatrixProducts
FrequentlyarisingmatrixproductsandtheamountofFLOPsrequiredfortheircomputationwill
bediscussedinthissection.
2.1.1Scalar-VectorMultiplicationa
AsimplemultiplicationaofavectorawithascalarrequiresNmultiplicationsandnosum-
mation.
2.1.2Scalar-MatrixMultiplicationA
ExtendingtheresultfromSubsection2.1.1toascalarmatrixmultiplicationArequiresNM
multiplicationsandagainnosummation.
2.1.3InnerProductaH bofTwoVectors
AninnerproductaH brequiresNmultiplicationsandN1summations,i.e.,2N1FLOPs.
2.1.4OuterProductac H ofTwoVectors
Anouterproductac H requiresNMmultiplicationsandnosummation.
5 6 2.FlopCounting
2.1.5Matrix-VectorProductAb
ComputingAbcorrespondstoapplyingtheinnerproductruleaH bfromSubsection2.1.3Mtimes. i Obviously,1iMandaH representsthei-throwofA.Hence,itscomputationcostsMNi multiplicationsandM(N1)summations,i.e.,2MNMFLOPs.
2.1.6Matrix-MatrixProductAC
Repeatedapplicationofthematrix-vectorruleAc i fromSubsection2.1.5withci beingthei-th
columnofCyieldstheoverallmatrix-matrixproductAC.Since1iL,thematrix-matrix
producthastheL-foldcomplexityofthematrix-vectorproduct.Thus,itneedsMNLmultiplica-
tionsandML(N1)summations,altogether2MNLMLFLOPs.
2.1.7MatrixDiagonalMatrixProductAD
IftherighthandsidematrixDofthematrixproductADisdiagonal,thecomputationalload
reducestoMmultiplicationsforeachoftheNcolumnsofA,sincethen-thcolumnofAis
scaledbythen-thmaindiagonalelementofD.Thus,MNmultiplicationsintotalarerequiredfor
thecomputationofAD,nosummationsareneeded.
2.1.8Matrix-MatrixProductLD
WhenmultiplyingalowertriangularmatrixLbyadiagonalmatrixD,columnnofthematrix
productrequiresNn+1multiplicationsandnosummations.Withn=1;:::;N,weget
1 N2 +1 Nmultiplications. 2 2
2.1.9Matrix-MatrixProductL1 D
WhenmultiplyingalowertriangularmatrixL1 withonesonthemaindiagonalbyadiagonal
matrixD,columnnofthematrixproductrequiresNnmultiplicationsandnosummations.
Withn=1;:::;N,weget 1 N2 1 Nmultiplications. 2 2
2.1.10Matrix-MatrixProductLCwithLLowerTriangular
ComputingtheproductofalowertriangularmatrixL2CNN andC2CNL isdonecolumn-
wise.ThenthelementineachPcolumnofLCrequiresnmultiplicationsPandn1summations,
sothecompletecolumnneeds N n=N2 +N multiplicationsand N (n1)=N2 N
n=1 2 2 n=1 2 2 summations.Thecompletematrix-matrixproductisobtainedfromcomputingLcolumns.Wehave
N2 L +NL multiplicationsand N2 L NL summations,yieldingatotalamountofN2 LFLOPs. 2 2 2 2
2.1.11GramAH AofA
IncontrasttothegeneralmatrixproductfromSubsection2.1.6,wecanmakeuseoftheHermitian
structureoftheproductAH A2CNN .Hence,thestrictlylowertriangularpartofAH Aneed
notbecomputed,sinceitcorrespondstotheHermitianofthestrictlyuppertriangularpart.For
thisreason,wehavetocomputeonlytheNmaindiagonalentriesofAH Aandthe N2 N upper 2
off-diagonalelements,soonly N2 +N differententrieshavetobeevaluated.Eachelementrequires 2 aninnerproductstepfromSubsection2.1.3costingMmultiplicationsandM1summations.
Therefore, 1 MN(N+1)multiplicationsand 1 (M1)N(N+1)summationsareneeded,making 2 2
upatotalamountofMN2 +MNN2 N FLOPs. 2 2 2.1MatrixProducts 7
2.1.12SquaredFrobeniusNormkAk2 =tr(AH A)F
ThesquaredHilbert-SchmidtnormkAk2 followsfromsumminguptheMNsquaredentriesfrom F A.WethereforehaveMNmultiplicationsandMN1summations,yieldingatotalof2MN1
FLOPs.
2.1.13SesquilinearFormcH Ab
ThesesquilinearformcH Abshouldbeevaluatedbycomputingthematrix-vectorproductAbina
firststepandthenmultiplyingwiththerowvectorcH fromthelefthandside.Thematrixvector
productrequiresMNmultiplicationsandM(N1)summations,whereastheinnerproductneeds
MmultiplicationsandM1summations.Altogether,M(N+1)multiplicationsandMN1
summationshavetobecomputedforthesesquilinearformcH Ab,yieldingatotalnumberof
2MN+M1flops.
2.1.14HermitianFormaH Ra
WiththeHermitianmatrixR=RH ,theproductaH Racanbeexpressedas
XN XN
aH Ra= aH em eT Rem n eT an
m=1n=1
XN XN
= a a (2.1) mn rm;n
m=1n=1
XN N X1 XN
= jam j2 rm;m +2 <fa amn rm;n g;
m=1 m=1n=m+1
witham =[a]m;1 ,andrm;n =[R]m;n .Thefirstsumaccumulatestheweightedmaindiagonal
entriesandrequires2NmultiplicationsandN1summations. 1 Thesecondpartof(2.1)accumu-
latesallweightedoff-diagonalentriesfromA.Thelasttwosummationssumup N(N1) terms 2 .2
Consequently,thesecondpartof(2.1)requires N(N1) 1summationsandN(N1)products 3 .2 Finally,thetwopartshavetobeaddedaccountingforanadditionalsummationandyieldingan
overallamountofN2 +Nproductsand 1 N2 +1 N1summations,correspondingto 3 N2 +3 N12 2 2 2 FLOPs 4 .
2.1.15GramLH LofaLowerTriangularMatrixL
Duringthecomputationoftheinverseofapositivedefinitematrix,theGrammatrixofalower
triangularmatrixoccurswhenCholeskydecompositionisapplied.Again,wemakeuseofthe
HermitianstructureoftheGramLH L,soonlythemaindiagonalentriesandtheupperrightoff-
diagonalentriesoftheproducthavetobeevaluated.Thea-thmain-diagonalentrycanbeexpressed
1 Wedonotexploitthefactthatonlyreal-valuedsummandsareaccumulatedasweonlyaccountforcomplexflops. P P P P2 N1 N 1= N1 (Nm)=N(N1) N1 m=N(N1)N(N1) =N(N1) .Wemade m=1 n=m+1 m=1 m=1 2 2 useof(A1)intheAppendixforthecomputationofthelastsumaccumulatingsubsequentintegers.
3 Thescalingwiththefactor2doesnotrequireaFLOP,asitcanbeimplementedbyasimplebitshift.
4 Clearly,ifN=1,wehavetosubtractonesummationfromthecalculationsincenooff-diagonalentriesexist. 8 2.FlopCounting
as XN
[LH L]a;a = jn;a j2 ; (2.2)
n=a
withn;a =[L]n;a ,requiringNa+1multiplicationsandNasummations.Hence,allmainP Pdiagonalelementsneed N (Nn+1)=1 N2 +1 Nmultiplicationsand N (Nn)=n=1 2 2 n=11 N2 1 Nsummations. 2 2 Theupperrightoff-diagonalentry[LH L]a;b inrowaandcolumnbwitha<breadsas
XN
[LH L]a;b = n;an;b ; (2.3)
n=b
againaccountingforNb+1multiplicationsandNbsummations.Thesetwoexpressions
havetobesummedupoverall1aN1anda+1bN,andforthenumberof
multiplications,wefind
" #N X1 XN N X1 XN
(Nb+1)= (Na)(N+1) b
a=1b=a+1 a=1 b=a+1
N X1 N(N+1)a(a+1)= N2 +Na(N+1) 2a=1
N X1 N2 +N a2 1= + a N+2 2 2a=1 (N1)(N+1)N (N1)N(2N1) 1 N(N1)= + N+2 26 2 2
1 1= N3 N:6 6 (2.4)
Again,wemadeuseof(A1)forthesumofsubsequentintegersand(A2)forthesumofsubsequent
squaredintegers.Forthenumberofsummations,weevaluate
N X1 XN 1 1 1(Nb)= N3 N2 + N: (2.5)6 2 3a=1 b=a+1
ComputingallnecessaryelementsoftheGramLH Ltherebyrequires 1 N3 +1 N2 +1 Nmultipli- 6 2 3 cationsand 1 N3 1 Nsummations.Altogether,1 N3 +1 N2 +1 NFLOPsresult.Thesameresult 6 6 3 2 6 ofcourseholdsfortheGramoftwouppertriangularmatrices.
2.2 Decompositions
2.2.1CholeskyDecompositionR=LL H (GaxpyVersion)
InsteadofcomputingtheinverseofapositivedefinitematrixRdirectly,itismoreefficientto
startwiththeCholeskydecompositionR=LL H andtheninvertthelowertriangularmatrixL
andcomputeitsGram.Inthissection,wecountthenumberofFLOPsnecessaryfortheCholesky
decomposition. 2.2Decompositions 9
TheimplementationoftheGeneralizedAxplusy(Gaxpy)versionoftheCholeskydecom-
position,whichoverwritesthelowertriangularpartofthepositivedefinitematrixRislistedin
Algorithm2.1,see[1].NotethatRneedstobepositivedefinitefortheLL H decomposition!
Algorithm2.1AlgorithmfortheGaxpyversionoftheCholeskydecomposition.
z2 }| CN {
[R]1: [R]1:N;1 = p1:N;1
[R]1;1
2: forn=2toNdo
3: [R]n:N;n =[R] ] [R]H
| {z n:N;n [R} | n: {z N;1:n }1 | n; {z 1:n }1
2CNn+1 2C(Nn+1)(n1) 2C(n1)
z2CN }| n+1 {
[4: [R] pR]n:N;nn:N;n = [R]n;n
5: endfor
6: L=tril(R) {lowertriangularpartofoverwrittenR}
ThecomputationofthefirstcolumnofLinLine1ofAlgorithm2.1requiresN1multiplica-
tions 5 ,asinglesquare-rootoperation,andnosummations.Columnn>1takesamatrixvector
productofdimension(Nn+1)(n1)whichissubtractedfromanother(Nn+1)-
dimensionalvectorinvolvingNn+1summations,seeLine3.Finally,Nnmultiplications 6
andasinglesquare-rootoperationarenecessaryinLine4.Inshort,rownwith1<nNneeds
n2 +n(N+1)1multiplications,n2 +n(N+2)N1summations(seeSubsection
2.1.5),andonesquarerootoperation,whichweclassifyasanadditionalFLOP.Summingupthe
multiplicationsforrows2nN,weobtain
XN N(N+1)2 N(N+1)(2N+1)6(n2 +n(N+1)1)=(N+1) (N1)2 6n=2
N3 +2N2 N 2N3 +3N2 +N= (N1)2 61 1 5= N3 + N2 N+1:6 2 3 (2.6)
Thenumberofsummationsforrows2nNreadsas
XN N(N+1)2(n2 +n(N+2)N1)=(N+1)(N1)+(N+2) 2n=2
N(N+1)(2N+1)6 6 (2.7)
N3 +3N2 4 2N3 +3N2 +N6=N2 +1+ 2 61 1= N3 N; 6 6
5 Thefirstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthedenominator.
6 Again,thefirstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthe
denominator. 10 2.FlopCounting
Algorithm2.2AlgorithmfortheCholeskydecompositionLDLH .
z2C }| N1 {
[R]1: [R] 2:N;1
2:N;1 = [R]1;1
2: forn=2toNdo
3: fori=1ton1do
[R] 14: [v]i = 1;n ifi=
[R]i;i [R] ifi6=1n;i
5: endfor
6: [v]n =[R]n;n [R]n; [v]| {z 1:n }1 | {z 1:n }1
2C1n1 2Cn1
7: [R]n;n =[v]n
z2C }| Nn { z2C(N }| n)(n1) {z2C }| n1 {
[R] [R]8: [R] n+1:N;n n+1:N;1:n1 [v]1:n1
n+1:N;n = [v]n
9: endfor
10: D=diag(diag(R))(returndiagonalD)
11: L1 =tril(R)withonesonthemaindiagonal
andfinally,N1square-rootoperationsareneededfortheN1rows.IncludingtheN1
multiplicationsforcolumnn=1andtheadditionalsquarerootoperation, 1 N3 +1 N2 2 N6 2 3 multiplications, 1 N3 1 Nsummations,andNsquare-rootoperationsoccur,1 N3 +1 N2 +1 N6 6 3 2 6 FLOPsintotal.
2.2.2CholeskyDecompositionR=L1 DLH
1
ThemainadvantageoftheL1 DLH decompositioncomparedtothestandardLL H decomposition 1 isthatnosquarerootoperationsareneeded,whichmayrequiremorethanoneFLOPdepending
onthegivenhardwareplatform.AnotherbenefitoftheL1 DLH decompositionisthatitdoesnot 1 requireapositivedefinitematrixR,theonlytwoconditionsfortheuniqueexistencearethatRis
Hermitianandallbutthelastprincipleminor(i.e.,thedeterminant)ofRneedtobedifferentfrom
zero[2].Hence,Rmayalsoberankdeficienttoacertaindegree.IfRisnotpositivesemidefinite,
thenDmaycontainnegativemaindiagonalentries.
TheoutcomeofthedecompositionisalowertriangularmatrixL1 withonesonthemain
diagonalandadiagonalmatrixD.
Algorithm2.2overwritesthestrictlylowerleftpartofthematrixRwiththestrictlylowerpart
ofL1 (i.e.,withouttheonesonthemaindiagonal)andoverwritesthemaindiagonalofRwith
themaindiagonalofD.Itistakenfrom[1]andslightlymodified,suchthatisalsoapplicableto
complexmatrices(seetheconjugateinLine4)andnoexistingscalarshouldbere-computed(see
casedistinctioninLine4fori=1).
Line1needsN1multiplications.P Lines3to5requiren2multiplicationsandareexe-
cutedforn=2;:::;N,yielding N (n2)=N2 3N+2 multiplications.Line6takesn1n=2 2 P multiplicationsandn1summations,againwithn=2;:::;N,yielding N (n1)=N2 N
n=2 2 multiplicationsandthesameamountofsummations.Line7doesnotrequireanyFLOP.InLine8,
thematrix-vectorproductneeds(Nn)(n1)multiplications,andadditionalNnmultiplica- 2.3InversesofMatrices 11
tionsarisewhenthecompletenumeratorisdiPvidedbythedenominator.Hence,wehaveNnn2
multiplications.Forn=2;:::;N,weget N (Nnn2 )=1 N3 7 N+1multiplications. n=2 6 6 ThenumberofsummationsinLine8is(Nn)(n2)forthematrixvectorproductandNn
forthesubtractioninthePnumerator.Together,wehaven2 +n(N+1)Nsummations.With
n=2;:::;N,weget N [n2 +n(N+1)N)]=1 N3 1 N2 +1 Nsummations. n=2 6 2 3
Summingup,thisalgorithmrequires 1 N3 +N2 13 N+1multiplications,and 1 N3 1 N6 6 6 6 summations,yieldingatotalamountof 1 N3 +N2 7 N+1FLOPs.(Notethatthisformulais 3 3 alsovalidforN=1.)
2.3InversesofMatrices
2.3.1InverseL1 ofaLowerTriangularMatrixL
LetX=[x1 ;:::;xN ]=L1 denotetheinverseofalowertriangularmatrixL.Then,Xisagain
lowertriangularwhichmeansthat[X]b;n =0forb<n.Thefollowingequationholds:
Lx n =en : (2.8)
Viaforwardsubstitution,abovesystemcaneasilybesolved.Rowb(nbN)from(2.8)can
beexpressedas
Xb
b;a xa;n =b;n ; (2.9)
a=n
withb;n denotingtheKroneckerdeltawhichvanishesforb6=n,andxa;n =[X]a;n =[xn ]a;1 .
Startingfromb=1,thexb;n arecomputedsuccessively,andwefind
" #
1 Xb1
xb;n = b;a xa;n b;n ; (2.10)
b;b a=n
withallxa;n ;nab1havingbeencomputedinprevioussteps.Hence,ifn=b,xn;n =
1 andasinglemultiplication 7 isrequired,nosummationsareneeded.Forb>n,bn+1 multiplications n;n andbn1summationsarerequired,astheKronecker-deltavanishes.Allmain
diagonalentriescanbecomputedbymeansofNmultiplicationsThelowerleftoff-diagonalentries
7 Actually,itisadivisionratherthanamultiplication. 12 2.FlopCounting
require " #N X1 XN N X1 XN
(bn+1)= (1n)(Nn)+ b
n=1b=n+1 n=1 b=n+1
N X1 N2 +Nn2 n= N+n2 n(N+1)+ 2n=1
N X1 N2 3N n2 3= + + n(N+ ) (2.11)2 2 2 2n=1
N (N1)N(2N1)=(N1) (N+3)+2 26
3(N1)N(N+ )2 21 1 2= N3 + N2 N6 2 3
multiplications,and
N X1 XN 1 1 1(bn1)= N3 N2 + N (2.12)6 2 3n=1b=n+1
summations.IncludingtheNmultiplicationsforthemain-diagonalentries, 1 N3 +1 N2 +1 N6 2 3 multiplicationsand 1 N3 1 N2 +1 Nsummationshavetobeimplemented,yieldingatotalamount 6 2 3 of 1 N3 +2 NFLOPs. 3 3
2.3.2InverseL1 ofaLowerTriangularMatrixL1 1 withOnesontheMainDiagonal
TheinverseofalowertriangularmatrixL1 turnsouttorequireN2 FLOPslessthantheinverse
ofLwitharbitrarynonzerodiagonalelements.LetXdenotetheinverseofL1 .Clearly,Xis
againalowertriangularmatrixwithonesonthemaindiagonal.Wecanexploitthisfactinorder
tocomputeonlytheunknownentries.
ThemthrowandnthcolumnofthesystemofequationsL1 X=IN withmn+1readsas 8
m X1
lm;n + lm;i xi;n +xm;n =0;
i=n+1
im1
or,equivalently, 2 3
6 m X1 7xm;n =4lm;n + lm;i xi;n 5:
i=n+1
im1
Hence,Xiscomputedviaforwardsubstitution.Tocomputexm;n ,weneedmn1multipli-
cationsandmn1summations.Rememberthatmn+1.Thetotalnumberofmultiplica-
tions/summationsisobtainedfrom
N X1 XN 1 1 1 (mn1)= N3 N2 + N: (2.13) 6 2 3n=1m=n+1
8 Weonlyhavetoconsidermn+1,sincetheequationsresultingfromm<n+1areautomaticallyfulfilled
duetothestructureofL1 andX. 2.4SolvingSystemsofEquations 13
Summingup, 1 N3 N2 +2 NFLOPsareneeded. 3 3
2.3.3InverseR1 ofaPositiveDefiniteMatrixR
TheinverseofamatrixcanforexamplebecomputedviaGaussian-elimination[1].However,this
approachiscomputationallyexpensiveanddoesnotexploittheHermitianstructureofR.Instead,
itismoreefficienttostartwiththeCholeskydecompositionofR=LL H (seeSubsection2.2.1),
invertthelowertriangularmatrixL(seeSubsection2.3.1),andthenbuildtheGramLH L1
ofL1 (seeSubsection2.1.15).Summinguptherespectivenumberofoperations,thisprocedure
requires 1 N3 +3 N2 multiplications, 1 N3 1 N2 summations,andNsquare-rootoperations,which 2 2 2 2 yieldsatotalamountofN3 +N2 +NFLOPs.
2.4SolvingSystemsofEquations
2.4.1ProductL1 CwithL1 notknownapriori.
AnaivewayofcomputingthesolutionX=L1 CoftheequationLX=CistofindL1 first
andafterwardsmultiplyitbyC.ThisapproachneedsN2 (L+1 N)+2 NFLOPsasshownin 3 3 Sections2.3.1and2.1.10.However,doingsoisveryexpensivesincewearenotinterestedinthe
inverseofLingeneral.Hence,theremustbeacomputationallycheapervariant.Again,forward
substitutionplaysakeyrole.
Itiseasytosee,thatXcanbecomputedcolumn-wise.Letxb;a =[X]b;a ,b;a =[L]b;a ,and
cb;c =[C]b;a .Then,fromLX=C,wegetfortheelementxb;a inrowbandcolumnaofX:
" #
1 Xb1
xb;a = b;i xi;a cb;a : (2.14)
b;b i=1
ItscomputationrequiresbmultiplicationsP andb1summations.PAcompletecolumnofXcan
thereforethecomputedwith N b=N2 +N multiplicationsand N (b1)=N2 N summa- b=1 2 2 b=1 2 2 tions.ThecompletematrixXwithLcolumnsthusneedsN2 LFLOPs,sotheforwardsubstitution
saves 1 N3 +2 NFLOPscomparedtothedirectioninversionofLandasubsequentmatrixmatrix 3 3 product.Interestingly,computingL1 CwithL1 unknownisasexpensiveascomputingLC,see
Section2.1.10. 3. Overview
A2CMN ,B2CNN ,andC2CNL arearbitrarymatrices.D2CNN isadiagonalmatrix,
L2CNN islowertriangular,L1 2CNN islowertriangularwithonesonthemaindiagonal,
a;b2CN ,c2CM ,andR2CNN ispositivedefinite.
Expression Description products summations FLOPs
a VectorScaling N N
A MatrixScaling MN MN
aH b InnerProduct N N1 2N1
ac H OuterProduct MN MN
Ab MatrixVectorProd. MN M(N1) 2MNM
AC MatrixMatrixProd. MNL ML(N1) 2MNLML
AD DiagonalMatrixProd. MN MN
LD Matrix-MatrixProd. 1 N2 +1 N 0 1 N2 +1 N2 2 2 2
L 11 D Matrix-MatrixProd. 1 N2 1 N 0 N2 1 N2 2 2 2
LC MatrixProduct N2 L +NL N2 L NL N2 L2 2 2 2
AH A Gram MN(N+1) (M1)N(N+1) MN2 +N(MN )N
2 2 2 2
kAk2 FrobeniusNorm MN MN1 2MN1F
cH Ab SesquilinearForm M(N+1) MN1 2MN+M1
aH Ra HermitianForm N2 +N N2 +N 1 3 N2 +3 N12 2 2 2
LH L GramofTriangular N3 +N2 +N N3 N 1 N3 +1 N2 +1 N6 2 3 6 6 3 2 6
L CholeskyR=LL H N3 +N2 2 N N3 N 1 N3 +1 N2 +1 N6 2 3 6 6 3 2 6 (Gaxpyversion) (Nrootsincluded)
L;D CholeskyR=LDLH N3 +N2 13N +1 N3 N 1 N3 +N2 7 N+16 6 6 6 3 3
L1 InverseofTriangular N3 +N2 +N N3 N2 +N 1 N3 +2 N6 2 3 6 2 3 3 3
L1 InverseofTriangular N3 N2 +N N3 N2 +N 1 N3 N2 +2 N1 6 2 3 6 2 3 3 3 withonesonmaindiag.
R1 InverseofPos.Definite N3 +3N2 N3 N2 N3 +N2 +N2 2 2 2 (Nrootsincluded)
L1 C L1 unknown N2 L +NL N2 L NL N2 L2 2 2 2
14 Appendix
AfrequentlyoccurringsummationinFLOPcountingisthesumofsubsequentintegers.Bycom-
pleteinduction,wefind
XN N(N+1)n= : (A1)2n=1
Aboveresultcaneasilybeverifiedbyrecognizingthatthesumofthen-thandthe(Nn)-th
summandisequaltoN+1,andwehaveN suchpairs. 2 Anothersumofrelevanceisthesumofsubsequentsquaredintegers.Again,viacomplete
induction,wefind
XN N(N+1)(2N+1)n2 = : (A2)6n=1
15 Bibliography
[1]G.H.GolubandC.F.VanLoan,MatrixComputations,JohnsHopkinsUniversityPress,1991.
[2]Kh.D.IkramovandN.V.Saveleva,“ConditionallyDefiniteMatrices,”JournalofMathemat-
icalSciences,vol.98,no.1,pp.150,2000.
16