testing_generation/Corpus/Floating Point Operations i...

671 lines
30 KiB
Plaintext
Raw Normal View History

2020-08-06 20:53:44 +00:00
FloatingPointOperationsinMatrix-VectorCalculus
(Version1.3)
RaphaelHunger
TechnicalReport
2007
TechnischeUniversitätMünchenAssociateInstituteforSignalProcessing
Univ.-Prof.Dr.-Ing.WolfgangUtschick History
Version1.00:October2005
-Initialversion
Version1.01:2006
-RewriteofsesquilinearformwithareducedamountofFLOPs
-SeveralTyposfixedconcerningthenumberofFLOPSrequiredfortheCholeskydecompo-
sition
Version1.2:November2006
-ConditionsfortheexistenceofthestandardLL H Choleskydecompositionspecified(pos-
itivedefiniteness)
-OuterproductversionofLL H Choleskydecompositionremoved
-FLOPsrequiredinGaxpyversionofLL H Choleskydecompositionupdated
-L1 DLH Choleskydecompositionadded 1 -Matrix-matrixproductLCaddedwithLtriangular
-Matrix-matrixproductL1 CaddedwithLtriangularandL1 notknownapriori
-InverseL1 ofalowertriangularmatrixwithonesonthemaindiagonaladded 1 Version1.3:September2007
-Firstgloballyaccessibledocumentversion
ToDo:(unknownwhen)
-QR-Decomposition
-LR-Decomposition
Pleasereportanybugandsuggestiontohunger@tum.de
2 Contents
1. Introduction 4
2. FlopCounting 5
2.1 MatrixProducts.................................... 5
2.1.1 Scalar-VectorMultiplicationa....................... 5
2.1.2 Scalar-MatrixMultiplicationA ...................... 5
2.1.3 InnerProductaH bofTwoVectors...................... 5
2.1.4 OuterProductac H ofTwoVectors...................... 5
2.1.5 Matrix-VectorProductAb.......................... 6
2.1.6 Matrix-MatrixProductAC ......................... 6
2.1.7 MatrixDiagonalMatrixProductAD .................... 6
2.1.8 Matrix-MatrixProductLD ......................... 6
2.1.9 Matrix-MatrixProductL1 D......................... 6
2.1.10 Matrix-MatrixProductLCwithLLowerTriangular............ 6
2.1.11 GramAH AofA............................... 6
2.1.12 SquaredFrobeniusNormkAk2 =tr(AH A) ................ 7F 2.1.13 SesquilinearFormcH Ab........................... 7
2.1.14 HermitianFormaH Ra............................ 7
2.1.15 GramLH LofaLowerTriangularMatrixL................. 7
2.2 Decompositions.................................... 8
2.2.1 CholeskyDecompositionR=LL H (GaxpyVersion) ........... 8
2.2.2 CholeskyDecompositionR=L1 DLH ................... 10 1 2.3 InversesofMatrices.................................. 11
2.3.1 InverseL1 ofaLowerTriangularMatrixL ................ 11
2.3.2 InverseL1 ofaLowerTriangularMatrixL1 1 withOnesontheMainDi-
agonal..................................... 12
2.3.3 InverseR1 ofaPositiveDefiniteMatrixR................. 13
2.4 SolvingSystemsofEquations ............................ 13
2.4.1 ProductL1 CwithL1 notknownapriori. ................ 13
3. Overview 14
Appendix 15
Bibliography 16
3 1. Introduction
Forthedesignofefficientundlow-complexityalgorithmsinmanysignal-processingtasks,ade-
tailedanalysisoftherequirednumberoffloating-pointoperations(FLOPs)isofteninevitable.
Mostfrequently,matrixoperationsareinvolved,suchasmatrix-matrixproductsandinversesof
matrices.StructureslikeHermitenessortriangularityforexamplecanbeexploitedtoreducethe
numberofneededFLOPsandwillbediscussedhere.Inthistechnicalreport,wederiveexpressions
forthenumberofmultiplicationsandsummationsthatamajorityofsignalprocessingalgorithms
inmobilecommunicationsbringwiththem.
Acknowledgments:
TheauthorwouldliketothankDipl.-Ing.DavidA.SchmidtandDipl.-Ing.GuidoDietlforthe
fruitfuldiscussionsonthistopic.
4 2. FlopCounting
Inthischapter,weofferexpressionsforthenumberofcomplexmultiplicationsandsummations
requiredforseveralmatrix-vectoroperations.Afloating-pointoperation(FLOP)isassumedtobe
eitheracomplexmultiplicationoracomplexsummationhere,despitethefactthatacomplexmul-
tiplicationrequires4realmultiplicationsand2realsummationswhereasacomplexsummations
constistsofonly2realsummations,makingamultiplicationmoreexpensivethanasummation.
However,wecounteachoperationasoneFLOP.
Throughoutthisreport,weassume2Ctobeascalar,thevectorsa2CN ,b2CN ,and
c2CM tohavedimensionN,N,andM,respectively.ThematricesA2CMN ,B2CNN ,
andC2CNL areassumedtohavenospecialstructure,whereasR=RH 2CNN isHermitian
andD=diagfd gN 2CNN isdiagonal.LisalowertriangularNNmatrix,e=1 n denotes
theunitvectorwitha1inthen-throwandzeroselsewhere.Itsdimensionalityischosensuchthat
therespectivematrix-vectorproductexists.Finally,[A]a;b denotestheelementinthea-throwand
b-thcolumnofA,[A]a:b;c:d selectsthesubmatrixofAconsistingofrowsatobandcolumnscto
d.0ab istheabzeromatrix.Transposition,Hermitiantransposition,conjugate,andreal-part
operatoraredenotedby()T ,()H ,() ,and<fg,respectively,andrequirenoFLOP.
2.1MatrixProducts
FrequentlyarisingmatrixproductsandtheamountofFLOPsrequiredfortheircomputationwill
bediscussedinthissection.
2.1.1Scalar-VectorMultiplicationa
AsimplemultiplicationaofavectorawithascalarrequiresNmultiplicationsandnosum-
mation.
2.1.2Scalar-MatrixMultiplicationA
ExtendingtheresultfromSubsection2.1.1toascalarmatrixmultiplicationArequiresNM
multiplicationsandagainnosummation.
2.1.3InnerProductaH bofTwoVectors
AninnerproductaH brequiresNmultiplicationsandN1summations,i.e.,2N1FLOPs.
2.1.4OuterProductac H ofTwoVectors
Anouterproductac H requiresNMmultiplicationsandnosummation.
5 6 2.FlopCounting
2.1.5Matrix-VectorProductAb
ComputingAbcorrespondstoapplyingtheinnerproductruleaH bfromSubsection2.1.3Mtimes. i Obviously,1iMandaH representsthei-throwofA.Hence,itscomputationcostsMNi multiplicationsandM(N1)summations,i.e.,2MNMFLOPs.
2.1.6Matrix-MatrixProductAC
Repeatedapplicationofthematrix-vectorruleAc i fromSubsection2.1.5withci beingthei-th
columnofCyieldstheoverallmatrix-matrixproductAC.Since1iL,thematrix-matrix
producthastheL-foldcomplexityofthematrix-vectorproduct.Thus,itneedsMNLmultiplica-
tionsandML(N1)summations,altogether2MNLMLFLOPs.
2.1.7MatrixDiagonalMatrixProductAD
IftherighthandsidematrixDofthematrixproductADisdiagonal,thecomputationalload
reducestoMmultiplicationsforeachoftheNcolumnsofA,sincethen-thcolumnofAis
scaledbythen-thmaindiagonalelementofD.Thus,MNmultiplicationsintotalarerequiredfor
thecomputationofAD,nosummationsareneeded.
2.1.8Matrix-MatrixProductLD
WhenmultiplyingalowertriangularmatrixLbyadiagonalmatrixD,columnnofthematrix
productrequiresNn+1multiplicationsandnosummations.Withn=1;:::;N,weget
1 N2 +1 Nmultiplications. 2 2
2.1.9Matrix-MatrixProductL1 D
WhenmultiplyingalowertriangularmatrixL1 withonesonthemaindiagonalbyadiagonal
matrixD,columnnofthematrixproductrequiresNnmultiplicationsandnosummations.
Withn=1;:::;N,weget 1 N2 1 Nmultiplications. 2 2
2.1.10Matrix-MatrixProductLCwithLLowerTriangular
ComputingtheproductofalowertriangularmatrixL2CNN andC2CNL isdonecolumn-
wise.ThenthelementineachPcolumnofLCrequiresnmultiplicationsPandn1summations,
sothecompletecolumnneeds N n=N2 +N multiplicationsand N (n1)=N2 N
n=1 2 2 n=1 2 2 summations.Thecompletematrix-matrixproductisobtainedfromcomputingLcolumns.Wehave
N2 L +NL multiplicationsand N2 L NL summations,yieldingatotalamountofN2 LFLOPs. 2 2 2 2
2.1.11GramAH AofA
IncontrasttothegeneralmatrixproductfromSubsection2.1.6,wecanmakeuseoftheHermitian
structureoftheproductAH A2CNN .Hence,thestrictlylowertriangularpartofAH Aneed
notbecomputed,sinceitcorrespondstotheHermitianofthestrictlyuppertriangularpart.For
thisreason,wehavetocomputeonlytheNmaindiagonalentriesofAH Aandthe N2 N upper 2
off-diagonalelements,soonly N2 +N differententrieshavetobeevaluated.Eachelementrequires 2 aninnerproductstepfromSubsection2.1.3costingMmultiplicationsandM1summations.
Therefore, 1 MN(N+1)multiplicationsand 1 (M1)N(N+1)summationsareneeded,making 2 2
upatotalamountofMN2 +MNN2 N FLOPs. 2 2 2.1MatrixProducts 7
2.1.12SquaredFrobeniusNormkAk2 =tr(AH A)F
ThesquaredHilbert-SchmidtnormkAk2 followsfromsumminguptheMNsquaredentriesfrom F A.WethereforehaveMNmultiplicationsandMN1summations,yieldingatotalof2MN1
FLOPs.
2.1.13SesquilinearFormcH Ab
ThesesquilinearformcH Abshouldbeevaluatedbycomputingthematrix-vectorproductAbina
firststepandthenmultiplyingwiththerowvectorcH fromthelefthandside.Thematrixvector
productrequiresMNmultiplicationsandM(N1)summations,whereastheinnerproductneeds
MmultiplicationsandM1summations.Altogether,M(N+1)multiplicationsandMN1
summationshavetobecomputedforthesesquilinearformcH Ab,yieldingatotalnumberof
2MN+M1flops.
2.1.14HermitianFormaH Ra
WiththeHermitianmatrixR=RH ,theproductaH Racanbeexpressedas
XN XN
aH Ra= aH em eT Rem n eT an
m=1n=1
XN XN
= a a (2.1) mn rm;n
m=1n=1
XN N X1 XN
= jam j2 rm;m +2 <fa amn rm;n g;
m=1 m=1n=m+1
witham =[a]m;1 ,andrm;n =[R]m;n .Thefirstsumaccumulatestheweightedmaindiagonal
entriesandrequires2NmultiplicationsandN1summations. 1 Thesecondpartof(2.1)accumu-
latesallweightedoff-diagonalentriesfromA.Thelasttwosummationssumup N(N1) terms 2 .2
Consequently,thesecondpartof(2.1)requires N(N1) 1summationsandN(N1)products 3 .2 Finally,thetwopartshavetobeaddedaccountingforanadditionalsummationandyieldingan
overallamountofN2 +Nproductsand 1 N2 +1 N1summations,correspondingto 3 N2 +3 N12 2 2 2 FLOPs 4 .
2.1.15GramLH LofaLowerTriangularMatrixL
Duringthecomputationoftheinverseofapositivedefinitematrix,theGrammatrixofalower
triangularmatrixoccurswhenCholeskydecompositionisapplied.Again,wemakeuseofthe
HermitianstructureoftheGramLH L,soonlythemaindiagonalentriesandtheupperrightoff-
diagonalentriesoftheproducthavetobeevaluated.Thea-thmain-diagonalentrycanbeexpressed
1 Wedonotexploitthefactthatonlyreal-valuedsummandsareaccumulatedasweonlyaccountforcomplexflops. P P P P2 N1 N 1= N1 (Nm)=N(N1) N1 m=N(N1)N(N1) =N(N1) .Wemade m=1 n=m+1 m=1 m=1 2 2 useof(A1)intheAppendixforthecomputationofthelastsumaccumulatingsubsequentintegers.
3 Thescalingwiththefactor2doesnotrequireaFLOP,asitcanbeimplementedbyasimplebitshift.
4 Clearly,ifN=1,wehavetosubtractonesummationfromthecalculationsincenooff-diagonalentriesexist. 8 2.FlopCounting
as XN
[LH L]a;a = jn;a j2 ; (2.2)
n=a
withn;a =[L]n;a ,requiringNa+1multiplicationsandNasummations.Hence,allmainP Pdiagonalelementsneed N (Nn+1)=1 N2 +1 Nmultiplicationsand N (Nn)=n=1 2 2 n=11 N2 1 Nsummations. 2 2 Theupperrightoff-diagonalentry[LH L]a;b inrowaandcolumnbwitha<breadsas
XN
[LH L]a;b = n;an;b ; (2.3)
n=b
againaccountingforNb+1multiplicationsandNbsummations.Thesetwoexpressions
havetobesummedupoverall1aN1anda+1bN,andforthenumberof
multiplications,wefind
" #N X1 XN N X1 XN
(Nb+1)= (Na)(N+1) b
a=1b=a+1 a=1 b=a+1
N X1 N(N+1)a(a+1)= N2 +Na(N+1) 2a=1
N X1 N2 +N a2 1= + a N+2 2 2a=1 (N1)(N+1)N (N1)N(2N1) 1 N(N1)= + N+2 26 2 2
1 1= N3 N:6 6 (2.4)
Again,wemadeuseof(A1)forthesumofsubsequentintegersand(A2)forthesumofsubsequent
squaredintegers.Forthenumberofsummations,weevaluate
N X1 XN 1 1 1(Nb)= N3 N2 + N: (2.5)6 2 3a=1 b=a+1
ComputingallnecessaryelementsoftheGramLH Ltherebyrequires 1 N3 +1 N2 +1 Nmultipli- 6 2 3 cationsand 1 N3 1 Nsummations.Altogether,1 N3 +1 N2 +1 NFLOPsresult.Thesameresult 6 6 3 2 6 ofcourseholdsfortheGramoftwouppertriangularmatrices.
2.2 Decompositions
2.2.1CholeskyDecompositionR=LL H (GaxpyVersion)
InsteadofcomputingtheinverseofapositivedefinitematrixRdirectly,itismoreefficientto
startwiththeCholeskydecompositionR=LL H andtheninvertthelowertriangularmatrixL
andcomputeitsGram.Inthissection,wecountthenumberofFLOPsnecessaryfortheCholesky
decomposition. 2.2Decompositions 9
TheimplementationoftheGeneralizedAxplusy(Gaxpy)versionoftheCholeskydecom-
position,whichoverwritesthelowertriangularpartofthepositivedefinitematrixRislistedin
Algorithm2.1,see[1].NotethatRneedstobepositivedefinitefortheLL H decomposition!
Algorithm2.1AlgorithmfortheGaxpyversionoftheCholeskydecomposition.
z2 }| CN {
[R]1: [R]1:N;1 = p1:N;1
[R]1;1
2: forn=2toNdo
3: [R]n:N;n =[R] ] [R]H
| {z n:N;n [R} | n: {z N;1:n }1 | n; {z 1:n }1
2CNn+1 2C(Nn+1)(n1) 2C(n1)
z2CN }| n+1 {
[4: [R] pR]n:N;nn:N;n = [R]n;n
5: endfor
6: L=tril(R) {lowertriangularpartofoverwrittenR}
ThecomputationofthefirstcolumnofLinLine1ofAlgorithm2.1requiresN1multiplica-
tions 5 ,asinglesquare-rootoperation,andnosummations.Columnn>1takesamatrixvector
productofdimension(Nn+1)(n1)whichissubtractedfromanother(Nn+1)-
dimensionalvectorinvolvingNn+1summations,seeLine3.Finally,Nnmultiplications 6
andasinglesquare-rootoperationarenecessaryinLine4.Inshort,rownwith1<nNneeds
n2 +n(N+1)1multiplications,n2 +n(N+2)N1summations(seeSubsection
2.1.5),andonesquarerootoperation,whichweclassifyasanadditionalFLOP.Summingupthe
multiplicationsforrows2nN,weobtain
XN N(N+1)2 N(N+1)(2N+1)6(n2 +n(N+1)1)=(N+1) (N1)2 6n=2
N3 +2N2 N 2N3 +3N2 +N= (N1)2 61 1 5= N3 + N2 N+1:6 2 3 (2.6)
Thenumberofsummationsforrows2nNreadsas
XN N(N+1)2(n2 +n(N+2)N1)=(N+1)(N1)+(N+2) 2n=2
N(N+1)(2N+1)6 6 (2.7)
N3 +3N2 4 2N3 +3N2 +N6=N2 +1+ 2 61 1= N3 N; 6 6
5 Thefirstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthedenominator.
6 Again,thefirstelementneednotbecomputedtwice,sincetheresultofthedivisionisthesquarerootofthe
denominator. 10 2.FlopCounting
Algorithm2.2AlgorithmfortheCholeskydecompositionLDLH .
z2C }| N1 {
[R]1: [R] 2:N;1
2:N;1 = [R]1;1
2: forn=2toNdo
3: fori=1ton1do
[R] 14: [v]i = 1;n ifi=
[R]i;i [R] ifi6=1n;i
5: endfor
6: [v]n =[R]n;n [R]n; [v]| {z 1:n }1 | {z 1:n }1
2C1n1 2Cn1
7: [R]n;n =[v]n
z2C }| Nn { z2C(N }| n)(n1) {z2C }| n1 {
[R] [R]8: [R] n+1:N;n n+1:N;1:n1 [v]1:n1
n+1:N;n = [v]n
9: endfor
10: D=diag(diag(R))(returndiagonalD)
11: L1 =tril(R)withonesonthemaindiagonal
andfinally,N1square-rootoperationsareneededfortheN1rows.IncludingtheN1
multiplicationsforcolumnn=1andtheadditionalsquarerootoperation, 1 N3 +1 N2 2 N6 2 3 multiplications, 1 N3 1 Nsummations,andNsquare-rootoperationsoccur,1 N3 +1 N2 +1 N6 6 3 2 6 FLOPsintotal.
2.2.2CholeskyDecompositionR=L1 DLH
1
ThemainadvantageoftheL1 DLH decompositioncomparedtothestandardLL H decomposition 1 isthatnosquarerootoperationsareneeded,whichmayrequiremorethanoneFLOPdepending
onthegivenhardwareplatform.AnotherbenefitoftheL1 DLH decompositionisthatitdoesnot 1 requireapositivedefinitematrixR,theonlytwoconditionsfortheuniqueexistencearethatRis
Hermitianandallbutthelastprincipleminor(i.e.,thedeterminant)ofRneedtobedifferentfrom
zero[2].Hence,Rmayalsoberankdeficienttoacertaindegree.IfRisnotpositivesemidefinite,
thenDmaycontainnegativemaindiagonalentries.
TheoutcomeofthedecompositionisalowertriangularmatrixL1 withonesonthemain
diagonalandadiagonalmatrixD.
Algorithm2.2overwritesthestrictlylowerleftpartofthematrixRwiththestrictlylowerpart
ofL1 (i.e.,withouttheonesonthemaindiagonal)andoverwritesthemaindiagonalofRwith
themaindiagonalofD.Itistakenfrom[1]andslightlymodified,suchthatisalsoapplicableto
complexmatrices(seetheconjugateinLine4)andnoexistingscalarshouldbere-computed(see
casedistinctioninLine4fori=1).
Line1needsN1multiplications.P Lines3to5requiren2multiplicationsandareexe-
cutedforn=2;:::;N,yielding N (n2)=N2 3N+2 multiplications.Line6takesn1n=2 2 P multiplicationsandn1summations,againwithn=2;:::;N,yielding N (n1)=N2 N
n=2 2 multiplicationsandthesameamountofsummations.Line7doesnotrequireanyFLOP.InLine8,
thematrix-vectorproductneeds(Nn)(n1)multiplications,andadditionalNnmultiplica- 2.3InversesofMatrices 11
tionsarisewhenthecompletenumeratorisdiPvidedbythedenominator.Hence,wehaveNnn2
multiplications.Forn=2;:::;N,weget N (Nnn2 )=1 N3 7 N+1multiplications. n=2 6 6 ThenumberofsummationsinLine8is(Nn)(n2)forthematrixvectorproductandNn
forthesubtractioninthePnumerator.Together,wehaven2 +n(N+1)Nsummations.With
n=2;:::;N,weget N [n2 +n(N+1)N)]=1 N3 1 N2 +1 Nsummations. n=2 6 2 3
Summingup,thisalgorithmrequires 1 N3 +N2 13 N+1multiplications,and 1 N3 1 N6 6 6 6 summations,yieldingatotalamountof 1 N3 +N2 7 N+1FLOPs.(Notethatthisformulais 3 3 alsovalidforN=1.)
2.3InversesofMatrices
2.3.1InverseL1 ofaLowerTriangularMatrixL
LetX=[x1 ;:::;xN ]=L1 denotetheinverseofalowertriangularmatrixL.Then,Xisagain
lowertriangularwhichmeansthat[X]b;n =0forb<n.Thefollowingequationholds:
Lx n =en : (2.8)
Viaforwardsubstitution,abovesystemcaneasilybesolved.Rowb(nbN)from(2.8)can
beexpressedas
Xb
b;a xa;n =b;n ; (2.9)
a=n
withb;n denotingtheKroneckerdeltawhichvanishesforb6=n,andxa;n =[X]a;n =[xn ]a;1 .
Startingfromb=1,thexb;n arecomputedsuccessively,andwefind
" #
1 Xb1
xb;n = b;a xa;n b;n ; (2.10)
b;b a=n
withallxa;n ;nab1havingbeencomputedinprevioussteps.Hence,ifn=b,xn;n =
1 andasinglemultiplication 7 isrequired,nosummationsareneeded.Forb>n,bn+1 multiplications n;n andbn1summationsarerequired,astheKronecker-deltavanishes.Allmain
diagonalentriescanbecomputedbymeansofNmultiplicationsThelowerleftoff-diagonalentries
7 Actually,itisadivisionratherthanamultiplication. 12 2.FlopCounting
require " #N X1 XN N X1 XN
(bn+1)= (1n)(Nn)+ b
n=1b=n+1 n=1 b=n+1
N X1 N2 +Nn2 n= N+n2 n(N+1)+ 2n=1
N X1 N2 3N n2 3= + + n(N+ ) (2.11)2 2 2 2n=1
N (N1)N(2N1)=(N1) (N+3)+2 26
3(N1)N(N+ )2 21 1 2= N3 + N2 N6 2 3
multiplications,and
N X1 XN 1 1 1(bn1)= N3 N2 + N (2.12)6 2 3n=1b=n+1
summations.IncludingtheNmultiplicationsforthemain-diagonalentries, 1 N3 +1 N2 +1 N6 2 3 multiplicationsand 1 N3 1 N2 +1 Nsummationshavetobeimplemented,yieldingatotalamount 6 2 3 of 1 N3 +2 NFLOPs. 3 3
2.3.2InverseL1 ofaLowerTriangularMatrixL1 1 withOnesontheMainDiagonal
TheinverseofalowertriangularmatrixL1 turnsouttorequireN2 FLOPslessthantheinverse
ofLwitharbitrarynonzerodiagonalelements.LetXdenotetheinverseofL1 .Clearly,Xis
againalowertriangularmatrixwithonesonthemaindiagonal.Wecanexploitthisfactinorder
tocomputeonlytheunknownentries.
ThemthrowandnthcolumnofthesystemofequationsL1 X=IN withmn+1readsas 8
m X1
lm;n + lm;i xi;n +xm;n =0;
i=n+1
im1
or,equivalently, 2 3
6 m X1 7xm;n =4lm;n + lm;i xi;n 5:
i=n+1
im1
Hence,Xiscomputedviaforwardsubstitution.Tocomputexm;n ,weneedmn1multipli-
cationsandmn1summations.Rememberthatmn+1.Thetotalnumberofmultiplica-
tions/summationsisobtainedfrom
N X1 XN 1 1 1 (mn1)= N3 N2 + N: (2.13) 6 2 3n=1m=n+1
8 Weonlyhavetoconsidermn+1,sincetheequationsresultingfromm<n+1areautomaticallyfulfilled
duetothestructureofL1 andX. 2.4SolvingSystemsofEquations 13
Summingup, 1 N3 N2 +2 NFLOPsareneeded. 3 3
2.3.3InverseR1 ofaPositiveDefiniteMatrixR
TheinverseofamatrixcanforexamplebecomputedviaGaussian-elimination[1].However,this
approachiscomputationallyexpensiveanddoesnotexploittheHermitianstructureofR.Instead,
itismoreefficienttostartwiththeCholeskydecompositionofR=LL H (seeSubsection2.2.1),
invertthelowertriangularmatrixL(seeSubsection2.3.1),andthenbuildtheGramLH L1
ofL1 (seeSubsection2.1.15).Summinguptherespectivenumberofoperations,thisprocedure
requires 1 N3 +3 N2 multiplications, 1 N3 1 N2 summations,andNsquare-rootoperations,which 2 2 2 2 yieldsatotalamountofN3 +N2 +NFLOPs.
2.4SolvingSystemsofEquations
2.4.1ProductL1 CwithL1 notknownapriori.
AnaivewayofcomputingthesolutionX=L1 CoftheequationLX=CistofindL1 first
andafterwardsmultiplyitbyC.ThisapproachneedsN2 (L+1 N)+2 NFLOPsasshownin 3 3 Sections2.3.1and2.1.10.However,doingsoisveryexpensivesincewearenotinterestedinthe
inverseofLingeneral.Hence,theremustbeacomputationallycheapervariant.Again,forward
substitutionplaysakeyrole.
Itiseasytosee,thatXcanbecomputedcolumn-wise.Letxb;a =[X]b;a ,b;a =[L]b;a ,and
cb;c =[C]b;a .Then,fromLX=C,wegetfortheelementxb;a inrowbandcolumnaofX:
" #
1 Xb1
xb;a = b;i xi;a cb;a : (2.14)
b;b i=1
ItscomputationrequiresbmultiplicationsP andb1summations.PAcompletecolumnofXcan
thereforethecomputedwith N b=N2 +N multiplicationsand N (b1)=N2 N summa- b=1 2 2 b=1 2 2 tions.ThecompletematrixXwithLcolumnsthusneedsN2 LFLOPs,sotheforwardsubstitution
saves 1 N3 +2 NFLOPscomparedtothedirectioninversionofLandasubsequentmatrixmatrix 3 3 product.Interestingly,computingL1 CwithL1 unknownisasexpensiveascomputingLC,see
Section2.1.10. 3. Overview
A2CMN ,B2CNN ,andC2CNL arearbitrarymatrices.D2CNN isadiagonalmatrix,
L2CNN islowertriangular,L1 2CNN islowertriangularwithonesonthemaindiagonal,
a;b2CN ,c2CM ,andR2CNN ispositivedefinite.
Expression Description products summations FLOPs
a VectorScaling N N
A MatrixScaling MN MN
aH b InnerProduct N N1 2N1
ac H OuterProduct MN MN
Ab MatrixVectorProd. MN M(N1) 2MNM
AC MatrixMatrixProd. MNL ML(N1) 2MNLML
AD DiagonalMatrixProd. MN MN
LD Matrix-MatrixProd. 1 N2 +1 N 0 1 N2 +1 N2 2 2 2
L 11 D Matrix-MatrixProd. 1 N2 1 N 0 N2 1 N2 2 2 2
LC MatrixProduct N2 L +NL N2 L NL N2 L2 2 2 2
AH A Gram MN(N+1) (M1)N(N+1) MN2 +N(MN )N
2 2 2 2
kAk2 FrobeniusNorm MN MN1 2MN1F
cH Ab SesquilinearForm M(N+1) MN1 2MN+M1
aH Ra HermitianForm N2 +N N2 +N 1 3 N2 +3 N12 2 2 2
LH L GramofTriangular N3 +N2 +N N3 N 1 N3 +1 N2 +1 N6 2 3 6 6 3 2 6
L CholeskyR=LL H N3 +N2 2 N N3 N 1 N3 +1 N2 +1 N6 2 3 6 6 3 2 6 (Gaxpyversion) (Nrootsincluded)
L;D CholeskyR=LDLH N3 +N2 13N +1 N3 N 1 N3 +N2 7 N+16 6 6 6 3 3
L1 InverseofTriangular N3 +N2 +N N3 N2 +N 1 N3 +2 N6 2 3 6 2 3 3 3
L1 InverseofTriangular N3 N2 +N N3 N2 +N 1 N3 N2 +2 N1 6 2 3 6 2 3 3 3 withonesonmaindiag.
R1 InverseofPos.Definite N3 +3N2 N3 N2 N3 +N2 +N2 2 2 2 (Nrootsincluded)
L1 C L1 unknown N2 L +NL N2 L NL N2 L2 2 2 2
14 Appendix
AfrequentlyoccurringsummationinFLOPcountingisthesumofsubsequentintegers.Bycom-
pleteinduction,wefind
XN N(N+1)n= : (A1)2n=1
Aboveresultcaneasilybeverifiedbyrecognizingthatthesumofthen-thandthe(Nn)-th
summandisequaltoN+1,andwehaveN suchpairs. 2 Anothersumofrelevanceisthesumofsubsequentsquaredintegers.Again,viacomplete
induction,wefind
XN N(N+1)(2N+1)n2 = : (A2)6n=1
15 Bibliography
[1]G.H.GolubandC.F.VanLoan,MatrixComputations,JohnsHopkinsUniversityPress,1991.
[2]Kh.D.IkramovandN.V.Saveleva,“ConditionallyDefiniteMatrices,”JournalofMathemat-
icalSciences,vol.98,no.1,pp.150,2000.
16