Updates to corpus
This commit is contained in:
parent
93ffde18f6
commit
93b3da7a7d
2253
Corpus/CORPUS.txt
2253
Corpus/CORPUS.txt
File diff suppressed because it is too large
Load Diff
Binary file not shown.
Binary file not shown.
|
@ -1,261 +0,0 @@
|
|||
Energy and Policy Considerations for Deep Learning in NLP
|
||||
|
||||
|
||||
Emma Strubell Ananya Ganesh Andrew McCallum
|
||||
College of Information and Computer Sciences
|
||||
University of Massachusetts Amherst
|
||||
{strubell, aganesh, mccallum}@cs.umass.edu
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Abstract Consumption CO 2 e (lbs)
|
||||
Air travel, 1 passenger, NY↔SF 1984 Recent progress in hardware and methodol-
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
arXiv:1906.02243v1 [cs.CL] 5 Jun 2019 Human life, avg, 1 year 11,023 ogy for training neural networks has ushered
|
||||
in a new generation of large networks trained American life, avg, 1 year 36,156
|
||||
on abundant data. These models have ob- Car, avg incl. fuel, 1 lifetime 126,000
|
||||
tained notable gains in accuracy across many
|
||||
NLP tasks. However, these accuracy improve- Training one model (GPU)
|
||||
ments depend on the availability of exception- NLP pipeline (parsing, SRL) 39 ally large computational resources that neces- w/ tuning & experimentation 78,468 sitate similarly substantial energy consump- Transformer (big) 192 tion. As a result these models are costly to
|
||||
train and develop, both financially, due to the w/ neural architecture search 626,155
|
||||
cost of hardware and electricity or cloud com- Table 1: Estimated COpute time, and environmentally,due to the car- 2 emissions from training com-
|
||||
mon NLP models, compared to familiar consumption. 1 bon footprint required to fuel modern tensor
|
||||
processing hardware. In this paper we bring
|
||||
this issue to the attention of NLP researchers NLP models could be trained and developed on by quantifying the approximate financial and a commodity laptop or server, many now require environmental costs of training a variety of re-
|
||||
cently successful neural network models for multiple instances of specialized hardware such as
|
||||
NLP. Based on these findings, we propose ac- GPUs or TPUs, therefore limiting access to these
|
||||
tionable recommendations to reduce costs and highly accurate models on the basis of finances.
|
||||
improve equity in NLP research and practice. Even when these expensive computational re-
|
||||
1 Introduction sources are available, model training also incurs a
|
||||
substantial cost to the environment due to the en-
|
||||
Advances in techniques and hardware for train- ergy required to power this hardware for weeks or
|
||||
ing deep neural networks have recently en- months at a time. Though some of this energy may
|
||||
abled impressive accuracy improvements across come from renewable or carbon credit-offset re-
|
||||
many fundamental NLP tasks ( Bahdanau et al., sources, the high energy demands of these models
|
||||
2015; Luong et al., 2015; Dozat and Man- are still a concern since (1) energy is not currently
|
||||
ning, 2017; Vaswani et al., 2017), with the derived from carbon-neural sources in many loca-
|
||||
most computationally-hungry models obtaining tions, and (2) when renewable energy is available,
|
||||
the highest scores (Peters et al.,2018;Devlin et al., it is still limited to the equipment we have to pro-
|
||||
2019;Radford et al.,2019;So et al.,2019). As duce and store it, and energy spent training a neu-
|
||||
a result, training a state-of-the-art model now re- ral network might better be allocated to heating a
|
||||
quires substantial computational resources which family’s home. It is estimated that we must cut
|
||||
demand considerable energy, along with the as- carbon emissions by half over the next decade to
|
||||
sociated financial and environmental costs. Re- deter escalating rates of natural disaster, and based
|
||||
search and development of new models multiplies on the estimated CO 2 emissions listed in Table 1,
|
||||
these costs by thousands of times by requiring re-
|
||||
training to experiment with model architectures 1 Sources: (1) Air travel and per-capita consump-
|
||||
tion: https://bit.ly/2Hw0xWc; (2) car lifetime: and hyperparameters. Whereas a decade ago most https://bit.ly/2Qbr0w1. model training and development likely make up Consumer Renew. Gas Coal Nuc.
|
||||
a substantial portion of the greenhouse gas emis- China 22% 3% 65% 4%
|
||||
sions attributed to many NLP researchers. Germany 40% 7% 38% 13%
|
||||
To heighten the awareness of the NLP commu- United States 17% 35% 27% 19%
|
||||
nity to this issue and promote mindful practice and Amazon-AWS 17% 24% 30% 26%
|
||||
policy, we characterize the dollar cost and carbon Google 56% 14% 15% 10%
|
||||
emissions that result from training the neural net- Microsoft 32% 23% 31% 10%
|
||||
works at the core of many state-of-the-art NLP
|
||||
models. We do this by estimating the kilowatts Table 2: Percent energy sourced from: Renewable (e.g.
|
||||
of energy required to train a variety of popular hydro, solar, wind), natural gas, coal and nuclear for
|
||||
off-the-shelf NLP models, which can be converted the top 3 cloud compute providers (Cook et al.,2017),
|
||||
to approximate carbon emissions and electricity compared to the United States, 4 China 5 and Germany
|
||||
costs. To estimate the even greater resources re- (Burger,2019).
|
||||
quired to transfer an existing model to a new task
|
||||
or develop new models, we perform a case study We estimate the total time expected for mod-
|
||||
of the full computational resources required for the els to train to completion using training times and
|
||||
development and tuning of a recent state-of-the-art hardware reported in the original papers. We then
|
||||
NLP pipeline (Strubell et al.,2018). We conclude calculate the power consumption in kilowatt-hours
|
||||
with recommendations to the community based on (kWh) as follows. Letpc be the average power
|
||||
our findings, namely: (1) Time to retrain and sen- draw (in watts) from all CPU sockets during train-
|
||||
sitivity to hyperparameters should be reported for ing, letpr be the average power draw from all
|
||||
NLP machine learning models; (2) academic re- DRAM (main memory) sockets, letpg be the aver-
|
||||
searchers need equitable access to computational age power draw of a GPU during training, and let
|
||||
resources; and (3) researchers should prioritize de- gbe the number of GPUs used to train. We esti-
|
||||
veloping efficient models and hardware. mate total power consumption as combined GPU,
|
||||
CPU and DRAM consumption, then multiply this
|
||||
2 Methods by Power Usage Effectiveness (PUE), which ac-
|
||||
counts for the additional energy required to sup-To quantify the computational and environmen- port the compute infrastructure (mainly cooling).tal cost of training deep neural network mod- We use a PUE coefficient of 1.58, the 2018 globalels for NLP, we perform an analysis of the en- average for data centers (Ascierto,2018). Then theergy required to train a variety of popular off- total powerpthe-shelf NLP models, as well as a case study of t required at a given instance during
|
||||
training is given by:the complete sum of resources required to develop
|
||||
LISA (Strubell et al.,2018), a state-of-the-art NLP 1.58t(pp c +pr +gp g )
|
||||
model from EMNLP 2018, including all tuning t = (1)1000
|
||||
and experimentation. The U.S. Environmental Protection Agency (EPA)We measure energy use as follows. We train the provides average COmodels described in§2.1using the default settings 2 produced (in pounds per
|
||||
kilowatt-hour) for power consumed in the U.S.provided, and sample GPU and CPU power con- (EPA,2018), which we use to convert power tosumption during training. Each model was trained estimated COfor a maximum of 1 day. We train all models on 2 emissions:
|
||||
|
||||
a single NVIDIA Titan X GPU, with the excep- CO 2 e = 0.954pt (2)
|
||||
tion of ELMo which was trained on 3 NVIDIA This conversion takes into account the relative pro-GTX 1080 Ti GPUs. While training, we repeat- portions of different energy sources (primarily nat-edly query the NVIDIA System Management In- ural gas, coal, nuclear and renewable) consumedterface 2 to sample the GPU power consumption to produce energy in the United States. Table2and report the average over all samples. To sample lists the relative energy sources for China, Ger-CPU power consumption, we use Intel’s Running many and the United States compared to the topAverage Power Limit interface. 3
|
||||
5 U.S. Dept. of Energy:https://bit.ly/2JTbGnI
|
||||
2 nvidia-smi:https://bit.ly/30sGEbi 5 China Electricity Council; trans. China Energy Portal:
|
||||
3 RAPL power meter:https://bit.ly/2LObQhV https://bit.ly/2QHE5O3 three cloud service providers. The U.S. break- ence. Devlin et al.(2019) report that the BERT
|
||||
down of energy is comparable to that of the most base model (110M parameters) was trained on 16
|
||||
popular cloud compute service, Amazon Web Ser- TPU chips for 4 days (96 hours). NVIDIA reports
|
||||
vices, so we believe this conversion to provide a that they can train a BERT model in 3.3 days (79.2
|
||||
reasonable estimate of CO 2 emissions per kilowatt hours) using 4 DGX-2H servers, totaling 64 Tesla
|
||||
hour of compute energy used. V100 GPUs (Forster et al.,2019).
|
||||
GPT-2. This model is the latest edition of
|
||||
2.1 Models OpenAI’s GPT general-purpose token encoder,
|
||||
We analyze four models, the computational re- also based on Transformer-style self-attention and
|
||||
quirements of which we describe below. All mod- trained with a language modeling objective (Rad-
|
||||
els have code freely available online, which we ford et al.,2019). By training a very large model
|
||||
used out-of-the-box. For more details on the mod- on massive data,Radford et al.(2019) show high
|
||||
els themselves, please refer to the original papers. zero-shot performance on question answering and
|
||||
language modeling benchmarks. The large modelTransformer. The Transformer model (Vaswani described inRadford et al.(2019) has 1542M pa-et al.,2017) is an encoder-decoder architecture rameters and is reported to require 1 week (168primarily recognized for efficient and accurate ma- hours) of training on 32 TPUv3 chips. 6 chine translation. The encoder and decoder each
|
||||
consist of 6 stacked layers of multi-head self-
|
||||
attention. Vaswani et al.(2017) report that the 3 Related work
|
||||
Transformerbasemodel (65M parameters) was
|
||||
trained on 8 NVIDIA P100 GPUs for 12 hours, There is some precedent for work characterizing
|
||||
and the Transformerbigmodel (213M parame- the computational requirements of training and in-
|
||||
ters) was trained for 3.5 days (84 hours; 300k ference in modern neural network architectures in
|
||||
steps). This model is also the basis for recent the computer vision community.Li et al.(2016)
|
||||
work on neural architecture search (NAS) for ma- present a detailed study of the energy use required
|
||||
chine translation and language modeling (So et al., for training and inference in popular convolutional
|
||||
2019), and the NLP pipeline that we study in more models for image classification in computer vi-
|
||||
detail in§4.2(Strubell et al.,2018). So et al. sion, including fine-grained analysis comparing
|
||||
(2019) report that their full architecture search ran different neural network layer types. Canziani
|
||||
for a total of 979M training steps, and that their et al.(2016) assess image classification model ac-
|
||||
base model requires 10 hours to train for 300k curacy as a function of model size and gigaflops
|
||||
steps on one TPUv2 core. This equates to 32,623 required during inference. They also measure av-
|
||||
hours of TPU or 274,120 hours on 8 P100 GPUs. erage power draw required during inference on
|
||||
GPUs as a function of batch size. Neither work an-ELMo. The ELMo model (Peters et al.,2018) alyzes the recurrent and self-attention models thatis based on stacked LSTMs and provides rich have become commonplace in NLP, nor do theyword representations in context by pre-training on extrapolate power to estimates of carbon and dol-a large amount of data using a language model- lar cost of training.ing objective. Replacing context-independent pre-
|
||||
trained word embeddings with ELMo has been Analysis of hyperparameter tuning has been
|
||||
shown to increase performance on downstream performed in the context of improved algorithms
|
||||
tasks such as named entity recognition, semantic for hyperparameter search (Bergstra et al.,2011;
|
||||
role labeling, and coreference.Peters et al.(2018) Bergstra and Bengio,2012;Snoek et al.,2012). To
|
||||
report that ELMo was trained on 3 NVIDIA GTX our knowledge there exists to date no analysis of
|
||||
1080 GPUs for 2 weeks (336 hours). the computation required for R&D and hyperpa-
|
||||
rameter tuning of neural network models in NLP.BERT.The BERT model (Devlin et al.,2019) pro-
|
||||
vides a Transformer-based architecture for build-
|
||||
ing contextual representations similar to ELMo, 6 Via the authorson Reddit.
|
||||
7 GPU lower bound computed using pre-emptible but trained with a different language modeling ob- P100/V100 U.S. resources priced at $0.43–$0.74/hr, upper
|
||||
jective. BERT substantially improves accuracy on bound uses on-demand U.S. resources priced at $1.46–
|
||||
tasks requiring sentence-level representations such $2.48/hr. We similarly use pre-emptible ($1.46/hr–$2.40/hr)
|
||||
and on-demand ($4.50/hr–$8/hr) pricing as lower and upper as question answering and natural language infer- bounds for TPU v2/3; cheaper bulk contracts are available. Model Hardware Power (W) Hours kWh·PUE CO 2 e Cloud compute cost
|
||||
Transformer base P100x8 1415.78 12 27 26 $41–$140
|
||||
Transformer big P100x8 1515.43 84 201 192 $289–$981
|
||||
ELMo P100x3 517.66 336 275 262 $433–$1472
|
||||
BERT base V100x64 12,041.51 79 1507 1438 $3751–$12,571
|
||||
BERT base TPUv2x16 — 96 — — $2074–$6912
|
||||
NAS P100x8 1515.43 274,120 656,347 626,155 $942,973–$3,201,722
|
||||
NAS TPUv2x1 — 32,623 — — $44,055–$146,848
|
||||
GPT-2 TPUv3x32 — 168 — — $12,902–$43,008
|
||||
|
||||
Table 3: Estimated cost of training a model in terms of CO 2 emissions (lbs) and cloud compute cost (USD). 7 Power
|
||||
and carbon footprint are omitted for TPUs due to lack of public information on power draw for this hardware.
|
||||
|
||||
|
||||
4 Experimental results Estimated cost (USD)
|
||||
Models Hours Cloud compute Electricity4.1 Cost of training 1 120 $52–$175 $5Table3lists CO 2 emissions and estimated cost of 24 2880 $1238–$4205 $118training the models described in§2.1. Of note is 4789 239,942 $103k–$350k $9870that TPUs are more cost-efficient than GPUs on
|
||||
workloads that make sense for that hardware (e.g. Table 4: Estimated cost in terms of cloud compute and
|
||||
BERT). We also see that models emit substan- electricity for training: (1) a single model (2) a single
|
||||
tial carbon emissions; training BERT on GPU is tune and (3) all models trained during R&D.
|
||||
roughly equivalent to a trans-American flight.So
|
||||
et al.(2019) report that NAS achieves a new state- about 60 GPUs running constantly throughout theof-the-art BLEU score of 29.7 for English to Ger- 6 month duration of the project. Table4lists upperman machine translation, an increase of just 0.1 and lower bounds of the estimated cost in termsBLEU at the cost of at least $150k in on-demand of Google Cloud compute and raw electricity re-compute time and non-trivial carbon emissions. quired to develop and deploy this model. 9 We see
|
||||
that while training a single model is relatively in-4.2 Cost of development: Case study expensive, the cost of tuning a model for a newTo quantify the computational requirements of dataset, which we estimate here to require 24 jobs,R&D for a new model we study the logs of or performing the full R&D required to developall training required to develop Linguistically- this model, quickly becomes extremely expensive.Informed Self-Attention (Strubell et al.,2018), a
|
||||
multi-task model that performs part-of-speech tag- 5 Conclusions
|
||||
ging, labeled dependency parsing, predicate detec-
|
||||
tion and semantic role labeling. This model makes Authors should report training time and
|
||||
for an interesting case study as a representative sensitivity to hyperparameters.
|
||||
NLP pipeline and as a Best Long Paper at EMNLP. Our experiments suggest that it would be benefi-
|
||||
Model training associated with the project cial to directly compare different models to per-
|
||||
spanned a period of 172 days (approx. 6 months). form a cost-benefit (accuracy) analysis. To ad-
|
||||
During that time 123 small hyperparameter grid dress this, when proposing a model that is meant
|
||||
searches were performed, resulting in 4789 jobs to be re-trained for downstream use, such as re-
|
||||
in total. Jobs varied in length ranging from a min- training on a new domain or fine-tuning on a new
|
||||
imum of 3 minutes, indicating a crash, to a maxi- task, authors should report training time and com-
|
||||
mum of 9 days, with an average job length of 52 putational resources required, as well as model
|
||||
hours. All training was done on a combination of sensitivity to hyperparameters. This will enable
|
||||
NVIDIA Titan X (72%) and M40 (28%) GPUs. 8 direct comparison across models, allowing subse-
|
||||
The sum GPU time required for the project quent consumers of these models to accurately as-
|
||||
totaled 9998 days (27 years). This averages to sess whether the required computational resources
|
||||
8 We approximate cloud compute cost using P100 pricing. 9 Based on average U.S cost of electricity of $0.12/kWh. are compatible with their setting. More explicit half the estimated cost to use on-demand cloud
|
||||
characterization of tuning time could also reveal GPUs. Unlike money spent on cloud compute,
|
||||
inconsistencies in time spent tuning baseline mod- however, that invested in centralized resources
|
||||
els compared to proposed contributions. Realiz- would continue to pay off as resources are shared
|
||||
ing this will require: (1) a standard, hardware- across many projects. A government-funded aca-
|
||||
independent measurement of training time, such demic compute cloud would provide equitable ac-
|
||||
as gigaflops required to convergence, and (2) a cess to all researchers.
|
||||
standard measurement of model sensitivity to data
|
||||
and hyperparameters, such as variance with re- Researchers should prioritize computationally
|
||||
spect to hyperparameters searched. efficient hardware and algorithms.
|
||||
We recommend a concerted effort by industry and
|
||||
Academic researchers need equitable access to academia to promote research of more computa-
|
||||
computation resources. tionally efficient algorithms, as well as hardware
|
||||
that requires less energy. An effort can also beRecent advances in available compute come at a made in terms of software. There is already ahigh price not attainable to all who desire access. precedent for NLP software packages prioritizingMost of the models studied in this paper were de- efficient models. An additional avenue throughveloped outside academia; recent improvements in which NLP and machine learning software de-state-of-the-art accuracy are possible thanks to in- velopers could aid in reducing the energy asso-dustry access to large-scale compute. ciated with model tuning is by providing easy-Limiting this style of research to industry labs to-use APIs implementing more efficient alterna-hurts the NLP research community in many ways. tives to brute-force grid search for hyperparameterFirst, it stifles creativity. Researchers with good tuning, e.g. random or Bayesian hyperparameterideas but without access to large-scale compute search techniques (Bergstra et al.,2011;Bergstrawill simply not be able to execute their ideas, and Bengio,2012;Snoek et al.,2012). Whileinstead constrained to focus on different prob- software packages implementing these techniqueslems. Second, it prohibits certain types of re- do exist, 10 they are rarely employed in practicesearch on the basis of access to financial resources. for tuning NLP models. This is likely becauseThis even more deeply promotes the already prob- their interoperability with popular deep learninglematic “rich get richer” cycle of research fund- frameworks such as PyTorch and TensorFlow ising, where groups that are already successful and not optimized, i.e. there are not simple exam-thus well-funded tend to receive more funding ples of how to tune TensorFlow Estimators usingdue to their existing accomplishments. Third, the Bayesian search. Integrating these tools into theprohibitive start-up cost of building in-house re- workflows with which NLP researchers and practi-sources forces resource-poor groups to rely on tioners are already familiar could have notable im-cloud compute services such as AWS, Google pact on the cost of developing and tuning in NLP.Cloud and Microsoft Azure.
|
||||
While these services provide valuable, flexi- Acknowledgements
|
||||
ble, and often relatively environmentally friendly We are grateful to Sherief Farouk and the anony- compute resources, it is more cost effective for mous reviewers for helpful feedback on earlieracademic researchers, who often work for non- drafts. This work was supported in part by theprofit educational institutions and whose research Centers for Data Science and Intelligent Infor-is funded by government entities, to pool resources mation Retrieval, the Chan Zuckerberg Initiativeto build shared compute centers at the level of under the Scientific Knowledge Base Construc-funding agencies, such as the U.S. National Sci- tion project, the IBM Cognitive Horizons Networkence Foundation. For example, an off-the-shelf agreement no. W1668553, and National ScienceGPU server containing 8 NVIDIA 1080 Ti GPUs Foundation grant no. IIS-1514053. Any opinions,and supporting hardware can be purchased for findings and conclusions or recommendations ex-approximately $20,000 USD. At that cost, the pressed in this material are those of the authors andhardware required to develop the model in our do not necessarily reflect those of the sponsor.case study (approximately 58 GPUs for 172 days)
|
||||
would cost $145,000 USD plus electricity, about 10 For example, theHyperopt Python library. References Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
|
||||
Gardner, Christopher Clark, Kenton Lee, and LukeRhonda Ascierto. 2018.Uptime Institute Global Data Zettlemoyer. 2018. Deep contextualized word rep-Center Survey. Technical report, Uptime Institute. resentations. InNAACL.
|
||||
Dzmitry Bahdanau, KyunghyunCho, and Yoshua Ben-
|
||||
gio. 2015. Neural Machine Translation by Jointly Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
|
||||
Learning to Align and Translate. In3rd Inter- Dario Amodei, and Ilya Sutskever. 2019.Language
|
||||
national Conference for Learning Representations models are unsupervised multitask learners.
|
||||
(ICLR), San Diego, California, USA. Jasper Snoek, Hugo Larochelle, and Ryan P Adams.
|
||||
James Bergstra and Yoshua Bengio. 2012. Random 2012. Practical bayesian optimization of machine
|
||||
search for hyper-parameter optimization.Journal of learning algorithms. InAdvances in neural informa-
|
||||
Machine Learning Research, 13(Feb):281–305. tion processing systems, pages 2951–2959.
|
||||
|
||||
James S Bergstra, R´emi Bardenet, Yoshua Bengio, and David R. So, Chen Liang, and Quoc V. Le. 2019.
|
||||
Bal´azs K´egl. 2011. Algorithms for hyper-parameter The evolved transformer. InProceedings of the
|
||||
optimization. InAdvances in neural information 36th InternationalConference on Machine Learning
|
||||
processing systems, pages 2546–2554. (ICML).
|
||||
|
||||
Bruno Burger. 2019.Net Public Electricity Generation Emma Strubell, Patrick Verga, Daniel Andor,
|
||||
in Germany in 2018. Technical report, Fraunhofer David Weiss, and Andrew McCallum. 2018.
|
||||
Institute for Solar Energy Systems ISE. Linguistically-Informed Self-Attention for Se-
|
||||
mantic Role Labeling. InConference on Empir-Alfredo Canziani, Adam Paszke, and Eugenio Culur- ical Methods in Natural Language Processingciello. 2016. An analysis of deep neural network (EMNLP), Brussels, Belgium. models for practical applications .
|
||||
Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobGary Cook, Jude Lee, Tamina Tsai, Ada Kongn, John Uszkoreit, Llion Jones, Aidan N Gomez, LukaszDeans, Brian Johnson, Elizabeth Jardim, and Brian Kaiser, and Illia Polosukhin. 2017. Attention is allJohnson. 2017. Clicking Clean: Who is winning you need. In31st Conference on Neural Informationthe race to build a green internet?Technical report, Processing Systems (NIPS).Greenpeace.
|
||||
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
|
||||
Kristina Toutanova. 2019. BERT: Pre-training of
|
||||
Deep Bidirectional Transformers for Language Un-
|
||||
derstanding. InNAACL.
|
||||
Timothy Dozat and Christopher D. Manning. 2017.
|
||||
Deep biaffine attention for neural dependency pars-
|
||||
ing. InICLR.
|
||||
EPA. 2018. Emissions & Generation Resource Inte-
|
||||
grated Database (eGRID). Technical report, U.S.
|
||||
Environmental Protection Agency.
|
||||
Christopher Forster, Thor Johnsen, Swetha Man-
|
||||
dava, Sharath Turuvekere Sreenivas, Deyu Fu, Julie
|
||||
Bernauer, Allison Gray, Sharan Chetlur, and Raul
|
||||
Puri. 2019. BERT Meets GPUs. Technical report,
|
||||
NVIDIA AI.
|
||||
Da Li, Xinbo Chen, Michela Becchi, and Ziliang Zong.
|
||||
2016. Evaluating the energy efficiency of deep con-
|
||||
volutional neural networks on cpus and gpus.2016
|
||||
IEEE International Conferences on Big Data and
|
||||
Cloud Computing (BDCloud), Social Computing
|
||||
and Networking (SocialCom), Sustainable Comput-
|
||||
ing and Communications (SustainCom) (BDCloud-
|
||||
SocialCom-SustainCom), pages 477–484.
|
||||
Thang Luong, Hieu Pham, and Christopher D. Man-
|
||||
ning. 2015.Effective approaches to attention-based
|
||||
neural machine translation. InProceedings of the
|
||||
2015 Conference on Empirical Methods in Natural
|
||||
Language Processing, pages 1412–1421. Associa-
|
||||
tion for Computational Linguistics.
|
|
@ -1,793 +0,0 @@
|
|||
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005 1381
|
||||
Finite-Element Neural Networks for Solving
|
||||
Differential Equations
|
||||
Pradeep Ramuhalli, Member, IEEE, Lalita Udpa, Senior Member, IEEE, and Satish S. Udpa, Fellow, IEEE
|
||||
|
||||
Abstract—The solution of partial differential equations (PDE)
|
||||
arises in a wide variety of engineering problems. Solutions to most
|
||||
practical problems use numerical analysis techniques such as fi-
|
||||
nite-element or finite-difference methods. The drawbacks of these
|
||||
approaches include computational costs associated with the mod-
|
||||
eling of complex geometries. This paper proposes a finite-element
|
||||
neural network (FENN) obtained by embedding a finite-element
|
||||
model in a neural network architecture that enables fast and ac-
|
||||
curate solution of the forward problem. Results of applying the
|
||||
FENN to severalsimpleelectromagnetic forward and inverseprob-
|
||||
lems are presented. Initial results indicate that the FENN perfor-
|
||||
mance as a forward model is comparable to that of the conven-
|
||||
tional finite-element method (FEM). The FENN can also be used
|
||||
in an iterative approach to solve inverse problems associated with Fig. 1. Iterative inversion method for solving inverse problems. the PDE. Results showing the ability of the FENN to solve the in-
|
||||
verse problem given the measured signal are also presented. The
|
||||
parallel nature of the FENN also makes it an attractive solution resulting in the corresponding solution to the forward problem
|
||||
for parallel implementation in hardware and software. . The model output is compared to the measurement ,
|
||||
Index Terms—Finite-element method (FEM), finite-element using a cost function .If is less than a toler-
|
||||
neural network (FENN), inverse problems. ance, the estimateis used as the desired solution. If not,
|
||||
is updated to minimize the cost function.
|
||||
S I. I Although finite-element methods (FEMs) [3], [4] are ex- NTRODUCTION tremely popular for solving differential equations, their majorOLUTIONS of differential equations arise in a widedrawback is computational complexity. This problem becomesvariety of engineering applications in electromagnetics,more acute when three-dimensional (3-D) finite-elementsignal processing, computational fluid dynamics, etc. Thesemodels are used in an iterative algorithm for solving the inverseequations are typically solved using either analytical or numer-problem. Recently, several authors have suggested the use ofical methods. Analytical solution methods are however feasibleneural networks (MLP or RBF networks [5]) for solving differ-only for simple geometries, which limits their applicability. Inential equations [6]–[9]. In these techniques, a neural networkmost practical problems with complex boundary conditions,is trained using a large database containing the input data andnumerical analysis methods are required in order to obtain athe solution of the differential equation. The neural networkreasonable solution. An example is the solution of Maxwell’sduring generalization learns the mapping corresponding toequations in electromagnetics. Solutions to Maxwell’s equa-the PDE. Alternatively, in [10], the solution to a differentialtions are used in a variety of applications for calculating theequation is written as a constant term, and an adjustable term interaction of electromagnetic (EM) fields with different typeswith parameters that need to be determined. A neural networkof media. is used to determine the optimal values of the parameters.Very often, the solution to differential equations is necessaryThis approach is applicable only to problems with regularfor solving the corresponding inverse problems. Inverse prob-boundaries. An extension of the approach to problems withlems in general are ill-posed, lacking continuous dependence ofirregular boundaries is given in [11]. Other neural networkthe measurements on the input. This has resulted in the devel-based differential equation solvers use multilayer perceptronopment of a variety of solution techniques ranging from simplenetworks or variations on the MLP to approximate the unknowncalibration procedures to other direct (analytical) and iterativefunction in a PDE [12]–[14]. A combination of the PDE andapproaches [1]. Iterative methods typically employ a forwardboundary conditions is used to construct an objective functionmodel that simulates the underlying physical process (Fig. 1)that is minimized during the training process.[2]. An initial estimate of the solution of the inverse problem A major limitation of these approaches is that the network ar- (represented byin Fig. 1) is applied to the forward model,chitecture is selected somewhat arbitrarily. A second drawback
|
||||
is that the performance of the neural networks depends on the
|
||||
Manuscript received January 17, 2004; revised April 2, 2005. data used in training and testing. As long the test data is sim-
|
||||
The authors are with the Department of Electrical and Computer Engi- ilar to the training data, the network can interpolate between the neering, Michigan State University, East Lansing, MI 48824 USA (e-mail: training data points to obtain a reasonable prediction. However, rpradeep@egr.msu.edu; udpal@egr.msu.edu; udpa@egr.msu.edu).
|
||||
Digital Object Identifier 10.1109/TNN.2005.857945 when the test signal is no longer similar to the training data, the
|
||||
1045-9227/$20.00 © 2005 IEEE 1382 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
|
||||
|
||||
|
||||
network is forced to extrapolate and the performance degrades. Section V draws conclusions from the results and presents
|
||||
One way around this difficulty is to ensure that the training data- ideas for future work.
|
||||
base has a diverse set of signals. However, this is difficult to
|
||||
ensure in practice. Alternatively, we have to design neural net- II. T HE FENN
|
||||
works that are capable of extrapolation. Extrapolation methods This section briefly describes the FEM and proposes its refor-are discussed extensively in literature [15]–[18], but the design mulation into a parallel neural network structure. Details aboutof an extrapolation neural network involves several issues par- the FEM can be found in [3] and [4].ticularly for ensuring that the error in the network prediction
|
||||
stays within reasonable bounds during the extrapolation proce- A. The FEMdure. Consider a typical boundary value problem with the gov-An ideal solution to this problem would be to combine the erning differential equationpower of numerical models with the computational speed of
|
||||
neural networks, i.e., to embed a numerical model in a neural (1)network structure. One suchfinite-element neural network
|
||||
(FENN) formulation has been reported by Takeuchi and Kosugi where is a differential operator, is the applied source or
|
||||
[19]. This approach, based on error minimization, derives the forcing function, and is the unknown quantity. This differen-
|
||||
neural network using the energy functional resulting from the tial equation can be solved in conjunction with boundary condi-
|
||||
finite-element formulation. Other reports of FENN combina- tionson theboundary enclosingthedomain .Thevariational
|
||||
tions are either similar to the Takeuchi method [20], [21] or use formulation used infinite-element analysis determines the un-
|
||||
Hopfield neural networks to solve the forward problem [22], known by minimizing the functional [3], [4]
|
||||
[23]. Kalkkuhlet al.[24] provide a description of a FEM-based
|
||||
approach to NARX modeling that may be interpreted both as (2)
|
||||
a local model network, as well as a single layer feedforward
|
||||
network. A slightly different approach to merging numerical with respect to the trial function . The minimization procedure
|
||||
methods and neural networks is given in [25], where thefi- starts by dividing into small subdomains called elements
|
||||
nite-difference time domain (FDTD) method is cast in a neural (Fig. 2) and representing in each element by means of basis
|
||||
network framework for the purpose of solving electromagnetic functions defined over the element
|
||||
forward problems. The related problem of mesh generation
|
||||
infinite-element models has also been tackled using neural (3)networks (for instance, [26]). Generally, these networks are
|
||||
designed to solve the forward problem, and must be modified
|
||||
to solve inverse problems. where is the unknown solution in element , is the basis
|
||||
This paper proposes a new approach that embeds afinite-ele- function associated with node in element , is the value
|
||||
ment model commonly used in the solution of differential equa- of the unknown quantity at node and is the total number of
|
||||
tions in a neural network. The network, called the FENN, can nodes associated with element . In general, the basis functions
|
||||
solve the forward problem and can also be used in an itera- (also referred to as interpolation functions or shape functions)
|
||||
tive algorithm to solve inverse problems. The primary advan- can be linear, quadratic, or of higher order. Typically,finite-el-
|
||||
tage of this approach is that the FEM is represented in a parallel ement models use either linear or polynomial spline basis func-
|
||||
form. Thus, it has the potential to alleviate the computational tions.
|
||||
cost associated with using the FEM in an iterative algorithm The functional within an element is expressed as
|
||||
for solving inverse problems. More importantly, the FENN does
|
||||
not need any training, and the computation of the weights is (4)
|
||||
a one-time process. The proposed approach is also different in
|
||||
that the neural network architecture developed can be used to
|
||||
solve the forward and inverse problems. The structure of the By substituting (3) in (4), we obtain the discrete version of the
|
||||
neural network is also simpler than those reported in the litera- functional within each element
|
||||
ture, making it easier to implement in parallel in both hardware (5)and software.
|
||||
The rest of this paper is organized as follows. Section II where is the transpose of a matrix, is the ele-briefly describes the FEM, and derives the proposed FENN. In mental matrix with elements this paper, we focus on the problem of solving typical equa-
|
||||
tions encountered in electromagnetic nondestructive evaluation (6)(NDE). However, the same concepts can be easily applied
|
||||
to solve differential equations encountered in otherfields.
|
||||
Sections III, IV and V present the application of the FENN and is an vector with elements
|
||||
to solving forward and inverse problems, along with initial
|
||||
results. A discussion of the advantages and disadvantages of (7)
|
||||
the proposed FENN architecture is given in Section IV. Finally, RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1383
|
||||
|
||||
|
||||
Combining the values in (5) for each of the elements
|
||||
|
||||
(8)
|
||||
|
||||
where is the global matrix derived from the terms
|
||||
of the elemental matrices for different elements, and is the
|
||||
total number of nodes. , also called the stiffness matrix, is a
|
||||
sparse, banded matrix. Equation (8) is the discrete version of
|
||||
the functional and can be minimized with respect to the nodal
|
||||
parameters by taking the derivative of with respect to and
|
||||
setting it equal to zero, which results in the matrix equation Fig.2. (a)Schematicrepresentationofdomainandboundary. (b)SampleFEM
|
||||
mesh for the domain.
|
||||
(9)
|
||||
|
||||
Boundary conditions for these problems are usually of two
|
||||
types: natural boundary conditions and essential boundary
|
||||
conditions. Essential boundary conditions (also referred to as
|
||||
Dirichlet boundary conditions) impose constraints on the value
|
||||
of the unknown at several nodes. Natural boundary condi-
|
||||
tions (of which Neumann boundary conditions are a special
|
||||
case) impose constraints on the change in across a boundary.
|
||||
Dirichlet boundary conditions are imposed on the functional
|
||||
minimization (9), by deleting the rows and columns of the
|
||||
matrix corresponding to the nodes on the Dirichlet boundary
|
||||
and modifying in (9). Fig. 3. FEM domain discretization using two elements and four nodes.
|
||||
Natural boundary conditions are applied in the FEM by
|
||||
adding an additional term to the functional. These boundary This process ensures that natural boundary conditions are im-conditions are then incorporated into the functional and are plicitlyandautomatically satisfiedduring theFEMsolutionpro-satisfied automatically during the solution procedure. As an cedure.example, consider the natural boundary condition represented
|
||||
by the following equation [3] B. The FENN
|
||||
on (10) This section describes how thefinite-element model can be
|
||||
converted intoa parallel network form. Wefocus on solving typ-
|
||||
where represents the Neumann boundary, is its outward ical inverse problems arising in electromagnetic NDE, but the
|
||||
normal unit vector, is some constant, and , , and are basicideaisapplicabletootherareas aswell.NDEinverseprob-
|
||||
known parameters associated with the boundary. Assuming that lems can be formulated as the problem offinding the material
|
||||
the boundary is made up of segments, we can define properties (such as the conductivity or the permeability) within
|
||||
boundary matrices and with elements the domain of the problem. Since the domain is discretized in
|
||||
the FEM method by a large number of elements, the problem
|
||||
can be posed as one offinding the material properties in each
|
||||
of these elements. These properties are usually embedded in the
|
||||
differential operator , or equivalently, in the global matrix .
|
||||
Thus, in order to be able to iteratively estimate these properties
|
||||
from the measurements, the material properties need to be sep-
|
||||
arated out from . This separation is easier to achieve at the
|
||||
element matrix level. For nodes and in element
|
||||
(11)
|
||||
|
||||
where are basis functions defined over segment and is
|
||||
the length of the segment. The elements of are added to the
|
||||
elementsof that correspond tothe nodeson the boundary .
|
||||
Similarly, the elements of are added to the corresponding
|
||||
elements of . The global matrix (9) is thus modified as follows
|
||||
before solving for (13)
|
||||
|
||||
where is the parameter representing the material property(12) in element and represents the differential operator at the 1384 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fig. 4. FENN.
|
||||
|
||||
|
||||
element level without embedded in it. Substituting (13) into neurons, corresponding to the members of the global ma-
|
||||
the functional, we get trix . The output of each group of hidden layer neurons is the
|
||||
corresponding row vector of . The weights from the input to
|
||||
the hidden layer are set to the appropriate values of . Each(14) neuron in the hidden layer acts as a summation unit, (equivalent
|
||||
toasummationfollowedbyalinearactivationfunction[5]).The
|
||||
If we define outputs of the hidden layer neurons are the elements of the
|
||||
global matrix as given in (15).
|
||||
(15) Each group of hidden neurons is connected to one output
|
||||
neuron (giving a total of output neurons) by a set of weights
|
||||
, with each element of representing the nodal values .where Note that the set of weights between thefirst group of hidden
|
||||
neurons and thefirst output neuron are the same as the set of(16)else weights between the second group of hidden neurons and the
|
||||
second output neuron (as well as between successive groups
|
||||
of hidden neurons and the corresponding output neuron). Each
|
||||
output neuron is also a summation unit followed by a linear ac-
|
||||
tivation function, and the output of each neuron is equal to :
|
||||
|
||||
|
||||
(18)
|
||||
(17)
|
||||
|
||||
where the second part of (18) is obtained by using (15). As an
|
||||
Equation (17) expresses the functional explicitly in terms of . example, the FENN architecture for a two-element, four-node
|
||||
The assumption that is constant within each element is im- FEM mesh (Fig. 3) is shown in Fig. 4. In this
|
||||
plicit in this expression. This assumption is usually satisfied in case, the FENN has two input neurons, 16 hidden layer neurons
|
||||
problems in NDE where each element in the FEM mesh is de- and four output neurons. Thefigure illustrates the grouping of
|
||||
fined within the confines of a domain, and at no time does a the hidden layer neurons, as well as the similarity inherent in
|
||||
single element cross domain boundaries. Furthermore, each el- the weights that connect each group of hidden layer neurons
|
||||
ement is small enough that minor variations in within an el- to the corresponding output neuron. To simplify thefigure, the
|
||||
ement may be ignored. Equation (17) can be easily converted weights between the network input and hidden layer neurons
|
||||
into a parallel network form. The neural network comprises an are depicted by means of vectors (for
|
||||
input, output and hidden layer. In the general case with el- , 2, 3, 4 and , 2), where the individual weight values
|
||||
ements and nodes in the FEM mesh, the input layer with are defined as in (16).
|
||||
network inputs takes the values in each element as input. 1) Boundary Conditions in the FENN: Note that the ele-
|
||||
The hidden layer has neurons 1 arranged in groups of ments of and in (11) do not depend on the material prop-
|
||||
1 erties . and need to be added appropriately to the global In this paper, we use the term“neurons”in the FENN (in the hidden and
|
||||
output layers) to avoid confusion with the nodes in afinite-element mesh. matrix and the source vector as shown in (12). Equation RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1385
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fig. 5. Geometry of mesh for 1-D FEM.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fig. 6. Flowchart (with example) for designing the FENN for a general PDE.
|
||||
|
||||
|
||||
(12) thus implies that natural boundary conditions can be ap- layer neurons. These weights will be referred to as the clamped
|
||||
plied in the FENN as bias inputs to the hidden layer neurons weights, while the remaining weights will be referred to as the
|
||||
that are a part of the boundary, and the corresponding output free weights. An example of these weights is presented later.
|
||||
neurons. Dirichlet boundary conditions are applied by clamping The FENN architecture was derived without consideration of
|
||||
the corresponding weights between the hidden layer and output the dimensionality of the problem at hand, and thus can be used 1386 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
|
||||
|
||||
|
||||
for 1-, 2-, 3-, or higher dimensional problems. The number of
|
||||
nodes and elements in the FEM mesh dictates the number of
|
||||
neurons in the different layers. The weights between the input
|
||||
and hidden layer change depending on node-element connec-
|
||||
tivity information.
|
||||
The major drawback of the FENN is the number of neurons
|
||||
and weights necessary. However, the memory requirements can
|
||||
be reduced considerably, since most of the weights between the
|
||||
input and hidden layer are zero. These weights, and the corre-
|
||||
sponding connections, can be discarded. Similarly, most of the Fig. 7. Shielded microstrip geometry. (a) Complete problem description. (b)
|
||||
elements of the matrix are also zero ( is a banded ma- Problem description using symmetry considerations.
|
||||
trix). The corresponding neurons in the hidden layer can also
|
||||
be discarded, reducing memory and computation requirements The network implementation of (23) can be derived as fol-
|
||||
considerably. Furthermore, the weights between each group of lows. If and values at each element are the inputs to the
|
||||
hidden layer neurons and the output layer are the same . network, , , , and form the weights
|
||||
Weight-sharing approaches can be used here to further reduce between the input and hidden layers. The network thus uses
|
||||
the storage requirements. inputneuronsand hiddenneurons.Thevaluesof ateachof
|
||||
thenodesareassigned asweightsbetweenthehidden andoutput
|
||||
C. A 1-D Example layers, and the source is the desired output of this network
|
||||
Consider the 1-D equation (corresponding to the output neurons). Dirichlet boundary
|
||||
conditions on are applied as explained earlier.
|
||||
|
||||
(19) D. General Case
|
||||
Fig. 6 shows aflowchart of the general scheme for convertingboundary conditions on the boundary defined by . a differential equation into the FENN structure. An exampleand are constants depending on the material and is the in two dimensions is also provided next to theflowchart. Weapplied source. Laplace’s equation and Poisson’s equation are start with the differential equation and the boundary conditionsspecial cases of this equation. The FENN formulation for this and formulate the FEM using the variational method. This in-problem starts by discretizing the domain of interest with el- volves discretizing the domain of interest with elements andements and nodes. In one dimension, each element is defined nodes, selecting basis functions, writing the functional forby two nodes (Fig. 5). Define basis functions and over each element and obtaining the element matrices and the sourceeach element and let is the value of on node in element vector. The example presented uses the FEM mesh shown in. An example of the basis functions is shown in Fig. 5. Fig. 3, with elements, and nodes, and linearFor these basis functions, i.e., basis functions. The unknown solution to the differential equa-
|
||||
tion is represented by its values at each of the nodes in the(20) finite-element mesh . The element matrices are then
|
||||
separated into two parts, with one part dependent on the mate-the element matrices are given by [3] rial properties and while the other is independent of them.
|
||||
The FENN is then designed to have input neurons,
|
||||
hidden neurons, and output neurons, where is the number
|
||||
of material property parameters. In the example under consid-
|
||||
eration, , since we have two material property parameters(21) ( and ). Thefirst group of input neurons takes in the
|
||||
values while the second group takes in the values in each ele-
|
||||
ment. The weights from the input to the hidden layer are set to
|
||||
the appropriate values of . In the example, since nodes 1, 2,
|
||||
(22) and 3 are part of element 1 (see Fig. 3), the weights from thefirst
|
||||
input node to thefirst group of four neurons in the hidden
|
||||
Here, is the length of element . The global matrix is then layer are given by
|
||||
constructed by selectively adding the element matrices based
|
||||
on the nodes that form an element. Specifically, is a sparse
|
||||
tridiagonal matrix, and its nonzero elements are given by (24)
|
||||
|
||||
The last weight is zero since node 4 is not a part of element 1.
|
||||
Each group of hidden neurons is connected to one output
|
||||
neuron (giving a total of output neurons) by a set of weights
|
||||
, with each element of representing the nodal values . The
|
||||
(23) output of each neuron in the output layer is equal to . RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1387
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fig. 8. Forward problem solutions for shielded microstrip problem show the contours of constant potential for: (a) FEM solution and (b) FENN solution. (c) Error
|
||||
between (a) and (b). Thex- andy-axes show the nodes in the FEM discretization of the domain, and thez-axis in (c) shows the error at each of these nodes in volts.
|
||||
|
||||
|
||||
|
||||
III. F ORWARD AND INVERSE PROBLEM FORMULATION USING where is the output of the FENN. Then, for a gradient-
|
||||
FENN based approach, the gradients of the error with respect to the
|
||||
free hidden layer weights is given by
|
||||
|
||||
The FENN architecture and algorithm lends itself to solving (27)both the forward and inverse problems. The forward problem
|
||||
involves determining the weights given the material parame- Equation (27) can be used to solve the forward problem. Sim-ters and and the applied source while the inverse problem ilarly, to solve the inverse problem, the gradients of the errorinvolves determining and given and . Any optimization with respect to and (input of the FENN) are necessary, andapproach can be used to solve both these problems. Suppose we are given bydefine the error at the output of the FENN as
|
||||
|
||||
|
||||
|
||||
|
||||
(28)
|
||||
|
||||
|
||||
|
||||
|
||||
(26) (29) 1388 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
|
||||
|
||||
|
||||
|
||||
TABLE I
|
||||
SUMMARY OF PERFORMANCE OF THE FENN A LGORITHM FOR VARIOUS PDE S
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
For the forward problem, such an approach is equivalent to the Dirichlet boundary, with on the microstrip and on
|
||||
iterative approaches used to solve for the unknown nodal values the outer boundary [Fig. 7(b)]. Finally, there is no source term
|
||||
in the FEM [4]. in this example (the source term would correspond to a charge
|
||||
distribution in the domain of interest), i.e., . In this ex-
|
||||
IV. R ESULTS ample, we assume that volts and . Further, we
|
||||
assume that the domain of interest is .A. Forward Model Results The solution to the forward problem is presented in Fig. 8,
|
||||
The FENN was tested using both 1- and 2-D versions of with the FEM solution using 11 nodes in each direction shown
|
||||
Poisson’s equation in Fig. 8(a) and the corresponding FENN solution in Fig. 8(b).
|
||||
|
||||
(30) Thesefigures show contours of constant potential. The error be-
|
||||
tween the FEM and FENN solutions is presented in Fig. 8(c). As
|
||||
where represents the material property, and is the applied seen from thefigure, the FENN is seen to match the FEM solu-
|
||||
source. For instance, in electromagnetics may represent the tion accurately, with the peak error at any node on the order of
|
||||
permittivity while represents the charge density. .
|
||||
As thefirst example, consider the following 2-D equation Several other examples were also used to test the FENN and
|
||||
the results are summarized in Table I. Column 1 shows the
|
||||
(31) PDE used to evaluate the FENN performance, while column 2
|
||||
shows the boundary conditions used. The analytic solution to
|
||||
with boundary conditions the problem is indicated in Column 3. The FENN structure and
|
||||
|
||||
on (32) the number of iterations for convergence using a gradient de-
|
||||
scent approach are indicated in Columns 4 and 5, respectively.
|
||||
and The FENN structure, as explained earlier, has inputs,
|
||||
hidden neurons and output neurons, where and are the
|
||||
on (33) number of elements and nodes in the FEM mesh, respectively,
|
||||
and is the number of hidden neurons, and corresponds to the
|
||||
This is the governing equation for the shielded microstrip trans- number of nonzero elements in the FEM global matrix . Fi-
|
||||
mission line problem shown in Fig. 7. The forward problem nally, Columns 6 and 7 present the sum-squared error (SSE) and
|
||||
computes the electric potential due to the shielded microstrip the maximum error in the solution, respectively, where the er-
|
||||
shown in Fig. 7(a). The potentials are zero on the shielding con- rors are computed with respect to the analytical solution. These
|
||||
ductor.Sincethegeometryissymmetric,wecansolvetheequiv- results indicate that the FENN is capable of accurately deter-
|
||||
alent problem shown in Fig. 7(b), by applying the homogeneous mining the potential . One advantage of the FENN approach
|
||||
Neumann condition on the plane of symmetry. The inner con- is that the computation of the input-hidden layer weights is a
|
||||
ductor (microstrip) is held at a constant potential of volts. one-time process, as long as the differential equation does not
|
||||
Finally, we also assume that the material inside the shielding change. The only changes necessary to solve the different prob-
|
||||
conductor has a permittivity , where K is a constant. The lems are changes in the input and the desired output .
|
||||
permittivity in this case corresponds to the material property .
|
||||
Specifically, and . The homogeneous Neu- B. Inverse Model Results
|
||||
mann boundary condition is equivalent to setting . TheFENNwasalsousedtosolveseveralsimpleinverseprob-
|
||||
The microstrip and the shielding conductor correspond to the lems based on (30). In all cases, the objective was to determine RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1389
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fig. 9. FENN inversion results for Poisson’s equation with initial solutions (a) = x . (b) =1+ x .
|
||||
|
||||
|
||||
the value of and for given values of and . Thefirst ex- In order to obtain a unique solution, we need to constrain the
|
||||
ample is a 1-D problem that involves determining given value of at the boundary as well. Consider the same differen-
|
||||
and , for the differential equation tial equation as (34), but with and specified as follows:
|
||||
|
||||
(34) and
|
||||
|
||||
with boundary conditions and . The analyt- (36)
|
||||
ical solution to this inverse problem is The analytical solution for this equation is .To
|
||||
and (35) solve this problem, we set and clamp the value of at
|
||||
As seen from (35), the problem has an infinite number of solu- and as follows: , .
|
||||
tions and we expect the solution procedure to converge to one The results of the constrained inversion obtained using 11
|
||||
of these solutions depending on the initial value. nodes and 10 elements in the correspondingfinite-element mesh
|
||||
Fig. 9(a) and (b) shows two solutions to this inverse problem are shown in Fig. 10. Fig. 10(a) shows the comparison between
|
||||
for two different initializations (shown using triangles). In both the analytical solution (solid line with squares) and the FENN
|
||||
cases, the FENN solution (in stars) is seen to match the analyt- result (solid line with stars). The initial value of is shown in
|
||||
ical solution (squares). The SSE in both cases was on the order thefigure as a dashed line. Fig. 10(b) shows the comparison
|
||||
of . between the actual and desired forcing function at the FENN 1390 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fig. 10. Constrained inversion result with eleven nodes. (a) Comparison of analytic and simulation results for . (b) Comparison of actual and desired NN outputs.
|
||||
|
||||
|
||||
output. This result indicates that the SSE in the forcing function, weight structure that allows both the forward and inverse prob-
|
||||
as well as the SSE in the inversion result, is fairly large (0.0148 lemstobesolvedusingsimplegradient-basedalgorithms.Initial
|
||||
and 0.0197, respectively). The reason for this was traced back results indicate that the proposed FENN algorithm is capable of
|
||||
to the mesh discretization. Fig. 11 shows the SSE in the output accurately solving both the forward and inverse problems. In
|
||||
of the FENN and the SSE in the inverse problem solution as a addition, the forward problem solution from the FENN is seen
|
||||
function of FEM discretization. It is seen that increasing the dis- to exactly match the FEM solution, indicating that the FENN
|
||||
cretization significantly improves the solution. Similar results represents thefinite-element model exactly in a parallel config-
|
||||
were observed for other problems. uration.
|
||||
The major advantage of the FENN is that it represents the
|
||||
finite-element model in a parallel form, enabling parallel imple-
|
||||
V. D ISCUSSION AND CONCLUSION mentation in either hardware or software. Further, computing
|
||||
gradients in the FENN is very simple. This is an advantage in
|
||||
The FENN is closely related to thefinite-element model used solving bothforward and inverse problems using gradient-based
|
||||
to solve differential equations. The FENN architecture has a methods. The gradients can also be computed in parallel and RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1391
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Fig. 11. SSE in FENN output and inversion results as a function of discretization.
|
||||
|
||||
|
||||
the lack of nonlinearities in the neuron activation functions [6] C. A. Jensenet al.,“Inversion of feedforward neural networks: algo-
|
||||
makes the computation of gradients simpler. A major advantage rithms and applications,”Proc. IEEE, vol. 87, no. 9, pp. 1536–1549,
|
||||
of this approach for solving inverse problems is that it avoids 1999.
|
||||
[7] P. Ramuhalli, L. Udpa, and S. Udpa,“Neural networkalgorithm for elec-
|
||||
inverting the global matrix in each iteration. The FENN also tromagnetic NDE signal inversion,”inENDE 2000, Budapest, Hungary,
|
||||
does not require any training, since most of its weights can be Jun. 2000.
|
||||
computed in advance and stored. The weights depend on the [8] C. H. Barbosa, A. C. Bruno, M. Vellasco, M. Pacheco, J. P. Wikswo Jr.,
|
||||
and A. P. Ewing,“Automation of SQUID nondestructive evaluation of
|
||||
governing differential equation and its associated boundary steel plates by neural networks,”IEEE Trans. Appl. Supercond., vol. 9,
|
||||
conditions, and as long as these two factors do not change, no. 2, pp. 3475–3478, 1999.
|
||||
the weights do not change. This is especially an advantage [9] W.Qing, S. Xueqin,Y.Qingxin,and Y.Weili,“Usingwaveletneural net-
|
||||
works for the optimal design of electromagnetic devices,”IEEE Trans.
|
||||
in solving inverse problems in electromagnetic NDE. This Magn., vol. 33, no. 2, pp. 1928–1930, 1997.
|
||||
approach also reduces the computational effort associated with [10] I. E. Lagaris, A. C. Likas, and D. I. Fotiadis,“Artificial neural networks
|
||||
the network. for solving ordinary and partial differential equations,”IEEE Trans.
|
||||
Neural Netw., vol. 9, no. 5, pp. 987–1000, 1998.
|
||||
Future work will concentrate on applying the FENN to 3-D [11] I. E. Lagaris, A. C. Likas, and D. G. Papageorgiou,“Neural-network
|
||||
electromagnetic NDE problems. The robustness of the approach methods for boundary value problems with irregular boundaries,”IEEE
|
||||
will also be tested, since the ability of these approaches to in- Trans. Neural Netw., vol. 11, no. 5, pp. 1041–1049, 2000.
|
||||
[12] B. P. Van Milligen, V. Tribaldos, and J. A. Jimenez,“Neural network
|
||||
vert practical noisy measurements is important. Furthermore, differential equation and plasma equilibrium solver,”Phys. Rev. Lett.,
|
||||
the use of better optimization algorithms, like conjugate gra- vol. 75, no. 20, pp. 3594–3597, 1995.
|
||||
dient methods, is expected to improve the solution speed. In ad- [13] M. W. M. G. Dissanayake and N. Phan-Thien,“Neural-network-based
|
||||
approximations for solving partial differential equations,”Commun.
|
||||
dition, parallel implementation of the FENN in both hardware Numer. Meth. Eng., vol. 10, pp. 195–201, 1994.
|
||||
and software is under investigation. The approach described in [14] R. Masuoka,“Neural networks learning differential data,”IEICE Trans.
|
||||
this paper is very general in that it can be applied to a variety Inform. Syst., vol. E83-D, no. 6, pp. 1291–1300, 2000.
|
||||
[15] D.C.Youla,“Generalizedimagerestorationbythemethodofalternating
|
||||
of inverse problems infields other than electromagnetic NDE. orthogonal projections,”IEEE Trans. Circuits Syst., vol. CAS-25, no. 9,
|
||||
Some of these other applications will also be investigated to pp. 694–702, 1978.
|
||||
show the general nature of the proposed method. [16] D. C. Youla and H. Webb,“Image restoration by the method of convex
|
||||
projections: part I—theory,”IEEE Trans. Med. Imag., vol. MI-1, no. 2,
|
||||
pp. 81–94, 1982.
|
||||
REFERENCES [17] A. Lent and H. Tuy,“An iterative method for the extrapolation of band-
|
||||
limitedfunctions,”J.Math.AnalysisandApplicat.,vol.83, pp.554–565,
|
||||
[1] L. Udpa and S. S. Udpa,“Application of signal processing and pattern 1981.
|
||||
recognition techniques to inverse problems in NDE,”Int. J. Appl. Elec- [18] W. Chen,“A new extrapolation algorithm for band-limited signals using
|
||||
tromagn. Mechan., vol. 8, pp. 99–117, 1997. the regularization method,”IEEE Trans. Signal Process., vol. 41, no. 3,
|
||||
[2] M. Yan, M. Afzal, S. Udpa, S. Mandayam, Y. Sun, L. Udpa, and P. pp. 1048–1060, 1993.
|
||||
Sacks,“Iterative algorithms for electromagnetic NDE signal inversion,” [19] J. Takeuchi and Y. Kosugi,“Neural network representation of thefinite
|
||||
inENDE ’97, Reggio Calabria, Italy, Sep. 14–16, 1997. element method,”Neural Netw., vol. 7, no. 2, pp. 389–395, 1994.
|
||||
[3] J. Jin,The Finite Element Method in Electromagnetics. New York: [20] R. Sikora, J. Sikora, E. Cardelli, and T. Chady,“Artificial neural net-
|
||||
Wiley, 1993. work application for material evaluation by electromagnetic methods,”
|
||||
[4] P. Zhou,Numerical Analysis of Electromagnetic Fields. Berlin, Ger- inProc. Int. Joint Conf. Neural Networks, vol. 6, 1999, pp. 4027–4032.
|
||||
many: Springer-Verlag, 1993. [21] G. Xu, G. Littlefair, R. Penson, and R. Callan,“Application of FE-based
|
||||
[5] S. Haykin,Neural Networks: A Comprehensive Foundation. Upper neural networks to dynamic problems,”inProc. Int. Conf. Neural Infor-
|
||||
Saddle River, NJ: Prentice-Hall, 1994. mation Processing, vol. 3, 1999, pp. 1039–1044. 1392 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
|
||||
|
||||
|
||||
|
||||
[22] F. Guo, P. Zhang, F. Wang, X. Ma, and G. Qiu,“Finite element anal- Lalita Udpa (S’84–M’86–SM’96) received the
|
||||
ysis-based Hopfield neural network model for solving nonlinear elec- Ph.D. degree in electrical engineering from Col-
|
||||
tromagneticfield problems,”inProc. Int. Joint Conf. Neural Networks, orado State University, Fort Collins, in 1986.
|
||||
vol. 6, 1999, pp. 4399–4403. She is currently a Professor with the Department
|
||||
[23] H. Lee and I. S. Kang,“Neural algorithm for solving differential equa- of Electrical and Computer Engineering, Michigan
|
||||
tions,”J. Computat. Phys., vol. 91, pp. 110–131, 1990. State University, East Lansing. She works primarily
|
||||
[24] J. Kalkkuhl, K. J. Hunt, and H. Fritz,“FEM-based neural-network in the broad areas of nondestructive evaluation,
|
||||
approach to nonlinear modeling with application to longitudinal vehicle signal processing, and biomedical applications. Her
|
||||
dynamics control,”IEEE Trans. Neural Netw., vol. 10, no. 4, pp. research interests include various aspects of NDE,
|
||||
885–897, 1999. such as development of computational models for
|
||||
[25] R. K. Mishra and P. S. Hall,“NFDTD concept,”IEEE Trans. Neural the forward problem in NDE, signal and image pro-
|
||||
Netw., vol. 16, no. 2, pp. 484–490, 2005. cessing, pattern recognition and neural networks, and development of solution
|
||||
[26] D. G. Triantafyllidis and D. P. Labridis,“Afinite-element mesh gener- techniques for inverse problems. Her current projects includefinite-element
|
||||
ator based on growing neural networks,”IEEE Trans. Neural Netw., vol. modeling of electromagnetic NDE phenomena, application of neural network
|
||||
13, no. 6, pp. 1482–1496, 2002. and signal processing algorithms to NDE data, and development of image
|
||||
processing techniques for the analysis of NDE and biomedical images.
|
||||
Dr. Udpa is a Member of Eta Kappa Nu and Sigma Xi.
|
||||
|
||||
|
||||
|
||||
Satish S. Udpa(S’82–M’82–SM’91–F’03) received
|
||||
the B.Tech. degree in 1975 and the Post Graduate
|
||||
Diplomainelectricalengineeringin1977fromJ.N.T.
|
||||
University, Hyderabad, India. He received the M.S.
|
||||
degree in 1980 and the Ph.D. degree in electrical en-
|
||||
gineering in 1983, both from Colorado State Univer-
|
||||
sity, Fort Collins.
|
||||
He has been with Michigan State University, East
|
||||
Lansing, since 2001 and is currently Acting Dean for
|
||||
the College of Engineering and a Professor with the
|
||||
Electrical and Computer Engineering Department.
|
||||
Prior to joining Michigan State, he was a Professor with Iowa State University,
|
||||
Ames, from 1990 to 2001 and was associated with the Materials Assessment
|
||||
Research Group. Prior to joining Iowa State, he was an Associate Professor
|
||||
with the Department of Electrical Engineering at Colorado State University.
|
||||
His research interests span the broad area of materials characterization and
|
||||
nondestructive evaluation (NDE). Work done by him to date in the area includes
|
||||
an extensive repertoire of forward models for simulating physical processes
|
||||
underlying several inspection techniques. Coupled with careful experimental
|
||||
Pradeep Ramuhalli (S’92–M’02) received the work, such forward models can be used for designing new sensors, optimizing
|
||||
B.Tech. degree from J.N.T. University, Hyderabad, test conditions, estimating the probability of detection, assessing designs for
|
||||
India, in electronics and communications engi- inspectability and training inverse models for characterizing defects. He has
|
||||
neering in 1995, and the M.S. and Ph.D. degrees in also been involved in the development of system-, as well as model-based,
|
||||
electrical engineering from Iowa State University, inverse solutions for defect and material property characterization. His interests
|
||||
Ames, in 1998 and 2002, respectively. have expanded in recent years to include the development of noninvasive
|
||||
He is currently an Assistant Professor with the tools for clinical applications. Work done to date in thisfield includes the
|
||||
Department of Electrical and Computer Engi- development of new electromagnetic-acoustic (EMAT) methods for detecting
|
||||
neering, Michigan State University, East Lansing. single leg separation failures in artificial heart valves and microwave imaging
|
||||
His research is in the general area of nondestruc- and ablation therapy systems. He and his research group have been engaged
|
||||
tive evaluation and materials characterization. His in the design and development of high-performance instrumentation including
|
||||
research interests include the application of signal and image processing acoustic microscopes and single and multifrequency eddy current NDE instru-
|
||||
methods, pattern recognition and neural networks for nondestructive evaluation ments. These systems, as well as software packages embodying algorithms
|
||||
applications, development of model-based solutions for inverse problems in developed by Udpa for defect classification and characterization, have been
|
||||
NDE, and the development of information fusion algorithms for multimodal licensed to industry.
|
||||
data fusion. He is a Fellow of the American Society for Nondestructive Testing (ASNT)
|
||||
Dr. Ramuhalli is a Member of Phi Kappa Phi. and a Fellow of the Indian Society of Nondestructive Testing.
|
Loading…
Reference in New Issue