Updates to corpus

2020-08-07 18:30:32 -06:00 · 2020-08-07 18:30:32 -06:00 · 93b3da7a7d
commit 93b3da7a7d
parent 93ffde18f6
5 changed files with 2253 additions and 1054 deletions
--- a/Corpus/CORPUS.txt
+++ b/Corpus/CORPUS.txt
--- a/Corpus/Efficient
+++ b/Corpus/Efficient
--- a/Corpus/EfficientNet
+++ b/Corpus/EfficientNet
--- a/Corpus/Energy
+++ b/Corpus/Energy
@ -1,261 +0,0 @@
-                 Energy and Policy Considerations for Deep Learning in NLP
-
-
-                      Emma Strubell Ananya Ganesh Andrew McCallum
-                            College of Information and Computer Sciences
-                                University of Massachusetts Amherst
-                       {strubell, aganesh, mccallum}@cs.umass.edu
-
-
-
-
-
-                        Abstract                Consumption CO 2 e (lbs)
-                                               Air travel, 1 passenger, NY↔SF 1984 Recent progress in hardware and methodol-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-    arXiv:1906.02243v1  [cs.CL]  5 Jun 2019                                          Human life, avg, 1 year 11,023 ogy for training neural networks has ushered
-             in a new generation of large networks trained      American life, avg, 1 year 36,156
-             on abundant data. These models have ob-      Car, avg incl. fuel, 1 lifetime 126,000
-             tained notable gains in accuracy across many
-             NLP tasks. However, these accuracy improve-      Training one model (GPU)
-             ments depend on the availability of exception-      NLP pipeline (parsing, SRL) 39 ally large computational resources that neces-       w/ tuning & experimentation 78,468 sitate similarly substantial energy consump-      Transformer (big) 192 tion. As a result these models are costly to
-             train and develop, both ﬁnancially, due to the       w/ neural architecture search 626,155
-             cost of hardware and electricity or cloud com-     Table 1: Estimated COpute time, and environmentally,due to the car-                   2 emissions from training com-
-                                              mon NLP models, compared to familiar consumption. 1 bon footprint required to fuel modern tensor
-             processing hardware. In this paper we bring
-             this issue to the attention of NLP researchers     NLP models could be trained and developed on by quantifying the approximate ﬁnancial and     a commodity laptop or server, many now require environmental costs of training a variety of re-
-             cently successful neural network models for     multiple instances of specialized hardware such as
-             NLP. Based on these ﬁndings, we propose ac-     GPUs or TPUs, therefore limiting access to these
-             tionable recommendations to reduce costs and     highly accurate models on the basis of ﬁnances.
-             improve equity in NLP research and practice.       Even when these expensive computational re-
-           1 Introduction                       sources are available, model training also incurs a
-                                              substantial cost to the environment due to the en-
-           Advances in techniques and hardware for train-  ergy required to power this hardware for weeks or
-           ing deep neural networks have recently en-  months at a time. Though some of this energy may
-           abled impressive accuracy improvements across  come from renewable or carbon credit-offset re-
-           many fundamental NLP tasks ( Bahdanau et al.,  sources, the high energy demands of these models
-           2015; Luong et al., 2015; Dozat and Man-  are still a concern since (1) energy is not currently
-           ning, 2017; Vaswani et al., 2017), with the  derived from carbon-neural sources in many loca-
-           most computationally-hungry models obtaining  tions, and (2) when renewable energy is available,
-           the highest scores (Peters et al.,2018;Devlin et al.,  it is still limited to the equipment we have to pro-
-           2019;Radford et al.,2019;So et al.,2019). As  duce and store it, and energy spent training a neu-
-           a result, training a state-of-the-art model now re-  ral network might better be allocated to heating a
-           quires substantial computational resources which  family’s home. It is estimated that we must cut
-           demand considerable energy, along with the as-  carbon emissions by half over the next decade to
-           sociated ﬁnancial and environmental costs. Re-  deter escalating rates of natural disaster, and based
-           search and development of new models multiplies  on the estimated CO 2 emissions listed in Table 1,
-           these costs by thousands of times by requiring re-
-           training to experiment with model architectures    1 Sources: (1) Air travel and per-capita consump-
-                                              tion: https://bit.ly/2Hw0xWc; (2) car lifetime: and hyperparameters. Whereas a decade ago most  https://bit.ly/2Qbr0w1.           model training and development likely make up   Consumer Renew. Gas Coal Nuc.
-           a substantial portion of the greenhouse gas emis-   China 22% 3% 65% 4%
-           sions attributed to many NLP researchers.         Germany 40% 7% 38% 13%
-            To heighten the awareness of the NLP commu-   United States 17% 35% 27% 19%
-           nity to this issue and promote mindful practice and   Amazon-AWS 17% 24% 30% 26%
-           policy, we characterize the dollar cost and carbon   Google 56% 14% 15% 10%
-           emissions that result from training the neural net-   Microsoft 32% 23% 31% 10%
-           works at the core of many state-of-the-art NLP
-           models. We do this by estimating the kilowatts  Table 2: Percent energy sourced from: Renewable (e.g.
-           of energy required to train a variety of popular  hydro, solar, wind), natural gas, coal and nuclear for
-           off-the-shelf NLP models, which can be converted  the top 3 cloud compute providers (Cook et al.,2017),
-           to approximate carbon emissions and electricity  compared to the United States, 4 China 5 and Germany
-           costs. To estimate the even greater resources re-  (Burger,2019).
-           quired to transfer an existing model to a new task
-           or develop new models, we perform a case study    We estimate the total time expected for mod-
-           of the full computational resources required for the  els to train to completion using training times and
-           development and tuning of a recent state-of-the-art  hardware reported in the original papers. We then
-           NLP pipeline (Strubell et al.,2018). We conclude  calculate the power consumption in kilowatt-hours
-           with recommendations to the community based on  (kWh) as follows. Letpc be the average power
-           our ﬁndings, namely: (1) Time to retrain and sen-  draw (in watts) from all CPU sockets during train-
-           sitivity to hyperparameters should be reported for  ing, letpr be the average power draw from all
-           NLP machine learning models; (2) academic re-  DRAM (main memory) sockets, letpg be the aver-
-           searchers need equitable access to computational  age power draw of a GPU during training, and let
-           resources; and (3) researchers should prioritize de-  gbe the number of GPUs used to train. We esti-
-           veloping efﬁcient models and hardware.         mate total power consumption as combined GPU,
-                                              CPU and DRAM consumption, then multiply this
-           2 Methods                          by Power Usage Effectiveness (PUE), which ac-
-                                              counts for the additional energy required to sup-To quantify the computational and environmen-  port the compute infrastructure (mainly cooling).tal cost of training deep neural network mod-  We use a PUE coefﬁcient of 1.58, the 2018 globalels for NLP, we perform an analysis of the en-  average for data centers (Ascierto,2018). Then theergy required to train a variety of popular off-  total powerpthe-shelf NLP models, as well as a case study of           t required at a given instance during
-                                              training is given by:the complete sum of resources required to develop
-           LISA (Strubell et al.,2018), a state-of-the-art NLP             1.58t(pp       c +pr +gp g )
-           model from EMNLP 2018, including all tuning          t =                    (1)1000
-           and experimentation.                      The U.S. Environmental Protection Agency (EPA)We measure energy use as follows. We train the  provides average COmodels described in§2.1using the default settings                 2 produced (in pounds per
-                                              kilowatt-hour) for power consumed in the U.S.provided, and sample GPU and CPU power con-  (EPA,2018), which we use to convert power tosumption during training. Each model was trained  estimated COfor a maximum of 1 day. We train all models on           2 emissions:
-
-           a single NVIDIA Titan X GPU, with the excep-             CO 2 e = 0.954pt         (2)
-           tion of ELMo which was trained on 3 NVIDIA  This conversion takes into account the relative pro-GTX 1080 Ti GPUs. While training, we repeat-  portions of different energy sources (primarily nat-edly query the NVIDIA System Management In-  ural gas, coal, nuclear and renewable) consumedterface 2 to sample the GPU power consumption  to produce energy in the United States. Table2and report the average over all samples. To sample  lists the relative energy sources for China, Ger-CPU power consumption, we use Intel’s Running  many and the United States compared to the topAverage Power Limit interface. 3
-                                                5 U.S. Dept. of Energy:https://bit.ly/2JTbGnI
-            2 nvidia-smi:https://bit.ly/30sGEbi        5 China Electricity Council; trans. China Energy Portal:
-            3 RAPL power meter:https://bit.ly/2LObQhV   https://bit.ly/2QHE5O3           three cloud service providers. The U.S. break-  ence. Devlin et al.(2019) report that the BERT
-           down of energy is comparable to that of the most  base model (110M parameters) was trained on 16
-           popular cloud compute service, Amazon Web Ser-  TPU chips for 4 days (96 hours). NVIDIA reports
-           vices, so we believe this conversion to provide a  that they can train a BERT model in 3.3 days (79.2
-           reasonable estimate of CO 2 emissions per kilowatt  hours) using 4 DGX-2H servers, totaling 64 Tesla
-           hour of compute energy used.                V100 GPUs (Forster et al.,2019).
-                                              GPT-2. This model is the latest edition of
-           2.1 Models                           OpenAI’s GPT general-purpose token encoder,
-           We analyze four models, the computational re-  also based on Transformer-style self-attention and
-           quirements of which we describe below. All mod-  trained with a language modeling objective (Rad-
-           els have code freely available online, which we  ford et al.,2019). By training a very large model
-           used out-of-the-box. For more details on the mod-  on massive data,Radford et al.(2019) show high
-           els themselves, please refer to the original papers.  zero-shot performance on question answering and
-                                              language modeling benchmarks. The large modelTransformer. The Transformer model (Vaswani  described inRadford et al.(2019) has 1542M pa-et al.,2017) is an encoder-decoder architecture  rameters and is reported to require 1 week (168primarily recognized for efﬁcient and accurate ma-  hours) of training on 32 TPUv3 chips. 6 chine translation. The encoder and decoder each
-           consist of 6 stacked layers of multi-head self-
-           attention. Vaswani et al.(2017) report that the  3 Related work
-           Transformerbasemodel (65M parameters) was
-           trained on 8 NVIDIA P100 GPUs for 12 hours,  There is some precedent for work characterizing
-           and the Transformerbigmodel (213M parame-  the computational requirements of training and in-
-           ters) was trained for 3.5 days (84 hours; 300k  ference in modern neural network architectures in
-           steps). This model is also the basis for recent  the computer vision community.Li et al.(2016)
-           work on neural architecture search (NAS) for ma-  present a detailed study of the energy use required
-           chine translation and language modeling (So et al.,  for training and inference in popular convolutional
-           2019), and the NLP pipeline that we study in more  models for image classiﬁcation in computer vi-
-           detail in§4.2(Strubell et al.,2018). So et al.  sion, including ﬁne-grained analysis comparing
-           (2019) report that their full architecture search ran  different neural network layer types. Canziani
-           for a total of 979M training steps, and that their  et al.(2016) assess image classiﬁcation model ac-
-           base model requires 10 hours to train for 300k  curacy as a function of model size and gigaﬂops
-           steps on one TPUv2 core. This equates to 32,623  required during inference. They also measure av-
-           hours of TPU or 274,120 hours on 8 P100 GPUs.   erage power draw required during inference on
-                                              GPUs as a function of batch size. Neither work an-ELMo. The ELMo model (Peters et al.,2018)  alyzes the recurrent and self-attention models thatis based on stacked LSTMs and provides rich  have become commonplace in NLP, nor do theyword representations in context by pre-training on  extrapolate power to estimates of carbon and dol-a large amount of data using a language model-  lar cost of training.ing objective. Replacing context-independent pre-
-           trained word embeddings with ELMo has been    Analysis of hyperparameter tuning has been
-           shown to increase performance on downstream  performed in the context of improved algorithms
-           tasks such as named entity recognition, semantic  for hyperparameter search (Bergstra et al.,2011;
-           role labeling, and coreference.Peters et al.(2018)  Bergstra and Bengio,2012;Snoek et al.,2012). To
-           report that ELMo was trained on 3 NVIDIA GTX  our knowledge there exists to date no analysis of
-           1080 GPUs for 2 weeks (336 hours).           the computation required for R&D and hyperpa-
-                                              rameter tuning of neural network models in NLP.BERT.The BERT model (Devlin et al.,2019) pro-
-           vides a Transformer-based architecture for build-
-           ing contextual representations similar to ELMo,    6 Via the authorson Reddit.
-                                                7 GPU lower bound computed using pre-emptible but trained with a different language modeling ob-  P100/V100 U.S. resources priced at $0.43–$0.74/hr, upper
-           jective. BERT substantially improves accuracy on  bound uses on-demand U.S. resources priced at $1.46–
-           tasks requiring sentence-level representations such  $2.48/hr. We similarly use pre-emptible ($1.46/hr–$2.40/hr)
-                                              and on-demand ($4.50/hr–$8/hr) pricing as lower and upper as question answering and natural language infer-  bounds for TPU v2/3; cheaper bulk contracts are available.           Model Hardware Power (W) Hours kWh·PUE CO 2 e Cloud compute cost
-           Transformer base P100x8 1415.78 12 27 26 $41–$140
-           Transformer big  P100x8 1515.43 84 201 192 $289–$981
-           ELMo P100x3 517.66 336 275 262 $433–$1472
-           BERT base     V100x64 12,041.51 79 1507 1438 $3751–$12,571
-           BERT base     TPUv2x16 — 96 — — $2074–$6912
-           NAS P100x8 1515.43 274,120 656,347 626,155 $942,973–$3,201,722
-           NAS TPUv2x1 — 32,623 — — $44,055–$146,848
-           GPT-2 TPUv3x32 — 168 — — $12,902–$43,008
-
-           Table 3: Estimated cost of training a model in terms of CO 2 emissions (lbs) and cloud compute cost (USD). 7 Power
-           and carbon footprint are omitted for TPUs due to lack of public information on power draw for this hardware.
-
-
-           4 Experimental results                                  Estimated cost (USD)
-                                               Models Hours Cloud compute Electricity4.1 Cost of training                     1 120 $52–$175 $5Table3lists CO 2 emissions and estimated cost of   24 2880 $1238–$4205 $118training the models described in§2.1. Of note is   4789 239,942 $103k–$350k $9870that TPUs are more cost-efﬁcient than GPUs on
-           workloads that make sense for that hardware (e.g.  Table 4: Estimated cost in terms of cloud compute and
-           BERT). We also see that models emit substan-  electricity for training: (1) a single model (2) a single
-           tial carbon emissions; training BERT on GPU is  tune and (3) all models trained during R&D.
-           roughly equivalent to a trans-American ﬂight.So
-           et al.(2019) report that NAS achieves a new state-  about 60 GPUs running constantly throughout theof-the-art BLEU score of 29.7 for English to Ger-  6 month duration of the project. Table4lists upperman machine translation, an increase of just 0.1  and lower bounds of the estimated cost in termsBLEU at the cost of at least $150k in on-demand  of Google Cloud compute and raw electricity re-compute time and non-trivial carbon emissions.    quired to develop and deploy this model. 9 We see
-                                              that while training a single model is relatively in-4.2 Cost of development: Case study        expensive, the cost of tuning a model for a newTo quantify the computational requirements of  dataset, which we estimate here to require 24 jobs,R&D for a new model we study the logs of  or performing the full R&D required to developall training required to develop Linguistically-  this model, quickly becomes extremely expensive.Informed Self-Attention (Strubell et al.,2018), a
-           multi-task model that performs part-of-speech tag-  5 Conclusions
-           ging, labeled dependency parsing, predicate detec-
-           tion and semantic role labeling. This model makes  Authors should report training time and
-           for an interesting case study as a representative  sensitivity to hyperparameters.
-           NLP pipeline and as a Best Long Paper at EMNLP.  Our experiments suggest that it would be beneﬁ-
-            Model training associated with the project  cial to directly compare different models to per-
-           spanned a period of 172 days (approx. 6 months).  form a cost-beneﬁt (accuracy) analysis. To ad-
-           During that time 123 small hyperparameter grid  dress this, when proposing a model that is meant
-           searches were performed, resulting in 4789 jobs  to be re-trained for downstream use, such as re-
-           in total. Jobs varied in length ranging from a min-  training on a new domain or ﬁne-tuning on a new
-           imum of 3 minutes, indicating a crash, to a maxi-  task, authors should report training time and com-
-           mum of 9 days, with an average job length of 52  putational resources required, as well as model
-           hours. All training was done on a combination of  sensitivity to hyperparameters. This will enable
-           NVIDIA Titan X (72%) and M40 (28%) GPUs. 8   direct comparison across models, allowing subse-
-            The sum GPU time required for the project  quent consumers of these models to accurately as-
-           totaled 9998 days (27 years). This averages to  sess whether the required computational resources
-            8 We approximate cloud compute cost using P100 pricing.    9 Based on average U.S cost of electricity of $0.12/kWh.           are compatible with their setting. More explicit  half the estimated cost to use on-demand cloud
-           characterization of tuning time could also reveal  GPUs. Unlike money spent on cloud compute,
-           inconsistencies in time spent tuning baseline mod-  however, that invested in centralized resources
-           els compared to proposed contributions. Realiz-  would continue to pay off as resources are shared
-           ing this will require: (1) a standard, hardware-  across many projects. A government-funded aca-
-           independent measurement of training time, such  demic compute cloud would provide equitable ac-
-           as gigaﬂops required to convergence, and (2) a  cess to all researchers.
-           standard measurement of model sensitivity to data
-           and hyperparameters, such as variance with re-  Researchers should prioritize computationally
-           spect to hyperparameters searched.            efﬁcient hardware and algorithms.
-                                              We recommend a concerted effort by industry and
-           Academic researchers need equitable access to   academia to promote research of more computa-
-           computation resources.                   tionally efﬁcient algorithms, as well as hardware
-                                              that requires less energy. An effort can also beRecent advances in available compute come at a  made in terms of software. There is already ahigh price not attainable to all who desire access.  precedent for NLP software packages prioritizingMost of the models studied in this paper were de-  efﬁcient models. An additional avenue throughveloped outside academia; recent improvements in  which NLP and machine learning software de-state-of-the-art accuracy are possible thanks to in-  velopers could aid in reducing the energy asso-dustry access to large-scale compute.           ciated with model tuning is by providing easy-Limiting this style of research to industry labs  to-use APIs implementing more efﬁcient alterna-hurts the NLP research community in many ways.  tives to brute-force grid search for hyperparameterFirst, it stiﬂes creativity. Researchers with good  tuning, e.g. random or Bayesian hyperparameterideas but without access to large-scale compute  search techniques (Bergstra et al.,2011;Bergstrawill simply not be able to execute their ideas,  and Bengio,2012;Snoek et al.,2012). Whileinstead constrained to focus on different prob-  software packages implementing these techniqueslems. Second, it prohibits certain types of re-  do exist, 10 they are rarely employed in practicesearch on the basis of access to ﬁnancial resources.  for tuning NLP models. This is likely becauseThis even more deeply promotes the already prob-  their interoperability with popular deep learninglematic “rich get richer” cycle of research fund-  frameworks such as PyTorch and TensorFlow ising, where groups that are already successful and  not optimized, i.e. there are not simple exam-thus well-funded tend to receive more funding  ples of how to tune TensorFlow Estimators usingdue to their existing accomplishments. Third, the  Bayesian search. Integrating these tools into theprohibitive start-up cost of building in-house re-  workﬂows with which NLP researchers and practi-sources forces resource-poor groups to rely on  tioners are already familiar could have notable im-cloud compute services such as AWS, Google  pact on the cost of developing and tuning in NLP.Cloud and Microsoft Azure.
-            While these services provide valuable, ﬂexi-  Acknowledgements
-           ble, and often relatively environmentally friendly  We are grateful to Sherief Farouk and the anony- compute resources, it is more cost effective for  mous reviewers for helpful feedback on earlieracademic researchers, who often work for non-  drafts. This work was supported in part by theproﬁt educational institutions and whose research  Centers for Data Science and Intelligent Infor-is funded by government entities, to pool resources  mation Retrieval, the Chan Zuckerberg Initiativeto build shared compute centers at the level of  under the Scientiﬁc Knowledge Base Construc-funding agencies, such as the U.S. National Sci-  tion project, the IBM Cognitive Horizons Networkence Foundation. For example, an off-the-shelf  agreement no. W1668553, and National ScienceGPU server containing 8 NVIDIA 1080 Ti GPUs  Foundation grant no. IIS-1514053. Any opinions,and supporting hardware can be purchased for  ﬁndings and conclusions or recommendations ex-approximately $20,000 USD. At that cost, the  pressed in this material are those of the authors andhardware required to develop the model in our  do not necessarily reﬂect those of the sponsor.case study (approximately 58 GPUs for 172 days)
-           would cost $145,000 USD plus electricity, about    10 For example, theHyperopt Python library.           References                              Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
-                                                     Gardner, Christopher Clark, Kenton Lee, and LukeRhonda Ascierto. 2018.Uptime Institute Global Data    Zettlemoyer. 2018. Deep contextualized word rep-Center Survey. Technical report, Uptime Institute.     resentations. InNAACL.
-           Dzmitry Bahdanau, KyunghyunCho, and Yoshua Ben-
-             gio. 2015. Neural Machine Translation by Jointly  Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
-             Learning to Align and Translate. In3rd Inter-    Dario Amodei, and Ilya Sutskever. 2019.Language
-             national Conference for Learning Representations    models are unsupervised multitask learners.
-             (ICLR), San Diego, California, USA.            Jasper Snoek, Hugo Larochelle, and Ryan P Adams.
-           James Bergstra and Yoshua Bengio. 2012. Random    2012. Practical bayesian optimization of machine
-             search for hyper-parameter optimization.Journal of    learning algorithms. InAdvances in neural informa-
-             Machine Learning Research, 13(Feb):281–305.       tion processing systems, pages 2951–2959.
-
-           James S Bergstra, R´emi Bardenet, Yoshua Bengio, and  David R. So, Chen Liang, and Quoc V. Le. 2019.
-             Bal´azs K´egl. 2011. Algorithms for hyper-parameter    The evolved transformer. InProceedings of the
-             optimization. InAdvances in neural information    36th InternationalConference on Machine Learning
-             processing systems, pages 2546–2554.             (ICML).
-
-           Bruno Burger. 2019.Net Public Electricity Generation  Emma Strubell, Patrick Verga, Daniel Andor,
-             in Germany in 2018. Technical report, Fraunhofer    David Weiss, and Andrew McCallum. 2018.
-             Institute for Solar Energy Systems ISE.             Linguistically-Informed Self-Attention for Se-
-                                                     mantic Role Labeling. InConference on Empir-Alfredo Canziani, Adam Paszke, and Eugenio Culur-    ical Methods in Natural Language Processingciello. 2016. An analysis of deep neural network    (EMNLP), Brussels, Belgium. models for practical applications .
-                                                   Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobGary Cook, Jude Lee, Tamina Tsai, Ada Kongn, John    Uszkoreit, Llion Jones, Aidan N Gomez, LukaszDeans, Brian Johnson, Elizabeth Jardim, and Brian    Kaiser, and Illia Polosukhin. 2017. Attention is allJohnson. 2017. Clicking Clean: Who is winning    you need. In31st Conference on Neural Informationthe race to build a green internet?Technical report,    Processing Systems (NIPS).Greenpeace.
-            Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
-             Kristina Toutanova. 2019. BERT: Pre-training of
-             Deep Bidirectional Transformers for Language Un-
-             derstanding. InNAACL.
-            Timothy Dozat and Christopher D. Manning. 2017.
-             Deep biafﬁne attention for neural dependency pars-
-             ing. InICLR.
-            EPA. 2018. Emissions & Generation Resource Inte-
-             grated Database (eGRID). Technical report, U.S.
-             Environmental Protection Agency.
-            Christopher Forster, Thor Johnsen, Swetha Man-
-             dava, Sharath Turuvekere Sreenivas, Deyu Fu, Julie
-             Bernauer, Allison Gray, Sharan Chetlur, and Raul
-             Puri. 2019. BERT Meets GPUs. Technical report,
-             NVIDIA AI.
-            Da Li, Xinbo Chen, Michela Becchi, and Ziliang Zong.
-             2016. Evaluating the energy efﬁciency of deep con-
-             volutional neural networks on cpus and gpus.2016
-             IEEE International Conferences on Big Data and
-             Cloud Computing (BDCloud), Social Computing
-             and Networking (SocialCom), Sustainable Comput-
-             ing and Communications (SustainCom) (BDCloud-
-             SocialCom-SustainCom), pages 477–484.
-           Thang Luong, Hieu Pham, and Christopher D. Man-
-             ning. 2015.Effective approaches to attention-based
-             neural machine translation. InProceedings of the
-             2015 Conference on Empirical Methods in Natural
-             Language Processing, pages 1412–1421. Associa-
-             tion for Computational Linguistics.
--- a/Corpus/Finite-Element
+++ b/Corpus/Finite-Element
@ -1,793 +0,0 @@
-                                                               IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005                                                                                                                                                                                                                                                                                                                                                                                                                1381
-     Finite-Element Neural Networks for Solving
-             Differential Equations
-    Pradeep Ramuhalli, Member, IEEE, Lalita Udpa, Senior Member, IEEE, and Satish S. Udpa, Fellow, IEEE
-
-   Abstract—The solution of partial differential equations (PDE)
-  arises in a wide variety of engineering problems. Solutions to most
-  practical problems use numerical analysis techniques such as ﬁ-
-  nite-element or ﬁnite-difference methods. The drawbacks of these
-  approaches include computational costs associated with the mod-
-  eling of complex geometries. This paper proposes a ﬁnite-element
-  neural network (FENN) obtained by embedding a ﬁnite-element
-  model in a neural network architecture that enables fast and ac-
-  curate solution of the forward problem. Results of applying the
-  FENN to severalsimpleelectromagnetic forward and inverseprob-
-  lems are presented. Initial results indicate that the FENN perfor-
-  mance as a forward model is comparable to that of the conven-
-  tional ﬁnite-element method (FEM). The FENN can also be used
-  in an iterative approach to solve inverse problems associated with Fig. 1. Iterative inversion method for solving inverse problems. the PDE. Results showing the ability of the FENN to solve the in-
-  verse problem given the measured signal are also presented. The
-  parallel nature of the FENN also makes it an attractive solution resulting in the corresponding solution to the forward problem
-  for parallel implementation in hardware and software.    . The model output is compared to the measurement ,
-   Index Terms—Finite-element method (FEM), ﬁnite-element using a cost function  .If  is less than a toler-
-  neural network (FENN), inverse problems.      ance, the estimateis used as the desired solution. If not,
-                    is updated to minimize the cost function.
-  S     I. I           Although ﬁnite-element methods (FEMs) [3], [4] are ex- NTRODUCTION       tremely popular for solving differential equations, their majorOLUTIONS of differential equations arise in a widedrawback is computational complexity. This problem becomesvariety of engineering applications in electromagnetics,more acute when three-dimensional (3-D) ﬁnite-elementsignal processing, computational ﬂuid dynamics, etc. Thesemodels are used in an iterative algorithm for solving the inverseequations are typically solved using either analytical or numer-problem. Recently, several authors have suggested the use ofical methods. Analytical solution methods are however feasibleneural networks (MLP or RBF networks [5]) for solving differ-only for simple geometries, which limits their applicability. Inential equations [6]–[9]. In these techniques, a neural networkmost practical problems with complex boundary conditions,is trained using a large database containing the input data andnumerical analysis methods are required in order to obtain athe solution of the differential equation. The neural networkreasonable solution. An example is the solution of Maxwell’sduring generalization learns the mapping corresponding toequations in electromagnetics. Solutions to Maxwell’s equa-the PDE. Alternatively, in [10], the solution to a differentialtions are used in a variety of applications for calculating theequation is written as a constant term, and an adjustable term interaction of electromagnetic (EM) ﬁelds with different typeswith parameters that need to be determined. A neural networkof media.               is used to determine the optimal values of the parameters.Very often, the solution to differential equations is necessaryThis approach is applicable only to problems with regularfor solving the corresponding inverse problems. Inverse prob-boundaries. An extension of the approach to problems withlems in general are ill-posed, lacking continuous dependence ofirregular boundaries is given in [11]. Other neural networkthe measurements on the input. This has resulted in the devel-based differential equation solvers use multilayer perceptronopment of a variety of solution techniques ranging from simplenetworks or variations on the MLP to approximate the unknowncalibration procedures to other direct (analytical) and iterativefunction in a PDE [12]–[14]. A combination of the PDE andapproaches [1]. Iterative methods typically employ a forwardboundary conditions is used to construct an objective functionmodel that simulates the underlying physical process (Fig. 1)that is minimized during the training process.[2]. An initial estimate of the solution of the inverse problem A major limitation of these approaches is that the network ar- (represented byin Fig. 1) is applied to the forward model,chitecture is selected somewhat arbitrarily. A second drawback
-                    is that the performance of the neural networks depends on the
-   Manuscript received January 17, 2004; revised April 2, 2005.    data used in training and testing. As long the test data is sim-
-   The authors are with the Department of Electrical and Computer Engi- ilar to the training data, the network can interpolate between the neering, Michigan State University, East Lansing, MI 48824 USA (e-mail: training data points to obtain a reasonable prediction. However, rpradeep@egr.msu.edu; udpal@egr.msu.edu; udpa@egr.msu.edu).
-   Digital Object Identiﬁer 10.1109/TNN.2005.857945      when the test signal is no longer similar to the training data, the
-                1045-9227/$20.00 © 2005 IEEE                                                                  1382                                                                                                                                                                                                                                                                                                                                                                                                                IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
-
-
-      network is forced to extrapolate and the performance degrades.  Section V draws conclusions from the results and presents
-      One way around this difﬁculty is to ensure that the training data- ideas for future work.
-      base has a diverse set of signals. However, this is difﬁcult to
-      ensure in practice. Alternatively, we have to design neural net-                  II. T HE FENN
-      works that are capable of extrapolation. Extrapolation methods   This section brieﬂy describes the FEM and proposes its refor-are discussed extensively in literature [15]–[18], but the design  mulation into a parallel neural network structure. Details aboutof an extrapolation neural network involves several issues par-  the FEM can be found in [3] and [4].ticularly for ensuring that the error in the network prediction
-      stays within reasonable bounds during the extrapolation proce-  A. The FEMdure.                                          Consider a typical boundary value problem with the gov-An ideal solution to this problem would be to combine the erning differential equationpower of numerical models with the computational speed of
-      neural networks, i.e., to embed a numerical model in a neural                                          (1)network structure. One suchﬁnite-element neural network
-      (FENN) formulation has been reported by Takeuchi and Kosugi  where  is a differential operator,  is the applied source or
-      [19]. This approach, based on error minimization, derives the forcing function, and is the unknown quantity. This differen-
-      neural network using the energy functional resulting from the tial equation can be solved in conjunction with boundary condi-
-      ﬁnite-element formulation. Other reports of FENN combina-  tionson theboundary enclosingthedomain .Thevariational
-      tions are either similar to the Takeuchi method [20], [21] or use  formulation used inﬁnite-element analysis determines the un-
-      Hopﬁeld neural networks to solve the forward problem [22],  known by minimizing the functional [3], [4]
-      [23]. Kalkkuhlet al.[24] provide a description of a FEM-based
-      approach to NARX modeling that may be interpreted both as                                          (2)
-      a local model network, as well as a single layer feedforward
-      network. A slightly different approach to merging numerical  with respect to the trial function . The minimization procedure
-      methods and neural networks is given in [25], where theﬁ-  starts by dividing  into  small subdomains called elements
-      nite-difference time domain (FDTD) method is cast in a neural (Fig. 2) and representing  in each element by means of basis
-      network framework for the purpose of solving electromagnetic  functions deﬁned over the element
-      forward problems. The related problem of mesh generation
-      inﬁnite-element models has also been tackled using neural                                          (3)networks (for instance, [26]). Generally, these networks are
-      designed to solve the forward problem, and must be modiﬁed
-      to solve inverse problems.                          where  is the unknown solution in element ,   is the basis
-        This paper proposes a new approach that embeds aﬁnite-ele-  function associated with node in element ,  is the value
-      ment model commonly used in the solution of differential equa-  of the unknown quantity at node and is the total number of
-      tions in a neural network. The network, called the FENN, can  nodes associated with element . In general, the basis functions
-      solve the forward problem and can also be used in an itera-  (also referred to as interpolation functions or shape functions)
-      tive algorithm to solve inverse problems. The primary advan- can be linear, quadratic, or of higher order. Typically,ﬁnite-el-
-      tage of this approach is that the FEM is represented in a parallel ement models use either linear or polynomial spline basis func-
-      form. Thus, it has the potential to alleviate the computational  tions.
-      cost associated with using the FEM in an iterative algorithm   The functional within an element is expressed as
-      for solving inverse problems. More importantly, the FENN does
-      not need any training, and the computation of the weights is                                          (4)
-      a one-time process. The proposed approach is also different in
-      that the neural network architecture developed can be used to
-      solve the forward and inverse problems. The structure of the By substituting (3) in (4), we obtain the discrete version of the
-      neural network is also simpler than those reported in the litera-  functional within each element
-      ture, making it easier to implement in parallel in both hardware                                          (5)and software.
-        The rest of this paper is organized as follows. Section II  where     is the transpose of a matrix,   is the    ele-brieﬂy describes the FEM, and derives the proposed FENN. In  mental matrix with elements this paper, we focus on the problem of solving typical equa-
-      tions encountered in electromagnetic nondestructive evaluation                                          (6)(NDE). However, the same concepts can be easily applied
-      to solve differential equations encountered in otherﬁelds.
-      Sections III, IV and V present the application of the FENN  and  is an    vector with elements
-      to solving forward and inverse problems, along with initial
-      results. A discussion of the advantages and disadvantages of                                          (7)
-      the proposed FENN architecture is given in Section IV. Finally,                                                               RAMUHALLI   et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1383
-
-
-      Combining the values in (5) for each of the elements
-
-                                              (8)
-
-      where  is the      global matrix derived from the terms
-      of the elemental matrices for different elements, and  is the
-      total number of nodes.  , also called the stiffness matrix, is a
-      sparse, banded matrix. Equation (8) is the discrete version of
-      the functional and can be minimized with respect to the nodal
-      parameters by taking the derivative of with respect to and
-      setting it equal to zero, which results in the matrix equation    Fig.2. (a)Schematicrepresentationofdomainandboundary. (b)SampleFEM
-                                                  mesh for the domain.
-                                              (9)
-
-        Boundary conditions for these problems are usually of two
-      types: natural boundary conditions and essential boundary
-      conditions. Essential boundary conditions (also referred to as
-      Dirichlet boundary conditions) impose constraints on the value
-      of the unknown  at several nodes. Natural boundary condi-
-      tions (of which Neumann boundary conditions are a special
-      case) impose constraints on the change in across a boundary.
-      Dirichlet boundary conditions are imposed on the functional
-      minimization (9), by deleting the rows and columns of the
-      matrix corresponding to the nodes on the Dirichlet boundary
-         and modifying  in (9).                         Fig. 3. FEM domain discretization using two elements and four nodes.
-        Natural boundary conditions are applied in the FEM by
-      adding an additional term to the functional. These boundary  This process ensures that natural boundary conditions are im-conditions are then incorporated into the functional and are  plicitlyandautomatically satisﬁedduring theFEMsolutionpro-satisﬁed automatically during the solution procedure. As an  cedure.example, consider the natural boundary condition represented
-      by the following equation [3]                        B. The FENN
-                               on            (10)   This section describes how theﬁnite-element model can be
-                                                  converted intoa parallel network form. Wefocus on solving typ-
-      where   represents the Neumann boundary,  is its outward  ical inverse problems arising in electromagnetic NDE, but the
-      normal unit vector,  is some constant, and , , and  are basicideaisapplicabletootherareas aswell.NDEinverseprob-
-      known parameters associated with the boundary. Assuming that lems can be formulated as the problem ofﬁnding the material
-      the boundary   is made up of   segments, we can deﬁne properties (such as the conductivity or the permeability) within
-      boundary matrices   and  with elements              the domain of the problem. Since the domain is discretized in
-                                                  the FEM method by a large number of elements, the problem
-                                                  can be posed as one ofﬁnding the material properties in each
-                                                  of these elements. These properties are usually embedded in the
-                                                  differential operator , or equivalently, in the global matrix .
-                                                  Thus, in order to be able to iteratively estimate these properties
-                                                  from the measurements, the material properties need to be sep-
-                                                  arated out from  . This separation is easier to achieve at the
-                                                  element matrix level. For nodes and in element
-                                             (11)
-
-      where   are basis functions deﬁned over segment and  is
-      the length of the segment. The elements of   are added to the
-      elementsof  that correspond tothe nodeson the boundary  .
-      Similarly, the elements of   are added to the corresponding
-      elements of . The global matrix (9) is thus modiﬁed as follows
-      before solving for                                                                       (13)
-
-                                                  where   is the parameter representing the material property(12)  in element  and  represents the differential operator at the                                                                  1384                                                                                                                                                                                                                                                                                                                                                                                                                IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-        Fig. 4. FENN.
-
-
-        element level without   embedded in it. Substituting (13) into  neurons, corresponding to the    members of the global ma-
-        the functional, we get                                    trix  . The output of each group of hidden layer neurons is the
-                                                               corresponding row vector of  . The weights from the input to
-                                                               the hidden layer are set to the appropriate values of   . Each(14)  neuron in the hidden layer acts as a summation unit, (equivalent
-                                                               toasummationfollowedbyalinearactivationfunction[5]).The
-        If we deﬁne                                            outputs of the hidden layer neurons are the elements    of the
-                                                               global matrix   as given in (15).
-                                                         (15)    Each group of hidden neurons is connected to one output
-                                                               neuron (giving a total of  output neurons) by a set of weights
-                                                                , with each element of  representing the nodal values  .where                                                 Note that the set of weights  between theﬁrst group of hidden
-                                                               neurons and theﬁrst output neuron are the same as the set of(16)else                                   weights between the second group of hidden neurons and the
-                                                               second output neuron (as well as between successive groups
-                                                               of hidden neurons and the corresponding output neuron). Each
-                                                               output neuron is also a summation unit followed by a linear ac-
-                                                               tivation function, and the output of each neuron is equal to  :
-
-
-                                                                                                                (18)
-                                                         (17)
-
-                                                               where the second part of (18) is obtained by using (15). As an
-        Equation (17) expresses the functional explicitly in terms of  .  example, the FENN architecture for a two-element, four-node
-        The assumption that   is constant within each element is im-                 FEM mesh (Fig. 3) is shown in Fig. 4. In this
-        plicit in this expression. This assumption is usually satisﬁed in  case, the FENN has two input neurons, 16 hidden layer neurons
-        problems in NDE where each element in the FEM mesh is de-  and four output neurons. Theﬁgure illustrates the grouping of
-        ﬁned within the conﬁnes of a domain, and at no time does a  the hidden layer neurons, as well as the similarity inherent in
-        single element cross domain boundaries. Furthermore, each el-  the weights that connect each group of hidden layer neurons
-        ement is small enough that minor variations in   within an el-  to the corresponding output neuron. To simplify theﬁgure, the
-        ement may be ignored. Equation (17) can be easily converted  weights between the network input and hidden layer neurons
-        into a parallel network form. The neural network comprises an  are depicted by means of vectors                      (for
-        input, output and hidden layer. In the general case with   el-       , 2, 3, 4 and     , 2), where the individual weight values
-        ements and   nodes in the FEM mesh, the input layer with      are deﬁned as in (16).
-           network inputs takes the  values in each element as input.    1) Boundary Conditions in the FENN: Note that the ele-
-        The hidden layer has    neurons 1 arranged in   groups of    ments of   and   in (11) do not depend on the material prop-
-         1                                                    erties .   and   need to be added appropriately to the global In this paper, we use the term“neurons”in the FENN (in the hidden and
-        output layers) to avoid confusion with the nodes in aﬁnite-element mesh.     matrix   and the source vector  as shown in (12). Equation                                                               RAMUHALLI   et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1385
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-       Fig. 5. Geometry of mesh for 1-D FEM.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-       Fig. 6. Flowchart (with example) for designing the FENN for a general PDE.
-
-
-       (12) thus implies that natural boundary conditions can be ap-  layer neurons. These weights will be referred to as the clamped
-       plied in the FENN as bias inputs to the hidden layer neurons  weights, while the remaining weights will be referred to as the
-       that are a part of the boundary, and the corresponding output  free weights. An example of these weights is presented later.
-       neurons. Dirichlet boundary conditions are applied by clamping    The FENN architecture was derived without consideration of
-       the corresponding weights between the hidden layer and output  the dimensionality of the problem at hand, and thus can be used                                                                  1386                                                                                                                                                                                                                                                                                                                                                                                                                IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
-
-
-      for 1-, 2-, 3-, or higher dimensional problems. The number of
-      nodes and elements in the FEM mesh dictates the number of
-      neurons in the different layers. The weights between the input
-      and hidden layer change depending on node-element connec-
-      tivity information.
-        The major drawback of the FENN is the number of neurons
-      and weights necessary. However, the memory requirements can
-      be reduced considerably, since most of the weights between the
-      input and hidden layer are zero. These weights, and the corre-
-      sponding connections, can be discarded. Similarly, most of the Fig. 7. Shielded microstrip geometry. (a) Complete problem description. (b)
-      elements of the  matrix are also zero (  is a banded ma-  Problem description using symmetry considerations.
-      trix). The corresponding neurons in the hidden layer can also
-      be discarded, reducing memory and computation requirements   The network implementation of (23) can be derived as fol-
-      considerably. Furthermore, the weights between each group of  lows. If  and  values at each element are the inputs to the
-      hidden layer neurons and the output layer are the same   .  network,   ,      ,   , and      form the weights
-      Weight-sharing approaches can be used here to further reduce  between the input and hidden layers. The network thus uses
-      the storage requirements.                           inputneuronsand  hiddenneurons.Thevaluesof ateachof
-                                                  thenodesareassigned asweightsbetweenthehidden andoutput
-      C. A 1-D Example                               layers, and the source   is the desired output of this network
-        Consider the 1-D equation                        (corresponding to the  output neurons). Dirichlet boundary
-                                                  conditions on are applied as explained earlier.
-
-                                             (19)  D. General Case
-                                                    Fig. 6 shows aﬂowchart of the general scheme for convertingboundary conditions       on the boundary deﬁned by .  a differential equation into the FENN structure. An exampleand  are constants depending on the material and  is the in two dimensions is also provided next to theﬂowchart. Weapplied source. Laplace’s equation and Poisson’s equation are  start with the differential equation and the boundary conditionsspecial cases of this equation. The FENN formulation for this and formulate the FEM using the variational method. This in-problem starts by discretizing the domain of interest with  el-  volves discretizing the domain of interest with  elements andements and  nodes. In one dimension, each element is deﬁned    nodes, selecting basis functions, writing the functional forby two nodes (Fig. 5). Deﬁne basis functions   and   over  each element and obtaining the element matrices and the sourceeach element and let  is the value of on node in element  vector. The example presented uses the FEM mesh shown in. An example of the basis functions is shown in Fig. 5.      Fig. 3, with      elements, and      nodes, and linearFor these basis functions, i.e.,                      basis functions. The unknown solution to the differential equa-
-                                                  tion   is represented by its values at each of the nodes in the(20)  ﬁnite-element mesh   . The element matrices   are then
-                                                  separated into two parts, with one part dependent on the mate-the element matrices are given by [3]                   rial properties and while the other is independent of them.
-                                                    The FENN is then designed to have   input neurons,
-                                                  hidden neurons, and  output neurons, where is the number
-                                                  of material property parameters. In the example under consid-
-                                                  eration,    , since we have two material property parameters(21)  ( and ). Theﬁrst group of  input neurons takes in the
-                                                  values while the second group takes in the values in each ele-
-                                                  ment. The weights from the input to the hidden layer are set to
-                                                  the appropriate values of  . In the example, since nodes 1, 2,
-                                             (22)  and 3 are part of element 1 (see Fig. 3), the weights from theﬁrst
-                                                  input node   to theﬁrst group of four neurons in the hidden
-      Here,  is the length of element . The global matrix  is then layer are given by
-      constructed by selectively adding the element matrices based
-      on the nodes that form an element. Speciﬁcally,  is a sparse
-      tridiagonal matrix, and its nonzero elements are given by                                             (24)
-
-                                                  The last weight is zero since node 4 is not a part of element 1.
-                                                    Each group of hidden neurons is connected to one output
-                                                  neuron (giving a total of  output neurons) by a set of weights
-                                                   , with each element of representing the nodal values  . The
-                                             (23)  output of each neuron in the output layer is equal to .                                                               RAMUHALLI   et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1387
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-       Fig. 8. Forward problem solutions for shielded microstrip problem show the contours of constant potential for: (a) FEM solution and (b) FENN solution. (c) Error
-       between (a) and (b). Thex- andy-axes show the nodes in the FEM discretization of the domain, and thez-axis in (c) shows the error at each of these nodes in volts.
-
-
-
-        III. F ORWARD AND INVERSE PROBLEM FORMULATION USING   where     is the output of the FENN. Then, for a gradient-
-                               FENN                          based approach, the gradients of the error with respect to the
-                                                              free hidden layer weights is given by
-
-          The FENN architecture and algorithm lends itself to solving                                                   (27)both the forward and inverse problems. The forward problem
-       involves determining the weights  given the material parame-  Equation (27) can be used to solve the forward problem. Sim-ters  and  and the applied source  while the inverse problem  ilarly, to solve the inverse problem, the gradients of the errorinvolves determining  and  given  and . Any optimization  with respect to  and  (input of the FENN) are necessary, andapproach can be used to solve both these problems. Suppose we  are given bydeﬁne the error at the output of the FENN as
-
-
-
-
-                                                                                                                (28)
-
-
-
-
-                                                         (26)                                                   (29)                                                                  1388                                                                                                                                                                                                                                                                                                                                                                                                                IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
-
-
-
-                                                          TABLE I
-                                   SUMMARY OF PERFORMANCE OF THE FENN A LGORITHM FOR VARIOUS PDE S
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-        For the forward problem, such an approach is equivalent to the  Dirichlet boundary, with      on the microstrip and     on
-        iterative approaches used to solve for the unknown nodal values  the outer boundary [Fig. 7(b)]. Finally, there is no source term
-        in the FEM [4].                                         in this example (the source term would correspond to a charge
-                                                               distribution in the domain of interest), i.e.,      . In this ex-
-                             IV. R ESULTS                       ample, we assume that        volts and      . Further, we
-                                                               assume that the domain of interest is                  .A. Forward Model Results                                  The solution to the forward problem is presented in Fig. 8,
-          The FENN was tested using both 1- and 2-D versions of  with the FEM solution using 11 nodes in each direction shown
-        Poisson’s equation                                       in Fig. 8(a) and the corresponding FENN solution in Fig. 8(b).
-
-                                                         (30)  Theseﬁgures show contours of constant potential. The error be-
-                                                               tween the FEM and FENN solutions is presented in Fig. 8(c). As
-        where  represents the material property, and  is the applied  seen from theﬁgure, the FENN is seen to match the FEM solu-
-        source. For instance, in electromagnetics  may represent the  tion accurately, with the peak error at any node on the order of
-        permittivity while  represents the charge density.                  .
-          As theﬁrst example, consider the following 2-D equation       Several other examples were also used to test the FENN and
-                                                               the results are summarized in Table I. Column 1 shows the
-                                                         (31)  PDE used to evaluate the FENN performance, while column 2
-                                                               shows the boundary conditions used. The analytic solution to
-        with boundary conditions                                 the problem is indicated in Column 3. The FENN structure and
-
-                                  on                    (32)  the number of iterations for convergence using a gradient de-
-                                                               scent approach are indicated in Columns 4 and 5, respectively.
-        and                                                   The FENN structure, as explained earlier, has    inputs,
-                                                               hidden neurons and  output neurons, where   and  are the
-                                               on       (33)  number of elements and nodes in the FEM mesh, respectively,
-                                                               and  is the number of hidden neurons, and corresponds to the
-        This is the governing equation for the shielded microstrip trans-  number of nonzero elements in the FEM global matrix  . Fi-
-        mission line problem shown in Fig. 7. The forward problem  nally, Columns 6 and 7 present the sum-squared error (SSE) and
-        computes the electric potential due to the shielded microstrip  the maximum error in the solution, respectively, where the er-
-        shown in Fig. 7(a). The potentials are zero on the shielding con-  rors are computed with respect to the analytical solution. These
-        ductor.Sincethegeometryissymmetric,wecansolvetheequiv-  results indicate that the FENN is capable of accurately deter-
-        alent problem shown in Fig. 7(b), by applying the homogeneous  mining the potential . One advantage of the FENN approach
-        Neumann condition on the plane of symmetry. The inner con-  is that the computation of the input-hidden layer weights is a
-        ductor (microstrip) is held at a constant potential of   volts.  one-time process, as long as the differential equation does not
-        Finally, we also assume that the material inside the shielding  change. The only changes necessary to solve the different prob-
-        conductor has a permittivity      , where K is a constant. The  lems are changes in the input    and the desired output   .
-        permittivity in this case corresponds to the material property .
-        Speciﬁcally,            and     . The homogeneous Neu-  B. Inverse Model Results
-        mann boundary condition is equivalent to setting          .    TheFENNwasalsousedtosolveseveralsimpleinverseprob-
-        The microstrip and the shielding conductor correspond to the  lems based on (30). In all cases, the objective was to determine                                                               RAMUHALLI   et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1389
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-       Fig. 9. FENN inversion results for Poisson’s equation with initial solutions (a)  = x . (b)  =1+   x .
-
-
-       the value of  and  for given values of  and . Theﬁrst ex-    In order to obtain a unique solution, we need to constrain the
-       ample is a 1-D problem that involves determining  given       value of  at the boundary as well. Consider the same differen-
-       and     ,         for the differential equation             tial equation as (34), but with  and  speciﬁed as follows:
-
-                                                         (34)                           and
-
-       with boundary conditions         and        . The analyt-                                                   (36)
-       ical solution to this inverse problem is                      The analytical solution for this equation is              .To
-                                       and              (35)  solve this problem, we set       and clamp the value of  at
-       As seen from (35), the problem has an inﬁnite number of solu-       and     as follows:          ,                 .
-       tions and we expect the solution procedure to converge to one    The results of the constrained inversion obtained using 11
-       of these solutions depending on the initial value.              nodes and 10 elements in the correspondingﬁnite-element mesh
-          Fig. 9(a) and (b) shows two solutions to this inverse problem  are shown in Fig. 10. Fig. 10(a) shows the comparison between
-       for two different initializations (shown using triangles). In both  the analytical solution (solid line with squares) and the FENN
-       cases, the FENN solution (in stars) is seen to match the analyt-  result (solid line with stars). The initial value of  is shown in
-       ical solution (squares). The SSE in both cases was on the order  theﬁgure as a dashed line. Fig. 10(b) shows the comparison
-       of     .                                              between the actual and desired forcing function  at the FENN                                                                  1390                                                                                                                                                                                                                                                                                                                                                                                                                IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-        Fig. 10. Constrained inversion result with eleven nodes. (a) Comparison of analytic and simulation results for  . (b) Comparison of actual and desired NN outputs.
-
-
-        output. This result indicates that the SSE in the forcing function,  weight structure that allows both the forward and inverse prob-
-        as well as the SSE in the inversion result, is fairly large (0.0148  lemstobesolvedusingsimplegradient-basedalgorithms.Initial
-        and 0.0197, respectively). The reason for this was traced back  results indicate that the proposed FENN algorithm is capable of
-        to the mesh discretization. Fig. 11 shows the SSE in the output  accurately solving both the forward and inverse problems. In
-        of the FENN and the SSE in the inverse problem solution as a  addition, the forward problem solution from the FENN is seen
-        function of FEM discretization. It is seen that increasing the dis-  to exactly match the FEM solution, indicating that the FENN
-        cretization signiﬁcantly improves the solution. Similar results  represents theﬁnite-element model exactly in a parallel conﬁg-
-        were observed for other problems.                          uration.
-                                                                 The major advantage of the FENN is that it represents the
-                                                               ﬁnite-element model in a parallel form, enabling parallel imple-
-                    V. D ISCUSSION AND CONCLUSION              mentation in either hardware or software. Further, computing
-                                                               gradients in the FENN is very simple. This is an advantage in
-          The FENN is closely related to theﬁnite-element model used  solving bothforward and inverse problems using gradient-based
-        to solve differential equations. The FENN architecture has a  methods. The gradients can also be computed in parallel and                                                               RAMUHALLI   et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1391
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-       Fig. 11. SSE in FENN output and inversion results as a function of discretization.
-
-
-       the lack of nonlinearities in the neuron activation functions    [6] C. A. Jensenet al.,“Inversion of feedforward neural networks: algo-
-       makes the computation of gradients simpler. A major advantage       rithms and applications,”Proc. IEEE, vol. 87, no. 9, pp. 1536–1549,
-       of this approach for solving inverse problems is that it avoids       1999.
-                                                                [7] P. Ramuhalli, L. Udpa, and S. Udpa,“Neural networkalgorithm for elec-
-       inverting the global matrix in each iteration. The FENN also       tromagnetic NDE signal inversion,”inENDE 2000, Budapest, Hungary,
-       does not require any training, since most of its weights can be       Jun. 2000.
-       computed in advance and stored. The weights depend on the    [8] C. H. Barbosa, A. C. Bruno, M. Vellasco, M. Pacheco, J. P. Wikswo Jr.,
-                                                                   and A. P. Ewing,“Automation of SQUID nondestructive evaluation of
-       governing differential equation and its associated boundary       steel plates by neural networks,”IEEE Trans. Appl. Supercond., vol. 9,
-       conditions, and as long as these two factors do not change,       no. 2, pp. 3475–3478, 1999.
-       the weights do not change. This is especially an advantage    [9] W.Qing, S. Xueqin,Y.Qingxin,and Y.Weili,“Usingwaveletneural net-
-                                                                   works for the optimal design of electromagnetic devices,”IEEE Trans.
-       in solving inverse problems in electromagnetic NDE. This       Magn., vol. 33, no. 2, pp. 1928–1930, 1997.
-       approach also reduces the computational effort associated with   [10] I. E. Lagaris, A. C. Likas, and D. I. Fotiadis,“Artiﬁcial neural networks
-       the network.                                                 for solving ordinary and partial differential equations,”IEEE Trans.
-                                                                   Neural Netw., vol. 9, no. 5, pp. 987–1000, 1998.
-          Future work will concentrate on applying the FENN to 3-D   [11] I. E. Lagaris, A. C. Likas, and D. G. Papageorgiou,“Neural-network
-       electromagnetic NDE problems. The robustness of the approach       methods for boundary value problems with irregular boundaries,”IEEE
-       will also be tested, since the ability of these approaches to in-       Trans. Neural Netw., vol. 11, no. 5, pp. 1041–1049, 2000.
-                                                                [12] B. P. Van Milligen, V. Tribaldos, and J. A. Jimenez,“Neural network
-       vert practical noisy measurements is important. Furthermore,       differential equation and plasma equilibrium solver,”Phys. Rev. Lett.,
-       the use of better optimization algorithms, like conjugate gra-       vol. 75, no. 20, pp. 3594–3597, 1995.
-       dient methods, is expected to improve the solution speed. In ad-   [13] M. W. M. G. Dissanayake and N. Phan-Thien,“Neural-network-based
-                                                                   approximations for solving partial differential equations,”Commun.
-       dition, parallel implementation of the FENN in both hardware       Numer. Meth. Eng., vol. 10, pp. 195–201, 1994.
-       and software is under investigation. The approach described in   [14] R. Masuoka,“Neural networks learning differential data,”IEICE Trans.
-       this paper is very general in that it can be applied to a variety       Inform. Syst., vol. E83-D, no. 6, pp. 1291–1300, 2000.
-                                                                [15] D.C.Youla,“Generalizedimagerestorationbythemethodofalternating
-       of inverse problems inﬁelds other than electromagnetic NDE.       orthogonal projections,”IEEE Trans. Circuits Syst., vol. CAS-25, no. 9,
-       Some of these other applications will also be investigated to       pp. 694–702, 1978.
-       show the general nature of the proposed method.               [16] D. C. Youla and H. Webb,“Image restoration by the method of convex
-                                                                   projections: part I—theory,”IEEE Trans. Med. Imag., vol. MI-1, no. 2,
-                                                                   pp. 81–94, 1982.
-                            REFERENCES                        [17] A. Lent and H. Tuy,“An iterative method for the extrapolation of band-
-                                                                   limitedfunctions,”J.Math.AnalysisandApplicat.,vol.83, pp.554–565,
-         [1] L. Udpa and S. S. Udpa,“Application of signal processing and pattern       1981.
-            recognition techniques to inverse problems in NDE,”Int. J. Appl. Elec-   [18] W. Chen,“A new extrapolation algorithm for band-limited signals using
-            tromagn. Mechan., vol. 8, pp. 99–117, 1997.                         the regularization method,”IEEE Trans. Signal Process., vol. 41, no. 3,
-         [2] M. Yan, M. Afzal, S. Udpa, S. Mandayam, Y. Sun, L. Udpa, and P.       pp. 1048–1060, 1993.
-            Sacks,“Iterative algorithms for electromagnetic NDE signal inversion,”   [19] J. Takeuchi and Y. Kosugi,“Neural network representation of theﬁnite
-            inENDE ’97, Reggio Calabria, Italy, Sep. 14–16, 1997.                 element method,”Neural Netw., vol. 7, no. 2, pp. 389–395, 1994.
-         [3] J. Jin,The Finite Element Method in Electromagnetics. New York:   [20] R. Sikora, J. Sikora, E. Cardelli, and T. Chady,“Artiﬁcial neural net-
-            Wiley, 1993.                                              work application for material evaluation by electromagnetic methods,”
-         [4] P. Zhou,Numerical Analysis of Electromagnetic Fields. Berlin, Ger-       inProc. Int. Joint Conf. Neural Networks, vol. 6, 1999, pp. 4027–4032.
-            many: Springer-Verlag, 1993.                               [21] G. Xu, G. Littlefair, R. Penson, and R. Callan,“Application of FE-based
-         [5] S. Haykin,Neural Networks: A Comprehensive Foundation. Upper       neural networks to dynamic problems,”inProc. Int. Conf. Neural Infor-
-            Saddle River, NJ: Prentice-Hall, 1994.                             mation Processing, vol. 3, 1999, pp. 1039–1044.                                                                  1392                                                                                                                                                                                                                                                                                                                                                                                                                IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
-
-
-
-         [22] F. Guo, P. Zhang, F. Wang, X. Ma, and G. Qiu,“Finite element anal-                    Lalita Udpa (S’84–M’86–SM’96) received the
-             ysis-based Hopﬁeld neural network model for solving nonlinear elec-                    Ph.D. degree in electrical engineering from Col-
-             tromagneticﬁeld problems,”inProc. Int. Joint Conf. Neural Networks,                    orado State University, Fort Collins, in 1986.
-             vol. 6, 1999, pp. 4399–4403.                                                 She is currently a Professor with the Department
-         [23] H. Lee and I. S. Kang,“Neural algorithm for solving differential equa-                    of Electrical and Computer Engineering, Michigan
-             tions,”J. Computat. Phys., vol. 91, pp. 110–131, 1990.                              State University, East Lansing. She works primarily
-         [24] J. Kalkkuhl, K. J. Hunt, and H. Fritz,“FEM-based neural-network                    in the broad areas of nondestructive evaluation,
-             approach to nonlinear modeling with application to longitudinal vehicle                    signal processing, and biomedical applications. Her
-             dynamics control,”IEEE Trans. Neural Netw., vol. 10, no. 4, pp.                    research interests include various aspects of NDE,
-             885–897, 1999.                                                         such as development of computational models for
-         [25] R. K. Mishra and P. S. Hall,“NFDTD concept,”IEEE Trans. Neural                    the forward problem in NDE, signal and image pro-
-             Netw., vol. 16, no. 2, pp. 484–490, 2005.                      cessing, pattern recognition and neural networks, and development of solution
-         [26] D. G. Triantafyllidis and D. P. Labridis,“Aﬁnite-element mesh gener-  techniques for inverse problems. Her current projects includeﬁnite-element
-             ator based on growing neural networks,”IEEE Trans. Neural Netw., vol.  modeling of electromagnetic NDE phenomena, application of neural network
-             13, no. 6, pp. 1482–1496, 2002.                            and signal processing algorithms to NDE data, and development of image
-                                                               processing techniques for the analysis of NDE and biomedical images.
-                                                                Dr. Udpa is a Member of Eta Kappa Nu and Sigma Xi.
-
-
-
-                                                                                Satish S. Udpa(S’82–M’82–SM’91–F’03) received
-                                                                                the B.Tech. degree in 1975 and the Post Graduate
-                                                                                Diplomainelectricalengineeringin1977fromJ.N.T.
-                                                                                University, Hyderabad, India. He received the M.S.
-                                                                                degree in 1980 and the Ph.D. degree in electrical en-
-                                                                                gineering in 1983, both from Colorado State Univer-
-                                                                                sity, Fort Collins.
-                                                                                  He has been with Michigan State University, East
-                                                                                Lansing, since 2001 and is currently Acting Dean for
-                                                                                the College of Engineering and a Professor with the
-                                                                                Electrical and Computer Engineering Department.
-                                                               Prior to joining Michigan State, he was a Professor with Iowa State University,
-                                                               Ames, from 1990 to 2001 and was associated with the Materials Assessment
-                                                               Research Group. Prior to joining Iowa State, he was an Associate Professor
-                                                               with the Department of Electrical Engineering at Colorado State University.
-                                                               His research interests span the broad area of materials characterization and
-                                                               nondestructive evaluation (NDE). Work done by him to date in the area includes
-                                                               an extensive repertoire of forward models for simulating physical processes
-                                                               underlying several inspection techniques. Coupled with careful experimental
-                         Pradeep Ramuhalli (S’92–M’02) received the  work, such forward models can be used for designing new sensors, optimizing
-                         B.Tech. degree from J.N.T. University, Hyderabad,  test conditions, estimating the probability of detection, assessing designs for
-                         India, in electronics and communications engi-  inspectability and training inverse models for characterizing defects. He has
-                         neering in 1995, and the M.S. and Ph.D. degrees in  also been involved in the development of system-, as well as model-based,
-                         electrical engineering from Iowa State University,  inverse solutions for defect and material property characterization. His interests
-                         Ames, in 1998 and 2002, respectively.           have expanded in recent years to include the development of noninvasive
-                           He is currently an Assistant Professor with the  tools for clinical applications. Work done to date in thisﬁeld includes the
-                         Department of Electrical and Computer Engi-  development of new electromagnetic-acoustic (EMAT) methods for detecting
-                         neering, Michigan State University, East Lansing.  single leg separation failures in artiﬁcial heart valves and microwave imaging
-                         His research is in the general area of nondestruc-  and ablation therapy systems. He and his research group have been engaged
-                         tive evaluation and materials characterization. His  in the design and development of high-performance instrumentation including
-        research interests include the application of signal and image processing  acoustic microscopes and single and multifrequency eddy current NDE instru-
-        methods, pattern recognition and neural networks for nondestructive evaluation  ments. These systems, as well as software packages embodying algorithms
-        applications, development of model-based solutions for inverse problems in  developed by Udpa for defect classiﬁcation and characterization, have been
-        NDE, and the development of information fusion algorithms for multimodal  licensed to industry.
-        data fusion.                                                He is a Fellow of the American Society for Nondestructive Testing (ASNT)
-         Dr. Ramuhalli is a Member of Phi Kappa Phi.                      and a Fellow of the Indian Society of Nondestructive Testing.