Updates to corpus

This commit is contained in:
Eduardo Cueto Mendoza 2020-08-07 18:30:32 -06:00
parent 93ffde18f6
commit 93b3da7a7d
5 changed files with 2253 additions and 1054 deletions

File diff suppressed because it is too large Load Diff

View File

@ -1,261 +0,0 @@
Energy and Policy Considerations for Deep Learning in NLP
Emma Strubell Ananya Ganesh Andrew McCallum
College of Information and Computer Sciences
University of Massachusetts Amherst
{strubell, aganesh, mccallum}@cs.umass.edu
Abstract Consumption CO 2 e (lbs)
Air travel, 1 passenger, NY↔SF 1984 Recent progress in hardware and methodol-
arXiv:1906.02243v1 [cs.CL] 5 Jun 2019 Human life, avg, 1 year 11,023 ogy for training neural networks has ushered
in a new generation of large networks trained American life, avg, 1 year 36,156
on abundant data. These models have ob- Car, avg incl. fuel, 1 lifetime 126,000
tained notable gains in accuracy across many
NLP tasks. However, these accuracy improve- Training one model (GPU)
ments depend on the availability of exception- NLP pipeline (parsing, SRL) 39 ally large computational resources that neces- w/ tuning & experimentation 78,468 sitate similarly substantial energy consump- Transformer (big) 192 tion. As a result these models are costly to
train and develop, both financially, due to the w/ neural architecture search 626,155
cost of hardware and electricity or cloud com- Table 1: Estimated COpute time, and environmentally,due to the car- 2 emissions from training com-
mon NLP models, compared to familiar consumption. 1 bon footprint required to fuel modern tensor
processing hardware. In this paper we bring
this issue to the attention of NLP researchers NLP models could be trained and developed on by quantifying the approximate financial and a commodity laptop or server, many now require environmental costs of training a variety of re-
cently successful neural network models for multiple instances of specialized hardware such as
NLP. Based on these findings, we propose ac- GPUs or TPUs, therefore limiting access to these
tionable recommendations to reduce costs and highly accurate models on the basis of finances.
improve equity in NLP research and practice. Even when these expensive computational re-
1 Introduction sources are available, model training also incurs a
substantial cost to the environment due to the en-
Advances in techniques and hardware for train- ergy required to power this hardware for weeks or
ing deep neural networks have recently en- months at a time. Though some of this energy may
abled impressive accuracy improvements across come from renewable or carbon credit-offset re-
many fundamental NLP tasks ( Bahdanau et al., sources, the high energy demands of these models
2015; Luong et al., 2015; Dozat and Man- are still a concern since (1) energy is not currently
ning, 2017; Vaswani et al., 2017), with the derived from carbon-neural sources in many loca-
most computationally-hungry models obtaining tions, and (2) when renewable energy is available,
the highest scores (Peters et al.,2018;Devlin et al., it is still limited to the equipment we have to pro-
2019;Radford et al.,2019;So et al.,2019). As duce and store it, and energy spent training a neu-
a result, training a state-of-the-art model now re- ral network might better be allocated to heating a
quires substantial computational resources which familys home. It is estimated that we must cut
demand considerable energy, along with the as- carbon emissions by half over the next decade to
sociated financial and environmental costs. Re- deter escalating rates of natural disaster, and based
search and development of new models multiplies on the estimated CO 2 emissions listed in Table 1,
these costs by thousands of times by requiring re-
training to experiment with model architectures 1 Sources: (1) Air travel and per-capita consump-
tion: https://bit.ly/2Hw0xWc; (2) car lifetime: and hyperparameters. Whereas a decade ago most https://bit.ly/2Qbr0w1. model training and development likely make up Consumer Renew. Gas Coal Nuc.
a substantial portion of the greenhouse gas emis- China 22% 3% 65% 4%
sions attributed to many NLP researchers. Germany 40% 7% 38% 13%
To heighten the awareness of the NLP commu- United States 17% 35% 27% 19%
nity to this issue and promote mindful practice and Amazon-AWS 17% 24% 30% 26%
policy, we characterize the dollar cost and carbon Google 56% 14% 15% 10%
emissions that result from training the neural net- Microsoft 32% 23% 31% 10%
works at the core of many state-of-the-art NLP
models. We do this by estimating the kilowatts Table 2: Percent energy sourced from: Renewable (e.g.
of energy required to train a variety of popular hydro, solar, wind), natural gas, coal and nuclear for
off-the-shelf NLP models, which can be converted the top 3 cloud compute providers (Cook et al.,2017),
to approximate carbon emissions and electricity compared to the United States, 4 China 5 and Germany
costs. To estimate the even greater resources re- (Burger,2019).
quired to transfer an existing model to a new task
or develop new models, we perform a case study We estimate the total time expected for mod-
of the full computational resources required for the els to train to completion using training times and
development and tuning of a recent state-of-the-art hardware reported in the original papers. We then
NLP pipeline (Strubell et al.,2018). We conclude calculate the power consumption in kilowatt-hours
with recommendations to the community based on (kWh) as follows. Letpc be the average power
our findings, namely: (1) Time to retrain and sen- draw (in watts) from all CPU sockets during train-
sitivity to hyperparameters should be reported for ing, letpr be the average power draw from all
NLP machine learning models; (2) academic re- DRAM (main memory) sockets, letpg be the aver-
searchers need equitable access to computational age power draw of a GPU during training, and let
resources; and (3) researchers should prioritize de- gbe the number of GPUs used to train. We esti-
veloping efficient models and hardware. mate total power consumption as combined GPU,
CPU and DRAM consumption, then multiply this
2 Methods by Power Usage Effectiveness (PUE), which ac-
counts for the additional energy required to sup-To quantify the computational and environmen- port the compute infrastructure (mainly cooling).tal cost of training deep neural network mod- We use a PUE coefficient of 1.58, the 2018 globalels for NLP, we perform an analysis of the en- average for data centers (Ascierto,2018). Then theergy required to train a variety of popular off- total powerpthe-shelf NLP models, as well as a case study of t required at a given instance during
training is given by:the complete sum of resources required to develop
LISA (Strubell et al.,2018), a state-of-the-art NLP 1.58t(pp c +pr +gp g )
model from EMNLP 2018, including all tuning t = (1)1000
and experimentation. The U.S. Environmental Protection Agency (EPA)We measure energy use as follows. We train the provides average COmodels described in§2.1using the default settings 2 produced (in pounds per
kilowatt-hour) for power consumed in the U.S.provided, and sample GPU and CPU power con- (EPA,2018), which we use to convert power tosumption during training. Each model was trained estimated COfor a maximum of 1 day. We train all models on 2 emissions:
a single NVIDIA Titan X GPU, with the excep- CO 2 e = 0.954pt (2)
tion of ELMo which was trained on 3 NVIDIA This conversion takes into account the relative pro-GTX 1080 Ti GPUs. While training, we repeat- portions of different energy sources (primarily nat-edly query the NVIDIA System Management In- ural gas, coal, nuclear and renewable) consumedterface 2 to sample the GPU power consumption to produce energy in the United States. Table2and report the average over all samples. To sample lists the relative energy sources for China, Ger-CPU power consumption, we use Intels Running many and the United States compared to the topAverage Power Limit interface. 3
5 U.S. Dept. of Energy:https://bit.ly/2JTbGnI
2 nvidia-smi:https://bit.ly/30sGEbi 5 China Electricity Council; trans. China Energy Portal:
3 RAPL power meter:https://bit.ly/2LObQhV https://bit.ly/2QHE5O3 three cloud service providers. The U.S. break- ence. Devlin et al.(2019) report that the BERT
down of energy is comparable to that of the most base model (110M parameters) was trained on 16
popular cloud compute service, Amazon Web Ser- TPU chips for 4 days (96 hours). NVIDIA reports
vices, so we believe this conversion to provide a that they can train a BERT model in 3.3 days (79.2
reasonable estimate of CO 2 emissions per kilowatt hours) using 4 DGX-2H servers, totaling 64 Tesla
hour of compute energy used. V100 GPUs (Forster et al.,2019).
GPT-2. This model is the latest edition of
2.1 Models OpenAIs GPT general-purpose token encoder,
We analyze four models, the computational re- also based on Transformer-style self-attention and
quirements of which we describe below. All mod- trained with a language modeling objective (Rad-
els have code freely available online, which we ford et al.,2019). By training a very large model
used out-of-the-box. For more details on the mod- on massive data,Radford et al.(2019) show high
els themselves, please refer to the original papers. zero-shot performance on question answering and
language modeling benchmarks. The large modelTransformer. The Transformer model (Vaswani described inRadford et al.(2019) has 1542M pa-et al.,2017) is an encoder-decoder architecture rameters and is reported to require 1 week (168primarily recognized for efficient and accurate ma- hours) of training on 32 TPUv3 chips. 6 chine translation. The encoder and decoder each
consist of 6 stacked layers of multi-head self-
attention. Vaswani et al.(2017) report that the 3 Related work
Transformerbasemodel (65M parameters) was
trained on 8 NVIDIA P100 GPUs for 12 hours, There is some precedent for work characterizing
and the Transformerbigmodel (213M parame- the computational requirements of training and in-
ters) was trained for 3.5 days (84 hours; 300k ference in modern neural network architectures in
steps). This model is also the basis for recent the computer vision community.Li et al.(2016)
work on neural architecture search (NAS) for ma- present a detailed study of the energy use required
chine translation and language modeling (So et al., for training and inference in popular convolutional
2019), and the NLP pipeline that we study in more models for image classification in computer vi-
detail in§4.2(Strubell et al.,2018). So et al. sion, including fine-grained analysis comparing
(2019) report that their full architecture search ran different neural network layer types. Canziani
for a total of 979M training steps, and that their et al.(2016) assess image classification model ac-
base model requires 10 hours to train for 300k curacy as a function of model size and gigaflops
steps on one TPUv2 core. This equates to 32,623 required during inference. They also measure av-
hours of TPU or 274,120 hours on 8 P100 GPUs. erage power draw required during inference on
GPUs as a function of batch size. Neither work an-ELMo. The ELMo model (Peters et al.,2018) alyzes the recurrent and self-attention models thatis based on stacked LSTMs and provides rich have become commonplace in NLP, nor do theyword representations in context by pre-training on extrapolate power to estimates of carbon and dol-a large amount of data using a language model- lar cost of training.ing objective. Replacing context-independent pre-
trained word embeddings with ELMo has been Analysis of hyperparameter tuning has been
shown to increase performance on downstream performed in the context of improved algorithms
tasks such as named entity recognition, semantic for hyperparameter search (Bergstra et al.,2011;
role labeling, and coreference.Peters et al.(2018) Bergstra and Bengio,2012;Snoek et al.,2012). To
report that ELMo was trained on 3 NVIDIA GTX our knowledge there exists to date no analysis of
1080 GPUs for 2 weeks (336 hours). the computation required for R&D and hyperpa-
rameter tuning of neural network models in NLP.BERT.The BERT model (Devlin et al.,2019) pro-
vides a Transformer-based architecture for build-
ing contextual representations similar to ELMo, 6 Via the authorson Reddit.
7 GPU lower bound computed using pre-emptible but trained with a different language modeling ob- P100/V100 U.S. resources priced at $0.43$0.74/hr, upper
jective. BERT substantially improves accuracy on bound uses on-demand U.S. resources priced at $1.46
tasks requiring sentence-level representations such $2.48/hr. We similarly use pre-emptible ($1.46/hr$2.40/hr)
and on-demand ($4.50/hr$8/hr) pricing as lower and upper as question answering and natural language infer- bounds for TPU v2/3; cheaper bulk contracts are available. Model Hardware Power (W) Hours kWh·PUE CO 2 e Cloud compute cost
Transformer base P100x8 1415.78 12 27 26 $41$140
Transformer big P100x8 1515.43 84 201 192 $289$981
ELMo P100x3 517.66 336 275 262 $433$1472
BERT base V100x64 12,041.51 79 1507 1438 $3751$12,571
BERT base TPUv2x16 — 96 — — $2074$6912
NAS P100x8 1515.43 274,120 656,347 626,155 $942,973$3,201,722
NAS TPUv2x1 — 32,623 — — $44,055$146,848
GPT-2 TPUv3x32 — 168 — — $12,902$43,008
Table 3: Estimated cost of training a model in terms of CO 2 emissions (lbs) and cloud compute cost (USD). 7 Power
and carbon footprint are omitted for TPUs due to lack of public information on power draw for this hardware.
4 Experimental results Estimated cost (USD)
Models Hours Cloud compute Electricity4.1 Cost of training 1 120 $52$175 $5Table3lists CO 2 emissions and estimated cost of 24 2880 $1238$4205 $118training the models described in§2.1. Of note is 4789 239,942 $103k$350k $9870that TPUs are more cost-efficient than GPUs on
workloads that make sense for that hardware (e.g. Table 4: Estimated cost in terms of cloud compute and
BERT). We also see that models emit substan- electricity for training: (1) a single model (2) a single
tial carbon emissions; training BERT on GPU is tune and (3) all models trained during R&D.
roughly equivalent to a trans-American flight.So
et al.(2019) report that NAS achieves a new state- about 60 GPUs running constantly throughout theof-the-art BLEU score of 29.7 for English to Ger- 6 month duration of the project. Table4lists upperman machine translation, an increase of just 0.1 and lower bounds of the estimated cost in termsBLEU at the cost of at least $150k in on-demand of Google Cloud compute and raw electricity re-compute time and non-trivial carbon emissions. quired to develop and deploy this model. 9 We see
that while training a single model is relatively in-4.2 Cost of development: Case study expensive, the cost of tuning a model for a newTo quantify the computational requirements of dataset, which we estimate here to require 24 jobs,R&D for a new model we study the logs of or performing the full R&D required to developall training required to develop Linguistically- this model, quickly becomes extremely expensive.Informed Self-Attention (Strubell et al.,2018), a
multi-task model that performs part-of-speech tag- 5 Conclusions
ging, labeled dependency parsing, predicate detec-
tion and semantic role labeling. This model makes Authors should report training time and
for an interesting case study as a representative sensitivity to hyperparameters.
NLP pipeline and as a Best Long Paper at EMNLP. Our experiments suggest that it would be benefi-
Model training associated with the project cial to directly compare different models to per-
spanned a period of 172 days (approx. 6 months). form a cost-benefit (accuracy) analysis. To ad-
During that time 123 small hyperparameter grid dress this, when proposing a model that is meant
searches were performed, resulting in 4789 jobs to be re-trained for downstream use, such as re-
in total. Jobs varied in length ranging from a min- training on a new domain or fine-tuning on a new
imum of 3 minutes, indicating a crash, to a maxi- task, authors should report training time and com-
mum of 9 days, with an average job length of 52 putational resources required, as well as model
hours. All training was done on a combination of sensitivity to hyperparameters. This will enable
NVIDIA Titan X (72%) and M40 (28%) GPUs. 8 direct comparison across models, allowing subse-
The sum GPU time required for the project quent consumers of these models to accurately as-
totaled 9998 days (27 years). This averages to sess whether the required computational resources
8 We approximate cloud compute cost using P100 pricing. 9 Based on average U.S cost of electricity of $0.12/kWh. are compatible with their setting. More explicit half the estimated cost to use on-demand cloud
characterization of tuning time could also reveal GPUs. Unlike money spent on cloud compute,
inconsistencies in time spent tuning baseline mod- however, that invested in centralized resources
els compared to proposed contributions. Realiz- would continue to pay off as resources are shared
ing this will require: (1) a standard, hardware- across many projects. A government-funded aca-
independent measurement of training time, such demic compute cloud would provide equitable ac-
as gigaflops required to convergence, and (2) a cess to all researchers.
standard measurement of model sensitivity to data
and hyperparameters, such as variance with re- Researchers should prioritize computationally
spect to hyperparameters searched. efficient hardware and algorithms.
We recommend a concerted effort by industry and
Academic researchers need equitable access to academia to promote research of more computa-
computation resources. tionally efficient algorithms, as well as hardware
that requires less energy. An effort can also beRecent advances in available compute come at a made in terms of software. There is already ahigh price not attainable to all who desire access. precedent for NLP software packages prioritizingMost of the models studied in this paper were de- efficient models. An additional avenue throughveloped outside academia; recent improvements in which NLP and machine learning software de-state-of-the-art accuracy are possible thanks to in- velopers could aid in reducing the energy asso-dustry access to large-scale compute. ciated with model tuning is by providing easy-Limiting this style of research to industry labs to-use APIs implementing more efficient alterna-hurts the NLP research community in many ways. tives to brute-force grid search for hyperparameterFirst, it stifles creativity. Researchers with good tuning, e.g. random or Bayesian hyperparameterideas but without access to large-scale compute search techniques (Bergstra et al.,2011;Bergstrawill simply not be able to execute their ideas, and Bengio,2012;Snoek et al.,2012). Whileinstead constrained to focus on different prob- software packages implementing these techniqueslems. Second, it prohibits certain types of re- do exist, 10 they are rarely employed in practicesearch on the basis of access to financial resources. for tuning NLP models. This is likely becauseThis even more deeply promotes the already prob- their interoperability with popular deep learninglematic “rich get richer” cycle of research fund- frameworks such as PyTorch and TensorFlow ising, where groups that are already successful and not optimized, i.e. there are not simple exam-thus well-funded tend to receive more funding ples of how to tune TensorFlow Estimators usingdue to their existing accomplishments. Third, the Bayesian search. Integrating these tools into theprohibitive start-up cost of building in-house re- workflows with which NLP researchers and practi-sources forces resource-poor groups to rely on tioners are already familiar could have notable im-cloud compute services such as AWS, Google pact on the cost of developing and tuning in NLP.Cloud and Microsoft Azure.
While these services provide valuable, flexi- Acknowledgements
ble, and often relatively environmentally friendly We are grateful to Sherief Farouk and the anony- compute resources, it is more cost effective for mous reviewers for helpful feedback on earlieracademic researchers, who often work for non- drafts. This work was supported in part by theprofit educational institutions and whose research Centers for Data Science and Intelligent Infor-is funded by government entities, to pool resources mation Retrieval, the Chan Zuckerberg Initiativeto build shared compute centers at the level of under the Scientific Knowledge Base Construc-funding agencies, such as the U.S. National Sci- tion project, the IBM Cognitive Horizons Networkence Foundation. For example, an off-the-shelf agreement no. W1668553, and National ScienceGPU server containing 8 NVIDIA 1080 Ti GPUs Foundation grant no. IIS-1514053. Any opinions,and supporting hardware can be purchased for findings and conclusions or recommendations ex-approximately $20,000 USD. At that cost, the pressed in this material are those of the authors andhardware required to develop the model in our do not necessarily reflect those of the sponsor.case study (approximately 58 GPUs for 172 days)
would cost $145,000 USD plus electricity, about 10 For example, theHyperopt Python library. References Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, and LukeRhonda Ascierto. 2018.Uptime Institute Global Data Zettlemoyer. 2018. Deep contextualized word rep-Center Survey. Technical report, Uptime Institute. resentations. InNAACL.
Dzmitry Bahdanau, KyunghyunCho, and Yoshua Ben-
gio. 2015. Neural Machine Translation by Jointly Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
Learning to Align and Translate. In3rd Inter- Dario Amodei, and Ilya Sutskever. 2019.Language
national Conference for Learning Representations models are unsupervised multitask learners.
(ICLR), San Diego, California, USA. Jasper Snoek, Hugo Larochelle, and Ryan P Adams.
James Bergstra and Yoshua Bengio. 2012. Random 2012. Practical bayesian optimization of machine
search for hyper-parameter optimization.Journal of learning algorithms. InAdvances in neural informa-
Machine Learning Research, 13(Feb):281305. tion processing systems, pages 29512959.
James S Bergstra, R´emi Bardenet, Yoshua Bengio, and David R. So, Chen Liang, and Quoc V. Le. 2019.
Bal´azs K´egl. 2011. Algorithms for hyper-parameter The evolved transformer. InProceedings of the
optimization. InAdvances in neural information 36th InternationalConference on Machine Learning
processing systems, pages 25462554. (ICML).
Bruno Burger. 2019.Net Public Electricity Generation Emma Strubell, Patrick Verga, Daniel Andor,
in Germany in 2018. Technical report, Fraunhofer David Weiss, and Andrew McCallum. 2018.
Institute for Solar Energy Systems ISE. Linguistically-Informed Self-Attention for Se-
mantic Role Labeling. InConference on Empir-Alfredo Canziani, Adam Paszke, and Eugenio Culur- ical Methods in Natural Language Processingciello. 2016. An analysis of deep neural network (EMNLP), Brussels, Belgium. models for practical applications .
Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobGary Cook, Jude Lee, Tamina Tsai, Ada Kongn, John Uszkoreit, Llion Jones, Aidan N Gomez, LukaszDeans, Brian Johnson, Elizabeth Jardim, and Brian Kaiser, and Illia Polosukhin. 2017. Attention is allJohnson. 2017. Clicking Clean: Who is winning you need. In31st Conference on Neural Informationthe race to build a green internet?Technical report, Processing Systems (NIPS).Greenpeace.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2019. BERT: Pre-training of
Deep Bidirectional Transformers for Language Un-
derstanding. InNAACL.
Timothy Dozat and Christopher D. Manning. 2017.
Deep biaffine attention for neural dependency pars-
ing. InICLR.
EPA. 2018. Emissions & Generation Resource Inte-
grated Database (eGRID). Technical report, U.S.
Environmental Protection Agency.
Christopher Forster, Thor Johnsen, Swetha Man-
dava, Sharath Turuvekere Sreenivas, Deyu Fu, Julie
Bernauer, Allison Gray, Sharan Chetlur, and Raul
Puri. 2019. BERT Meets GPUs. Technical report,
NVIDIA AI.
Da Li, Xinbo Chen, Michela Becchi, and Ziliang Zong.
2016. Evaluating the energy efficiency of deep con-
volutional neural networks on cpus and gpus.2016
IEEE International Conferences on Big Data and
Cloud Computing (BDCloud), Social Computing
and Networking (SocialCom), Sustainable Comput-
ing and Communications (SustainCom) (BDCloud-
SocialCom-SustainCom), pages 477484.
Thang Luong, Hieu Pham, and Christopher D. Man-
ning. 2015.Effective approaches to attention-based
neural machine translation. InProceedings of the
2015 Conference on Empirical Methods in Natural
Language Processing, pages 14121421. Associa-
tion for Computational Linguistics.

View File

@ -1,793 +0,0 @@
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005 1381
Finite-Element Neural Networks for Solving
Differential Equations
Pradeep Ramuhalli, Member, IEEE, Lalita Udpa, Senior Member, IEEE, and Satish S. Udpa, Fellow, IEEE
Abstract—The solution of partial differential equations (PDE)
arises in a wide variety of engineering problems. Solutions to most
practical problems use numerical analysis techniques such as fi-
nite-element or finite-difference methods. The drawbacks of these
approaches include computational costs associated with the mod-
eling of complex geometries. This paper proposes a finite-element
neural network (FENN) obtained by embedding a finite-element
model in a neural network architecture that enables fast and ac-
curate solution of the forward problem. Results of applying the
FENN to severalsimpleelectromagnetic forward and inverseprob-
lems are presented. Initial results indicate that the FENN perfor-
mance as a forward model is comparable to that of the conven-
tional finite-element method (FEM). The FENN can also be used
in an iterative approach to solve inverse problems associated with Fig. 1. Iterative inversion method for solving inverse problems. the PDE. Results showing the ability of the FENN to solve the in-
verse problem given the measured signal are also presented. The
parallel nature of the FENN also makes it an attractive solution resulting in the corresponding solution to the forward problem
for parallel implementation in hardware and software. . The model output is compared to the measurement ,
Index Terms—Finite-element method (FEM), finite-element using a cost function .If is less than a toler-
neural network (FENN), inverse problems. ance, the estimateis used as the desired solution. If not,
is updated to minimize the cost function.
S I. I Although finite-element methods (FEMs) [3], [4] are ex- NTRODUCTION tremely popular for solving differential equations, their majorOLUTIONS of differential equations arise in a widedrawback is computational complexity. This problem becomesvariety of engineering applications in electromagnetics,more acute when three-dimensional (3-D) finite-elementsignal processing, computational fluid dynamics, etc. Thesemodels are used in an iterative algorithm for solving the inverseequations are typically solved using either analytical or numer-problem. Recently, several authors have suggested the use ofical methods. Analytical solution methods are however feasibleneural networks (MLP or RBF networks [5]) for solving differ-only for simple geometries, which limits their applicability. Inential equations [6][9]. In these techniques, a neural networkmost practical problems with complex boundary conditions,is trained using a large database containing the input data andnumerical analysis methods are required in order to obtain athe solution of the differential equation. The neural networkreasonable solution. An example is the solution of Maxwellsduring generalization learns the mapping corresponding toequations in electromagnetics. Solutions to Maxwells equa-the PDE. Alternatively, in [10], the solution to a differentialtions are used in a variety of applications for calculating theequation is written as a constant term, and an adjustable term interaction of electromagnetic (EM) fields with different typeswith parameters that need to be determined. A neural networkof media. is used to determine the optimal values of the parameters.Very often, the solution to differential equations is necessaryThis approach is applicable only to problems with regularfor solving the corresponding inverse problems. Inverse prob-boundaries. An extension of the approach to problems withlems in general are ill-posed, lacking continuous dependence ofirregular boundaries is given in [11]. Other neural networkthe measurements on the input. This has resulted in the devel-based differential equation solvers use multilayer perceptronopment of a variety of solution techniques ranging from simplenetworks or variations on the MLP to approximate the unknowncalibration procedures to other direct (analytical) and iterativefunction in a PDE [12][14]. A combination of the PDE andapproaches [1]. Iterative methods typically employ a forwardboundary conditions is used to construct an objective functionmodel that simulates the underlying physical process (Fig. 1)that is minimized during the training process.[2]. An initial estimate of the solution of the inverse problem A major limitation of these approaches is that the network ar- (represented byin Fig. 1) is applied to the forward model,chitecture is selected somewhat arbitrarily. A second drawback
is that the performance of the neural networks depends on the
Manuscript received January 17, 2004; revised April 2, 2005. data used in training and testing. As long the test data is sim-
The authors are with the Department of Electrical and Computer Engi- ilar to the training data, the network can interpolate between the neering, Michigan State University, East Lansing, MI 48824 USA (e-mail: training data points to obtain a reasonable prediction. However, rpradeep@egr.msu.edu; udpal@egr.msu.edu; udpa@egr.msu.edu).
Digital Object Identifier 10.1109/TNN.2005.857945 when the test signal is no longer similar to the training data, the
1045-9227/$20.00 © 2005 IEEE 1382 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
network is forced to extrapolate and the performance degrades. Section V draws conclusions from the results and presents
One way around this difficulty is to ensure that the training data- ideas for future work.
base has a diverse set of signals. However, this is difficult to
ensure in practice. Alternatively, we have to design neural net- II. T HE FENN
works that are capable of extrapolation. Extrapolation methods This section briefly describes the FEM and proposes its refor-are discussed extensively in literature [15][18], but the design mulation into a parallel neural network structure. Details aboutof an extrapolation neural network involves several issues par- the FEM can be found in [3] and [4].ticularly for ensuring that the error in the network prediction
stays within reasonable bounds during the extrapolation proce- A. The FEMdure. Consider a typical boundary value problem with the gov-An ideal solution to this problem would be to combine the erning differential equationpower of numerical models with the computational speed of
neural networks, i.e., to embed a numerical model in a neural (1)network structure. One suchfinite-element neural network
(FENN) formulation has been reported by Takeuchi and Kosugi where is a differential operator, is the applied source or
[19]. This approach, based on error minimization, derives the forcing function, and is the unknown quantity. This differen-
neural network using the energy functional resulting from the tial equation can be solved in conjunction with boundary condi-
finite-element formulation. Other reports of FENN combina- tionson theboundary enclosingthedomain .Thevariational
tions are either similar to the Takeuchi method [20], [21] or use formulation used infinite-element analysis determines the un-
Hopfield neural networks to solve the forward problem [22], known by minimizing the functional [3], [4]
[23]. Kalkkuhlet al.[24] provide a description of a FEM-based
approach to NARX modeling that may be interpreted both as (2)
a local model network, as well as a single layer feedforward
network. A slightly different approach to merging numerical with respect to the trial function . The minimization procedure
methods and neural networks is given in [25], where thefi- starts by dividing into small subdomains called elements
nite-difference time domain (FDTD) method is cast in a neural (Fig. 2) and representing in each element by means of basis
network framework for the purpose of solving electromagnetic functions defined over the element
forward problems. The related problem of mesh generation
infinite-element models has also been tackled using neural (3)networks (for instance, [26]). Generally, these networks are
designed to solve the forward problem, and must be modified
to solve inverse problems. where is the unknown solution in element , is the basis
This paper proposes a new approach that embeds afinite-ele- function associated with node in element , is the value
ment model commonly used in the solution of differential equa- of the unknown quantity at node and is the total number of
tions in a neural network. The network, called the FENN, can nodes associated with element . In general, the basis functions
solve the forward problem and can also be used in an itera- (also referred to as interpolation functions or shape functions)
tive algorithm to solve inverse problems. The primary advan- can be linear, quadratic, or of higher order. Typically,finite-el-
tage of this approach is that the FEM is represented in a parallel ement models use either linear or polynomial spline basis func-
form. Thus, it has the potential to alleviate the computational tions.
cost associated with using the FEM in an iterative algorithm The functional within an element is expressed as
for solving inverse problems. More importantly, the FENN does
not need any training, and the computation of the weights is (4)
a one-time process. The proposed approach is also different in
that the neural network architecture developed can be used to
solve the forward and inverse problems. The structure of the By substituting (3) in (4), we obtain the discrete version of the
neural network is also simpler than those reported in the litera- functional within each element
ture, making it easier to implement in parallel in both hardware (5)and software.
The rest of this paper is organized as follows. Section II where is the transpose of a matrix, is the ele-briefly describes the FEM, and derives the proposed FENN. In mental matrix with elements this paper, we focus on the problem of solving typical equa-
tions encountered in electromagnetic nondestructive evaluation (6)(NDE). However, the same concepts can be easily applied
to solve differential equations encountered in otherfields.
Sections III, IV and V present the application of the FENN and is an vector with elements
to solving forward and inverse problems, along with initial
results. A discussion of the advantages and disadvantages of (7)
the proposed FENN architecture is given in Section IV. Finally, RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1383
Combining the values in (5) for each of the elements
(8)
where is the global matrix derived from the terms
of the elemental matrices for different elements, and is the
total number of nodes. , also called the stiffness matrix, is a
sparse, banded matrix. Equation (8) is the discrete version of
the functional and can be minimized with respect to the nodal
parameters by taking the derivative of with respect to and
setting it equal to zero, which results in the matrix equation Fig.2. (a)Schematicrepresentationofdomainandboundary. (b)SampleFEM
mesh for the domain.
(9)
Boundary conditions for these problems are usually of two
types: natural boundary conditions and essential boundary
conditions. Essential boundary conditions (also referred to as
Dirichlet boundary conditions) impose constraints on the value
of the unknown at several nodes. Natural boundary condi-
tions (of which Neumann boundary conditions are a special
case) impose constraints on the change in across a boundary.
Dirichlet boundary conditions are imposed on the functional
minimization (9), by deleting the rows and columns of the
matrix corresponding to the nodes on the Dirichlet boundary
and modifying in (9). Fig. 3. FEM domain discretization using two elements and four nodes.
Natural boundary conditions are applied in the FEM by
adding an additional term to the functional. These boundary This process ensures that natural boundary conditions are im-conditions are then incorporated into the functional and are plicitlyandautomatically satisfiedduring theFEMsolutionpro-satisfied automatically during the solution procedure. As an cedure.example, consider the natural boundary condition represented
by the following equation [3] B. The FENN
on (10) This section describes how thefinite-element model can be
converted intoa parallel network form. Wefocus on solving typ-
where represents the Neumann boundary, is its outward ical inverse problems arising in electromagnetic NDE, but the
normal unit vector, is some constant, and , , and are basicideaisapplicabletootherareas aswell.NDEinverseprob-
known parameters associated with the boundary. Assuming that lems can be formulated as the problem offinding the material
the boundary is made up of segments, we can define properties (such as the conductivity or the permeability) within
boundary matrices and with elements the domain of the problem. Since the domain is discretized in
the FEM method by a large number of elements, the problem
can be posed as one offinding the material properties in each
of these elements. These properties are usually embedded in the
differential operator , or equivalently, in the global matrix .
Thus, in order to be able to iteratively estimate these properties
from the measurements, the material properties need to be sep-
arated out from . This separation is easier to achieve at the
element matrix level. For nodes and in element
(11)
where are basis functions defined over segment and is
the length of the segment. The elements of are added to the
elementsof that correspond tothe nodeson the boundary .
Similarly, the elements of are added to the corresponding
elements of . The global matrix (9) is thus modified as follows
before solving for (13)
where is the parameter representing the material property(12) in element and represents the differential operator at the 1384 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
Fig. 4. FENN.
element level without embedded in it. Substituting (13) into neurons, corresponding to the members of the global ma-
the functional, we get trix . The output of each group of hidden layer neurons is the
corresponding row vector of . The weights from the input to
the hidden layer are set to the appropriate values of . Each(14) neuron in the hidden layer acts as a summation unit, (equivalent
toasummationfollowedbyalinearactivationfunction[5]).The
If we define outputs of the hidden layer neurons are the elements of the
global matrix as given in (15).
(15) Each group of hidden neurons is connected to one output
neuron (giving a total of output neurons) by a set of weights
, with each element of representing the nodal values .where Note that the set of weights between thefirst group of hidden
neurons and thefirst output neuron are the same as the set of(16)else weights between the second group of hidden neurons and the
second output neuron (as well as between successive groups
of hidden neurons and the corresponding output neuron). Each
output neuron is also a summation unit followed by a linear ac-
tivation function, and the output of each neuron is equal to :
(18)
(17)
where the second part of (18) is obtained by using (15). As an
Equation (17) expresses the functional explicitly in terms of . example, the FENN architecture for a two-element, four-node
The assumption that is constant within each element is im- FEM mesh (Fig. 3) is shown in Fig. 4. In this
plicit in this expression. This assumption is usually satisfied in case, the FENN has two input neurons, 16 hidden layer neurons
problems in NDE where each element in the FEM mesh is de- and four output neurons. Thefigure illustrates the grouping of
fined within the confines of a domain, and at no time does a the hidden layer neurons, as well as the similarity inherent in
single element cross domain boundaries. Furthermore, each el- the weights that connect each group of hidden layer neurons
ement is small enough that minor variations in within an el- to the corresponding output neuron. To simplify thefigure, the
ement may be ignored. Equation (17) can be easily converted weights between the network input and hidden layer neurons
into a parallel network form. The neural network comprises an are depicted by means of vectors (for
input, output and hidden layer. In the general case with el- , 2, 3, 4 and , 2), where the individual weight values
ements and nodes in the FEM mesh, the input layer with are defined as in (16).
network inputs takes the values in each element as input. 1) Boundary Conditions in the FENN: Note that the ele-
The hidden layer has neurons 1 arranged in groups of ments of and in (11) do not depend on the material prop-
1 erties . and need to be added appropriately to the global In this paper, we use the term“neurons”in the FENN (in the hidden and
output layers) to avoid confusion with the nodes in afinite-element mesh. matrix and the source vector as shown in (12). Equation RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1385
Fig. 5. Geometry of mesh for 1-D FEM.
Fig. 6. Flowchart (with example) for designing the FENN for a general PDE.
(12) thus implies that natural boundary conditions can be ap- layer neurons. These weights will be referred to as the clamped
plied in the FENN as bias inputs to the hidden layer neurons weights, while the remaining weights will be referred to as the
that are a part of the boundary, and the corresponding output free weights. An example of these weights is presented later.
neurons. Dirichlet boundary conditions are applied by clamping The FENN architecture was derived without consideration of
the corresponding weights between the hidden layer and output the dimensionality of the problem at hand, and thus can be used 1386 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
for 1-, 2-, 3-, or higher dimensional problems. The number of
nodes and elements in the FEM mesh dictates the number of
neurons in the different layers. The weights between the input
and hidden layer change depending on node-element connec-
tivity information.
The major drawback of the FENN is the number of neurons
and weights necessary. However, the memory requirements can
be reduced considerably, since most of the weights between the
input and hidden layer are zero. These weights, and the corre-
sponding connections, can be discarded. Similarly, most of the Fig. 7. Shielded microstrip geometry. (a) Complete problem description. (b)
elements of the matrix are also zero ( is a banded ma- Problem description using symmetry considerations.
trix). The corresponding neurons in the hidden layer can also
be discarded, reducing memory and computation requirements The network implementation of (23) can be derived as fol-
considerably. Furthermore, the weights between each group of lows. If and values at each element are the inputs to the
hidden layer neurons and the output layer are the same . network, , , , and form the weights
Weight-sharing approaches can be used here to further reduce between the input and hidden layers. The network thus uses
the storage requirements. inputneuronsand hiddenneurons.Thevaluesof ateachof
thenodesareassigned asweightsbetweenthehidden andoutput
C. A 1-D Example layers, and the source is the desired output of this network
Consider the 1-D equation (corresponding to the output neurons). Dirichlet boundary
conditions on are applied as explained earlier.
(19) D. General Case
Fig. 6 shows aflowchart of the general scheme for convertingboundary conditions on the boundary defined by . a differential equation into the FENN structure. An exampleand are constants depending on the material and is the in two dimensions is also provided next to theflowchart. Weapplied source. Laplaces equation and Poissons equation are start with the differential equation and the boundary conditionsspecial cases of this equation. The FENN formulation for this and formulate the FEM using the variational method. This in-problem starts by discretizing the domain of interest with el- volves discretizing the domain of interest with elements andements and nodes. In one dimension, each element is defined nodes, selecting basis functions, writing the functional forby two nodes (Fig. 5). Define basis functions and over each element and obtaining the element matrices and the sourceeach element and let is the value of on node in element vector. The example presented uses the FEM mesh shown in. An example of the basis functions is shown in Fig. 5. Fig. 3, with elements, and nodes, and linearFor these basis functions, i.e., basis functions. The unknown solution to the differential equa-
tion is represented by its values at each of the nodes in the(20) finite-element mesh . The element matrices are then
separated into two parts, with one part dependent on the mate-the element matrices are given by [3] rial properties and while the other is independent of them.
The FENN is then designed to have input neurons,
hidden neurons, and output neurons, where is the number
of material property parameters. In the example under consid-
eration, , since we have two material property parameters(21) ( and ). Thefirst group of input neurons takes in the
values while the second group takes in the values in each ele-
ment. The weights from the input to the hidden layer are set to
the appropriate values of . In the example, since nodes 1, 2,
(22) and 3 are part of element 1 (see Fig. 3), the weights from thefirst
input node to thefirst group of four neurons in the hidden
Here, is the length of element . The global matrix is then layer are given by
constructed by selectively adding the element matrices based
on the nodes that form an element. Specifically, is a sparse
tridiagonal matrix, and its nonzero elements are given by (24)
The last weight is zero since node 4 is not a part of element 1.
Each group of hidden neurons is connected to one output
neuron (giving a total of output neurons) by a set of weights
, with each element of representing the nodal values . The
(23) output of each neuron in the output layer is equal to . RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1387
Fig. 8. Forward problem solutions for shielded microstrip problem show the contours of constant potential for: (a) FEM solution and (b) FENN solution. (c) Error
between (a) and (b). Thex- andy-axes show the nodes in the FEM discretization of the domain, and thez-axis in (c) shows the error at each of these nodes in volts.
III. F ORWARD AND INVERSE PROBLEM FORMULATION USING where is the output of the FENN. Then, for a gradient-
FENN based approach, the gradients of the error with respect to the
free hidden layer weights is given by
The FENN architecture and algorithm lends itself to solving (27)both the forward and inverse problems. The forward problem
involves determining the weights given the material parame- Equation (27) can be used to solve the forward problem. Sim-ters and and the applied source while the inverse problem ilarly, to solve the inverse problem, the gradients of the errorinvolves determining and given and . Any optimization with respect to and (input of the FENN) are necessary, andapproach can be used to solve both these problems. Suppose we are given bydefine the error at the output of the FENN as
(28)
(26) (29) 1388 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
TABLE I
SUMMARY OF PERFORMANCE OF THE FENN A LGORITHM FOR VARIOUS PDE S
For the forward problem, such an approach is equivalent to the Dirichlet boundary, with on the microstrip and on
iterative approaches used to solve for the unknown nodal values the outer boundary [Fig. 7(b)]. Finally, there is no source term
in the FEM [4]. in this example (the source term would correspond to a charge
distribution in the domain of interest), i.e., . In this ex-
IV. R ESULTS ample, we assume that volts and . Further, we
assume that the domain of interest is .A. Forward Model Results The solution to the forward problem is presented in Fig. 8,
The FENN was tested using both 1- and 2-D versions of with the FEM solution using 11 nodes in each direction shown
Poissons equation in Fig. 8(a) and the corresponding FENN solution in Fig. 8(b).
(30) Thesefigures show contours of constant potential. The error be-
tween the FEM and FENN solutions is presented in Fig. 8(c). As
where represents the material property, and is the applied seen from thefigure, the FENN is seen to match the FEM solu-
source. For instance, in electromagnetics may represent the tion accurately, with the peak error at any node on the order of
permittivity while represents the charge density. .
As thefirst example, consider the following 2-D equation Several other examples were also used to test the FENN and
the results are summarized in Table I. Column 1 shows the
(31) PDE used to evaluate the FENN performance, while column 2
shows the boundary conditions used. The analytic solution to
with boundary conditions the problem is indicated in Column 3. The FENN structure and
on (32) the number of iterations for convergence using a gradient de-
scent approach are indicated in Columns 4 and 5, respectively.
and The FENN structure, as explained earlier, has inputs,
hidden neurons and output neurons, where and are the
on (33) number of elements and nodes in the FEM mesh, respectively,
and is the number of hidden neurons, and corresponds to the
This is the governing equation for the shielded microstrip trans- number of nonzero elements in the FEM global matrix . Fi-
mission line problem shown in Fig. 7. The forward problem nally, Columns 6 and 7 present the sum-squared error (SSE) and
computes the electric potential due to the shielded microstrip the maximum error in the solution, respectively, where the er-
shown in Fig. 7(a). The potentials are zero on the shielding con- rors are computed with respect to the analytical solution. These
ductor.Sincethegeometryissymmetric,wecansolvetheequiv- results indicate that the FENN is capable of accurately deter-
alent problem shown in Fig. 7(b), by applying the homogeneous mining the potential . One advantage of the FENN approach
Neumann condition on the plane of symmetry. The inner con- is that the computation of the input-hidden layer weights is a
ductor (microstrip) is held at a constant potential of volts. one-time process, as long as the differential equation does not
Finally, we also assume that the material inside the shielding change. The only changes necessary to solve the different prob-
conductor has a permittivity , where K is a constant. The lems are changes in the input and the desired output .
permittivity in this case corresponds to the material property .
Specifically, and . The homogeneous Neu- B. Inverse Model Results
mann boundary condition is equivalent to setting . TheFENNwasalsousedtosolveseveralsimpleinverseprob-
The microstrip and the shielding conductor correspond to the lems based on (30). In all cases, the objective was to determine RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1389
Fig. 9. FENN inversion results for Poissons equation with initial solutions (a) = x . (b) =1+ x .
the value of and for given values of and . Thefirst ex- In order to obtain a unique solution, we need to constrain the
ample is a 1-D problem that involves determining given value of at the boundary as well. Consider the same differen-
and , for the differential equation tial equation as (34), but with and specified as follows:
(34) and
with boundary conditions and . The analyt- (36)
ical solution to this inverse problem is The analytical solution for this equation is .To
and (35) solve this problem, we set and clamp the value of at
As seen from (35), the problem has an infinite number of solu- and as follows: , .
tions and we expect the solution procedure to converge to one The results of the constrained inversion obtained using 11
of these solutions depending on the initial value. nodes and 10 elements in the correspondingfinite-element mesh
Fig. 9(a) and (b) shows two solutions to this inverse problem are shown in Fig. 10. Fig. 10(a) shows the comparison between
for two different initializations (shown using triangles). In both the analytical solution (solid line with squares) and the FENN
cases, the FENN solution (in stars) is seen to match the analyt- result (solid line with stars). The initial value of is shown in
ical solution (squares). The SSE in both cases was on the order thefigure as a dashed line. Fig. 10(b) shows the comparison
of . between the actual and desired forcing function at the FENN 1390 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
Fig. 10. Constrained inversion result with eleven nodes. (a) Comparison of analytic and simulation results for . (b) Comparison of actual and desired NN outputs.
output. This result indicates that the SSE in the forcing function, weight structure that allows both the forward and inverse prob-
as well as the SSE in the inversion result, is fairly large (0.0148 lemstobesolvedusingsimplegradient-basedalgorithms.Initial
and 0.0197, respectively). The reason for this was traced back results indicate that the proposed FENN algorithm is capable of
to the mesh discretization. Fig. 11 shows the SSE in the output accurately solving both the forward and inverse problems. In
of the FENN and the SSE in the inverse problem solution as a addition, the forward problem solution from the FENN is seen
function of FEM discretization. It is seen that increasing the dis- to exactly match the FEM solution, indicating that the FENN
cretization significantly improves the solution. Similar results represents thefinite-element model exactly in a parallel config-
were observed for other problems. uration.
The major advantage of the FENN is that it represents the
finite-element model in a parallel form, enabling parallel imple-
V. D ISCUSSION AND CONCLUSION mentation in either hardware or software. Further, computing
gradients in the FENN is very simple. This is an advantage in
The FENN is closely related to thefinite-element model used solving bothforward and inverse problems using gradient-based
to solve differential equations. The FENN architecture has a methods. The gradients can also be computed in parallel and RAMUHALLI et al.: FENNs FOR SOLVING DIFFERENTIAL EQUATIONS 1391
Fig. 11. SSE in FENN output and inversion results as a function of discretization.
the lack of nonlinearities in the neuron activation functions [6] C. A. Jensenet al.,“Inversion of feedforward neural networks: algo-
makes the computation of gradients simpler. A major advantage rithms and applications,”Proc. IEEE, vol. 87, no. 9, pp. 15361549,
of this approach for solving inverse problems is that it avoids 1999.
[7] P. Ramuhalli, L. Udpa, and S. Udpa,“Neural networkalgorithm for elec-
inverting the global matrix in each iteration. The FENN also tromagnetic NDE signal inversion,”inENDE 2000, Budapest, Hungary,
does not require any training, since most of its weights can be Jun. 2000.
computed in advance and stored. The weights depend on the [8] C. H. Barbosa, A. C. Bruno, M. Vellasco, M. Pacheco, J. P. Wikswo Jr.,
and A. P. Ewing,“Automation of SQUID nondestructive evaluation of
governing differential equation and its associated boundary steel plates by neural networks,”IEEE Trans. Appl. Supercond., vol. 9,
conditions, and as long as these two factors do not change, no. 2, pp. 34753478, 1999.
the weights do not change. This is especially an advantage [9] W.Qing, S. Xueqin,Y.Qingxin,and Y.Weili,“Usingwaveletneural net-
works for the optimal design of electromagnetic devices,”IEEE Trans.
in solving inverse problems in electromagnetic NDE. This Magn., vol. 33, no. 2, pp. 19281930, 1997.
approach also reduces the computational effort associated with [10] I. E. Lagaris, A. C. Likas, and D. I. Fotiadis,“Artificial neural networks
the network. for solving ordinary and partial differential equations,”IEEE Trans.
Neural Netw., vol. 9, no. 5, pp. 9871000, 1998.
Future work will concentrate on applying the FENN to 3-D [11] I. E. Lagaris, A. C. Likas, and D. G. Papageorgiou,“Neural-network
electromagnetic NDE problems. The robustness of the approach methods for boundary value problems with irregular boundaries,”IEEE
will also be tested, since the ability of these approaches to in- Trans. Neural Netw., vol. 11, no. 5, pp. 10411049, 2000.
[12] B. P. Van Milligen, V. Tribaldos, and J. A. Jimenez,“Neural network
vert practical noisy measurements is important. Furthermore, differential equation and plasma equilibrium solver,”Phys. Rev. Lett.,
the use of better optimization algorithms, like conjugate gra- vol. 75, no. 20, pp. 35943597, 1995.
dient methods, is expected to improve the solution speed. In ad- [13] M. W. M. G. Dissanayake and N. Phan-Thien,“Neural-network-based
approximations for solving partial differential equations,”Commun.
dition, parallel implementation of the FENN in both hardware Numer. Meth. Eng., vol. 10, pp. 195201, 1994.
and software is under investigation. The approach described in [14] R. Masuoka,“Neural networks learning differential data,”IEICE Trans.
this paper is very general in that it can be applied to a variety Inform. Syst., vol. E83-D, no. 6, pp. 12911300, 2000.
[15] D.C.Youla,“Generalizedimagerestorationbythemethodofalternating
of inverse problems infields other than electromagnetic NDE. orthogonal projections,”IEEE Trans. Circuits Syst., vol. CAS-25, no. 9,
Some of these other applications will also be investigated to pp. 694702, 1978.
show the general nature of the proposed method. [16] D. C. Youla and H. Webb,“Image restoration by the method of convex
projections: part I—theory,”IEEE Trans. Med. Imag., vol. MI-1, no. 2,
pp. 8194, 1982.
REFERENCES [17] A. Lent and H. Tuy,“An iterative method for the extrapolation of band-
limitedfunctions,”J.Math.AnalysisandApplicat.,vol.83, pp.554565,
[1] L. Udpa and S. S. Udpa,“Application of signal processing and pattern 1981.
recognition techniques to inverse problems in NDE,”Int. J. Appl. Elec- [18] W. Chen,“A new extrapolation algorithm for band-limited signals using
tromagn. Mechan., vol. 8, pp. 99117, 1997. the regularization method,”IEEE Trans. Signal Process., vol. 41, no. 3,
[2] M. Yan, M. Afzal, S. Udpa, S. Mandayam, Y. Sun, L. Udpa, and P. pp. 10481060, 1993.
Sacks,“Iterative algorithms for electromagnetic NDE signal inversion,” [19] J. Takeuchi and Y. Kosugi,“Neural network representation of thefinite
inENDE 97, Reggio Calabria, Italy, Sep. 1416, 1997. element method,”Neural Netw., vol. 7, no. 2, pp. 389395, 1994.
[3] J. Jin,The Finite Element Method in Electromagnetics. New York: [20] R. Sikora, J. Sikora, E. Cardelli, and T. Chady,“Artificial neural net-
Wiley, 1993. work application for material evaluation by electromagnetic methods,”
[4] P. Zhou,Numerical Analysis of Electromagnetic Fields. Berlin, Ger- inProc. Int. Joint Conf. Neural Networks, vol. 6, 1999, pp. 40274032.
many: Springer-Verlag, 1993. [21] G. Xu, G. Littlefair, R. Penson, and R. Callan,“Application of FE-based
[5] S. Haykin,Neural Networks: A Comprehensive Foundation. Upper neural networks to dynamic problems,”inProc. Int. Conf. Neural Infor-
Saddle River, NJ: Prentice-Hall, 1994. mation Processing, vol. 3, 1999, pp. 10391044. 1392 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 16, NO. 6, NOVEMBER 2005
[22] F. Guo, P. Zhang, F. Wang, X. Ma, and G. Qiu,“Finite element anal- Lalita Udpa (S84M86SM96) received the
ysis-based Hopfield neural network model for solving nonlinear elec- Ph.D. degree in electrical engineering from Col-
tromagneticfield problems,”inProc. Int. Joint Conf. Neural Networks, orado State University, Fort Collins, in 1986.
vol. 6, 1999, pp. 43994403. She is currently a Professor with the Department
[23] H. Lee and I. S. Kang,“Neural algorithm for solving differential equa- of Electrical and Computer Engineering, Michigan
tions,”J. Computat. Phys., vol. 91, pp. 110131, 1990. State University, East Lansing. She works primarily
[24] J. Kalkkuhl, K. J. Hunt, and H. Fritz,“FEM-based neural-network in the broad areas of nondestructive evaluation,
approach to nonlinear modeling with application to longitudinal vehicle signal processing, and biomedical applications. Her
dynamics control,”IEEE Trans. Neural Netw., vol. 10, no. 4, pp. research interests include various aspects of NDE,
885897, 1999. such as development of computational models for
[25] R. K. Mishra and P. S. Hall,“NFDTD concept,”IEEE Trans. Neural the forward problem in NDE, signal and image pro-
Netw., vol. 16, no. 2, pp. 484490, 2005. cessing, pattern recognition and neural networks, and development of solution
[26] D. G. Triantafyllidis and D. P. Labridis,“Afinite-element mesh gener- techniques for inverse problems. Her current projects includefinite-element
ator based on growing neural networks,”IEEE Trans. Neural Netw., vol. modeling of electromagnetic NDE phenomena, application of neural network
13, no. 6, pp. 14821496, 2002. and signal processing algorithms to NDE data, and development of image
processing techniques for the analysis of NDE and biomedical images.
Dr. Udpa is a Member of Eta Kappa Nu and Sigma Xi.
Satish S. Udpa(S82M82SM91F03) received
the B.Tech. degree in 1975 and the Post Graduate
Diplomainelectricalengineeringin1977fromJ.N.T.
University, Hyderabad, India. He received the M.S.
degree in 1980 and the Ph.D. degree in electrical en-
gineering in 1983, both from Colorado State Univer-
sity, Fort Collins.
He has been with Michigan State University, East
Lansing, since 2001 and is currently Acting Dean for
the College of Engineering and a Professor with the
Electrical and Computer Engineering Department.
Prior to joining Michigan State, he was a Professor with Iowa State University,
Ames, from 1990 to 2001 and was associated with the Materials Assessment
Research Group. Prior to joining Iowa State, he was an Associate Professor
with the Department of Electrical Engineering at Colorado State University.
His research interests span the broad area of materials characterization and
nondestructive evaluation (NDE). Work done by him to date in the area includes
an extensive repertoire of forward models for simulating physical processes
underlying several inspection techniques. Coupled with careful experimental
Pradeep Ramuhalli (S92M02) received the work, such forward models can be used for designing new sensors, optimizing
B.Tech. degree from J.N.T. University, Hyderabad, test conditions, estimating the probability of detection, assessing designs for
India, in electronics and communications engi- inspectability and training inverse models for characterizing defects. He has
neering in 1995, and the M.S. and Ph.D. degrees in also been involved in the development of system-, as well as model-based,
electrical engineering from Iowa State University, inverse solutions for defect and material property characterization. His interests
Ames, in 1998 and 2002, respectively. have expanded in recent years to include the development of noninvasive
He is currently an Assistant Professor with the tools for clinical applications. Work done to date in thisfield includes the
Department of Electrical and Computer Engi- development of new electromagnetic-acoustic (EMAT) methods for detecting
neering, Michigan State University, East Lansing. single leg separation failures in artificial heart valves and microwave imaging
His research is in the general area of nondestruc- and ablation therapy systems. He and his research group have been engaged
tive evaluation and materials characterization. His in the design and development of high-performance instrumentation including
research interests include the application of signal and image processing acoustic microscopes and single and multifrequency eddy current NDE instru-
methods, pattern recognition and neural networks for nondestructive evaluation ments. These systems, as well as software packages embodying algorithms
applications, development of model-based solutions for inverse problems in developed by Udpa for defect classification and characterization, have been
NDE, and the development of information fusion algorithms for multimodal licensed to industry.
data fusion. He is a Fellow of the American Society for Nondestructive Testing (ASNT)
Dr. Ramuhalli is a Member of Phi Kappa Phi. and a Fellow of the Indian Society of Nondestructive Testing.