testing_generation/Corpus/Harnessing Nonlinearity Pre...

190 lines
25 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

REPORTS
Harnessing Nonlinearity: Predicting It is important that the echo signals be
richly varied. This was ensured by a sparse
Chaotic Systems and Saving Energy interconnectivity of 1% within the reservoir.
This condition lets the reservoir decompose
into many loosely coupled subsystems, estab-
in Wireless Communication lishing a richly structured reservoir of excit-
able dynamics.
After timen3000, output connection
Herbert Jaeger* and Harald Haas weightsw(i1, . . . , 1000) were computed i (dashed arrows in Fig. 2B) from the last 2000
We present a method for learning nonlinear systems, echo state networks stepsn1001, . . . , 3000 of the training run
(ESNs). ESNs employ artificial recurrent neural networks in a way that has such that the training error
recently been proposed independently as a learning mechanism in biological
3000
1000
2
brains. The learning method is computationally efficient and easy to use. On MSE 1/2000 d(n) wx(n)train n1001 i1 ii
a benchmark task of predicting a chaotic time series, accuracy is improved by
a factor of 2400 over previous techniques. The potential for engineering ap- was minimized [x(n), activation of theith i plicationsisillustratedbyequalizingacommunicationchannel,wherethesignal internal neuron at timen]. This is a simple
error rate is improved by two orders of magnitude. linear regression.
With the newwin place, the ESN was i Nonlinear dynamical systems abound in the convergence and suboptimal solutions (5, disconnected from the teacher after step 3000
sciences and in engineering. If one wishes to 6). The ESN approach differs from these and left running freely. A bidirectional dy-
simulate, predict, filter, classify, or control such methods in that a large RNN is used (on the namical interplay of the network-generated
a system, one needs an executable system mod- order of 50 to 1000 neurons; previous tech- output signal with the internal signalsx(n)i el. However, it is often infeasible to obtain niques typically use 5 to 30 neurons) and in unfolded. The output signaly(n) was created
analytical models. In such cases, one has to that only the synaptic connections from the from the internal neuron activation signals
resort to black-box models, which ignore the RNN to the output readout neurons are x(n) through the trained connectionsw,by i i internal physical mechanisms and instead re- modified by learning; previous techniques y(n) 1000 wxn)produce only the outwardly observable input- tune all synaptic connections (Fig. 1). Be- ii . Conversely, the internal i1
output behavior of the target system. cause there are no cyclic dependencies be- signals were echoed from that output signal
If the target system is linear, efficient tween the trained readout connections, through the fixed output feedback connec-
methods for black-box modeling are avail- training an ESN becomes a simple linear tions (supporting online text).
able. Most technical systems, however, be- regression task. For testing, an 84-step continuation
come nonlinear if operated at higher opera- We illustrate the ESN approach on a d(3001), . . . ,d(3084) of the original signal
tional points (that is, closer to saturation). task of chaotic time series prediction (Fig. was computed for reference. The network
Although this might lead to cheaper and more 2) (7). The Mackey-Glass system (MGS) outputy(3084) was compared with the cor-
energy-efficient designs, it is not done be- (8) is a standard benchmark system for time rect continuationd(3084). Averaged over 100
cause the resulting nonlinearities cannot be series prediction studies. It generates a sub- independent trials, a normalized root mean
harnessed. Many biomechanical systems use tly irregular time series (Fig. 2A). The square error
their full dynamic range (up to saturation) prediction task has two steps: (i) using an and thereby become lightweight, energy effi- initial teacher sequence generated by the 100 1/2
NRMSE (d(3084)y3084)) 2 /1002 10 4.2j j
cient, and thoroughly nonlinear. original MGS to learn a black-box modelM j1
Here, we present an approach to learn- of the generating system, and (ii) usingM was obtained (dandyteacher and network j j ing black-box models of nonlinear systems, to predict the value of the sequence some
echo state networks (ESNs). An ESN is an steps ahead.
artificial recurrent neural network (RNN). First, we created a random RNN with
RNNs are characterized by feedback (“re- 1000 neurons (called the “reservoir”) and one
current”) loops in their synaptic connection output neuron. The output neuron was
pathways. They can maintain an ongoing equipped with random connections that
activation even in the absence of input and project back into the reservoir (Fig. 2B). A
thus exhibit dynamic memory. Biological 3000-step teacher sequence d(1),...,
neural networks are typically recurrent. d(3000) was generated from the MGS equa-
Like biological neural networks, an artifi- tion and fed into the output neuron. This
cial RNN can learn to mimic a target excited the internal neurons through the out-
system—in principle, with arbitrary accu- put feedback connections. After an initial
racy (1). Several learning algorithms are transient period, they started to exhibit sys-
known (24) that incrementally adapt the tematic individual variations of the teacher
synaptic weights of an RNN in order to sequence (Fig. 2B).
tune it toward the target system. These The fact that the internal neurons display
algorithms have not been widely employed systematic variants of the exciting external Fig. 1.(A) Schema of previous approaches to
in technical applications because of slow signal is constitutional for ESNs: The internal RNN learning. (B) Schema of ESN approach.
neurons must work as “echo functions” for Solidboldarrows, fixedsynaptic connections;
the driving signal. Not every randomly gen- dotted arrows, adjustable connections. Both International University Bremen, Bremen D-28759, approaches aim at minimizing the errord(n) Germany. erated RNN has this property, but it can y(n), wherey(n) is the network output andd(n)
*To whom correspondence should be addressed. E- effectively be built into a reservoir (support- is the teacher time series observedfrom the
mail: h.jaeger@iu-bremen.de ing online text). target system.
78 2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org REPORTS
output in trialj,2 variance of MGS signal), communications. The corrupted signalu(n)is ratios ranging from 12 to 32 db. Figure 3
improving the best previous techniques (9 then passed through an equalizing filter compares the average symbol error rates ob-
15), which used training sequences of length whose outputy(n) should restoreu(n)as tained with the results reported in (17), show-
500 to 10,000, by a factor of 700. If the closely as possible tod(n). Finally, the equal- ing an improvement of two magnitudes for
prediction run was continued, deviations typ- ized signaly(n) is converted back into a high signal-to-noise ratios.
ically became visible after about 1300 steps symbol sequence. The quality measure for For tasks with multichannel input and/or
(Fig. 2A). With a refined variant of the learn- the entire process is the fraction of incorrect output, the ESN approach can be accommo-
ing method (7), the improvement factor rises symbols finally obtained (symbol error rate). dated simply by adding more input or output
to 2400. Models of similar accuracy were To compare the performance of an ESN neurons (16,18).
also obtained for other chaotic systems (sup- equalizer with standard techniques, we took ESNs can be applied to all basic tasks of
porting online text). a channel model for a nonlinear wireless signal processing and control, including time
The main reason for the jump in modeling transmission system from a study (17) that series prediction, inverse modeling, pattern
accuracy is that ESNs capitalize on a massive compared three customary nonlinear equal- generation, event detection and classification,
short-term memory. We showed analytically ization methods: a linear decision feedback modeling distributions of stochastic process-
(16) that under certain conditions an ESN of equalizer (DFE), which is actually a non- es, filtering, and nonlinear control (16,18,
sizeNmay be able to“remember”a number linear method; a Volterra DFE; and a bilin- 19,20). Because a single learning run takes
of previous inputs that is of the same order of ear DFE. The model equation featured only a few seconds (or minutes, for very large
magnitude asN. This information is more intersymbol interference across 10 consec- data sets and networks), engineers can test
massive than the information used in other utive symbols, a second-order and a third- out variants at a high turnover rate, a crucial
techniques (supporting online text). order nonlinear distortion, and additive factor for practical usability.
We now illustrate the approach in a task white Gaussian noise. All methods investi- ESNs have been developed from a mathe-
of practical relevance, namely, the equaliza- gated in that study had 47 adjustable pa- matical and engineering perspective, but exhibit
tion of a wireless communication channel rameters and used sequences of 5000 typical features of biological RNNs: a large
(7). The essentials of equalization are as fol- symbols for training. To make the ESN number of neurons, recurrent pathways, sparse
lows: A sender wants to communicate a sym- equalizer comparable with the equalizers random connectivity, and local modification of
bol sequences(n). This sequence is first studied in (17), we took ESNs with a res- synaptic weights. The idea of using randomly
transformed into an analog envelope signal ervoir of 46 neurons (which is small for the connected RNNs to represent and memorize
d(n), then modulated on a high-frequency ESN approach), which yielded 47 adjust- dynamic input in network states has frequently
carrier signal and transmitted, then received able parameters. (The 47th comes from a been explored in specific contexts, for instance,
and demodulated into an analog signalu(n), direct connection from the input to the in artificial intelligence models of associative
which is a corrupted version ofd(n). Major output neuron.) memory (21), models of prefrontal cortex func-
sources of corruption are noise (thermal or We carried out numerous learning trials tion in sensory-motor sequencing tasks (22),
due to interfering signals), multipath propa- (7) to obtain ESN equalizers, using an online models of birdsong (23), models of the cerebel-
gation, which leads to a superposition of ad- learning method (a version of the recursive lum (24), and general computational models of
jacent symbols (intersymbol interference), least square algorithm known from linear neural oscillators (25). Many different learning
and nonlinear distortion induced by operating adaptive filters) to train the output weights on mechanisms were considered, mostly within
the senders power amplifier in the high-gain 5000-step training sequences. We chose an the RNN itself. The contribution of the ESN is
region. To avoid the latter, the actual power online adaptation scheme here because the to elucidate the mathematical properties of
amplification is run well below the maximum methods in (17) were online adaptive, too, large RNNs such that they can be used with a
amplification possible, thereby incurring a and because wireless communication chan- linear, trainable readout mechanism for general
substantial loss in energy efficiency, which is nels mostly are time-varying, such that an black-box modeling. An approach essentially
clearly undesirable in cell-phone and satellite equalizer must adapt to changing system equivalent to ESNs, liquid state networks (26,
characteristics. The entire learning-testing 27), has been developed independently to mod-
procedure was repeated for signal-to-noise el computations in cortical microcircuits. Re-
cent findings in neurophysiology suggest that
the basic ESN/liquid state network principle
SER seems not uncommon in biological networks
(2830) and could eventually be exploited to 0.01 a
b control prosthetic devices by signals collected 0.001 c from a collective of neurons (31).
0.0001 d
0.00001 e References and Notes
1. K.-I. Funahashi, Y. Nakamura,Neural Netw.6, 801
(1993). 16 20 24 28 32 SNR 2. D. Zipser, R. J. Williams,Neural Comput.1, 270
(1989).
3. P. J. Werbos,Proc. IEEE78, 1550 (1990). Fig. 3.Results of using an ESN for nonlinear 4. L. A. Feldkamp, D. V. Prokhorov, C. F. Eagen, F. Yuan, channel equalization. Plot shows signal error inNonlinear Modeling: Advanced Black-Box Tech- Fig. 2.(A) Prediction output of the trained ESN rate (SER) versus signal-to-noise ratio (SNR). niques, J. A. K. Suykens, J. Vandewalle, Eds. (Kluwer,
(dotted) overlaid with the correct continuation (a) Linear DFE. (b) Volterra DFE. (c) Bilinear Dordrecht, Netherlands, 1998), pp. 2954.
(solid). (B) Learning the MG attractor. Three DFE. [(a) to (c) taken from (20)]. (d) Blue line 5. K. Doya, inThe Handbook of Brain Theory and Neural
sample activation traces of internal neurons are represents average ESN performance with ran- Networks, M. A. Arbib, Ed. (MIT Press, Cambridge, MA,
shown. They echo the teacher signald(n). After domly generated reservoirs. Error bars, varia- 1995), pp. 796800.
training, the desiredoutput is recreatedfrom tion across networks. (e) Green line indicates 6. H. Jaeger, “Tutorial on training recurrent neural
networks” (GMD-Report 159, German National Re- the echo signals through output connections performance of best network chosen from the search Institute for Computer Science, 2002); ftp:// (dotted arrows) whose weightsware the result networks averagedin (d). Error bars, variation borneo.gmd.de/pub/indy/publications_herbert/ of the training procedure. i across learning trials. CompleteTutorialTechrep.pdf.
www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 79 REPORTS
7. Materials andmethods are available as supporting Algorithms, Architectures, and Implementations V 26. W. Maass, T. Natschla¨ger, H. Markram,Neural Com-
material onScienceOnline. (Proc. SPIE Vol. 2296), (SPIE, San Diego, CA, 1994), put.14, 2531 (2002).
8. M. C. Mackey, L. Glass,Science197, 287 (1977). pp. 317327. 27. W. Maass, T. Natschla¨ger, H. Markram, inCompu-
9. J. Vesanto, inProc. WSOM 97(1997); www.cis.hut.fi/ 18. J. Hertzberg, H. Jaeger, F. Scho¨nherr, inProc. 15th tational Neuroscience: A Comprehensive Approach,
projects/monitor/publications/papers/wsom97.ps. Europ. Conf. on Art. Int. (ECAI 02), F. van Harmelen, J. Feng, Ed. (Chapman & Hall/CRC, 2003), pp. 575
10. L. Chudy, I. Farkas,Neural Network World8, 481 Ed. (IOS Press, Amsterdam, 2002), pp. 708712; www. 605.
(1998). ais.fhg.de/schoenhe/papers/ECAI02.pdf. 28. G. B. Stanley, F. F. Li, Y. Dan,J. Neurosci.19, 803611. H. Bersini, M. Birattari, G. Bontempi, inProc. IEEE 19. H. Jaeger, “The echo state approach to analysing and (1999).World Congr. on Computational Intelligence(IJCNN training recurrent neural networks” (GMD-Report 29. G. B. Stanley,Neurocomputing3840, 1703 (2001).98) (1997), pp. 21022106; ftp://iridia.ulb.ac.be/ 148, German National Research Institute for Com- 30. W. M. Kistler, Ch. I. de Zeeuw,Neural Comput.14,pub/lazy/papers/IridiaTr1997-13_2.ps.gz. puter Science, 2001); ftp://borneo.gmd.de/pub/indy/ 2597 (2002).12. T. M. Martinetz, S. G. Berkovich, K. J. Schulten,IEEE publications_herbert/EchoStatesTechRep.pdf. 31. S. Mussa-Ivaldi,Nature408, 361 (2000).Trans. Neural Netw.4, 558 (1993). 20. H. Jaeger, inAdvances in Neural Information Process- 32. The first author thanks T. Christaller for unfaltering13. X. Yao, Y. Liu,IEEE Trans. Neural Netw.8, 694 (1997). ing Systems15, S. Becker, S. Thrun, K. Obermayer, support andW. Maass for friendly cooperation. Inter-14. F. Gers, D. Eck, J. F. Schmidhuber, “Applying LSTM to Eds. (MIT Press, Cambridge, MA, 2003) pp. 593600. national patents are claimedby Fraunhofer AIS (PCT/time series predictable through time-window ap- 21. G. E. Hinton, inParallel Models of Associative Mem- EP01/11490).proaches” (IDSIA-IDSIA-22-00, 2000); www.idsia.ch/ ory, G. E. Hinton, J. A. Anderson, Eds. (Erlbaum, Hills-
felix/Publications.html. dale, NJ, 1981), pp. 161187. Supporting Online Material15. J. McNames, J. A. K. Suykens, J. Vandewalle,Int. J. 22. D. G. Beiser, J. C. Houk,J. Neurophysiol.79, 3168
Bifurcat. Chaos9, 1485 (1999). (1998). www.sciencemag.org/cgi/content/full/304/5667/78/DC1
16. H. Jaeger, “Short term memory in echo state net- 23. S. Dehaene, J.-P. Changeux, J.-P. Nadal,Proc. Natl. Materials andMethods
works” (GMD-Report 152, German National Re- Acad. Sci. U.S.A.84, 2727 (1987). SOM Text
search Institute for Computer Science, 2002); ftp:// 24. M. Kawato, inThe Handbook of Brain Theory and Figs. S1 to S4
borneo.gmd.de/pub/indy/publications_herbert/ Neural Networks, M. Arbib, Ed. (MIT Press, Cam- References
STMEchoStatesTechRep.pdf. bridge, MA, 1995), pp. 172178.
17. V. J. Mathews, J. Lee, inAdvanced Signal Processing: 25. K. Doya, S. Yoshizawa,Neural Netw.2, 375 (1989). 8 September 2003; accepted26 February 2004
Ultrafast Electron Crystallography temperature jump. Interfacial water is formed
of Interfacial Water on a hydrophilic surface (silicon, chlorine-
terminated) under controlled ultrahigh vacuum
(UHV) conditions (Fig. 1). With these atomic-
scale spatial, temporal, and energy resolutions, Chong-Yu Ruan, Vladimir A. Lobastov, Franco Vigliotti, the evolution of nonequilibrium structures was
Songye Chen, Ahmed H. Zewail* monitored, their ordered or disordered nature
was established, and the time scale for the
We report direct determination of the structures and dynamics of interfacial water breakage of long-range bonding and formation
on a hydrophilic surface with atomic-scale resolution using ultrafast electron of new structures was determined. We identi-
crystallography. On the nanometer scale, we observed the coexistence of ordered fied the structured and ordered interfacial water
surface water and crystallite-like ice structures, evident in the superposition of from the Bragg diffraction and the layered crys-
Bragg spots and Debye-Scherrer rings. The structures were determined to be tallite structure from the Debye-Scherrer rings.
dominantly cubic, but each undergoes different dynamics after the ultrafast sub- The temporal evolution of interfacial water and
strate temperature jump. From changes in local bond distances (OHOand OO) layered ice after the temperature jump was
with time, we elucidated the structural changes in the far-from-equilibrium regime studied with submonolayer sensitivity. We
at short times and near-equilibration at long times. compared these results with those obtained on
hydrophobic surfaces, such as hydrogen-
The nature of interfacial molecular assemblies Here, we report direct determination of the terminated silicon or silver substrate.
of nanometer scale is of fundamental impor- structures of interfacial water with atomic-scale Spectroscopic techniques, such as internal
tance to chemical and biological phenomena resolution, using diffraction and the dynamics reflection (11) and nonlinear [second-harmonic
(14). For water, the directional molecular fea- following ultrafast infrared (IR) laser-initiated generation (12) and sum-frequency generation
tures of hydrogen bonding (5,6) and the dif-
ferent structures possible, from amorphous (7) Fig. 1.Structuredwa-
to crystalline (8), make the interfacial (9) col- ter at the hydrophilic
lective assembly on the mesoscopic (10) scale interface. The chlo-
much less understood. Structurally, the nature rine termination on
of water on a substrate is determined by forces a Si(111) substrate
forms a hydrophilic of orientation at the interface and by the net layer that orients the charge density, which establishes the hydro- water bilayer. The
philic or hydrophobic character of the substrate. closest packing dis-
However, the transformation from ordered to dis- tance (4.43 Å) be-
ordered structure and their coexistence critically tween oxygen atoms
depends on the time scales for the movements of in the bottom layer of
water is similar to the atoms locally and at long range. Therefore, it is distance (4.50 Å) be- essential to elucidate the nature of these structures tween the on-top and
and the time scales for their equilibration. interstitial sites of the
chlorine layer, result-
ing in specific bilayer Laboratory for Molecular Sciences, Arthur Amos orientations (30°) Noyes Laboratory of Chemical Physics, California with respect to the sil- Institute of Technology, Pasadena, CA 91125, USA. icon substrate. This ordered stacking persists for three to four bilayers (1 nm) before disorien-
*To whom correspondence should be addressed. E- tation takes place andresults in crystallite islands, forming the layeredstructure. The size of atoms
mail: zewail@caltech.edu is not to scale for the van der Waals radii.
80 2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org