REPORTS Harnessing Nonlinearity: Predicting It is important that the echo signals be richly varied. This was ensured by a sparse Chaotic Systems and Saving Energy interconnectivity of 1% within the reservoir. This condition lets the reservoir decompose into many loosely coupled subsystems, estab- in Wireless Communication lishing a richly structured reservoir of excit- able dynamics. After timen3000, output connection Herbert Jaeger* and Harald Haas weightsw(i1, . . . , 1000) were computed i (dashed arrows in Fig. 2B) from the last 2000 We present a method for learning nonlinear systems, echo state networks stepsn1001, . . . , 3000 of the training run (ESNs). ESNs employ artificial recurrent neural networks in a way that has such that the training error recently been proposed independently as a learning mechanism in biological 3000 1000 2 brains. The learning method is computationally efficient and easy to use. On MSE 1/2000 d(n) wx(n)train n1001 i1 ii a benchmark task of predicting a chaotic time series, accuracy is improved by a factor of 2400 over previous techniques. The potential for engineering ap- was minimized [x(n), activation of theith i plicationsisillustratedbyequalizingacommunicationchannel,wherethesignal internal neuron at timen]. This is a simple error rate is improved by two orders of magnitude. linear regression. With the newwin place, the ESN was i Nonlinear dynamical systems abound in the convergence and suboptimal solutions (5, disconnected from the teacher after step 3000 sciences and in engineering. If one wishes to 6). The ESN approach differs from these and left running freely. A bidirectional dy- simulate, predict, filter, classify, or control such methods in that a large RNN is used (on the namical interplay of the network-generated a system, one needs an executable system mod- order of 50 to 1000 neurons; previous tech- output signal with the internal signalsx(n)i el. However, it is often infeasible to obtain niques typically use 5 to 30 neurons) and in unfolded. The output signaly(n) was created analytical models. In such cases, one has to that only the synaptic connections from the from the internal neuron activation signals resort to black-box models, which ignore the RNN to the output readout neurons are x(n) through the trained connectionsw,by i i internal physical mechanisms and instead re- modified by learning; previous techniques y(n) 1000 wxn)produce only the outwardly observable input- tune all synaptic connections (Fig. 1). Be- ii . Conversely, the internal i1 output behavior of the target system. cause there are no cyclic dependencies be- signals were echoed from that output signal If the target system is linear, efficient tween the trained readout connections, through the fixed output feedback connec- methods for black-box modeling are avail- training an ESN becomes a simple linear tions (supporting online text). able. Most technical systems, however, be- regression task. For testing, an 84-step continuation come nonlinear if operated at higher opera- We illustrate the ESN approach on a d(3001), . . . ,d(3084) of the original signal tional points (that is, closer to saturation). task of chaotic time series prediction (Fig. was computed for reference. The network Although this might lead to cheaper and more 2) (7). The Mackey-Glass system (MGS) outputy(3084) was compared with the cor- energy-efficient designs, it is not done be- (8) is a standard benchmark system for time rect continuationd(3084). Averaged over 100 cause the resulting nonlinearities cannot be series prediction studies. It generates a sub- independent trials, a normalized root mean harnessed. Many biomechanical systems use tly irregular time series (Fig. 2A). The square error their full dynamic range (up to saturation) prediction task has two steps: (i) using an and thereby become lightweight, energy effi- initial teacher sequence generated by the 100 1/2 NRMSE (d(3084)y3084)) 2 /1002 10 4.2j j cient, and thoroughly nonlinear. original MGS to learn a black-box modelM j1 Here, we present an approach to learn- of the generating system, and (ii) usingM was obtained (dandyteacher and network j j ing black-box models of nonlinear systems, to predict the value of the sequence some echo state networks (ESNs). An ESN is an steps ahead. artificial recurrent neural network (RNN). First, we created a random RNN with RNNs are characterized by feedback (“re- 1000 neurons (called the “reservoir”) and one current”) loops in their synaptic connection output neuron. The output neuron was pathways. They can maintain an ongoing equipped with random connections that activation even in the absence of input and project back into the reservoir (Fig. 2B). A thus exhibit dynamic memory. Biological 3000-step teacher sequence d(1),..., neural networks are typically recurrent. d(3000) was generated from the MGS equa- Like biological neural networks, an artifi- tion and fed into the output neuron. This cial RNN can learn to mimic a target excited the internal neurons through the out- system—in principle, with arbitrary accu- put feedback connections. After an initial racy (1). Several learning algorithms are transient period, they started to exhibit sys- known (24) that incrementally adapt the tematic individual variations of the teacher synaptic weights of an RNN in order to sequence (Fig. 2B). tune it toward the target system. These The fact that the internal neurons display algorithms have not been widely employed systematic variants of the exciting external Fig. 1.(A) Schema of previous approaches to in technical applications because of slow signal is constitutional for ESNs: The internal RNN learning. (B) Schema of ESN approach. neurons must work as “echo functions” for Solidboldarrows, fixedsynaptic connections; the driving signal. Not every randomly gen- dotted arrows, adjustable connections. Both International University Bremen, Bremen D-28759, approaches aim at minimizing the errord(n)– Germany. erated RNN has this property, but it can y(n), wherey(n) is the network output andd(n) *To whom correspondence should be addressed. E- effectively be built into a reservoir (support- is the teacher time series observedfrom the mail: h.jaeger@iu-bremen.de ing online text). target system. 78 2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org REPORTS output in trialj,2 variance of MGS signal), communications. The corrupted signalu(n)is ratios ranging from 12 to 32 db. Figure 3 improving the best previous techniques (9– then passed through an equalizing filter compares the average symbol error rates ob- 15), which used training sequences of length whose outputy(n) should restoreu(n)as tained with the results reported in (17), show- 500 to 10,000, by a factor of 700. If the closely as possible tod(n). Finally, the equal- ing an improvement of two magnitudes for prediction run was continued, deviations typ- ized signaly(n) is converted back into a high signal-to-noise ratios. ically became visible after about 1300 steps symbol sequence. The quality measure for For tasks with multichannel input and/or (Fig. 2A). With a refined variant of the learn- the entire process is the fraction of incorrect output, the ESN approach can be accommo- ing method (7), the improvement factor rises symbols finally obtained (symbol error rate). dated simply by adding more input or output to 2400. Models of similar accuracy were To compare the performance of an ESN neurons (16,18). also obtained for other chaotic systems (sup- equalizer with standard techniques, we took ESNs can be applied to all basic tasks of porting online text). a channel model for a nonlinear wireless signal processing and control, including time The main reason for the jump in modeling transmission system from a study (17) that series prediction, inverse modeling, pattern accuracy is that ESNs capitalize on a massive compared three customary nonlinear equal- generation, event detection and classification, short-term memory. We showed analytically ization methods: a linear decision feedback modeling distributions of stochastic process- (16) that under certain conditions an ESN of equalizer (DFE), which is actually a non- es, filtering, and nonlinear control (16,18, sizeNmay be able to“remember”a number linear method; a Volterra DFE; and a bilin- 19,20). Because a single learning run takes of previous inputs that is of the same order of ear DFE. The model equation featured only a few seconds (or minutes, for very large magnitude asN. This information is more intersymbol interference across 10 consec- data sets and networks), engineers can test massive than the information used in other utive symbols, a second-order and a third- out variants at a high turnover rate, a crucial techniques (supporting online text). order nonlinear distortion, and additive factor for practical usability. We now illustrate the approach in a task white Gaussian noise. All methods investi- ESNs have been developed from a mathe- of practical relevance, namely, the equaliza- gated in that study had 47 adjustable pa- matical and engineering perspective, but exhibit tion of a wireless communication channel rameters and used sequences of 5000 typical features of biological RNNs: a large (7). The essentials of equalization are as fol- symbols for training. To make the ESN number of neurons, recurrent pathways, sparse lows: A sender wants to communicate a sym- equalizer comparable with the equalizers random connectivity, and local modification of bol sequences(n). This sequence is first studied in (17), we took ESNs with a res- synaptic weights. The idea of using randomly transformed into an analog envelope signal ervoir of 46 neurons (which is small for the connected RNNs to represent and memorize d(n), then modulated on a high-frequency ESN approach), which yielded 47 adjust- dynamic input in network states has frequently carrier signal and transmitted, then received able parameters. (The 47th comes from a been explored in specific contexts, for instance, and demodulated into an analog signalu(n), direct connection from the input to the in artificial intelligence models of associative which is a corrupted version ofd(n). Major output neuron.) memory (21), models of prefrontal cortex func- sources of corruption are noise (thermal or We carried out numerous learning trials tion in sensory-motor sequencing tasks (22), due to interfering signals), multipath propa- (7) to obtain ESN equalizers, using an online models of birdsong (23), models of the cerebel- gation, which leads to a superposition of ad- learning method (a version of the recursive lum (24), and general computational models of jacent symbols (intersymbol interference), least square algorithm known from linear neural oscillators (25). Many different learning and nonlinear distortion induced by operating adaptive filters) to train the output weights on mechanisms were considered, mostly within the sender’s power amplifier in the high-gain 5000-step training sequences. We chose an the RNN itself. The contribution of the ESN is region. To avoid the latter, the actual power online adaptation scheme here because the to elucidate the mathematical properties of amplification is run well below the maximum methods in (17) were online adaptive, too, large RNNs such that they can be used with a amplification possible, thereby incurring a and because wireless communication chan- linear, trainable readout mechanism for general substantial loss in energy efficiency, which is nels mostly are time-varying, such that an black-box modeling. An approach essentially clearly undesirable in cell-phone and satellite equalizer must adapt to changing system equivalent to ESNs, liquid state networks (26, characteristics. The entire learning-testing 27), has been developed independently to mod- procedure was repeated for signal-to-noise el computations in cortical microcircuits. Re- cent findings in neurophysiology suggest that the basic ESN/liquid state network principle SER seems not uncommon in biological networks (28–30) and could eventually be exploited to 0.01 a b control prosthetic devices by signals collected 0.001 c from a collective of neurons (31). 0.0001 d 0.00001 e References and Notes 1. K.-I. Funahashi, Y. Nakamura,Neural Netw.6, 801 (1993). 16 20 24 28 32 SNR 2. D. Zipser, R. J. Williams,Neural Comput.1, 270 (1989). 3. P. J. Werbos,Proc. IEEE78, 1550 (1990). Fig. 3.Results of using an ESN for nonlinear 4. L. A. Feldkamp, D. V. Prokhorov, C. F. Eagen, F. Yuan, channel equalization. Plot shows signal error inNonlinear Modeling: Advanced Black-Box Tech- Fig. 2.(A) Prediction output of the trained ESN rate (SER) versus signal-to-noise ratio (SNR). niques, J. A. K. Suykens, J. Vandewalle, Eds. (Kluwer, (dotted) overlaid with the correct continuation (a) Linear DFE. (b) Volterra DFE. (c) Bilinear Dordrecht, Netherlands, 1998), pp. 29–54. (solid). (B) Learning the MG attractor. Three DFE. [(a) to (c) taken from (20)]. (d) Blue line 5. K. Doya, inThe Handbook of Brain Theory and Neural sample activation traces of internal neurons are represents average ESN performance with ran- Networks, M. A. Arbib, Ed. (MIT Press, Cambridge, MA, shown. They echo the teacher signald(n). After domly generated reservoirs. Error bars, varia- 1995), pp. 796–800. training, the desiredoutput is recreatedfrom tion across networks. (e) Green line indicates 6. H. Jaeger, “Tutorial on training recurrent neural networks” (GMD-Report 159, German National Re- the echo signals through output connections performance of best network chosen from the search Institute for Computer Science, 2002); ftp:// (dotted arrows) whose weightsware the result networks averagedin (d). Error bars, variation borneo.gmd.de/pub/indy/publications_herbert/ of the training procedure. i across learning trials. CompleteTutorialTechrep.pdf. www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004 79 REPORTS 7. Materials andmethods are available as supporting Algorithms, Architectures, and Implementations V 26. W. Maass, T. Natschla¨ger, H. Markram,Neural Com- material onScienceOnline. (Proc. SPIE Vol. 2296), (SPIE, San Diego, CA, 1994), put.14, 2531 (2002). 8. M. C. Mackey, L. Glass,Science197, 287 (1977). pp. 317–327. 27. W. Maass, T. Natschla¨ger, H. Markram, inCompu- 9. J. Vesanto, inProc. WSOM ’97(1997); www.cis.hut.fi/ 18. J. Hertzberg, H. Jaeger, F. Scho¨nherr, inProc. 15th tational Neuroscience: A Comprehensive Approach, projects/monitor/publications/papers/wsom97.ps. Europ. Conf. on Art. Int. (ECAI 02), F. van Harmelen, J. Feng, Ed. (Chapman & Hall/CRC, 2003), pp. 575– 10. L. Chudy, I. Farkas,Neural Network World8, 481 Ed. (IOS Press, Amsterdam, 2002), pp. 708–712; www. 605. (1998). ais.fhg.de/schoenhe/papers/ECAI02.pdf. 28. G. B. Stanley, F. F. Li, Y. Dan,J. Neurosci.19, 803611. H. Bersini, M. Birattari, G. Bontempi, inProc. IEEE 19. H. Jaeger, “The echo state approach to analysing and (1999).World Congr. on Computational Intelligence(IJCNN training recurrent neural networks” (GMD-Report 29. G. B. Stanley,Neurocomputing38–40, 1703 (2001).’98) (1997), pp. 2102–2106; ftp://iridia.ulb.ac.be/ 148, German National Research Institute for Com- 30. W. M. Kistler, Ch. I. de Zeeuw,Neural Comput.14,pub/lazy/papers/IridiaTr1997-13_2.ps.gz. puter Science, 2001); ftp://borneo.gmd.de/pub/indy/ 2597 (2002).12. T. M. Martinetz, S. G. Berkovich, K. J. Schulten,IEEE publications_herbert/EchoStatesTechRep.pdf. 31. S. Mussa-Ivaldi,Nature408, 361 (2000).Trans. Neural Netw.4, 558 (1993). 20. H. Jaeger, inAdvances in Neural Information Process- 32. The first author thanks T. Christaller for unfaltering13. X. Yao, Y. Liu,IEEE Trans. Neural Netw.8, 694 (1997). ing Systems15, S. Becker, S. Thrun, K. Obermayer, support andW. Maass for friendly cooperation. Inter-14. F. Gers, D. Eck, J. F. Schmidhuber, “Applying LSTM to Eds. (MIT Press, Cambridge, MA, 2003) pp. 593–600. national patents are claimedby Fraunhofer AIS (PCT/time series predictable through time-window ap- 21. G. E. Hinton, inParallel Models of Associative Mem- EP01/11490).proaches” (IDSIA-IDSIA-22-00, 2000); www.idsia.ch/ ory, G. E. Hinton, J. A. Anderson, Eds. (Erlbaum, Hills- felix/Publications.html. dale, NJ, 1981), pp. 161–187. Supporting Online Material15. J. McNames, J. A. K. Suykens, J. Vandewalle,Int. J. 22. D. G. Beiser, J. C. Houk,J. Neurophysiol.79, 3168 Bifurcat. Chaos9, 1485 (1999). (1998). www.sciencemag.org/cgi/content/full/304/5667/78/DC1 16. H. Jaeger, “Short term memory in echo state net- 23. S. Dehaene, J.-P. Changeux, J.-P. Nadal,Proc. Natl. Materials andMethods works” (GMD-Report 152, German National Re- Acad. Sci. U.S.A.84, 2727 (1987). SOM Text search Institute for Computer Science, 2002); ftp:// 24. M. Kawato, inThe Handbook of Brain Theory and Figs. S1 to S4 borneo.gmd.de/pub/indy/publications_herbert/ Neural Networks, M. Arbib, Ed. (MIT Press, Cam- References STMEchoStatesTechRep.pdf. bridge, MA, 1995), pp. 172–178. 17. V. J. Mathews, J. Lee, inAdvanced Signal Processing: 25. K. Doya, S. Yoshizawa,Neural Netw.2, 375 (1989). 8 September 2003; accepted26 February 2004 Ultrafast Electron Crystallography temperature jump. Interfacial water is formed of Interfacial Water on a hydrophilic surface (silicon, chlorine- terminated) under controlled ultrahigh vacuum (UHV) conditions (Fig. 1). With these atomic- scale spatial, temporal, and energy resolutions, Chong-Yu Ruan, Vladimir A. Lobastov, Franco Vigliotti, the evolution of nonequilibrium structures was Songye Chen, Ahmed H. Zewail* monitored, their ordered or disordered nature was established, and the time scale for the We report direct determination of the structures and dynamics of interfacial water breakage of long-range bonding and formation on a hydrophilic surface with atomic-scale resolution using ultrafast electron of new structures was determined. We identi- crystallography. On the nanometer scale, we observed the coexistence of ordered fied the structured and ordered interfacial water surface water and crystallite-like ice structures, evident in the superposition of from the Bragg diffraction and the layered crys- Bragg spots and Debye-Scherrer rings. The structures were determined to be tallite structure from the Debye-Scherrer rings. dominantly cubic, but each undergoes different dynamics after the ultrafast sub- The temporal evolution of interfacial water and strate temperature jump. From changes in local bond distances (OHOand OO) layered ice after the temperature jump was with time, we elucidated the structural changes in the far-from-equilibrium regime studied with submonolayer sensitivity. We at short times and near-equilibration at long times. compared these results with those obtained on hydrophobic surfaces, such as hydrogen- The nature of interfacial molecular assemblies Here, we report direct determination of the terminated silicon or silver substrate. of nanometer scale is of fundamental impor- structures of interfacial water with atomic-scale Spectroscopic techniques, such as internal tance to chemical and biological phenomena resolution, using diffraction and the dynamics reflection (11) and nonlinear [second-harmonic (1–4). For water, the directional molecular fea- following ultrafast infrared (IR) laser-initiated generation (12) and sum-frequency generation tures of hydrogen bonding (5,6) and the dif- ferent structures possible, from amorphous (7) Fig. 1.Structuredwa- to crystalline (8), make the interfacial (9) col- ter at the hydrophilic lective assembly on the mesoscopic (10) scale interface. The chlo- much less understood. Structurally, the nature rine termination on of water on a substrate is determined by forces a Si(111) substrate forms a hydrophilic of orientation at the interface and by the net layer that orients the charge density, which establishes the hydro- water bilayer. The philic or hydrophobic character of the substrate. closest packing dis- However, the transformation from ordered to dis- tance (4.43 Å) be- ordered structure and their coexistence critically tween oxygen atoms depends on the time scales for the movements of in the bottom layer of water is similar to the atoms locally and at long range. Therefore, it is distance (4.50 Å) be- essential to elucidate the nature of these structures tween the on-top and and the time scales for their equilibration. interstitial sites of the chlorine layer, result- ing in specific bilayer Laboratory for Molecular Sciences, Arthur Amos orientations (30°) Noyes Laboratory of Chemical Physics, California with respect to the sil- Institute of Technology, Pasadena, CA 91125, USA. icon substrate. This ordered stacking persists for three to four bilayers (1 nm) before disorien- *To whom correspondence should be addressed. E- tation takes place andresults in crystallite islands, forming the layeredstructure. The size of atoms mail: zewail@caltech.edu is not to scale for the van der Waals radii. 80 2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org