REPORTS


           Harnessing Nonlinearity: Predicting         It is important that the echo signals be
                                                                 richly varied. This was ensured by a sparse

           Chaotic Systems and Saving Energy       interconnectivity of 1% within the reservoir.
                                                                 This condition lets the reservoir decompose
                                                                 into many loosely coupled subsystems, estab-

                in Wireless Communication            lishing a richly structured reservoir of excit-
                                                                 able dynamics.
                                                                   After timen 3000, output connection
                      Herbert Jaeger* and Harald Haas                  weightsw(i 1, . . . , 1000) were computed i (dashed arrows in Fig. 2B) from the last 2000
          We present a method for learning nonlinear systems, echo state networks      stepsn 1001, . . . , 3000 of the training run
          (ESNs). ESNs employ artiﬁcial recurrent neural networks in a way that has      such that the training error
          recently been proposed independently as a learning mechanism in biological
                                                                            3000
                                                                                    1000
                                                                                           2
          brains. The learning method is computationally efﬁcient and easy to use. On       MSE  1/2000     d(n)    wx(n)train      n 1001     i 1 ii
          a benchmark task of predicting a chaotic time series, accuracy is improved by
          a factor of 2400 over previous techniques. The potential for engineering ap-      was minimized [x(n), activation of theith i plicationsisillustratedbyequalizingacommunicationchannel,wherethesignal      internal neuron at timen]. This is a simple
          error rate is improved by two orders of magnitude.                      linear regression.
                                                                   With the newwin place, the ESN was i Nonlinear dynamical systems abound in the  convergence and suboptimal solutions (5,  disconnected from the teacher after step 3000
      sciences and in engineering. If one wishes to  6). The ESN approach differs from these  and left running freely. A bidirectional dy-
      simulate, predict, filter, classify, or control such  methods in that a large RNN is used (on the  namical interplay of the network-generated
      a system, one needs an executable system mod-  order of 50 to 1000 neurons; previous tech-  output signal with the internal signalsx(n)i el. However, it is often infeasible to obtain  niques typically use 5 to 30 neurons) and in  unfolded. The output signaly(n) was created
      analytical models. In such cases, one has to  that only the synaptic connections from the  from the internal neuron activation signals
      resort to black-box models, which ignore the  RNN to the output readout neurons are  x(n) through the trained connectionsw,by i                       i internal physical mechanisms and instead re-  modified by learning; previous techniques  y(n)   1000 wx n)produce only the outwardly observable input-  tune all synaptic connections (Fig. 1). Be-          ii . Conversely, the internal i 1
      output behavior of the target system.        cause there are no cyclic dependencies be-  signals were echoed from that output signal
        If the target system is linear, efficient  tween the trained readout connections,  through the fixed output feedback connec-
      methods for black-box modeling are avail-  training an ESN becomes a simple linear  tions (supporting online text).
      able. Most technical systems, however, be-  regression task.                     For testing, an 84-step continuation
      come nonlinear if operated at higher opera-    We illustrate the ESN approach on a  d(3001), . . . ,d(3084) of the original signal
      tional points (that is, closer to saturation).  task of chaotic time series prediction (Fig.  was computed for reference. The network
      Although this might lead to cheaper and more  2) (7). The Mackey-Glass system (MGS)  outputy(3084) was compared with the cor-
      energy-efficient designs, it is not done be-  (8) is a standard benchmark system for time  rect continuationd(3084). Averaged over 100
      cause the resulting nonlinearities cannot be  series prediction studies. It generates a sub-  independent trials, a normalized root mean
      harnessed. Many biomechanical systems use  tly irregular time series (Fig. 2A). The  square error
      their full dynamic range (up to saturation)  prediction task has two steps: (i) using an          and thereby become lightweight, energy effi-  initial teacher sequence generated by the          100              1/2
                                                                 NRMSE    (d(3084) y 3084)) 2 /100 2    10  4.2j    j
      cient, and thoroughly nonlinear.          original MGS to learn a black-box modelM          j 1

        Here, we present an approach to learn-  of the generating system, and (ii) usingM  was obtained (dandyteacher and network j   j ing black-box models of nonlinear systems,  to predict the value of the sequence some
      echo state networks (ESNs). An ESN is an  steps ahead.
      artificial recurrent neural network (RNN).    First, we created a random RNN with
      RNNs are characterized by feedback (“re-  1000 neurons (called the “reservoir”) and one
      current”) loops in their synaptic connection  output neuron. The output neuron was
      pathways. They can maintain an ongoing  equipped with random connections that
      activation even in the absence of input and  project back into the reservoir (Fig. 2B). A
      thus exhibit dynamic memory. Biological  3000-step teacher sequence d(1),...,
      neural networks are typically recurrent.  d(3000) was generated from the MGS equa-
      Like biological neural networks, an artifi-  tion and fed into the output neuron. This
      cial RNN can learn to mimic a target  excited the internal neurons through the out-
      system—in principle, with arbitrary accu-  put feedback connections. After an initial
      racy (1). Several learning algorithms are  transient period, they started to exhibit sys-
      known (2 4) that incrementally adapt the  tematic individual variations of the teacher
      synaptic weights of an RNN in order to  sequence (Fig. 2B).
      tune it toward the target system. These    The fact that the internal neurons display
      algorithms have not been widely employed  systematic variants of the exciting external  Fig. 1.(A) Schema of previous approaches to
      in technical applications because of slow  signal is constitutional for ESNs: The internal  RNN learning. (B) Schema of ESN approach.
                                   neurons must work as “echo functions” for  Solidboldarrows, ﬁxedsynaptic connections;
                                   the driving signal. Not every randomly gen-  dotted arrows, adjustable connections. Both International University Bremen, Bremen D-28759,                               approaches aim at minimizing the errord(n)– Germany.                        erated RNN has this property, but it can  y(n), wherey(n) is the network output andd(n)
      *To whom correspondence should be addressed. E-  effectively be built into a reservoir (support-  is the teacher time series observedfrom the
      mail: h.jaeger@iu-bremen.de               ing online text).                   target system.

   78                         2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org                                                                                    REPORTS
      output in trialj, 2 variance of MGS signal),  communications. The corrupted signalu(n)is ratios ranging from 12 to 32 db. Figure 3
      improving the best previous techniques (9–  then passed through an equalizing filter  compares the average symbol error rates ob-
      15), which used training sequences of length  whose outputy(n) should restoreu(n)as tained with the results reported in (17), show-
      500 to 10,000, by a factor of 700. If the  closely as possible tod(n). Finally, the equal-  ing an improvement of two magnitudes for
      prediction run was continued, deviations typ-  ized signaly(n) is converted back into a  high signal-to-noise ratios.
      ically became visible after about 1300 steps  symbol sequence. The quality measure for    For tasks with multichannel input and/or
      (Fig. 2A). With a refined variant of the learn-  the entire process is the fraction of incorrect  output, the ESN approach can be accommo-
      ing method (7), the improvement factor rises  symbols finally obtained (symbol error rate).  dated simply by adding more input or output
      to 2400. Models of similar accuracy were    To compare the performance of an ESN  neurons (16,18).
      also obtained for other chaotic systems (sup-  equalizer with standard techniques, we took    ESNs can be applied to all basic tasks of
      porting online text).                 a channel model for a nonlinear wireless  signal processing and control, including time
        The main reason for the jump in modeling  transmission system from a study (17) that  series prediction, inverse modeling, pattern
      accuracy is that ESNs capitalize on a massive  compared three customary nonlinear equal-  generation, event detection and classification,
      short-term memory. We showed analytically  ization methods: a linear decision feedback  modeling distributions of stochastic process-
      (16) that under certain conditions an ESN of  equalizer (DFE), which is actually a non-  es, filtering, and nonlinear control (16,18,
      sizeNmay be able to“remember”a number  linear method; a Volterra DFE; and a bilin-  19,20). Because a single learning run takes
      of previous inputs that is of the same order of  ear DFE. The model equation featured  only a few seconds (or minutes, for very large
      magnitude asN. This information is more  intersymbol interference across 10 consec-  data sets and networks), engineers can test
      massive than the information used in other  utive symbols, a second-order and a third-  out variants at a high turnover rate, a crucial
      techniques (supporting online text).       order nonlinear distortion, and additive  factor for practical usability.
        We now illustrate the approach in a task  white Gaussian noise. All methods investi-    ESNs have been developed from a mathe-
      of practical relevance, namely, the equaliza-  gated in that study had 47 adjustable pa-  matical and engineering perspective, but exhibit
      tion of a wireless communication channel  rameters and used sequences of 5000  typical features of biological RNNs: a large
      (7). The essentials of equalization are as fol-  symbols for training. To make the ESN  number of neurons, recurrent pathways, sparse
      lows: A sender wants to communicate a sym-  equalizer comparable with the equalizers  random connectivity, and local modification of
      bol sequences(n). This sequence is first  studied in (17), we took ESNs with a res-  synaptic weights. The idea of using randomly
      transformed into an analog envelope signal  ervoir of 46 neurons (which is small for the  connected RNNs to represent and memorize
      d(n), then modulated on a high-frequency  ESN approach), which yielded 47 adjust-  dynamic input in network states has frequently
      carrier signal and transmitted, then received  able parameters. (The 47th comes from a  been explored in specific contexts, for instance,
      and demodulated into an analog signalu(n),  direct connection from the input to the  in artificial intelligence models of associative
      which is a corrupted version ofd(n). Major  output neuron.)                   memory (21), models of prefrontal cortex func-
      sources of corruption are noise (thermal or    We carried out numerous learning trials  tion in sensory-motor sequencing tasks (22),
      due to interfering signals), multipath propa-  (7) to obtain ESN equalizers, using an online  models of birdsong (23), models of the cerebel-
      gation, which leads to a superposition of ad-  learning method (a version of the recursive  lum (24), and general computational models of
      jacent symbols (intersymbol interference),  least square algorithm known from linear  neural oscillators (25). Many different learning
      and nonlinear distortion induced by operating  adaptive filters) to train the output weights on  mechanisms were considered, mostly within
      the sender’s power amplifier in the high-gain  5000-step training sequences. We chose an  the RNN itself. The contribution of the ESN is
      region. To avoid the latter, the actual power  online adaptation scheme here because the  to elucidate the mathematical properties of
      amplification is run well below the maximum  methods in (17) were online adaptive, too,  large RNNs such that they can be used with a
      amplification possible, thereby incurring a  and because wireless communication chan-  linear, trainable readout mechanism for general
      substantial loss in energy efficiency, which is  nels mostly are time-varying, such that an  black-box modeling. An approach essentially
      clearly undesirable in cell-phone and satellite  equalizer must adapt to changing system  equivalent to ESNs, liquid state networks (26,
                                   characteristics. The entire learning-testing  27), has been developed independently to mod-
                                   procedure was repeated for signal-to-noise  el computations in cortical microcircuits. Re-
                                                                 cent findings in neurophysiology suggest that
                                                                 the basic ESN/liquid state network principle
                                    SER                          seems not uncommon in biological networks
                                                                 (28–30) and could eventually be exploited to 0.01                     a
                                                              b  control prosthetic devices by signals collected 0.001                     c  from a collective of neurons (31).
                                    0.0001                     d
                                    0.00001                     e    References and Notes
                                                                  1. K.-I. Funahashi, Y. Nakamura,Neural Netw.6, 801
                                                                   (1993). 16  20  24  28  32 SNR   2. D. Zipser, R. J. Williams,Neural Comput.1, 270
                                                                   (1989).
                                                                  3. P. J. Werbos,Proc. IEEE78, 1550 (1990). Fig. 3.Results of using an ESN for nonlinear  4. L. A. Feldkamp, D. V. Prokhorov, C. F. Eagen, F. Yuan, channel equalization. Plot shows signal error    inNonlinear Modeling: Advanced Black-Box Tech- Fig. 2.(A) Prediction output of the trained ESN  rate (SER) versus signal-to-noise ratio (SNR).    niques, J. A. K. Suykens, J. Vandewalle, Eds. (Kluwer,
      (dotted) overlaid with the correct continuation  (a) Linear DFE. (b) Volterra DFE. (c) Bilinear    Dordrecht, Netherlands, 1998), pp. 29–54.
      (solid). (B) Learning the MG attractor. Three  DFE. [(a) to (c) taken from (20)]. (d) Blue line  5. K. Doya, inThe Handbook of Brain Theory and Neural
      sample activation traces of internal neurons are  represents average ESN performance with ran-    Networks, M. A. Arbib, Ed. (MIT Press, Cambridge, MA,
      shown. They echo the teacher signald(n). After  domly generated reservoirs. Error bars, varia-    1995), pp. 796–800.
      training, the desiredoutput is recreatedfrom  tion across networks. (e) Green line indicates  6. H. Jaeger, “Tutorial on training recurrent neural
                                                                   networks” (GMD-Report 159, German National Re- the echo signals through output connections  performance of best network chosen from the    search Institute for Computer Science, 2002); ftp:// (dotted arrows) whose weightsware the result  networks averagedin (d). Error bars, variation    borneo.gmd.de/pub/indy/publications_herbert/ of the training procedure.    i          across learning trials.                   CompleteTutorialTechrep.pdf.

                               www.sciencemag.org SCIENCE VOL 304 2 APRIL 2004                         79      REPORTS
      7. Materials andmethods are available as supporting    Algorithms, Architectures, and Implementations V  26. W. Maass, T. Natschla¨ger, H. Markram,Neural Com-
        material onScienceOnline.                (Proc. SPIE Vol. 2296), (SPIE, San Diego, CA, 1994),    put.14, 2531 (2002).
      8. M. C. Mackey, L. Glass,Science197, 287 (1977).      pp. 317–327.                     27. W. Maass, T. Natschla¨ger, H. Markram, inCompu-
      9. J. Vesanto, inProc. WSOM ’97(1997); www.cis.hut.ﬁ/  18. J. Hertzberg, H. Jaeger, F. Scho¨nherr, inProc. 15th    tational Neuroscience: A Comprehensive Approach,
        projects/monitor/publications/papers/wsom97.ps.        Europ. Conf. on Art. Int. (ECAI 02), F. van Harmelen,    J. Feng, Ed. (Chapman & Hall/CRC, 2003), pp. 575–
      10. L. Chudy, I. Farkas,Neural Network World8, 481    Ed. (IOS Press, Amsterdam, 2002), pp. 708–712; www.    605.
        (1998).                          ais.fhg.de/ schoenhe/papers/ECAI02.pdf.       28. G. B. Stanley, F. F. Li, Y. Dan,J. Neurosci.19, 803611. H. Bersini, M. Birattari, G. Bontempi, inProc. IEEE  19. H. Jaeger, “The echo state approach to analysing and    (1999).World Congr. on Computational Intelligence(IJCNN    training recurrent neural networks” (GMD-Report  29. G. B. Stanley,Neurocomputing38–40, 1703 (2001).’98) (1997), pp. 2102–2106; ftp://iridia.ulb.ac.be/    148, German National Research Institute for Com-  30. W. M. Kistler, Ch. I. de Zeeuw,Neural Comput.14,pub/lazy/papers/IridiaTr1997-13_2.ps.gz.          puter Science, 2001); ftp://borneo.gmd.de/pub/indy/    2597 (2002).12. T. M. Martinetz, S. G. Berkovich, K. J. Schulten,IEEE    publications_herbert/EchoStatesTechRep.pdf.      31. S. Mussa-Ivaldi,Nature408, 361 (2000).Trans. Neural Netw.4, 558 (1993).          20. H. Jaeger, inAdvances in Neural Information Process-  32. The ﬁrst author thanks T. Christaller for unfaltering13. X. Yao, Y. Liu,IEEE Trans. Neural Netw.8, 694 (1997).    ing Systems15, S. Becker, S. Thrun, K. Obermayer,    support andW. Maass for friendly cooperation. Inter-14. F. Gers, D. Eck, J. F. Schmidhuber, “Applying LSTM to    Eds. (MIT Press, Cambridge, MA, 2003) pp. 593–600.    national patents are claimedby Fraunhofer AIS (PCT/time series predictable through time-window ap-  21. G. E. Hinton, inParallel Models of Associative Mem-    EP01/11490).proaches” (IDSIA-IDSIA-22-00, 2000); www.idsia.ch/    ory, G. E. Hinton, J. A. Anderson, Eds. (Erlbaum, Hills-
         felix/Publications.html.                 dale, NJ, 1981), pp. 161–187.             Supporting Online Material15. J. McNames, J. A. K. Suykens, J. Vandewalle,Int. J.  22. D. G. Beiser, J. C. Houk,J. Neurophysiol.79, 3168
        Bifurcat. Chaos9, 1485 (1999).              (1998).                        www.sciencemag.org/cgi/content/full/304/5667/78/DC1
      16. H. Jaeger, “Short term memory in echo state net-  23. S. Dehaene, J.-P. Changeux, J.-P. Nadal,Proc. Natl.  Materials andMethods
        works” (GMD-Report 152, German National Re-    Acad. Sci. U.S.A.84, 2727 (1987).           SOM Text
        search Institute for Computer Science, 2002); ftp://  24. M. Kawato, inThe Handbook of Brain Theory and  Figs. S1 to S4
        borneo.gmd.de/pub/indy/publications_herbert/    Neural Networks, M. Arbib, Ed. (MIT Press, Cam-  References
        STMEchoStatesTechRep.pdf.               bridge, MA, 1995), pp. 172–178.
      17. V. J. Mathews, J. Lee, inAdvanced Signal Processing:  25. K. Doya, S. Yoshizawa,Neural Netw.2, 375 (1989).  8 September 2003; accepted26 February 2004


          Ultrafast Electron Crystallography      temperature jump. Interfacial water is formed


                     of Interfacial Water                 on a hydrophilic surface (silicon, chlorine-
                                                                 terminated) under controlled ultrahigh vacuum
                                                                 (UHV) conditions (Fig. 1). With these atomic-
                                                                 scale spatial, temporal, and energy resolutions, Chong-Yu Ruan, Vladimir A. Lobastov, Franco Vigliotti,          the evolution of nonequilibrium structures was
                      Songye Chen, Ahmed H. Zewail*                  monitored, their ordered or disordered nature
                                                                 was established, and the time scale for the
          We report direct determination of the structures and dynamics of interfacial water      breakage of long-range bonding and formation
          on a hydrophilic surface with atomic-scale resolution using ultrafast electron      of new structures was determined. We identi-
          crystallography. On the nanometer scale, we observed the coexistence of ordered      fied the structured and ordered interfacial water
          surface water and crystallite-like ice structures, evident in the superposition of      from the Bragg diffraction and the layered crys-
          Bragg spots and Debye-Scherrer rings. The structures were determined to be      tallite structure from the Debye-Scherrer rings.
          dominantly cubic, but each undergoes different dynamics after the ultrafast sub-      The temporal evolution of interfacial water and
          strate temperature jump. From changes in local bond distances (OH  Oand O   O)      layered ice after the temperature jump was
          with time, we elucidated the structural changes in the far-from-equilibrium regime      studied with submonolayer sensitivity. We
          at short times and near-equilibration at long times.                       compared these results with those obtained on
                                                                 hydrophobic surfaces, such as hydrogen-
      The nature of interfacial molecular assemblies    Here, we report direct determination of the  terminated silicon or silver substrate.
      of nanometer scale is of fundamental impor-  structures of interfacial water with atomic-scale    Spectroscopic techniques, such as internal
      tance to chemical and biological phenomena  resolution, using diffraction and the dynamics  reflection (11) and nonlinear [second-harmonic
      (1–4). For water, the directional molecular fea-  following ultrafast infrared (IR) laser-initiated  generation (12) and sum-frequency generation
      tures of hydrogen bonding (5,6) and the dif-
      ferent structures possible, from amorphous (7)  Fig. 1.Structuredwa-
      to crystalline (8), make the interfacial (9) col-  ter at the hydrophilic
      lective assembly on the mesoscopic (10) scale  interface. The chlo-
      much less understood. Structurally, the nature  rine termination on
      of water on a substrate is determined by forces  a Si(111) substrate
                                   forms a hydrophilic of orientation at the interface and by the net  layer that orients the charge density, which establishes the hydro-  water bilayer. The
      philic or hydrophobic character of the substrate.  closest packing dis-
      However, the transformation from ordered to dis-  tance (4.43 Å) be-
      ordered structure and their coexistence critically  tween oxygen atoms
      depends on the time scales for the movements of  in the bottom layer of
                                   water is similar to the atoms locally and at long range. Therefore, it is  distance (4.50 Å) be- essential to elucidate the nature of these structures  tween the on-top and
      and the time scales for their equilibration.      interstitial sites of the
                                   chlorine layer, result-
                                   ing in speciﬁc bilayer Laboratory for Molecular Sciences, Arthur Amos  orientations ( 30°) Noyes Laboratory of Chemical Physics, California  with respect to the sil- Institute of Technology, Pasadena, CA 91125, USA.   icon substrate. This ordered stacking persists for three to four bilayers ( 1 nm) before disorien-
      *To whom correspondence should be addressed. E-  tation takes place andresults in crystallite islands, forming the layeredstructure. The size of atoms
      mail: zewail@caltech.edu                is not to scale for the van der Waals radii.

   80                         2 APRIL 2004 VOL 304 SCIENCE www.sciencemag.org