Figure - available from: Nature
This content is subject to copyright. Terms and conditions apply.
An example PNN, implemented experimentally using broadband optical SHG
a, Input data are encoded into the spectrum of a laser pulse (Methods, Supplementary Section 2). To control transformations implemented by the broadband SHG process, a portion of the pulse’s spectrum is used as trainable parameters (orange). The physical computation result is obtained from the spectrum of a blue (about 390 nm) pulse generated within a χ⁽²⁾ medium. b, To construct a deep PNN, the outputs of the SHG transformations are used as inputs to subsequent SHG transformations, with independent trainable parameters. c, d, After training the SHG-PNN (see main text, Fig. 3), it classifies test vowels with 93% accuracy. c, The confusion matrix for the PNN on the test set. d, Representative examples of final-layer output spectra, which show the SHG-PNN’s prediction.

An example PNN, implemented experimentally using broadband optical SHG a, Input data are encoded into the spectrum of a laser pulse (Methods, Supplementary Section 2). To control transformations implemented by the broadband SHG process, a portion of the pulse’s spectrum is used as trainable parameters (orange). The physical computation result is obtained from the spectrum of a blue (about 390 nm) pulse generated within a χ⁽²⁾ medium. b, To construct a deep PNN, the outputs of the SHG transformations are used as inputs to subsequent SHG transformations, with independent trainable parameters. c, d, After training the SHG-PNN (see main text, Fig. 3), it classifies test vowels with 93% accuracy. c, The confusion matrix for the PNN on the test set. d, Representative examples of final-layer output spectra, which show the SHG-PNN’s prediction.

Source publication
Article
Full-text available
Deep-learning models have become pervasive tools in science and engineering. However, their energy requirements now increasingly limit their scalability ¹ . Deep-learning accelerators 2–9 aim to perform deep learning energy-efficiently, usually targeting the inference phase and often by exploiting physical substrates beyond conventional electronics...

Citations

... In this study, we introduce a deep-learning approach that facilitates efficient and reliable analysis of twisted van der Waals magnets. Deep learning has showcased remarkable success in tackling scientific challenges across various physical systems [22][23][24][25][26][27][28][29][30][31][32][33][34][35][36]. Specifically, the application of deep neural network (DNN) techniques to 2D magnetic systems has proven highly effective in extracting magnetic Hamiltonian parameters from magnetic domain images [22]. ...
Article
Full-text available
The application of twist engineering in van der Waals magnets has opened new frontiers in the field of two-dimensional magnetism, yielding distinctive magnetic domain structures. Despite the introduction of numerous theoretical methods, limitations persist in terms of accuracy or efficiency due to the complex nature of the magnetic Hamiltonians pertinent to these systems. In this study, we introduce a deep-learning approach to tackle these challenges. Utilizing customized, fully connected networks, we develop two deep-neural-network kernels that facilitate efficient and reliable analysis of twisted van der Waals magnets. Our regression model is adept at estimating the magnetic Hamiltonian parameters of twisted bilayer CrI3 from its magnetic domain images generated through atomistic spin simulations. The ‘generative model’ excels in producing precise magnetic domain images from the provided magnetic parameters. The trained networks for these models undergo thorough validation, including statistical error analysis and assessment of robustness against noisy injections. These advancements not only extend the applicability of deep-learning methods to twisted van der Waals magnets but also streamline future investigations into these captivating yet poorly understood systems.
... In particular, Deep-Echo State Network, with hyperbolic tangents as a nonlinear function, has shown remarkable performance improvements in the prediction tasks of nonlinear autoregressive moving average models and chaotic dynamical systems when the connection weights between the layers are trained by linear regression with the targets 29,30 . On the other hand, few attempts at multilayering in physical reservoirs or physical NN have been reported, and those that have been reported are limited to methods that either do not train the connection weights between reservoirs (i.e., the network is not highly flexible) 34,35 or train the connection weights between reservoirs (or layers of physical NN) using backpropagation algorithms that require complex calculations that rely on external circuits and have large computational costs 33,36 . It is particularly notable that there are no reports of deep-RC with nanodevices that are advantageous for integration to realize practical AI devices, and thus, it is not clear that multilayering is effective in improving the performance of physical reservoirs. ...
... In this network, the connection weights between reservoir layers are trained based on a simple linear regression algorithm, which provides a higher network flexibility compared to the scheme in which the connection weights between reservoirs are not trained 34,35 , and does not require a back-propagation algorithm. The backpropagation algorithm is an effective method that greatly improves the expressive power of the network, but it is difficult to apply to PRCs based on complex and dynamic nonlinearities (black box functions) originating from physical systems because it requires detailed information on the nonlinearities in the reservoir layer and their derivatives 33,36 . In this respect, the method of learning weights between layers by linear regression using targets is well suited for implementation in physical systems because it does not require detailed information on nonlinearities in physical systems (and their derivatives, etc.) and does not require inverse input of errors to physical systems in order to backpropagate the errors [28][29][30] . ...
Article
Full-text available
While physical reservoir computing is a promising way to achieve low power consumption neuromorphic computing, its computational performance is still insufficient at a practical level. One promising approach to improving its performance is deep reservoir computing, in which the component reservoirs are multi-layered. However, all of the deep-reservoir schemes reported so far have been effective only for simulation reservoirs and limited physical reservoirs, and there have been no reports of nanodevice implementations. Here, as an ionics-based neuromorphic nanodevice implementation of deep-reservoir computing, we report a demonstration of deep physical reservoir computing with maximum of four layers using an ion gating reservoir, which is a small and high-performance physical reservoir. While the previously reported deep-reservoir scheme did not improve the performance of the ion gating reservoir, our deep-ion gating reservoir achieved a normalized mean squared error of 9.08 × 10 ⁻³ on a second-order nonlinear autoregressive moving average task, which is the best performance of any physical reservoir so far reported in this task. More importantly, the device outperformed full simulation reservoir computing. The dramatic performance improvement of the ion gating reservoir with our deep-reservoir computing architecture paves the way for high-performance, large-scale, physical neural network devices.
... Unlike the χ (3) nonlinearity that is ubiquitous across all material platforms, the second-order (χ (2) ) nonlinearity is only present in noncentrosymmetric media. As a lower-order nonlinearity, χ (2) nonlinear effects are more efficient provided with proper phase matching, hence it is also interesting for nonlinear optical computation. ...
... Unlike the χ (3) nonlinearity that is ubiquitous across all material platforms, the second-order (χ (2) ) nonlinearity is only present in noncentrosymmetric media. As a lower-order nonlinearity, χ (2) nonlinear effects are more efficient provided with proper phase matching, hence it is also interesting for nonlinear optical computation. In this Article, we demonstrate a large-scale photonic NN that combines linear scattering and χ (2) optical nonlinearity for a wide range of ML applications. ...
... As a lower-order nonlinearity, χ (2) nonlinear effects are more efficient provided with proper phase matching, hence it is also interesting for nonlinear optical computation. In this Article, we demonstrate a large-scale photonic NN that combines linear scattering and χ (2) optical nonlinearity for a wide range of ML applications. The core processing unit consists in a disordered polycrystalline lithium niobate (LN) slab assembled from nanocrystals 41 , which not only is multiple scattering but also generates second-harmonic (SH) light assisted by random quasi-phase-matching 42 . ...
Article
Full-text available
Neural networks find widespread use in scientific and technological applications, yet their implementations in conventional computers have encountered bottlenecks due to ever-expanding computational needs. Photonic computing is a promising neuromorphic platform with potential advantages of massive parallelism, ultralow latency and reduced energy consumption but mostly for computing linear operations. Here we demonstrate a large-scale, high-performance nonlinear photonic neural system based on a disordered polycrystalline slab composed of lithium niobate nanocrystals. Mediated by random quasi-phase-matching and multiple scattering, linear and nonlinear optical speckle features are generated as the interplay between the simultaneous linear random scattering and the second-harmonic generation, defining a complex neural network in which the second-order nonlinearity acts as internal nonlinear activation functions. Benchmarked against linear random projection, such nonlinear mapping embedded with rich physical computational operations shows improved performance across a large collection of machine learning tasks in image classification, regression and graph classification. Demonstrating up to 27,648 input and 3,500 nonlinear output nodes, the combination of optical nonlinearity and random scattering serves as a scalable computing engine for diverse applications.
... Existing backpropagation-based training strategies for neuromorphic platforms include in-silico training, requiring a faithful digital model of the system, and physics-aware backpropagation [4], combining physical inference with a simulated backward pass which relaxes these constraints. However, it is a central question whether not only inference but also training can exploit the physical dynamics [5], making full use of the energy efficiency of neuromorphic systems. ...
Preprint
The widespread adoption of machine learning and artificial intelligence in all branches of science and technology has created a need for energy-efficient, alternative hardware platforms. While such neuromorphic approaches have been proposed and realised for a wide range of platforms, physically extracting the gradients required for training remains challenging as generic approaches only exist in certain cases. Equilibrium propagation (EP) is such a procedure that has been introduced and applied to classical energy-based models which relax to an equilibrium. Here, we show a direct connection between EP and Onsager reciprocity and exploit this to derive a quantum version of EP. This can be used to optimize loss functions that depend on the expectation values of observables of an arbitrary quantum system. Specifically, we illustrate this new concept with supervised and unsupervised learning examples in which the input or the solvable task is of quantum mechanical nature, e.g., the recognition of quantum many-body ground states, quantum phase exploration, sensing and phase boundary exploration. We propose that in the future quantum EP may be used to solve tasks such as quantum phase discovery with a quantum simulator even for Hamiltonians which are numerically hard to simulate or even partially unknown. Our scheme is relevant for a variety of quantum simulation platforms such as ion chains, superconducting qubit arrays, neutral atom Rydberg tweezer arrays and strongly interacting atoms in optical lattices.
... Isomorphic PNNs perform mathematical transformations by designing hardware for strict, operation-by-operation mathematical isomorphism, such as memristor crossbars for performing matrix-vector multiplications (see Fig. 1a). In contrast, broken-isomorphism PNNs break mathematical isomorphism to directly train the hardware's physical transformations 30 . One complication with broken-isomorphism PNNs is that it is often unknown what features are required for universal computation or universal function approximation. ...
... The notion of trainable broken-isomorphism PNNs emerged, in part, from untrained physical systems being used for machine learning: physical reservoir computing 46 . There were also several theoretical proposals 70-72 of broken-isomorphism PNNs prior to the general framework and experimental demonstrations presented in Ref. 30 . Broken-isomorphism PNNs could potentially perform certain computations much more efficiently than digital methods, leading to a path for more scalable, energy-efficient, and faster machine learning. ...
... Digital forward models might fail to encompass all physical phenomena in the actual PNN hardware, such as detection noise, misalignments, fabrication and material imperfections 30,34 , among other experimental factors; this forms a challenge for the accurate deployment of these trained PNNs at large scale, covering many devices. The computational demands of these forward model simulations form another potential hurdle. ...
Preprint
Physical neural networks (PNNs) are a class of neural-like networks that leverage the properties of physical systems to perform computation. While PNNs are so far a niche research area with small-scale laboratory demonstrations, they are arguably one of the most underappreciated important opportunities in modern AI. Could we train AI models 1000x larger than current ones? Could we do this and also have them perform inference locally and privately on edge devices, such as smartphones or sensors? Research over the past few years has shown that the answer to all these questions is likely "yes, with enough research": PNNs could one day radically change what is possible and practical for AI systems. To do this will however require rethinking both how AI models work, and how they are trained - primarily by considering the problems through the constraints of the underlying hardware physics. To train PNNs at large scale, many methods including backpropagation-based and backpropagation-free approaches are now being explored. These methods have various trade-offs, and so far no method has been shown to scale to the same scale and performance as the backpropagation algorithm widely used in deep learning today. However, this is rapidly changing, and a diverse ecosystem of training techniques provides clues for how PNNs may one day be utilized to create both more efficient realizations of current-scale AI models, and to enable unprecedented-scale models.
... Looking to the future, it is intriguing to contemplate the potential for advanced information acquisition systems to function inside increasingly intricate scattering spaces, facilitated by Neuroute's capacity to communicate physical information through scientific neural networks [51,52]. An additional valuable enhancement involves the intelligent Neuroute generation method, employing on-site learning to adapt to a more conventional open-loop operating system [53], thereby enhancing its resilience to unforeseen stimuli. ...
Article
Full-text available
Pushing the information states’ acquisition efficiency has been a long-held goal to reach the measurement precision limit inside scattering spaces. Recent studies have indicated that maximal information states can be attained through engineered modes; however, partial intrusion is generally required. While non-invasive designs have been substantially explored across diverse physical scenarios, the non-invasive acquisition of information states inside dynamic scattering spaces remains challenging due to the intractable non-unique mapping problem, particularly in the context of multi-target scenarios. Here, we establish the feasibility of non-invasive information states’ acquisition experimentally for the first time by introducing a tandem-generated adversarial network framework inside dynamic scattering spaces. To illustrate the framework’s efficacy, we demonstrate that efficient information states’ acquisition for multi-target scenarios can achieve the Fisher information limit solely through the utilization of the external scattering matrix of the system. Our work provides insightful perspectives for precise measurements inside dynamic complex systems.
... However, these methods only consider the setting of closed-loop control. The combination of an exact forward pass with an approximate backward pass, which our methods are based on, has also been explored in different settings in the deep learning literature, such as spiking [Lee et al., 2016] or physical [Wright et al., 2022] neural networks, or networks that include nondifferentiable procedures, for example used for rendering [Niemeyer et al., 2020] or combinatorial optimization [Vlastelica et al., 2020]. ...
Preprint
Full-text available
Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman's equation from dynamic programming, our work builds on Pontryagin's principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, demonstrating remarkable performance compared to existing baselines.
... One possibility is to act on the training procedure. For example by reintroducing back-propagation, as demonstrated in [13] and experimentally shown in [14]- [16], or by including knowledge about the dynamics of computational substrate in the training procedure, as in the "physics-aware" training scheme proposed in [17]. A second possibility, which is the one pursued in this work, is to concatenate multiple reservoirs with the purpose of forming a more powerful network. ...
Article
Full-text available
Speech recognition is a critical task in the field of artificial intelligence (AI) and has witnessed remarkable advancements thanks to large and complex neural networks, whose training process typically requires massive amounts of labeled data and computationally intensive operations. An alternative paradigm, reservoir computing (RC), is energy efficient and is well adapted to implementation in physical substrates, but exhibits limitations in performance when compared with more resource-intensive machine learning algorithms. In this work, we address this challenge by investigating different architectures of interconnected reservoirs, all falling under the umbrella of deep RC (DRC). We propose a photonic-based deep reservoir computer and evaluate its effectiveness on different speech recognition tasks. We show specific design choices that aim to simplify the practical implementation of a reservoir computer while simultaneously achieving high-speed processing of high-dimensional audio signals. Overall, with the present work, we hope to help the advancement of low-power and high-performance neuromorphic hardware.
... Deep learning models, particularly convolutional neural networks [8] and transformer models [9,10], have become a central area of research and are achieving a performance that surpasses human levels in supervised, unsupervised, and semi-supervised tasks. However, even though transformers and convolutional neural networks are applied in multitasking, the lack of universality in multiple-domain models, complexity and overfitting, difficulty interpreting model decisions, managing and balancing multiple inputs and outputs, maintaining model stability and accuracy, ensuring effective control of the model, and poor performance are some major challenges in multitask learning [11][12][13][14]. ...
Article
Full-text available
The application of deep learning has demonstrated impressive performance in computer vision tasks such as object detection, image classification, and image captioning. Though most models excel at performing single vision or language tasks, designing a single architecture that balances task specialization, performance, and adaptability across diverse tasks is challenging. To effectively address vision and language integration challenges, a combination of text embeddings and visual representation is necessary to understand dependencies of each subarea for multiple tasks. This paper proposes a single architecture that can handle various tasks in computer vision with fine-tuning capabilities for other specific vision and language tasks. The proposed model employs a modified DenseNet201 as a feature extractor (network backbone), an encoder-decoder architecture, and a task-specific head for inference. To tackle overfitting and improve precision, enhanced data augmentation and normalization techniques are employed. The model’s robustness is evaluated on over five datasets for different tasks: image classification, object detection, image captioning, and adversarial attack and defense. The experimental results demonstrate competitive performance compared to other works on CIFAR-10, CIFAR-100, Flickr8, Flickr30, Caltech10, and other task-specific datasets such as OCT, BreakHis, and so on. The proposed model is flexible and easy to adapt to new tasks, as it can also be extended to other vision and language tasks through fine-tuning with task-specific input indices.
... At their very core, neuromorphic computers (initially proposed by Mead 23,24 ) seek to mimic the intricate workings of the brain's nervous system and its vastly distributed nature. [25][26][27][28][29][30][31] There are two main approaches to develop neuromorphic systems. The first involves transferring and translating existing digital neural architectures to physical substrates, while the second pertains to creating new algorithms, which more accurately emulate the functionality of biological neurons and synapses. ...
Article
Full-text available
The ability of mechanical systems to perform basic computations has gained traction over recent years, providing an unconventional alternative to digital computing in off grid, low power, and severe environments, which render the majority of electronic components inoperable. However, much of the work in mechanical computing has focused on logic operations via quasi-static prescribed displacements in origami, bistable, and soft deformable matter. Here, we present a first attempt to describe the fundamental framework of an elastic neuromorphic metasurface that performs distinct classification tasks, providing a new set of challenges, given the complex nature of elastic waves with respect to scattering and manipulation. Multiple layers of reconfigurable waveguides are phase-trained via constant weights and trainable activation functions in a manner that enables the resultant wave scattering at the readout location to focus on the correct class within the detection plane. We further demonstrate the neuromorphic system’s reconfigurability in performing two distinct tasks, eliminating the need for costly remanufacturing.