Figure 6 - uploaded by Andrey Kormilitzin
Content may be subject to copyright.
Examples of two different interpolations.  

Examples of two different interpolations.  

Source publication
Article
Full-text available
In these notes, we wish to provide an introduction to the signature method, focusing on its basic theoretical properties and recent numerical applications. The notes are split into two parts. The first part focuses on the definition and fundamental properties of the signature of a path, or the path signature. We have aimed for a minimalistic approa...

Context in source publication

Context 1
... three auxiliary points {(1, 1), (2, 4), (3, 2)} added to construct the rectilinear path in Fig. 6b denoted by empty red circles. For various numerical applications, the original data might be mapped into different forms to exhibit its structure. One of the examples of such a mapping is the cumulative sum, or sequence of partial sums. More precisely the cumulative sum is defined ...

Similar publications

Article
Full-text available
In the past, many studies have been carried out on eye-gaze input; however, in this study, we developed an eye-glance input interface that tracks a combination of short eye movements. Unlike eye-gaze input that requires high accuracy measurements, eye-glance input can be detected with only a rough indication of the direction of the eye movements, m...

Citations

... Recently, several authors have promoted the practical utility of the concept of path signature in data science, where it offers features that encapsulate information encoded in sequential data, i.e., chronologically ordered sequences of data points such as, for instance, time-series in finance [CS24,CK16,LM24]. Its computational effectiveness derives, in part, from the ability to calculate the path signature in linear time, which is made possible by a dynamic programming approach rooted in Chen's identity (see (1.7) below as well as [DEFT22]). ...
Preprint
In the last decade, the concept of path signature has found great success in data science applications, where it provides features describing the path. This is partly explained by the fact that it is possible to compute the signature of a path in linear time, owing to a dynamic programming principle, based on Chen's identity. The path signature can be regarded as a specific example of product or time-/path-ordered integral. In other words, it can be seen as a 1-parameter object build on iterated integrals over a path. Increasing the number of parameters by one, which amounts to considering iterated integrals over surfaces, is more complicated. An observation that is familiar in the context of higher gauge theory where multiparameter iterated integrals play an important role. The 2-parameter case is naturally related to a non-commutative version of Stokes' theorem, which is understood to be fundamentally linked to the concept of crossed modules of groups. Indeed, crossed modules with non-trivial kernel of the feedback map permit to compute features of a surface that go beyond what can be expressed by computing line integrals along the boundary of a surface. A good candidate for the crossed analog of free Lie algebra then seems to be a certain free crossed module over it. Building on work by Kapranov, we study the analog to the classical path signature taking values in such a free crossed module of Lie algebra. In particular, we provide a Magnus-type expression for the logarithm of surface signature as well as a sewing lemma for the crossed module setting.
... We use path signature (specifically the log signature, a more concise representation) [34,35] to represent both a sin-gle breath cycle and a contextual window of several breath cycles. ...
... The path signature itself contains redundant information, and some terms can be inferred from a combination of other terms through the shuffle product identity [34]: ...
Article
Full-text available
Patient–ventilator asynchrony (PVA) refers to instances where a mechanical ventilator’s cycles are desynchronised from the patient’s breathing efforts, and may result in patient discomfort and potential ineffective ventilation. Typically, they are identified with constant monitoring by trained clinicians. Such expertise is often limited; therefore, it is desirable to automate PVA detection with machine learning methods. However, there are three major challenges to applying machine learning to the problem: data collected from non-invasive ventilation are often noisy, there exists high variability between patients or between setting changes, and manual annotations of PVA events are not always consistent. To produce meaningful inference from such noisy data, a model needs to not only provide a measure of uncertainty, but also take into account potential inconsistencies in the training signal it is based on. In this work, we propose a conditional latent Gaussian mixture generative classifier with noisy label correction, which is capable of capturing variations within and between classes, providing well-calibrated class probabilities, detecting unlikely input instances that deviates from training data, while also taking into account possible mislabelling of event classes. We show that our model is able to match the performance of a well-tuned gradient boosting classifier, but also produce better calibrated predictions and smaller performance variability between patients.
... We model this as a 1-dimensional time-series and apply a signature transform to each melt curve. The signature method is a non-parametric feature extractor that computes a series of integrals along a data path that fully capture its order and area [32,33]. The signature method is optionally time-shift invariant and is sensitive to the geometric shape of the path. ...
Article
Full-text available
Surveillance for genetic variation of microbial pathogens, both within and among species, plays an important role in informing research, diagnostic, prevention, and treatment activities for disease control. However, large-scale systematic screening for novel genotypes remains challenging in part due to technological limitations. Towards addressing this challenge, we present an advancement in universal microbial high resolution melting (HRM) analysis that is capable of accomplishing both known genotype identification and novel genotype detection. Specifically, this novel surveillance functionality is achieved through time-series modeling of sequence-defined HRM curves, which is uniquely enabled by the large-scale melt curve datasets generated using our high-throughput digital HRM platform. Taking the detection of bacterial genotypes as a model application, we demonstrate that our algorithms accomplish an overall classification accuracy over 99.7% and perform novelty detection with a sensitivity of 0.96, specificity of 0.96 and Youden index of 0.92. Since HRM-based DNA profiling is an inexpensive and rapid technique, our results add support for the feasibility of its use in surveillance applications. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-024-05747-0.
... Moreover, the expected signatures uniquely determine the distributions of paths, akin to the role of moment generating functions [41]. A more comprehensive exposition, rigorous formulations, and visual examples, are given in [34,36,42]. ...
Preprint
Full-text available
Portfolio allocation represents a significant challenge within financial markets, traditionally relying on correlation or covariance matrices to delineate relationships among stocks. However, these methodologies assume time stationarity and only capture linear relationships among stocks. In this study, we propose to substitute the conventional Pearson's correlation or covariance matrix in portfolio optimization with a similarity matrix derived from the signature. The signature, a concept from path theory, provides a unique representation of time series data, encoding their geometric patterns and inherent properties. Furthermore, we undertake a comparative analysis of network structures derived from the correlation matrix versus those obtained from the signature-based similarity matrix. Through numerical evaluation on the Standard & Poor's 500, we assess that portfolio allocation utilizing the signature-based similarity matrix yielded superior results in terms of cumulative log-returns and Sharpe ratio compared to the baseline network approach based on Pearson's correlation. This assessment was conducted across various portfolio optimization strategies. This research contributes to portfolio allocation and financial network representation by proposing the use of signature-based similarity matrices over traditional correlation or covariance matrices.
... For a more comprehensive and precise mathematical definition, please refer to [6,14,15]. ...
Chapter
Full-text available
We use Graph Neural Networks on signature-augmented graphs derived from time series for Predictive Maintenance. With this technique, we propose a solution to the Intelligent Data Analysis Industrial Challenge 2024 on the newly released SCANIA Component X dataset. We describe an Exploratory Data Analysis and preprocessing of the dataset, proposing improvements for its description in the SCANIA paper.
... However, we can grasp its properties more comprehensively with integral quantities (iterated integrals) concerning any combination of variables along the path using the Signature method (e.g., Chevyrev & Kormilitzin, 2016). The signature is a mathematical concept originating in rough path theory (e.g., Chevyrev & Kormilitzin, 2016;Lyons et al., 2007). ...
... However, we can grasp its properties more comprehensively with integral quantities (iterated integrals) concerning any combination of variables along the path using the Signature method (e.g., Chevyrev & Kormilitzin, 2016). The signature is a mathematical concept originating in rough path theory (e.g., Chevyrev & Kormilitzin, 2016;Lyons et al., 2007). It can be computed using a mathematical operation called the iterated integral, which integrates a path over a sequence of intervals to generate a sequence of higher-level functions. ...
Article
Full-text available
An array of atmospheric profile observations consists of three‐dimensional vectors representing pressure, temperature, and humidity, with each profile forming a continuous curve in this three‐dimensional space. In this paper, the Signature method, which can quantify a profile's curve, was adopted for the atmospheric profiles, and the accuracy of profile representations was investigated. The description of profiles by the signature was confirmed with adequate accuracy. The machine‐learning‐based model, developed using the signature, exhibited a high level of annual accuracy with minimal absolute mean differences in temperature and water vapor mixing ratio (<2.0 K or g kg⁻¹). Notably, the model successfully captured the vertical structure and atmospheric instability, encompassing drastic variations in water vapor and temperature, even during intense rainfall. These results indicate the Signature method can comprehensively describe the vertical profile with information on how ordered values are correlated. This concept would potentially improve the representation of the atmospheric vertical structure.
... Our approach differs in that we aim to create a large statistical ensemble, rather than a larger dataset. The signature of a time series has recently emerged in the machine learning community as a universal non-parametric descriptor of a stream of time ordered data [17], [18]. The signature transform has been used as features extraction mechanism in neural network-like models [19] and has been integrated in kernel-based models [20], [21]. ...
... The signature layer and normalization procedure are both differentiable thanks to formula (18) and the gradient computed in Corollary 11. ...
... There are other possible changes that can be easily implemented. For example, we can introduce any different signature computation algorithm, such as the log-Signature transform [18], or any time series transformation. Indeed, we have been using the time augmentation because it has a relevant role in various theoretical results (Proposition 3 and Theorem 4), but it can be replaced by various transformation. ...
Article
Full-text available
Time series classification tasks play a crucial role in extracting relevant information from data equipped with a temporal structure. In various scientific domains, such as biology or finance, this kind of data comes from complex and hardly predictable phenomena. Therefore, classification algorithms for time series should be able to deal with the uncertainty contained into data and to capture the relevant statistical properties of the underlying phenomenon. The main object of interest of this work is the development of a model for time series that tackles the classification task by interpreting time series as realisations of stochastic processes, the natural mathematical description of chaotic behaviour. The focus thus is on time series that can be thought as signals of some nature, and that convey some kind of statistical information.We propose a data-driven feature extraction model for time series built upon a Gaussian process based data augmentation and on the expected signature. The signature is a fundamental object that describes paths, much alike Fourier or wavelet expansion, but in a non-linear fashion. Likewise, the expected signature provides a statistical description of the law of stochastic processes. One of the main features is that an optimal feature extraction is learnt through the supervised task that uses the model. The model can be adapted to more complicated supervised tasks, as it integrates seamlessly in a neural network architecture and is fully compatible with back-propagation, and it can be easily accommodated to perform regressive tasks. The effectiveness of the model is demonstrated with numerical experiments on some benchmark time series.
... It is noninvasive, accessible, and has a high temporal resolution. It has been found 4-6 that motor movement as well as motor imagery (MI, i.e. imagination of movement without actually moving) cause modulation in SMR manifested as a decrease of power in the alpha (8)(9)(10)(11)(12)(13) Hz)/beta (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) frequency bands, known as event-related desynchronization (ERD), followed by an increase in the beta band, also known as beta rebound or event-related synchronization (ERS), after the actual or imagined movement. Movement or MI of different body parts is associated with an SMR modulation of different regions of the sensorimotor cortex, which leads to discriminant brain signals that allows the control of MI BCI. ...
... Suppose X t : [a, b] → R d is signal for which we have already computed the path signature, and Y t : [b, c] → R d is a newly obtained signal. We can compute the signature of the concatenated path (X * Y ) t : [a, c] → R d by taking the tensor product of the two signatures, 14 . In practice, Y t would consist of one additional time point, and by using the fused multiply-exponentiate algorithm 17 , the complexity of computing ...
Article
Full-text available
Brain-computer interfaces (BCIs) allow direct communication between one’s central nervous system and a computer without any muscle movement hence by-passing the peripheral nervous system. They can restore disabled people’s ability to interact with their environment, e.g. communication and wheelchair control. However, to this day their performance is still hindered by the non-stationarity of electroencephalography (EEG) signals, as well as their susceptibility to noise from the users’ environment and from their own physiological activity. Moreover, a non-negligible amount of users struggle to use BCI systems based on motor imagery. In this paper, a new method based on the path signature is introduced to tackle this problem by using features which are different from the usual power-based ones. The path signature is a series of iterated integrals computed from a multidimensional path. It is invariant under translation and time reparametrization, which makes it a robust feature for multichannel EEG time series. The performance can be further boosted by combining the path signature with the gold standard Riemannian classifier in the BCI field exploiting the geometric structure of symmetric positive definite (SPD) matrices. The results obtained on publicly available datasets show that the signature method is more robust to inter-user variability than classical ones, especially on noisy and low-quality data. Hence, this study paves the way towards the use of mathematical tools that until now have been neglected, in order to tackle the EEG-based BCI variability issue. It also sheds light on the lead-lag relationship captured by path signature which seems relevant to assess the underlying neural mechanisms.
... Over the past decade, the ability of the signature to encode information about a path in an efficient and robust way has made it a powerful tool in the analysis of time-ordered data. Examples of applications of signatures include the recognition of handwriting [15,45] and gestures [33], analysis of financial data [24,30], statistical inference of SDEs [37], analysis of psychiatric and physiological data [1,35], topological data analysis [10], neural networks [23], and kernel learning [11,25]. 1 See [7] for a gentle introduction to the path signature and some of its early applications. ...
... 2.3.3 to satisfy (2.10) with β = 2 and degξ = −1.5. 7 In the experiments below we only consider functions f τ ∈ M 4 α with degτ ≤ 5. ...
... See Remark 2.5 for a motivation behind taking these particular widths.7 This is motivated by the Hölder regularity of space-time white noise being −1.5 − ε for any small ε > 0 and the fact that the heat operator I increases the Hölder regularity by 2. ...
Article
Full-text available
We investigate the use of models from the theory of regularity structures as features in machine learning tasks. A model is a polynomial function of a space–time signal designed to well-approximate solutions to partial differential equations (PDEs), even in low regularity regimes. Models can be seen as natural multi-dimensional generalisations of signatures of paths; our work therefore aims to extend the recent use of signatures in data science beyond the context of time-ordered data. We provide a flexible definition of a model feature vector associated to a space–time signal, along with two algorithms which illustrate ways in which these features can be combined with linear regression. We apply these algorithms in several numerical experiments designed to learn solutions to PDEs with a given forcing and boundary data. Our experiments include semi-linear parabolic and wave equations with forcing, and Burgers’ equation with no forcing. We find an advantage in favour of our algorithms when compared to several alternative methods. Additionally, in the experiment with Burgers’ equation, we find non-trivial predictive power when noise is added to the observations.
... The signature transform is a mathematical technique used in TSC as a feature extraction tool, where the discrete time series is transformed in continuous paths through interpolation techniques, and an infinite set of features, known as signatures, can be computed from the new sequence Chevyrev and Kormilitzin [2016]. These signatures are combined with a traditional ML classifier to produce an output. ...
Article
Full-text available
Today, grid resilience as a feature has become non-negotiable, significantly when power interruptions can impact the economy. The widespread popularity of Intelligent Electronic Devices (IED) operating as smart meters enables an immense amount of fine-grained electricity consumption data to be collected. However, risk can still exist in the Smart Grid (SG), as valuable data are exchanged among SG systems; theft or alteration of this data could violate consumer privacy. The Internet of Things for Smart Grid (IoSGT) is a promising ecosystem of different technologies that coordinate with each other to pave the way for new SG applications and services. As a use case of IoSGT for future SG applications and services, fraud detection, ıNon-technical losses (NTL), emerges as an important application for Smart Grid (SG) scenarios. A substantial amount of electrical energy is lost throughout the distribution system, and these losses are divided into two types: technical and non-technical. Non-technical losses (NTL) are any electrical energy consumed and not invoiced. They may occur due to illegal connections, issues with energy meters such as delay in the installation or reading errors, contaminated, defective, or non-adapted measuring equipment, very low valid consumption estimates, faulty connections, and disregarded customers. Non-technical losses are the primary cause of revenue loss in the SG. According to a recent study, electrical utilities lose $89.3 Billion per year due to non-technical losses. This article proposes ensemble predictor-based time series classifiers for NTL detection. The proposed predictor ministers the user’s energy consumption as a data input for classification, from splitting the data to executing the classifier.It encompasses the temporal aspects of energy consumption data during preprocessing, training, testing, and validation stages. The suggested predictor is Time Series (TS) oriented, from data splitting to the classifier’s performance. Overall, our best results have been recorded in the fraud detection-based time series classifiers (TSC) model scoring an improvement in the empirical performance metrics by 10% or more over the other developed models.