About
44
Publications
19,765
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
569
Citations
Publications
Publications (44)
Data-driven predictive maintenance needs to understand high-dimensional “in-motion” data, for which fundamental machine learning tools, such as Principal Component Analysis (PCA), require computation-efficient algorithms that operate near-real-time. Despite the different streaming PCA flavors, there is no algorithm that precisely recovers the princ...
The increasingly interconnected and instrumented world, provides a deluge of data generated by multiple sensors in the form of continuous streams. Efficient stream processing needs control over the number of useful variables. This is because maintaining data structure in reduced sub-spaces, given that data is generated at high frequencies and is ty...
The need for real-time analysis is still spreading and the number of available streaming sources is increasing. The recent literature has plenty of works on Data Stream Processing (DSP). In a streaming environment, the data incoming rate varies over time. The challenge is how to efficiently deploy these applications in a cluster. Several works have...
With the explosion of data sizes, extracting valuable insight out of big data becomes increasingly difficult. New challenges begin to emerge that complement traditional, long-standing challenges related to building scalable infrastructure and runtime systems that can deliver the desired level of performance and resource efficiency. This vision pape...
The global deployment of cloud datacenters is enabling large scale scientific workflows to improve performance and deliver fast responses. This unprecedented geographical distribution of the computation is doubled by an increase in the scale of the data handled by such applications, bringing new challenges related to the efficient data management a...
Data-intensive computing is now starting to be considered as the basis for a new, fourth paradigm for science. Two factors are encouraging this trend. First, vast amounts of data are becoming available in more and more application areas. Second, the infrastructures allowing to persistently store these data for sharing and processing are becoming a...
The global deployment of cloud datacenters is enabling large web services to deliver fast response to users worldwide. This unprecedented geographical distribution of the computation also brings new challenges related to the efficient data management across sites. High throughput, low latencies, cost- or energy-related trade-offs are just a few con...
The easily-accessible computation power offered by cloud infrastructures coupled with the revolution of Big Data are expanding the scale and speed at which data analysis is performed. In their quest for finding the Value in the 3 Vs of Big Data, applications process larger data sets, within and across clouds. Enabling fast data transfers across geo...
The increasing scale at which data processing is being performed nowadays calls for data management systems that enable high-performance data exchanges among geographically remote instances of large web services. In this demonstration we show how JetStream can increase the transfer rate of events which are streamed between geographically remote clo...
Today's continuously growing cloud infrastructures provide support for processing ever increasing amounts of scientific data. Cloud resources for computation and storage are spread among globally distributed datacenters. Thus, to leverage the full computation power of the clouds, global data processing across multiple sites has to be fully enabled....
Infrastructure clouds revolutionized the way in which we approach resource procurement by providing an easy way to lease compute and storage resources on short notice, for a short amount of time, and on a pay-as-you-go basis. This new opportunity, however, introduces new performance trade-offs. Making the right choices in leveraging different types...
Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with in...
Scientific workflows typically communicate data between tasks using files. Currently, on public clouds, this is achieved by using the cloud storage services, which are unable to exploit the workflow semantics and are subject to low throughput and high latencies. To overcome these limitations, we propose an alternative leveraging data locality throu...
A large spectrum of scientific applications, some generating data volumes exceeding petabytes, are currently being ported on clouds to build on their inherent elasticity and scalability. One of the critical needs in order to deal with this "data deluge" is an efficient, scalable and reliable storage. However, the storage services proposed by cloud...
The continuous growth of sensor networks, stock exchanges, climate monitoring or scientific applications produces new streaming data at increasing rates. Managing and processing such data, sometimes generated from multiple geographical locations, raises important challenges as it requires real-time processing or data aggregation. Conventional solut...
The emergence of cloud computing has brought the opportunity to use large-scale compute infrastructures for a broader and broader spectrum of applications and users. As the cloud paradigm gets attractive for the ‘elasticity’ in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from...
With the emergence of cloud computing as an alternative to supercomputers to support data intensive applications, MapReduce has arisen as a major programming model for data analysis on clouds. In this context, reduce-intensive algorithms are becoming increasingly useful in applications such as data clustering, classification and mining. However, pl...
Joint genetic and neuroimaging data analysis on large cohorts of subjects is a new approach used to assess and understand the variability that exists between individuals. This approach has remained poorly understood so far and brings forward very significant challenges, as progress in this field can open pioneering directions in biology and medicin...
The emergence of cloud computing brought the opportunity to use large-scale compute infrastructures for a broad spectrum of applications and users. As the cloud paradigm gets attractive for the " elasticity'' in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latenc...
The emergence of cloud computing brought the opportunity to use large-scale computational infrastructures for a broad spectrum of scientific applications. As more and more cloud providers and technologies appear, scientists are faced with an increasingly difficult problem of evaluating various offerings, like public and private clouds, and deciding...
This paper presents a general solution for computing the electric potential in homogeneous and non-homogeneous media using a Monte Carlo-based method. The implementation relies on an original framework that uses FPGAs to improve the computational speed.
The calculation process relies on a series of both geometric and electric parameters describing...
In the last years the interest for magnetic stimulation of the human nervous tissue has increased considerably, because this technique has proved its utility and applicability both as a diagnostic and as a treatment instrument. Research in this domain is aimed at removing some of the disadvantages of the technique: the lack of focalization of the s...
The implementation of high-precision floating-point applications on reconfigurable hardware requires large multipliers. Full multipliers are the core of floating-point multipliers. Truncated multipliers, trading resources for a well-controlled accuracy degradation, are useful building blocks in situations where a full multiplier is not needed.
This...
This paper presents a new method for implementing TRNGs in FPGA devices, which relies on filling a region or the whole FPGA
chip close to its maximal capacity and exploiting the interconnection network as intensely as possible. This way, there are
strong chances for the design to exhibit a nondeterministic behavior. Our first design is a computatio...
This article studies two common situations where the flexibility of FPGAs allows one to design application-specific floating-point operators which are more efficient and more accurate than those offered by processors and GPUs. First, for applications involving the addition of a large number of floating-point values, an ad-hoc accumulator is propose...
This paper presents a new method for implementing TRNGs in FPGA devices. The design is based on filling the chip close to its maximal capacity and exploiting the interconnection network as intensely as possible. This way, there are strong chances for the design to exhibit a nondeterministic behavior. Our design is a computationally intensive core t...
This paper presents an original method for creating TRNGs in Xilinx FPGAs. The design is based on agglomerating active logic in a given region of the FPGA chip, either globally or locally. No timing constraints were used in this design. A series of experiments conducted on different architectural variants lead to the conclusion that mapping logic b...
Floating-point operators on FPGAs do not have to be identical to the ones available in processors. This article studies two common situations where the flexibility of FPGAs allows one to design application-specific floating-point operators. First, for applications involving the addition of a large number of floating-point values, an ad-hoc accumula...
The paper presents a new software strategy for generating true random numbers, by creating several threads and letting them compete unsynchronized for a shared variable, whose value is read-modified-updated by each thread repeatedly. The generated sequence of random numbers consists of the final values of the shared variable. Our strategy is based...
It has been shown that FPGAs could outperform high-end microprocessors on floating-point computations thanks to massive parallelism. However, most previous studies re-implement in the FPGA the operators present in a processor. This is a safe and relatively straightforward approach, but it doesn't exploit the greater flexibility of the FPGA. This ar...
In the last years the interest for magnetic stimulation of the human nervous tissue has increased, because this technique has proved its utility and applicability both as a diagnostic and as a treatment instrument. Research in this domain is aimed at eliminating some disadvantages of the technique: the lack of focalization of the stimulated human b...
In the last years the interest for magnetic stimulation of the human nervous tissue has increased considerably, because this technique has proved its utility and applicability both as a diagnostic and as a treatment instrument. Research in this domain is aimed at removing some of the disadvantages of the technique: the lack of focalization of the s...
It has been shown that FPGAs could outperform high-end microprocessors on floating-point computations thanks to massive parallelism. However, most previous studies re-implement in the FPGA the operators present in a processor. This conservative approach is relatively straightforward, but it doesn't exploit the greater flexibility of the FPGA. We su...
In the last years the interest for magnetic stimulation of the human nervous tissue has increased considerably, because this technique has proved its utility and applicability both as a diagnostic and as a treatment instrument. Research in this domain is aimed at removing some of the disadvantages of the technique: the lack of focalization of the s...
This article studies two common situations where the flex-ibility of FPGAs allows one to design application-specific floating-point operators which are more efficient and more accurate than those offered by processors and GPUs. First, for applications involving the addition of a large number of floating-point values, an ad-hoc accumulator is propos...
In the last years the interest for magnetic stimulation of the hu-man nervous tissue has increased, because this technique has proved its utility and applicability both as a diagnostic and as a treatment instrument. Research in this domain is aimed at eliminating some disadvantages of the technique: the lack of focalization of the stimulated human...
In this report we address the problem of data management in clouds for the MapRe-duce programing model. In order to improve the performance of data-intensive appli-cations, we designed a distributed file system deployed on the computation nodes of public clouds. This approach exploits the data locality principle by moving the data close to the comp...