Memory hierarchy. Typical latencies for data transfers from the CPU to each of the levels are shown. The numbers shown here are only an indication, and the actual numbers will depend on the exact architecture under consideration and the access sequence of the program.

Source publication

How to Write Fast Numerical Code: A Small Introduction

Conference Paper

Full-text available

Jan 2007

The complexity of modern computing platforms has made it ex- tremely difficult to write numerical code that achieves the best possible perfor- mance. Straightforward implementations based on algorithms that minimize the operations count often fall short in performance by at least one order of magni- tude. This tutorial introduces the reader to a se...

Context 1

... hierarchy. Most computer systems use a memory hierarchy to bridge the speed gap between the processor(s) and its connection to main memory. As shown in Fig. 5, the highest levels of the memory hierarchy contain the fastest and the smallest memory systems, and vice ...

View in full-text

Extraction of the Electromagnetic Parameters of a Metamaterial Using the Nicolson–Ross–Weir Method: An Analysis Based on Global Analytic Functions and Riemann Surfaces

Article

Full-text available

Nov 2022

The characterization of electromagnetic metamaterials (MMs) plays a fundamental role in their engineering processes. To this end, the Nicolson–Ross–Weir (NRW) method is intensively used to recover the effective parameters of MMs, even though this is affected by the branch ambiguity problem. In this paper, we face this issue in the context of global analytic functions and Riemann surfaces. This point of view allows us to rigorously demonstrate the mathematical foundations of an algorithmic approach for avoiding the branch ambiguity problem, in which the phase unwrapping method is merged with K-K relations for recovering the effective parameters of an MM. In addition, exploiting the intimate relationship between the K-K relations and the Hilbert transform, a simple variant of the above algorithm is presented.

A practical guide to writing a radiative transfer code

Article

Oct 2021
COMPUT PHYS COMMUN

Using our decades-long experience in radiative transfer (RT) code development for Earth science, we endeavor to reduce the knowledge gap of bringing RT from theory to code quickly. Despite numerous classic and recent literature, it is still hard to develop an RT code from scratch within a few weeks. It is equally hard to understand, not to mention modify, an existing “monster” RT code, for which the developer is either located remotely or has retired. Following the format of “Numerical Recipes” by Press et al., we collocate in this paper small pieces of necessary theory with corresponding small pieces of RT code. These are arranged in an order that is natural for code development, which is often opposite of the natural order for laying out the theoretical basis. We focus on the transfer of unpolarized monochromatic solar radiation in a plane-parallel atmosphere over a reflecting surface. Both the surface and the atmosphere are homogeneous (uniform) at all directions. The multiple scattering is numerically solved using the deterministic method of Gauss-Seidel iterations. Except for the presented Python-Numba open-source RT code gsit, the paper does not report any new scientific results, but rather serves as an academic demonstration. If development time is an issue or the reader is familiar with basic concepts of RT theory, we recommend proceeding directly to Sec.3 “RT code development”. Program summary Program title: gsit (pronounced “jeezit”) CPC Library link to program files: https://doi.org/10.17632/d3zt5zhx49.1 Developer's repository link: https://github.com/korkins/gsit Licensing provisions: MIT Programming language: Python 3 Nature of problem: We present a tutorial in Python code for deterministic (non-stochastic) numerical simulation of multiple scattering of monochromatic solar light in a plane-parallel Earth atmosphere bounded from below by a reflecting surface. The problem is solved in a simplified form (i.e., uniform atmosphere, no polarization, uniform surface reflectance, etc.) to better explain numerical features, rather than physics, of propagation of light in the atmosphere. Solution method: The method of Gauss-Seidel iterations. It relies on the Fourier decomposition of the Radiative Transfer Equation over azimuth, Gauss quadrature for numerical integration over the zenith and iterative process for integration over height (optical depth) with analytical (hence known) single scattering approximation being the starting point. The method is relatively simple to code and does not require any external libraries.

Fast Updates for Least‐Squares Rotational Alignment

Article

Full-text available

Jun 2021
COMPUT GRAPH FORUM

Across computer graphics, vision, robotics and simulation, many applications rely on determining the 3D rotation that aligns two objects or sets of points. The standard solution is to use singular value decomposition (SVD), where the optimal rotation is recovered as the product of the singular vectors. Faster computation of only the rotation is possible using suitable parameterizations of the rotations and iterative optimization. We propose such a method based on the Cayley transformations. The resulting optimization problem allows better local quadratic approximation compared to the Taylor approximation of the exponential map. This results in both faster convergence as well as more stable approximation compared to other iterative approaches. It also maps well to AVX vectorization. We compare our implementation with a wide range of alternatives on real and synthetic data. The results demonstrate up to two orders of magnitude of speedup compared to a straightforward SVD implementation and a 1.5‐6 times speedup over popular optimized code.

GEOWARE: An optimised Matlab software for determination of high-frequency geoid model using relational operators

Conference Paper

Full-text available

Oct 2018

We present an optimised software (GEOWARE) for determination of high-frequency geoid height using terrestrial gravity measurements. The optimisation of Stokes integral is based on the extraction of a local area with a radius of a few hundreds kilometres around the computation point which complies with the specified spherical cap sizes. The extraction step is highly important because it detaches the dispensable compartments of the grid which are far from the computation domain. That makes it convenient to avoid passing through the compartments of the entire grid to test whether the spherical distances comply with the truncated cap size or not. Matlab relational operators and vectorisation are powerful optimisation tools because they can replace conditional statements and nested loops efficiently. GEOWARE has been compared with a non-optimised code over different sizes of cap size and it shows a significant improvement in the performance. The run time of GEOWARE in all cap sizes has up to 5 times smaller than that of the code before optimisation. GEOWARE is also compatible with modified Stokes, Newton and Poisson kernels.

Learning from Optimizing Matrix-Matrix Multiplication

Conference Paper

Full-text available

May 2018

Accelerated AC contingency calculation on commodity multi-core SIMD CPUs

Article

Full-text available

Oct 2014

Multi-core CPUs with multiple levels of parallelism (i.e. data level, instruction level and task/core level) have become the mainstream CPUs for commodity computing systems. Based on the multi-core CPUs, in this paper we developed a high performance computing framework for AC contingency calculation (ACCC) to fully utilize the computing power of commodity systems for online and real time applications. Using Woodbury matrix identity based compensation method, we transform and pack multiple contingency cases of different outages into a fine grained vectorized data parallel programming model. We implement the data parallel programming model using SIMD instruction extension on x86 CPUs, therefore, fully taking advantages of the CPU core with SIMD floating point capability. We also implement a thread pool scheduler for ACCC on multi-core CPUs which automatically balances the computing loads across CPU cores to fully utilize the multi-core capability. We test the ACCC solver on the IEEE test systems and on the Polish 3000-bus system using a quad-core Intel Sandy Bridge CPU. The optimized ACCC solver achieves close to linear speedup (SIMD width multiply core numbers) comparing to scalar implementation and is able to solve a complete N-1 line outage AC contingency calculation of the Polish grid within one second on a commodity CPU. It enables the complete ACCC as a real-time application on commodity computing systems.

Dominant speed factors of active set methods for fast MPC

Article

Aug 2014
OPTIM CONTR APPL MET

The paper presents a review of active set (AS) algorithms that have been deployed for implementation of fast model predictive control (MPC). The main purpose of the survey is to identify the dominant features of the algorithms that contribute to fast execution of online MPC and to study their influence on the speed. The simulation study is conducted on two benchmark examples where the algorithms are analyzed in the number of iterations and in the workload per iteration. The obtained results suggest directions for potential improvement in the speed of existing AS algorithms. Copyright © 2014 John Wiley & Sons, Ltd.

Real-time multi-target tracking: A study on color-texture covariance matrices and descriptor/operator switching.

Thesis

Full-text available

Dec 2013

Andres Romero

This thesis proposes a computer vision system for detecting and tracking multiple targets in videos. The covariance matching method is the guiding thread of our work because it offers a compact representation of the target by embedding heterogeneous features in a elegant way. Therefore, it is efficient both for tracking and recognition. Four categories of contributions are proposed. The first one deals with the adaptation to a changing context, following two aspects. A preliminary work consists in the adaptation of color according to lighting variations and relevance of the color. Then, literature shows a wide variety of tracking methods, which have both advantages and limitations, depending on the object to track and the context. Here, a deterministic method is developed to automatically adapt the tracking method to the context through the cooperation of two complementary techniques. A first proposition combines covariance matching for modeling characteristics texture-color information with optical flow (KLT) of a set of points uniformly distributed on the object. A second technique associates covariance and Mean-Shift. In both cases, the cooperation allows a good robustness of the tracking whatever the nature of the target, while reducing the global execution times. The second contribution is the definition of descriptors both discriminative and compact to be included in the target representation. To improve the ability of visual recognition of descriptors two approaches are proposed. The first is an adaptation operators (LBP to Local Binary Patterns ) for inclusion in the covariance matrices . This method is called ELBCM for Enhanced Local Binary Covariance Matrices. The second approach is based on the analysis of different spaces and color invariants to obtain a descriptor which is discriminating and robust to illumination changes. The various experiments implemented in tracking and recognition (texture , faces , pedestrians ) show very promising results. The third contribution addresses the problem of multi-target tracking, the difficulties of which are the matching ambiguities, the occlusions, the merging and division of trajectories. We also propose the re-identification of targets using a set of spatially adapted covariance descriptors and minimizing a function of discrete energy that takes into account the kinematic behavior of the whole objects and model their appearance. Finally to speed algorithms and provide a usable quick solution in embedded applications this thesis proposes a series of optimizations to accelerate the matching using covariance matrices. Data layout transformations, vectorizing the calculations (using SIMD instructions) and some loop transformations had made possible the real-time execution of the algorithm not only on Intel classic but also on embedded platforms (ARM Cortex A9 and Intel U9300).

Power system probabilistic and security analysis on commodity high performance computing systems

Conference Paper

Full-text available

Nov 2013

Large scale integration of stochastic energy resources in power systems requires probabilistic analysis approaches for comprehensive system analysis. The large-varying grid condition on the aging and stressed power system infrastructures also requires merging of offline security analyses into online operation. Meanwhile in computing, the recent rapid hardware performance growth comes from the more and more complicated architecture. Fully utilizing the computing power for specific applications becomes very difficult. Given the challenges and opportunities in both the power system and the computing fields, this paper presents the unique commodity high performance computing system solutions to the following fundamental tools for power system probabilistic and security analysis: 1) a high performance Monte Carlo simulation (MCS) based distribution probabilistic load flow solver for real-time distribution feeder probabilistic solutions. 2) A high performance MCS based transmission probabilistic load flow solver for transmission grid probabilistic analysis. 3) A SIMD accelerated AC contingency calculation solver based on Woodbury matrix identity on multi-core CPUs. By aggressive algorithm level and computer architecture level performance optimizations including optimized data structures, optimization for superscalar out-of-order execution, SIMDization, and multi-core scheduling, our software fully utilizes the modern commodity computing systems, makes the critical and computational intensive power system probabilistic and security analysis problems solvable in real-time on commodity computing systems.

A multi-core high performance computing framework for probabilistic solutions of distribution systems

Conference Paper

Full-text available

Jul 2012

Multi-core CPUs with multiple levels of parallelism and deep memory hierarchies have become the mainstream computing platform. In this paper we developed a generally applicable high performance computing framework for Monte Carlo simulation (MCS) type applications in distribution systems, taking advantage of performance-enhancing features of multi-core CPUs. The application in this paper is to solve the probabilistic load flow (PLF) in real time, in order to cope with the uncertainties caused by the integration of renewable energy resources. By applying various performance optimizations and multi-level parallelization, the optimized MCS solver is able to achieve more than 50% of a CPU's theoretical peak performance and the performance is scalable with the hardware parallelism. We tested the MCS solver on the IEEE 37-bus test feeder using a new Intel Sandy Bridge multi-core CPU. The optimized MCS solver is able to solve millions of load flow cases within a second, enabling the real-time Monte Carlo solution of the PLF.

Memory hierarchy. Typical latencies for data transfers from the CPU to each of the levels are shown. The numbers shown here are only an indication, and the actual numbers will depend on the exact architecture under consideration and the access sequence of the program.

Context in source publication

Citations