Figure 7 - available via license: CC BY
Content may be subject to copyright.
Comparison of the number of CPU cores and GPUs. CPUs and GPUs models have been selected for different targets, e.g., personal computers or servers, and different price ranges. For the CPUs, the gray and black lines correspond to the minimum and the maximum cores of a family, respectively. For the GPUs, the black lines represent the number of CUDA cores, and the gray line represents the Tensor cores present in the Nvidia Tesla V100 only.

Comparison of the number of CPU cores and GPUs. CPUs and GPUs models have been selected for different targets, e.g., personal computers or servers, and different price ranges. For the CPUs, the gray and black lines correspond to the minimum and the maximum cores of a family, respectively. For the GPUs, the black lines represent the number of CUDA cores, and the gray line represents the Tensor cores present in the Nvidia Tesla V100 only.

Source publication
Article
Full-text available
Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications. Their ability to go beyond human precision has made these networks a milestone in the history of AI. However, while on the one hand they present cutting edge performance, on the other hand they require enormous computing power. For t...

Contexts in source publication

Context 1
... their throughput is limited by the small number of cores and, therefore, by the small number of operations executable in parallel. Figure 7 compares the number of cores of CPUs and GPUs. The Intel Xeon Platinum 9222, a high-end processor used in servers with price over USD 10,000, has a number of floating-point operations per second per Watt (FLOPS/W) similar to the FLOPS/W of the 2014 Nvidia GT 740 GPU with price below USD 100 (∼12GFLOPS/W). ...
Context 2
... are the current workhorses for DNNs' inference and especially training. They contain up to thousands of cores (see Figure 7) to work efficiently on highly-parallel algorithms. Matrix multiplications, the core operations of DNNs, belong to this class of parallel algorithms. ...

Similar publications

Conference Paper
Full-text available
In machine learning, data is usually represented in a (flat) Euclidean space where distances between points are along straight lines. Researchers have recently considered more exotic (non-Euclidean) Riemannian manifolds such as hyperbolic space which is well suited for tree-like data. In this paper, we propose a representation living on a pseudo-Ri...

Citations

... NNs [26] have been successfully applied to several data types; image, text, and audio [27]. Increasing the number of deep layers in a NN can lead to better performance and improvement in hardware capabilities has fueled this idea [28]. Researchers have designed NN architectures for tabular data, yet GBDTs show superior performance [29] with less hyperparameter tuning [8]. ...
Preprint
Full-text available
Federated Learning (FL) is a privacy-aware machine learning paradigm. It was initially designed to fit parametric models, namely Neural Networks (NNs) and thus, it has excelled on image, audio and text tasks. However, FL for tabular data still receives little attention. Tree-Based Models (TBMs) perform better than NNs on tabular data in a centralized setting, and are starting to see FL integrations. In this paper, we evaluate federated TBMs and NNs for horizontal FL, with varying data partitions, on 31 datasets. We propose treesXnets - a unified benchmarking tool for federated evaluation. treesXnets’ results capture model performance, e.g. accuracy, communication effort, model training duration, and device utilization. A cyclic implementation of federated XGBoost is the best performing model, outperforming the best federated NNs with 5-10% in terms of accuracy and regression error. It is also faster, requires less communication and memory than other federated XGBoost models.
... In today's digital era, the role of memory technology has become integral, particularly in the context of artificial intelligence (AI) applications. Several AI tasks, such as classification, recognition, natural language processing, prediction etc., heavily rely on large-scale memory storage and processing capability of the underlying hardware [1][2][3] . Conventional memory technologies like Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM) are energy-inefficient for implementing these data-intensive applications due to their volatile nature, resulting in dynamic as well as static power dissipation. ...
Article
Full-text available
The ability to scale two-dimensional (2D) material thickness down to a single monolayer presents a promising opportunity to realize high-speed energy-efficient memristors. Here, we report an ultra-fast memristor fabricated using atomically thin sheets of 2D hexagonal Boron Nitride, exhibiting the shortest observed switching speed (120 ps) among 2D memristors and low switching energy (2pJ). Furthermore, we study the switching dynamics of these memristors using ultra-short (120ps-3ns) voltage pulses, a frequency range that is highly relevant in the context of modern complementary metal oxide semiconductor (CMOS) circuits. We employ statistical analysis of transient characteristics to gain insights into the memristor switching mechanism. Cycling endurance data confirms the ultra-fast switching capability of these memristors, making them attractive for next generation computing, storage, and Radio-Frequency (RF) circuit applications.
... The major task in the present day scenario is to how better we can organize a huge gallery of personal photos which are to be assigned to respective album based on the event type based on the specific environment [2]. For example events similar to "Concert", "Exhibition", "Fashion", "Graduation", "Mountain Trip", "Sea Holiday, "Sport", "Wedding" [3]. The process of assigning labels to these eight eventalbum is performed either manually or through other aspects that rely over any specific algorithm being implemented such as location etc.Though most of the analysis is performed is with respect to the contentbased image analysis [4]which has been introduced in recent times to organize photos by implementing some process of selection the images related to a particular event to facilitate nice memories of our lives [5] based on the specific interests images are classified. ...
Chapter
A major objective of this book series is to drive innovation in every aspect of Artificial Intelligence. It offers researchers, educators and students the opportunity to discuss and share ideas on topics, trends and developments in the fields of artificial intelligence, machine learning, deep learning and more, big data and computer science, computer intelligence and Technology. It aims to bring together experts from various disciplines to emphasize the dissemination of ongoing research in the fields of science and computing, computational intelligence, schema recognition and information retrieval. The content of the book is as follows
... SC uses simple logic gates to perform the arithmetic operation by exploiting probability mathematics to compute in the probability domain. Nowadays, with exhaustive multiplyaccumulate (MAC) operation in computing algorithms such as convolutional neural network (CNN) being deployed as part of artificial intelligence (AI) edge computing, binary computing architecture itself becomes the primary bottleneck due to limited memory bandwidth [3]. SC is regaining interest because it is suitable for AI computation, especially the CNN algorithm. ...
Article
Full-text available
Stochastic computing (SC) has a substantial amount of study on application-specific integrated circuit (ASIC) design for artificial intelligence (AI) edge computing, especially the convolutional neural network (CNN) algorithm. However, SC has little to no optimization on field-programmable gate array (FPGA). Scaling up the ASIC logic without FPGA-oriented designs is inefficient, while aggregating thousands of bitstreams is still challenging in the conventional SC. This research has reinvented several FPGA-efficient 8-bit SC CNN computing architectures, i.e., SC multiplexer multiply-accumulate, multiply-accumulate function generator, and binary rectified linear unit, and successfully scaled and implemented a fully parallel CNN model on Kintex7 FPGA. The proposed SC hardware only compromises 0.14% accuracy compared to binary computing on the handwriting Modified National Institute of Standards and Technology classification task and achieved at least 99.72% energy saving per image feedforward and 31× more data throughput than modern hardware. Unique to SC, early decision termination pushed the performance baseline exponentially with minimum accuracy loss, making SC CNN extremely lucrative for AI edge computing but limited to classification tasks. The SC’s inherent noise heavily penalizes CNN regression performance, rendering SC unsuitable for regression tasks.
... Convolutional neural networks (CNNs) have outstanding performance in various fields such as face recognition [4], image classification [5], and natural language processing [6]. As one of the fundamental components of CNNs, convolutional computation usually consumes a large amount of computational resources, so it is desirable to achieve the acceleration of convolutional computation through specialized devices [7]. Compared with the electrical convolutional computation based on graphics processing unit (GPU), tensor processing unit (TPU), or field programmable gate array (FPGA), optical convolutional computation has had significant advantages in speed and energy consumption [8]. ...
Article
Full-text available
Optical complex-valued convolution can extract the feature of complex-valued data by processing both amplitude and phase information, enabling a wide range of future applications in artificial intelligence and high-speed optical computation. However, because optical signals at different wavelengths cannot interfere, optical systems based on wavelength multiplexing usually can only realize real-valued computation. Here, we experimentally demonstrate an all-optical computing scheme using Kerr-based optical four-wave mixing (FWM) that can perform complex-valued convolution of multi-wavelength signals. Specifically, this all-optical complex-valued convolution operation can be implemented based on the coherent superposition of converted light generated by multiple FWM processes. The computational throughput of this scheme can be expanded by increasing the number of optical wavelengths and the signal baud rate. To exemplify the application, we successfully applied this all-optical complex-valued convolution to four different orientations of image edge extraction. Our scheme can provide a basis for wavelength-parallel optical computing systems with the demanded complex-valued computation capability.
... In the ongoing pursuit of AI leadership, model sizes have escalated from millions to billions of parameters, as seen in OpenAI GPT models. Google reported the GLaM model with over 1 trillion parameters (compared to GPT-3's 175 billion parameters) [115]. The primary challenges associated with these massive models are training costs and their deployment on smaller devices. ...
Article
Full-text available
Middle Eastern social standards promote familial affinity, respect for tradition, and strong family relationships. As a result, while choosing a career path, Lebanese youth are more likely to consider family expectations and responsibilities. This study intends to provide insights into the ways family influences career decisions to minimize delays in the student’s intended career paths. We surveyed 113 students from various majors at a large urban private university to better understand the relationship between family influence and career decisions. Our research employs a mixed-methods approach to obtain a comprehensive understanding of family involvement in college students’ career decisions and its effects on professional awareness and development. Our findings indicated that parents demonstrated their involvement and support for their children in terms of influence, academic engagement, and career choice. Both parents were active in their child’s career choices, making them the key influences. Family influence was also connected to career-related decisions, career satisfaction, and motivation. Parents’ financial situation and expectations also influenced their children’s decisions, either directly or indirectly. Due to the availability or absence of resources, the socioeconomic level of the family influences the child’s occupational choice. According to the data we gathered, males and females were equally impacted by their parents. Females’ first preference was the mother, followed by the father. Males prioritized the father, who was closely followed by the mother.
... In the ongoing pursuit of AI leadership, model sizes have escalated from millions to billions of parameters, as seen in OpenAI GPT models. Google reported the GLaM model with over 1 trillion parameters (compared to GPT-3's 175 billion parameters) [115]. The primary challenges associated with these massive models are training costs and their deployment on smaller devices. ...
Article
In the past decade, a substantial increase in medical data from various sources, including wearable sensors, medical imaging, personal health records, and public health organizations, has propelled advancements in the medical sciences. The evolution of computational hardware, such as cloud computing, GPUs, FPGAs, and TPUs, has enabled the effective utilization of this vast amount of data. Consequently, sophisticated AI techniques have been developed to extract valuable insights from healthcare datasets. This article provides a comprehensive overview of recent developments in AI and biosensors within the medical and life sciences. The review highlights the role of machine learning in key areas such as medical imaging, precision medicine, and biosensors designed for the Internet of Things (IoT). Emphasis is placed on the latest progress in wearable biosensing technologies, where AI plays a pivotal role in monitoring electro-physiological and electro-chemical signals and aiding in disease diagnosis. These advancements underscore the growing trend towards personalized medicine, offering precise and costefficient point-of-care treatment. Additionally, the article delves into the advancements in computing technologies, including accelerated AI, edge computing, and federated learning specifically tailored for medical data. The challenges associated with data-driven AI approaches, potential issues arising from biosensors and IoT-based healthcare, and distribution shifts among different data modalities are thoroughly explored. The discussion concludes with insights into future prospects in the field.
... Pertaining to NN based systems, state-of-the-art hardware accelerators are tremendously resource intensive due to the massive amount of processing elements needed to effectively accelerate the computations [18]. Multipliers are recognized as the most demanding component in terms of overhead, both in terms of silicon area and power consumption [19]; consequently, a significant number of libraries providing hundreds of implementations of approximate components delivering different trade-offs between error and hardware resource requirements, such as the EvoApprox-Lib [20], have been proposed. ...
Article
Full-text available
During the last decade, classification systems (CSs) received significant research attention, with new learning algorithms achieving high accuracy in various applications. However, their resource-intensive nature, in terms of hardware and computation time, poses new design challenges. CSs exhibit inherent error resilience, due to redundancy of training sets, and self-healing properties, making them suitable for Approximate Computing (AxC). AxC enables efficient computation by using reduced precision or approximate values, leading to energy, time, and silicon area savings. Exploiting AxC involves estimating the introduced error for each approximate variant found during a Design-Space Exploration (DSE). This estimation has to be both rapid and meaningful, considering a substantial number of test samples, which are utterly conflicting demands. In this paper, we investigate on sources of error resiliency of CSs, and we propose a technique to haste the DSE that reduces the computational time for error estimation by systematically reducing the test set. In particular, we cherry-pick samples that are likely to be more sensitive to approximation and perform accuracy-loss estimation just by exploiting such a sample subset. In order to demonstrate its efficacy, we integrate our technique into two different approaches for generating approximate CSs, showing an average speed-up up to ≈18.
... The findings of this research are particularly relevant in the context of embedded system design, where there is a need to achieve high precision of neural network output while dealing with limited hardware resources [2,3], such as memory and computational capacity. Thus, this research highlights a promising avenue for improving the performance and efficiency of spiking neural networks, which is a critical factor in enabling their adoption in real-world applications. ...
Article
Full-text available
This research investigates the implementation of complex-exponential-based neurons in FPGA, which can pave the way for implementing bio-inspired spiking neural networks to compensate for the existing computational constraints in conventional artificial neural networks. The increasing use of extensive neural networks and the complexity of models in handling big data lead to higher power consumption and delays. Hence, finding solutions to reduce computational complexity is crucial for addressing power consumption challenges. The complex exponential form effectively encodes oscillating features like frequency, amplitude, and phase shift, streamlining the demanding calculations typical of conventional artificial neurons through levering the simple phase addition of complex exponential functions. The article implements such a two-neuron and a multi-neuron neural model using the Xilinx System Generator and Vivado Design Suite, employing 8-bit, 16-bit, and 32-bit fixed-point data format representations. The study evaluates the accuracy of the proposed neuron model across different FPGA implementations while also providing a detailed analysis of operating frequency, power consumption, and resource usage for the hardware implementations. BRAM-based Vivado designs outperformed Simulink regarding speed, power, and resource efficiency. Specifically, the Vivado BRAM-based approach supported up to 128 neurons, showcasing optimal LUT and FF resource utilization. Such outcomes accommodate choosing the optimal design procedure for implementing spiking neural networks on FPGAs.
... The best performing models use deep convolutional neural network (CNN) architecture (Zaidi et al., 2022). They generally require a large amount of labelled data to train for accurate performance and have traditionally required powerful computing hardware to operate at practically useful speeds (Capra et al., 2020). Recent developments have made it easier to incorporate neural network-based object detection in lightweight electronics devices (Zaidi et al., 2022;Zhao et al., 2020). ...
Article
Full-text available
The use of autonomous underwater vehicles (AUVs) for surveying underwater infrastructure presents a potential cost saving in comparison to remotely operated vehicles (ROVs). One of the challenges when processing images of underwater structures captured by an AUV, is that vast number of images captured during the mission usually do not show the structure. For instance, images captured during the dive to the structure or of the sea floor, or of the deep sea facing away from the structure. Too many images captured, without relevant information for a 3D reconstruction of the structure, leads to increased processing time and issues during the reconstruction process. There are two solutions to reduce the images to only images showing the structure. Firstly, only images of the structure are captured in the first place or remove images that are not useful after the capture and before further processing. This study developed and evaluated techniques that would enable the first strategy to be applied in an AUV. To apply this strategy in an AUV, would require an on-board structure detection system to ensure that they are correctly orientated for capturing useful footage during a survey mission. However, the marine environment poses several challenges to image-based object detection. Furthermore, small AUVs have limited power and computational resources available while deployed on a mission. To investigate the suitability of creating a lightweight structure detection model for the purpose of image evaluation, three computationally efficient image feature extraction methods (colour moments, local binary patterns (LBP), and Haar wavelet decomposition) were evaluated for their ability to distinguish underwater structures from background areas using unsupervised k-means models. LBP was found to be an effective method for identifying underwater structures in open water conditions. For identifying a structure against the seabed, colour moments were identified as the most effective method.