Comparison of the number of CPU cores and GPUs. CPUs and GPUs models have been selected for different targets, e.g., personal computers or servers, and different price ranges. For the CPUs, the gray and black lines correspond to the minimum and the maximum cores of a family, respectively. For the GPUs, the black lines represent the number of CUDA cores, and the gray line represents the Tensor cores present in the Nvidia Tesla V100 only.

Source publication

Figure 2. An abstract example of a neural network.

Figure 3. A graphical model of the neuron.

Figure 4. A fully connected layer with C i = 5 and C o = 4.

Figure 5. A convolutional layer with C i = 4, H i = W i = 3, C o = H o...

Figure 6. Comparison between the temporal and spatial architectures.

An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks

Article

Full-text available

Jul 2020

Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications. Their ability to go beyond human precision has made these networks a milestone in the history of AI. However, while on the one hand they present cutting edge performance, on the other hand they require enormous computing power. For t...

Context 1

... their throughput is limited by the small number of cores and, therefore, by the small number of operations executable in parallel. Figure 7 compares the number of cores of CPUs and GPUs. The Intel Xeon Platinum 9222, a high-end processor used in servers with price over USD 10,000, has a number of floating-point operations per second per Watt (FLOPS/W) similar to the FLOPS/W of the 2014 Nvidia GT 740 GPU with price below USD 100 (∼12GFLOPS/W). ...

View in full-text

Context 2

... are the current workhorses for DNNs' inference and especially training. They contain up to thousands of cores (see Figure 7) to work efficiently on highly-parallel algorithms. Matrix multiplications, the core operations of DNNs, belong to this class of parallel algorithms. ...

View in full-text

Ultrahyperbolic Representation Learning

Conference Paper

Full-text available

Oct 2020

In machine learning, data is usually represented in a (flat) Euclidean space where distances between points are along straight lines. Researchers have recently considered more exotic (non-Euclidean) Riemannian manifolds such as hyperbolic space which is well suited for tree-like data. In this paper, we propose a representation living on a pseudo-Ri...

treeXnets: Comparing Federated Tree-BasedModels and Neural Networks on Tabular Data

Preprint

Full-text available

Jun 2024

Federated Learning (FL) is a privacy-aware machine learning paradigm. It was initially designed to fit parametric models, namely Neural Networks (NNs) and thus, it has excelled on image, audio and text tasks. However, FL for tabular data still receives little attention. Tree-Based Models (TBMs) perform better than NNs on tabular data in a centralized setting, and are starting to see FL integrations. In this paper, we evaluate federated TBMs and NNs for horizontal FL, with varying data partitions, on 31 datasets. We propose treesXnets - a unified benchmarking tool for federated evaluation. treesXnets’ results capture model performance, e.g. accuracy, communication effort, model training duration, and device utilization. A cyclic implementation of federated XGBoost is the best performing model, outperforming the best federated NNs with 5-10% in terms of accuracy and regression error. It is also faster, requires less communication and memory than other federated XGBoost models.

Ultra-fast switching memristors based on two-dimensional materials

Article

Full-text available

Mar 2024

The ability to scale two-dimensional (2D) material thickness down to a single monolayer presents a promising opportunity to realize high-speed energy-efficient memristors. Here, we report an ultra-fast memristor fabricated using atomically thin sheets of 2D hexagonal Boron Nitride, exhibiting the shortest observed switching speed (120 ps) among 2D memristors and low switching energy (2pJ). Furthermore, we study the switching dynamics of these memristors using ultra-short (120ps-3ns) voltage pulses, a frequency range that is highly relevant in the context of modern complementary metal oxide semiconductor (CMOS) circuits. We employ statistical analysis of transient characteristics to gain insights into the memristor switching mechanism. Cycling endurance data confirms the ultra-fast switching capability of these memristors, making them attractive for next generation computing, storage, and Radio-Frequency (RF) circuit applications.

EVENT DETECTION BASED ON IMAGES

Chapter

Mar 2024

A major objective of this book series is to drive innovation in every aspect of Artificial Intelligence. It offers researchers, educators and students the opportunity to discuss and share ideas on topics, trends and developments in the fields of artificial intelligence, machine learning, deep learning and more, big data and computer science, computer intelligence and Technology. It aims to bring together experts from various disciplines to emphasize the dissemination of ongoing research in the fields of science and computing, computational intelligence, schema recognition and information retrieval. The content of the book is as follows

Stochastic Computing Convolution Neural Network Architecture Reinvented For Highly Efficient Artificial Intelligence Workload on Field Programmable Gate Array

Article

Full-text available

Mar 2024

Stochastic computing (SC) has a substantial amount of study on application-specific integrated circuit (ASIC) design for artificial intelligence (AI) edge computing, especially the convolutional neural network (CNN) algorithm. However, SC has little to no optimization on field-programmable gate array (FPGA). Scaling up the ASIC logic without FPGA-oriented designs is inefficient, while aggregating thousands of bitstreams is still challenging in the conventional SC. This research has reinvented several FPGA-efficient 8-bit SC CNN computing architectures, i.e., SC multiplexer multiply-accumulate, multiply-accumulate function generator, and binary rectified linear unit, and successfully scaled and implemented a fully parallel CNN model on Kintex7 FPGA. The proposed SC hardware only compromises 0.14% accuracy compared to binary computing on the handwriting Modified National Institute of Standards and Technology classification task and achieved at least 99.72% energy saving per image feedforward and 31× more data throughput than modern hardware. Unique to SC, early decision termination pushed the performance baseline exponentially with minimum accuracy loss, making SC CNN extremely lucrative for AI edge computing but limited to classification tasks. The SC’s inherent noise heavily penalizes CNN regression performance, rendering SC unsuitable for regression tasks.

All-optical complex-valued convolution based on four-wave mixing

Article

Full-text available

Jan 2024

Optical complex-valued convolution can extract the feature of complex-valued data by processing both amplitude and phase information, enabling a wide range of future applications in artificial intelligence and high-speed optical computation. However, because optical signals at different wavelengths cannot interfere, optical systems based on wavelength multiplexing usually can only realize real-valued computation. Here, we experimentally demonstrate an all-optical computing scheme using Kerr-based optical four-wave mixing (FWM) that can perform complex-valued convolution of multi-wavelength signals. Specifically, this all-optical complex-valued convolution operation can be implemented based on the coherent superposition of converted light generated by multiple FWM processes. The computational throughput of this scheme can be expanded by increasing the number of optical wavelengths and the signal baud rate. To exemplify the application, we successfully applied this all-optical complex-valued convolution to four different orientations of image edge extraction. Our scheme can provide a basis for wavelength-parallel optical computing systems with the demanded complex-valued computation capability.

The Influence and Involvement of Family Members in Career Decision-Making

Article

Full-text available

Jan 2024

Ahmad Oweini

Middle Eastern social standards promote familial affinity, respect for tradition, and strong family relationships. As a result, while choosing a career path, Lebanese youth are more likely to consider family expectations and responsibilities. This study intends to provide insights into the ways family influences career decisions to minimize delays in the student’s intended career paths. We surveyed 113 students from various majors at a large urban private university to better understand the relationship between family influence and career decisions. Our research employs a mixed-methods approach to obtain a comprehensive understanding of family involvement in college students’ career decisions and its effects on professional awareness and development. Our findings indicated that parents demonstrated their involvement and support for their children in terms of influence, academic engagement, and career choice. Both parents were active in their child’s career choices, making them the key influences. Family influence was also connected to career-related decisions, career satisfaction, and motivation. Parents’ financial situation and expectations also influenced their children’s decisions, either directly or indirectly. Due to the availability or absence of resources, the socioeconomic level of the family influences the child’s occupational choice. According to the data we gathered, males and females were equally impacted by their parents. Females’ first preference was the mother, followed by the father. Males prioritized the father, who was closely followed by the mother.

A Review of Biosensors and Artificial Intelligence in Healthcare and their Clinical Significance

Article

Jan 2024

Mehtab Tariq

In the past decade, a substantial increase in medical data from various sources, including wearable sensors, medical imaging, personal health records, and public health organizations, has propelled advancements in the medical sciences. The evolution of computational hardware, such as cloud computing, GPUs, FPGAs, and TPUs, has enabled the effective utilization of this vast amount of data. Consequently, sophisticated AI techniques have been developed to extract valuable insights from healthcare datasets. This article provides a comprehensive overview of recent developments in AI and biosensors within the medical and life sciences. The review highlights the role of machine learning in key areas such as medical imaging, precision medicine, and biosensors designed for the Internet of Things (IoT). Emphasis is placed on the latest progress in wearable biosensing technologies, where AI plays a pivotal role in monitoring electro-physiological and electro-chemical signals and aiding in disease diagnosis. These advancements underscore the growing trend towards personalized medicine, offering precise and costefficient point-of-care treatment. Additionally, the article delves into the advancements in computing technologies, including accelerated AI, edge computing, and federated learning specifically tailored for medical data. The challenges associated with data-driven AI approaches, potential issues arising from biosensors and IoT-based healthcare, and distribution shifts among different data modalities are thoroughly explored. The discussion concludes with insights into future prospects in the field.

Investigating the Resilience Source of Classification Systems for Approximate Computing Techniques

Article

Full-text available

Jan 2024

During the last decade, classification systems (CSs) received significant research attention, with new learning algorithms achieving high accuracy in various applications. However, their resource-intensive nature, in terms of hardware and computation time, poses new design challenges. CSs exhibit inherent error resilience, due to redundancy of training sets, and self-healing properties, making them suitable for Approximate Computing (AxC). AxC enables efficient computation by using reduced precision or approximate values, leading to energy, time, and silicon area savings. Exploiting AxC involves estimating the introduced error for each approximate variant found during a Design-Space Exploration (DSE). This estimation has to be both rapid and meaningful, considering a substantial number of test samples, which are utterly conflicting demands. In this paper, we investigate on sources of error resiliency of CSs, and we propose a technique to haste the DSE that reduces the computational time for error estimation by systematically reducing the test set. In particular, we cherry-pick samples that are likely to be more sensitive to approximation and perform accuracy-loss estimation just by exploiting such a sample subset. In order to demonstrate its efficacy, we integrate our technique into two different approaches for generating approximate CSs, showing an average speed-up up to ≈18.

Complex-Exponential-Based Bio-Inspired Neuron Model Implementation in FPGA Using Xilinx System Generator and Vivado Design Suite

Article

Full-text available

Dec 2023

This research investigates the implementation of complex-exponential-based neurons in FPGA, which can pave the way for implementing bio-inspired spiking neural networks to compensate for the existing computational constraints in conventional artificial neural networks. The increasing use of extensive neural networks and the complexity of models in handling big data lead to higher power consumption and delays. Hence, finding solutions to reduce computational complexity is crucial for addressing power consumption challenges. The complex exponential form effectively encodes oscillating features like frequency, amplitude, and phase shift, streamlining the demanding calculations typical of conventional artificial neurons through levering the simple phase addition of complex exponential functions. The article implements such a two-neuron and a multi-neuron neural model using the Xilinx System Generator and Vivado Design Suite, employing 8-bit, 16-bit, and 32-bit fixed-point data format representations. The study evaluates the accuracy of the proposed neuron model across different FPGA implementations while also providing a detailed analysis of operating frequency, power consumption, and resource usage for the hardware implementations. BRAM-based Vivado designs outperformed Simulink regarding speed, power, and resource efficiency. Specifically, the Vivado BRAM-based approach supported up to 128 neurons, showcasing optimal LUT and FF resource utilization. Such outcomes accommodate choosing the optimal design procedure for implementing spiking neural networks on FPGAs.

IMAGE FEATURE EXTRACTION METHODS FOR STRUCTURE DETECTION FROM UNDERWATER IMAGERY

Article

Full-text available

Dec 2023

The use of autonomous underwater vehicles (AUVs) for surveying underwater infrastructure presents a potential cost saving in comparison to remotely operated vehicles (ROVs). One of the challenges when processing images of underwater structures captured by an AUV, is that vast number of images captured during the mission usually do not show the structure. For instance, images captured during the dive to the structure or of the sea floor, or of the deep sea facing away from the structure. Too many images captured, without relevant information for a 3D reconstruction of the structure, leads to increased processing time and issues during the reconstruction process. There are two solutions to reduce the images to only images showing the structure. Firstly, only images of the structure are captured in the first place or remove images that are not useful after the capture and before further processing. This study developed and evaluated techniques that would enable the first strategy to be applied in an AUV. To apply this strategy in an AUV, would require an on-board structure detection system to ensure that they are correctly orientated for capturing useful footage during a survey mission. However, the marine environment poses several challenges to image-based object detection. Furthermore, small AUVs have limited power and computational resources available while deployed on a mission. To investigate the suitability of creating a lightweight structure detection model for the purpose of image evaluation, three computationally efficient image feature extraction methods (colour moments, local binary patterns (LBP), and Haar wavelet decomposition) were evaluated for their ability to distinguish underwater structures from background areas using unsupervised k-means models. LBP was found to be an effective method for identifying underwater structures in open water conditions. For identifying a structure against the seabed, colour moments were identified as the most effective method.

Contexts in source publication

Similar publications

Citations