Figure 3. Convergency analysis of AM-LFS.

Meta-Learning Loss Functions for Deep Neural Networks

Preprint

Full-text available

Jun 2024

Christian Raymond

Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.

How far are we with automated machine learning? characterization and challenges of AutoML toolkits

Article

Full-text available

Jun 2024
EMPIR SOFTW ENG

Automated Machine Learning aka AutoML toolkits are low/no-code software that aim to democratize ML system application development by ensuring rapid prototyping of ML models and by enabling collaboration across different stakeholders in ML system design (e.g., domain experts, data scientists, etc.). It is thus important to know the state of current AutoML toolkits and the challenges ML practitioners face while using those toolkits. In this paper, we first offer a characterization of currently available AutoML toolits by analyzing 37 top AutoML tools and platforms. We find that the top AutoML platforms are mostly cloud-based. Most of the tools are optimized for the adoption of shallow ML models. Second, we present an empirical study of 14.3K AutoML related posts from Stack Overflow (SO) that we analyzed using topic modelling algorithm LDA (Latent Dirichlet Allocation) to understand the challenges of ML practitioners while using the AutoML toolkits. We find 13 topics in the AutoML related discussions in SO. The 13 topics are grouped into four categories: MLOps (43% of all questions), Model (28% questions), Data (27% questions), and Documentation (2% questions). Most questions are asked during Model training (29%) and Data preparation (25%) phases. AutoML practitioners find the MLOps topic category most challenging. Topics related to the MLOps category are the most prevalent and popular for cloud-based AutoML toolkits. Based on our study findings, we provide 15 recommendations to improve the adoption and development of AutoML toolkits.

Automated Loss function Search for Class-imbalanced Node Classification

Preprint

Full-text available

May 2024

Class-imbalanced node classification tasks are prevalent in real-world scenarios. Due to the uneven distribution of nodes across different classes, learning high-quality node representations remains a challenging endeavor. The engineering of loss functions has shown promising potential in addressing this issue. It involves the meticulous design of loss functions, utilizing information about the quantities of nodes in different categories and the network's topology to learn unbiased node representations. However, the design of these loss functions heavily relies on human expert knowledge and exhibits limited adaptability to specific target tasks. In this paper, we introduce a high-performance, flexible, and generalizable automated loss function search framework to tackle this challenge. Across 15 combinations of graph neural networks and datasets, our framework achieves a significant improvement in performance compared to state-of-the-art methods. Additionally, we observe that homophily in graph-structured data significantly contributes to the transferability of the proposed framework.

FPL: False Positive Loss

Preprint

Full-text available

Sep 2023

When training deep neural networks tasks, the most popular choices are cross-entropy loss. On the other hand, in general speaking, a decent loss function can take on shapes that are considerably more flexible and ought to be adapted for different activities and datasets. In most of the classification tasks, generally if the true class is not correctly recognized by the network (top1), that class is placed among the five classes with the highest probability (top5). This shows that the network does not necessarily recognize the correct class with a low probability, but a class similar to it (such as 3 vs. 8 in mnist) assigns a higher probability and this causes a mistake in that task. Accordingly, we proposed a loss function deals with the error of class that the neural network incorrectly recognized as correct, in addition to the correct class error. We call our proposed loss as False Positive Loss (FPL), with the intention of viewing and designing loss functions not only through the utilization of true class but also through the utilization of the value of false positive classes. One of the core properties of our proposed loss is full adaptability, which makes False Positive Loss be fully capable of getting reformulated by using other widely used loss functions formulas based on the task or the need of the users. Extensive experimental results demonstrate that our suggested loss function outperforms other well-known losses on a variety of tasks and datasets. As can be observed, the performance of our False Positive Loss is superior to that of the cross-entropy loss when it comes to tasks involving 2D picture classification. We have compared our loss with cross entropy as the most common classification loss function on some models (such as ResNet-18, ResNet-50 and Efficientnet-V2) through classification known as a basic computer vision task. with both random or pre-trained initial weights. As a result, in some cases the models with our loss outperform the same tasks with cross entropy from the viewpoint of some metric (i.e. accuracy and FP). For example, the resnet-50 on cifar-10 dataset with random initialization indicated a top1 accuracy of 94.93 with cross entropy and 95.25 with our loss, while for top5 accuracy the results are 99.86 and 99.87, respectively.

OTP-NMS: Towards Optimal Threshold Prediction of NMS for Crowded Pedestrian Detection

Article

May 2023
IEEE T IMAGE PROCESS

Pedestrian detection is still a challenging task for computer vision, especially in crowded scenes where the overlaps between pedestrians tend to be large. The non-maximum suppression (NMS) plays an important role in removing the redundant false positive detection proposals while retaining the true positive detection proposals. However, the highly overlapped results may be suppressed if the threshold of NMS is lower. Meanwhile, a higher threshold of NMS will introduce a larger number of false positive results. To solve this problem, we propose an optimal threshold prediction (OTP) based NMS method that predicts a suitable threshold of NMS for each human instance. First, a visibility estimation module is designed to obtain the visibility ratio. Then, we propose a threshold prediction subnet to determine the optimal threshold of NMS automatically according to the visibility ratio and classification score. Finally, we re-formulate the objective function of the subnet and utilize the reward-guided gradient estimation algorithm to update the subnet. Comprehensive experiments on CrowdHuman and CityPersons show the superior performance of the proposed method in pedestrian detection, especially in crowded scenes.

DAC-MR: Data Augmentation Consistency Based Meta-Regularization for Meta-Learning

Preprint

Full-text available

May 2023

Meta learning recently has been heavily researched and helped advance the contemporary machine learning. However, achieving well-performing meta-learning model requires a large amount of training tasks with high-quality meta-data representing the underlying task generalization goal, which is sometimes difficult and expensive to obtain for real applications. Current meta-data-driven meta-learning approaches, however, are fairly hard to train satisfactory meta-models with imperfect training tasks. To address this issue, we suggest a meta-knowledge informed meta-learning (MKIML) framework to improve meta-learning by additionally integrating compensated meta-knowledge into meta-learning process. We preliminarily integrate meta-knowledge into meta-objective via using an appropriate meta-regularization (MR) objective to regularize capacity complexity of the meta-model function class to facilitate better generalization on unseen tasks. As a practical implementation, we introduce data augmentation consistency to encode invariance as meta-knowledge for instantiating MR objective, denoted by DAC-MR. The proposed DAC-MR is hopeful to learn well-performing meta-models from training tasks with noisy, sparse or unavailable meta-data. We theoretically demonstrate that DAC-MR can be treated as a proxy meta-objective used to evaluate meta-model without high-quality meta-data. Besides, meta-data-driven meta-loss objective combined with DAC-MR is capable of achieving better meta-level generalization. 10 meta-learning tasks with different network architectures and benchmarks substantiate the capability of our DAC-MR on aiding meta-model learning. Fine performance of DAC-MR are obtained across all settings, and are well-aligned with our theoretical insights. This implies that our DAC-MR is problem-agnostic, and hopeful to be readily applied to extensive meta-learning problems and tasks.

Searching sharing relationship for instance segmentation decoder

Article

Full-text available

May 2023
APPL INTELL

Instance segmentation is a typical visual task that requires per-pixel mask prediction with a category label for each instance. For the decoder in instance segmentation network, parallel branches or towers are commonly adopted to deal with instance- and dense-level predictions. However, this parallelism ignores inter-branch and inner-branch relationships. Besides, how the different branches are connected is unclear, which is difficult to explore manually in practice. To address the above issues, we introduce Neural Architecture Search (NAS) to automatically search for hardware and memory-friendly feature sharing branch. Concretely, applying to instance segmentation, we design a search space considering both operations and sharing connections of parallel branches. Through a tailored reinforcement learning(RL) paradigm, we can efficiently search multiple architectures with different shared patterns and tap more feature selection possibilities. Our method is generically useful and can be transferred to analogous multi-task networks. The searched architecture shares features in the middle of the head branches and utilizes instance-level head features to generate pixel-level predictions. Extensive experiments demonstrate the effectiveness and surpass classical parallel decoder networks, exceeding BlendMask by 1.2% on bounding box mAP and 0.9% on segmentation mAP.

Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection

Preprint

Apr 2023

Few-shot object detection, the problem of modelling novel object detection categories with few training instances, is an emerging topic in the area of few-shot learning and object detection. Contemporary techniques can be divided into two groups: fine-tuning based and meta-learning based approaches. While meta-learning approaches aim to learn dedicated meta-models for mapping samples to novel class models, fine-tuning approaches tackle few-shot detection in a simpler manner, by adapting the detection model to novel classes through gradient based optimization. Despite their simplicity, fine-tuning based approaches typically yield competitive detection results. Based on this observation, we focus on the role of loss functions and augmentations as the force driving the fine-tuning process, and propose to tune their dynamics through meta-learning principles. The proposed training scheme, therefore, allows learning inductive biases that can boost few-shot detection, while keeping the advantages of fine-tuning based approaches. In addition, the proposed approach yields interpretable loss functions, as opposed to highly parametric and complex few-shot meta-models. The experimental results highlight the merits of the proposed scheme, with significant improvements over the strong fine-tuning based few-shot detection baselines on benchmark Pascal VOC and MS-COCO datasets, in terms of both standard and generalized few-shot performance metrics.

Online Loss Function Learning

Preprint

Full-text available

Jan 2023

Loss function learning is a new meta-learning paradigm that aims to automate the essential task of designing a loss function for a machine learning model. Existing techniques for loss function learning have shown promising results, often improving a model's training dynamics and final inference performance. However, a significant limitation of these techniques is that the loss functions are meta-learned in an offline fashion, where the meta-objective only considers the very first few steps of training, which is a significantly shorter time horizon than the one typically used for training deep neural networks. This causes significant bias towards loss functions that perform well at the very start of training but perform poorly at the end of training. To address this issue we propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters. The experimental results show that our proposed method consistently outperforms the cross-entropy loss and offline loss function learning techniques on a diverse range of neural network architectures and datasets.

Improve Noise Tolerance of Robust Loss via Noise-Awareness

Preprint

Jan 2023

Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust losses, however, inevitably involve hyperparameters to be tuned for different datasets with noisy labels, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Existing robust loss methods usually assume that all training samples share common hyperparameters, which are independent of instances. This limits the ability of these methods on distinguishing individual noise properties of different samples, making them hardly adapt to different noise structures. To address above issues, we propose to assemble robust loss with instance-dependent hyperparameters to improve their noise-tolerance with theoretical guarantee. To achieve setting such instance-dependent hyperparameters for robust loss, we propose a meta-learning method capable of adaptively learning a hyperparameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster). Specifically, through mutual amelioration between hyperparameter prediction function and classifier parameters in our method, both of them can be simultaneously finely ameliorated and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust losses are attempted to be integrated with our algorithm, and experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and generalization performance. Meanwhile, the explicit parameterized structure makes the meta-learned prediction function capable of being readily transferrable and plug-and-play to unseen datasets with noisy labels. Specifically, we transfer our meta-learned NARL-Adjuster to unseen tasks, including several real noisy datasets, and achieve better performance compared with conventional hyperparameter tuning strategy.

Convergency analysis of AM-LFS.

Context in source publication

Citations