Figure 3 - uploaded by Wanli Ouyang
Content may be subject to copyright.
Convergency analysis of AM-LFS.

Convergency analysis of AM-LFS.

Context in source publication

Context 1
... a result, we choose to track a more intuitive metric, the distribution parameters to study the convergency of our method. From the Figure 3, we see that the distribution parameters tends to converge to specific values as the epochs increase, which indicates AM-LFS can be trained in a stable manner. ...

Citations

... Although online loss function learning has not been explored in the meta-learning context, some existing research outside the subfield has previously explored the possibility of adaptive loss functions, such as in (Li et al., 2019a) and (Wang et al., 2020). However, we emphasize that these approaches are categorically different in that they do not learn the loss function from scratch; instead, they interpolate between a small subset of handcrafted loss functions, updating the loss function after each epoch. ...
Preprint
Full-text available
Humans can often quickly and efficiently solve complex new learning tasks given only a small set of examples. In contrast, modern artificially intelligent systems often require thousands or millions of observations in order to solve even the most basic tasks. Meta-learning aims to resolve this issue by leveraging past experiences from similar learning tasks to embed the appropriate inductive biases into the learning system. Historically methods for meta-learning components such as optimizers, parameter initializations, and more have led to significant performance increases. This thesis aims to explore the concept of meta-learning to improve performance, through the often-overlooked component of the loss function. The loss function is a vital component of a learning system, as it represents the primary learning objective, where success is determined and quantified by the system's ability to optimize for that objective successfully.
... Significant research efforts in recent years are dedicated to improve the various AutoML methods, e.g., hyperparameter tuning, neural architecture search (NAS) (Mellor et al. 2021;Elsken et al. 2019;Pham et al. 2018;Hu et al. 2020;Li et al. 2019Li et al. , 2022. The AutoML toolkits are used in open source ML software (Majidi et al. 2020) and in domain-specific ML software systems, such as autonomous vehicles, banking software, etc. Li et al. (2021); Agrapetidou et al. (2021). ...
... There are ongoing efforts to increase the performance of AutoML tools through better NAS approach (Mellor et al. 2021;Elsken et al. 2019;Pham et al. 2018;Hu et al. 2020), search for hyperparameters and loss methods (Li et al. 2019(Li et al. , 2022, automating machine learning via keeping human in the loop (Lee and Macke 2020). Truong et al. (2019) evaluated few popular AutoML tools on their abilities to automate ML pipeline. ...
Article
Full-text available
Automated Machine Learning aka AutoML toolkits are low/no-code software that aim to democratize ML system application development by ensuring rapid prototyping of ML models and by enabling collaboration across different stakeholders in ML system design (e.g., domain experts, data scientists, etc.). It is thus important to know the state of current AutoML toolkits and the challenges ML practitioners face while using those toolkits. In this paper, we first offer a characterization of currently available AutoML toolits by analyzing 37 top AutoML tools and platforms. We find that the top AutoML platforms are mostly cloud-based. Most of the tools are optimized for the adoption of shallow ML models. Second, we present an empirical study of 14.3K AutoML related posts from Stack Overflow (SO) that we analyzed using topic modelling algorithm LDA (Latent Dirichlet Allocation) to understand the challenges of ML practitioners while using the AutoML toolkits. We find 13 topics in the AutoML related discussions in SO. The 13 topics are grouped into four categories: MLOps (43% of all questions), Model (28% questions), Data (27% questions), and Documentation (2% questions). Most questions are asked during Model training (29%) and Data preparation (25%) phases. AutoML practitioners find the MLOps topic category most challenging. Topics related to the MLOps category are the most prevalent and popular for cloud-based AutoML toolkits. Based on our study findings, we provide 15 recommendations to improve the adoption and development of AutoML toolkits.
... Automated loss learning aims to alleviate the considerable human effort and expertise traditionally required for loss function design. While several studies (Xu et al., 2019a;Li et al., 2019;Wang et al., 2020;Liu & Lai, 2020;Gao et al., 2022) have sought to learn loss functions automatically, they still heavily rely on human expertise in the loss search process, often initiating their search from existing loss functions. In related efforts, Li et al., 2022a;Raymond et al., 2023) have employed evolutionary algorithms to search for loss functions composed of primitive mathematical operators for various computer vision tasks. ...
Preprint
Full-text available
Class-imbalanced node classification tasks are prevalent in real-world scenarios. Due to the uneven distribution of nodes across different classes, learning high-quality node representations remains a challenging endeavor. The engineering of loss functions has shown promising potential in addressing this issue. It involves the meticulous design of loss functions, utilizing information about the quantities of nodes in different categories and the network's topology to learn unbiased node representations. However, the design of these loss functions heavily relies on human expert knowledge and exhibits limited adaptability to specific target tasks. In this paper, we introduce a high-performance, flexible, and generalizable automated loss function search framework to tackle this challenge. Across 15 combinations of graph neural networks and datasets, our framework achieves a significant improvement in performance compared to state-of-the-art methods. Additionally, we observe that homophily in graph-structured data significantly contributes to the transferability of the proposed framework.
... Designing a universal loss Numerous recent publications have investigated new loss functions by means of meta-learning, ensembles of diverse losses, or compositing various types of losses [2][3][4][5]. As we motioned, many of the well-designed, well-known, and widely used loss functions are task specific for example in Face Net [6] researchers designed their own loss function to solve the face recognition task. ...
... The latest research reveals that the loss function can be trained concurrently during the learning process via gradient descent or meta-learning [2,3,5,27]. Notably, Taylor GLO employs CMA-ES to 3 optimize a multivariate Taylor parameterization of both the loss function and learning rate schedule throughout the training [4,28]. ...
Preprint
Full-text available
When training deep neural networks tasks, the most popular choices are cross-entropy loss. On the other hand, in general speaking, a decent loss function can take on shapes that are considerably more flexible and ought to be adapted for different activities and datasets. In most of the classification tasks, generally if the true class is not correctly recognized by the network (top1), that class is placed among the five classes with the highest probability (top5). This shows that the network does not necessarily recognize the correct class with a low probability, but a class similar to it (such as 3 vs. 8 in mnist) assigns a higher probability and this causes a mistake in that task. Accordingly, we proposed a loss function deals with the error of class that the neural network incorrectly recognized as correct, in addition to the correct class error. We call our proposed loss as False Positive Loss (FPL), with the intention of viewing and designing loss functions not only through the utilization of true class but also through the utilization of the value of false positive classes. One of the core properties of our proposed loss is full adaptability, which makes False Positive Loss be fully capable of getting reformulated by using other widely used loss functions formulas based on the task or the need of the users. Extensive experimental results demonstrate that our suggested loss function outperforms other well-known losses on a variety of tasks and datasets. As can be observed, the performance of our False Positive Loss is superior to that of the cross-entropy loss when it comes to tasks involving 2D picture classification. We have compared our loss with cross entropy as the most common classification loss function on some models (such as ResNet-18, ResNet-50 and Efficientnet-V2) through classification known as a basic computer vision task. with both random or pre-trained initial weights. As a result, in some cases the models with our loss outperform the same tasks with cross entropy from the viewpoint of some metric (i.e. accuracy and FP). For example, the resnet-50 on cifar-10 dataset with random initialization indicated a top1 accuracy of 94.93 with cross entropy and 95.25 with our loss, while for top5 accuracy the results are 99.86 and 99.87, respectively.
... To maximize the reward of the subnet, the gradient ascent algorithm is applied to update the parameter of subnet θ [36], [37], which could be written as: ...
... As for the first term, the gradient between reward R and predicted threshold T can be estimated by the REINFORCE [40]. Since T is continuous, it can be modeled as an independent Gaussian distribution of mean T and standard deviation σ [36], [41], represented by N T, σ 2 . Then, the gradient between reward R and the predicted threshold T can be derived as: ...
Article
Pedestrian detection is still a challenging task for computer vision, especially in crowded scenes where the overlaps between pedestrians tend to be large. The non-maximum suppression (NMS) plays an important role in removing the redundant false positive detection proposals while retaining the true positive detection proposals. However, the highly overlapped results may be suppressed if the threshold of NMS is lower. Meanwhile, a higher threshold of NMS will introduce a larger number of false positive results. To solve this problem, we propose an optimal threshold prediction (OTP) based NMS method that predicts a suitable threshold of NMS for each human instance. First, a visibility estimation module is designed to obtain the visibility ratio. Then, we propose a threshold prediction subnet to determine the optimal threshold of NMS automatically according to the visibility ratio and classification score. Finally, we re-formulate the objective function of the subnet and utilize the reward-guided gradient estimation algorithm to update the subnet. Comprehensive experiments on CrowdHuman and CityPersons show the superior performance of the proposed method in pedestrian detection, especially in crowded scenes.
... Meta-learning loss/regularization. The main idea is to meta-learn proxy loss/regularization from data that improves inner-level model optimization from various taskspecific goal perspectives, including model generalization [41], [81], [82], optimization efficiency [83], [84], differentiable approximation to a true non-differentiable metric [45], unsupervised update rule [35], robust to domain shift [85], label noise [44], [86], [87], [88], or adversarial attack [58], and arising in generalizations of unsupervised learning [89], self-supervised learning [90], auxiliary task learning [91], [92], etc. These methods, however, still overlook the metaobjective's design at outer-level learning. ...
Preprint
Full-text available
Meta learning recently has been heavily researched and helped advance the contemporary machine learning. However, achieving well-performing meta-learning model requires a large amount of training tasks with high-quality meta-data representing the underlying task generalization goal, which is sometimes difficult and expensive to obtain for real applications. Current meta-data-driven meta-learning approaches, however, are fairly hard to train satisfactory meta-models with imperfect training tasks. To address this issue, we suggest a meta-knowledge informed meta-learning (MKIML) framework to improve meta-learning by additionally integrating compensated meta-knowledge into meta-learning process. We preliminarily integrate meta-knowledge into meta-objective via using an appropriate meta-regularization (MR) objective to regularize capacity complexity of the meta-model function class to facilitate better generalization on unseen tasks. As a practical implementation, we introduce data augmentation consistency to encode invariance as meta-knowledge for instantiating MR objective, denoted by DAC-MR. The proposed DAC-MR is hopeful to learn well-performing meta-models from training tasks with noisy, sparse or unavailable meta-data. We theoretically demonstrate that DAC-MR can be treated as a proxy meta-objective used to evaluate meta-model without high-quality meta-data. Besides, meta-data-driven meta-loss objective combined with DAC-MR is capable of achieving better meta-level generalization. 10 meta-learning tasks with different network architectures and benchmarks substantiate the capability of our DAC-MR on aiding meta-model learning. Fine performance of DAC-MR are obtained across all settings, and are well-aligned with our theoretical insights. This implies that our DAC-MR is problem-agnostic, and hopeful to be readily applied to extensive meta-learning problems and tasks.
... With various search strategies and search space, the searched architectures exceed the manual networks. Meanwhile, NAS on multi-task learning also makes progress, mainly working on loss function and gradient optimization [20,21]. In this work, we resort to NAS to investigate the correlation of multi-task features. ...
Article
Full-text available
Instance segmentation is a typical visual task that requires per-pixel mask prediction with a category label for each instance. For the decoder in instance segmentation network, parallel branches or towers are commonly adopted to deal with instance- and dense-level predictions. However, this parallelism ignores inter-branch and inner-branch relationships. Besides, how the different branches are connected is unclear, which is difficult to explore manually in practice. To address the above issues, we introduce Neural Architecture Search (NAS) to automatically search for hardware and memory-friendly feature sharing branch. Concretely, applying to instance segmentation, we design a search space considering both operations and sharing connections of parallel branches. Through a tailored reinforcement learning(RL) paradigm, we can efficiently search multiple architectures with different shared patterns and tap more feature selection possibilities. Our method is generically useful and can be transferred to analogous multi-task networks. The searched architecture shares features in the middle of the head branches and utilizes instance-level head features to generate pixel-level predictions. Extensive experiments demonstrate the effectiveness and surpass classical parallel decoder networks, exceeding BlendMask by 1.2% on bounding box mAP and 0.9% on segmentation mAP.
... Loss function discovery is an emerging AutoML topic towards improving the learning systems in a data-driven manner. Existing methods are mainly based on either (i) constructing the loss function directly from the basic operators [21,43,56] or (ii) optimizing parameterized loss functions [38,67]. For loss construction, [43] proposes a genetic algorithm that consists of loss function verification and quality filtering modules. ...
... [56] suggests a method to learn not only the loss function but also the whole machine learning algorithm from scratch. For loss optimization, [38] re-analyzes the existing loss functions and presents them in a combined formula. [67] observes that the search space used in [38] can be too complex, and propose to simplify the search space via heuristics. ...
... For loss optimization, [38] re-analyzes the existing loss functions and presents them in a combined formula. [67] observes that the search space used in [38] can be too complex, and propose to simplify the search space via heuristics. In contrast to these works targeting supervised training scenarios, we aim to adapt loss function learning principles to the FSOD problem. ...
Preprint
Few-shot object detection, the problem of modelling novel object detection categories with few training instances, is an emerging topic in the area of few-shot learning and object detection. Contemporary techniques can be divided into two groups: fine-tuning based and meta-learning based approaches. While meta-learning approaches aim to learn dedicated meta-models for mapping samples to novel class models, fine-tuning approaches tackle few-shot detection in a simpler manner, by adapting the detection model to novel classes through gradient based optimization. Despite their simplicity, fine-tuning based approaches typically yield competitive detection results. Based on this observation, we focus on the role of loss functions and augmentations as the force driving the fine-tuning process, and propose to tune their dynamics through meta-learning principles. The proposed training scheme, therefore, allows learning inductive biases that can boost few-shot detection, while keeping the advantages of fine-tuning based approaches. In addition, the proposed approach yields interpretable loss functions, as opposed to highly parametric and complex few-shot meta-models. The experimental results highlight the merits of the proposed scheme, with significant improvements over the strong fine-tuning based few-shot detection baselines on benchmark Pascal VOC and MS-COCO datasets, in terms of both standard and generalized few-shot performance metrics.
... Although online loss function learning has not been explored in the meta-learning context, some existing research outside the subfield has previously explored the possibility of adaptive loss functions, such as in (Li et al., 2019) and . However, we emphasize that these approaches are categorically different in that they do not learn the loss function from scratch; instead, they interpolate between a small subset of handcrafted loss functions, updating the loss function after each epoch. ...
Preprint
Full-text available
Loss function learning is a new meta-learning paradigm that aims to automate the essential task of designing a loss function for a machine learning model. Existing techniques for loss function learning have shown promising results, often improving a model's training dynamics and final inference performance. However, a significant limitation of these techniques is that the loss functions are meta-learned in an offline fashion, where the meta-objective only considers the very first few steps of training, which is a significantly shorter time horizon than the one typically used for training deep neural networks. This causes significant bias towards loss functions that perform well at the very start of training but perform poorly at the end of training. To address this issue we propose a new loss function learning technique for adaptively updating the loss function online after each update to the base model parameters. The experimental results show that our proposed method consistently outperforms the cross-entropy loss and offline loss function learning techniques on a diverse range of neural network architectures and datasets.
... [62] tries to learn the surrogate losses for non-differentiable and non-decomposable loss function. [63] designs a search space containing almost all popular loss functions and dynamically optimizes sampling distribution of loss functions. [64] chooses Taylor series approximations of functions as the searching space, and the learned loss functions with fixed hyperparameters can be directly employed to new datasets with noisy labels. ...
Preprint
Robust loss minimization is an important strategy for handling robust learning issue on noisy labels. Current robust losses, however, inevitably involve hyperparameters to be tuned for different datasets with noisy labels, manually or heuristically through cross validation, which makes them fairly hard to be generally applied in practice. Existing robust loss methods usually assume that all training samples share common hyperparameters, which are independent of instances. This limits the ability of these methods on distinguishing individual noise properties of different samples, making them hardly adapt to different noise structures. To address above issues, we propose to assemble robust loss with instance-dependent hyperparameters to improve their noise-tolerance with theoretical guarantee. To achieve setting such instance-dependent hyperparameters for robust loss, we propose a meta-learning method capable of adaptively learning a hyperparameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster). Specifically, through mutual amelioration between hyperparameter prediction function and classifier parameters in our method, both of them can be simultaneously finely ameliorated and coordinated to attain solutions with good generalization capability. Four kinds of SOTA robust losses are attempted to be integrated with our algorithm, and experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and generalization performance. Meanwhile, the explicit parameterized structure makes the meta-learned prediction function capable of being readily transferrable and plug-and-play to unseen datasets with noisy labels. Specifically, we transfer our meta-learned NARL-Adjuster to unseen tasks, including several real noisy datasets, and achieve better performance compared with conventional hyperparameter tuning strategy.