A holistic scene understanding approach to semantic segmentation consists of a conditional random field (CRF) model that jointly reasons about: (a) classification of local patches (segmentation), (b) object detection, (c) shape analysis, (d) scene recognition and (e) contextual reasoning. In this paper we analyze the relative importance of each of these components by building an array of hybrid human-machine CRFs where each component is performed by a machine (default), or replaced by human subjects or ground truth, or is removed all together (top).

Source publication

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

Conference Paper

Full-text available

Jun 2013

Recent trends in semantic image segmentation have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning. In this work, we are interested in understanding the roles of these different tasks in aiding semantic segmentation. Towards this...

Context 1

... is a con- ditional random field (CRF) that models the interplay be- tween segmentation and a variety of components such as local super-pixel appearance, object detection, scene recog- nition, shape analysis, class co-occurrence, and compatibil- ity of classes with scene categories. To gain insights into the relative importance of these different factors or tasks, we isolate each one, and substitute a machine with a human for that task, keeping the rest of the model intact (Figure 1). The resultant improvement in segmentation performance, if any, will give us an indication of how much "head room" there is to improve segmentation by focusing research ef- forts on that task. ...

View in full-text

A Deep Context Learning based PCB Defect Detection Model with Anomalous Trend Alarming System

Article

Mar 2023

The quality of a printed circuit board (PCB) is paramount towards ensuring proper functionality of electronic products. To achieve the required quality standards, substantial research and development efforts were invested to automate PCB inspection for defect detection, primarily using computer vision techniques. Despite these advancements, the accuracy of such techniques is often susceptible towards varying board and component size. Efforts to increase its accuracy especially for small or tiny defects on a PCB often lead to a tradeoff with reduced real-time performance, which in turn limits its applicability in the manufacturing industry. Hence, this paper puts forward an enhanced deep learning network which addresses the difficulty in inferring tiny or varying defects on a PCB in real-time. Our proposed enhancements consist of i) A novel multi-scale feature pyramid network to enhance tiny defect detection through context information inclusion; and ii) A refined complete intersection over union loss function to precisely encapsulate tiny defects. Experimental results on a publicly available PCB defects dataset demonstrate that our model achieves 99.17% mean-average precision, while maintaining real-time inferencing speed at 90 frames per second. In addition, we introduce three trend detection algorithms which alert an operator when abnormal development of defect characteristics is detected. Each algorithm is responsible for localizing defect buildups, increasing defect size and increasing defect occurrences, respectively. As a whole, the proposed model is capable of performing accurate and reliable real-time PCB inspection with the aid of an automated alert capability. The dataset and trained models are available at: https://github.com/JiaLim98/YOLO-PCB.

Context Understanding in Computer Vision: A Survey

Preprint

Full-text available

Feb 2023

Contextual information plays an important role in many computer vision tasks, such as object detection, video action detection, image classification, etc. Recognizing a single object or action out of context could be sometimes very challenging, and context information may help improve the understanding of a scene or an event greatly. Appearance context information, e.g., colors or shapes of the background of an object can improve the recognition accuracy of the object in the scene. Semantic context (e.g. a keyboard on an empty desk vs. a keyboard next to a desktop computer ) will improve accuracy and exclude unrelated events. Context information that are not in the image itself, such as the time or location of an images captured, can also help to decide whether certain event or action should occur. Other types of context (e.g. 3D structure of a building) will also provide additional information to improve the accuracy. In this survey, different context information that has been used in computer vision tasks is reviewed. We categorize context into different types and different levels. We also review available machine learning models and image/video datasets that can employ context information. Furthermore, we compare context based integration and context-free integration in mainly two classes of tasks: image-based and video-based. Finally, this survey is concluded by a set of promising future directions in context learning and utilization.

DrsNet: Dual-resolution Semantic Segmentation with Rare Class-Oriented Superpixel Prior

Article

Full-text available

Jan 2021
MULTIMED TOOLS APPL

Rare-class objects in natural scene images that are usually small and less frequent often convey more important information for scene understanding than the common ones. However, they are often overlooked in scene labeling studies due to two main reasons, low occurrence frequency and limited spatial coverage. Many methods have been proposed to enhance overall semantic labeling performance, but only a few consider rare-class objects. In this work, we present a deep semantic labeling framework with special consideration of rare classes via three techniques. First, a novel dual-resolution coarse-to-fine superpixel representation is developed, where fine and coarse superpixels are applied to rare classes and background areas respectively. This unique dual representation allows seamless incorporation of shape features into integrated global and local convolutional neural network (CNN) models. Second, shape information is directly involved during the CNN feature learning for both frequent and rare classes from the re-balanced training data, and also explicitly involved in data inference. Third, the proposed framework incorporates both shape information and the CNN architecture into semantic labeling through a fusion of probabilistic multi-class likelihood. Experimental results demonstrate competitive semantic labeling performance on two standard datasets both qualitatively and quantitatively, especially for rare-class objects.

A Human-AI Loop Approach for Joint Keyword Discovery and Expectation Estimation in Micropost Event Detection

Article

Full-text available

Apr 2020

Microblogging platforms such as Twitter are increasingly being used in event detection. Existing approaches mainly use machine learning models and rely on event-related keywords to collect the data for model training. These approaches make strong assumptions on the distribution of the relevant microposts containing the keyword – referred to as the expectation of the distribution – and use it as a posterior regularization parameter during model training. Such approaches are, however, limited as they fail to reliably estimate the informativeness of a keyword and its expectation for model training. This paper introduces a Human-AI loop approach to jointly discover informative keywords for model training while estimating their expectation. Our approach iteratively leverages the crowd to estimate both keyword-specific expectation and the disagreement between the crowd and the model in order to discover new keywords that are most beneficial for model training. These keywords and their expectation not only improve the resulting performance but also make the model training process more transparent. We empirically demonstrate the merits of our approach, both in terms of accuracy and interpretability, on multiple real-world datasets and show that our approach improves the state of the art by 24.3%.

Fluid Annotation: A Human-Machine Collaboration Interface for Full Image Annotation

Conference Paper

Oct 2018

We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid annotation is based on three principles:(I) Strong Machine-Learning aid. We start from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cover missing objects, and removing incorrect regions.The edit operations are also assisted by the model.(II) Full image annotation in a single pass. As opposed to performing a series of small annotation tasks in isolation [51,68], we propose a unified interface for full image annotation in a single pass.(III) Empower the annotator.We empower the annotator to choose what to annotate and in which order. This enables concentrating on what the ma-chine does not already know, i.e. putting human effort only on the errors it made. This helps using the annotation budget effectively. Through extensive experiments on the COCO+Stuff dataset [11,51], we demonstrate that Fluid Annotation leads to accurate an-notations very efficiently, taking 3x less annotation time than the popular LabelMe interface [70].

Fluid Annotation: a human-machine collaboration interface for full image annotation

Preprint

Jun 2018

We introduce Fluid Annotation, an intuitive human-machine collaboration interface for annotating the class label and outline of every object and background region in an image. Fluid Annotation starts from the output of a strong neural network model, which the annotator can edit by correcting the labels of existing regions, adding new regions to cover missing objects, and removing incorrect regions. Fluid annotation has several attractive properties: (a) it is very efficient in terms of human annotation time; (b) it supports full images annotation in a single pass, as opposed to performing a series of small tasks in isolation, such as indicating the presence of objects, clicking on instances, or segmenting a single object known to be present. Fluid Annotation subsumes all these tasks in one unified interface. (c) it empowers the annotator to choose what to annotate and in which order. This enables to put human effort only on the errors the machine made, which helps using the annotation budget effectively. Through extensive experiments on the COCO+Stuff dataset, we demonstrate that Fluid Annotation leads to accurate annotations very efficiently, taking three times less annotation time than the popular LabelMe interface.

Semantic Video Segmentation: A Review on Recent Approaches

Preprint

Full-text available

Jun 2018

This paper gives an overview on semantic segmentation consists of an explanation of this field, it's status and relation with other vision fundamental tasks, different datasets and common evaluation parameters that have been used by researchers. This survey also includes an overall review on a variety of recent approaches (RDF, MRF, CRF, etc.) and their advantages and challenges and shows the superiority of CNN-based semantic segmentation systems on CamVid and NYUDv2 datasets. In addition, some areas that is ideal for future work have mentioned.

COCO-Stuff: Thing and Stuff Classes in Context

Conference Paper

Full-text available

Jun 2018

Crowdsourcing – A Step Towards Advanced Machine Learning

Article

Full-text available

Jan 2018

This paper avails an in-depth analysis of the field of crowdsourcing and its impacts when used in the world of machine learning. It comprises of various contributions that crowdsourcing can make to improvise the techniques that employ machine learning like – producing data, debugging and checking of models, hybrid smart machines to reduce the human intervention required to facilitate high quality performance by artificial intelligence and developmental experimentation to improve human-computer interaction. A discussion regarding the nature of crowd-workers follows next which focuses on various factors like their reaction to different forms of motivation, their behaviour towards each other and deceit among them. The takeaways of this paper include a few tips and routines to be followed to achieve success through crowdsourcing.

A hierarchical inferential method for indoor scene classification

Article

Full-text available

Dec 2017
INT J AP MAT COM-POL

Indoor scene classification forms a basis for scene interaction for service robots. The task is challenging because the layout and decoration of a scene vary considerably. Previous studies on knowledge-based methods commonly ignore the importance of visual attributes when constructing the knowledge base. These shortcomings restrict the performance of classification. The structure of a semantic hierarchy was proposed to describe similarities of different parts of scenes in a fine-grained way. Besides the commonly used semantic features, visual attributes were also introduced to construct the knowledge base. Inspired by the processes of human cognition and the characteristics of indoor scenes, we proposed an inferential framework based on the Markov logic network. The framework is evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.

Context in source publication

Citations