Fig 5 - uploaded by Youming deng
Content may be subject to copyright.
Histograms for positive and negative distance distributions of randomly selected 10000 images on the CUB-200-2011 train/test set for: (a)/(c) Initial state: randomly initialized net, (b)/(d) Network training with N-pair Center loss.

Histograms for positive and negative distance distributions of randomly selected 10000 images on the CUB-200-2011 train/test set for: (a)/(c) Initial state: randomly initialized net, (b)/(d) Network training with N-pair Center loss.

Context in source publication

Context 1
... CUB-200-2011 dataset. And our method has a close performance as the best ensemble model that uses multiple networks on CARS196 dataset, but our framework has less parameters and shorter embedding size. The embedding distribution of the positive and negative pairs of randomly selected 10000 images from unseen CUB-200-2011 test set is depicted in Fig. 5. The result shows that the N-pair Center loss makes the distance distribution of positive and negative pairs less overlapping than the pre-trained ResNet-18 network, and also demonstrates that our method can effectively spread out the ...

Similar publications

Preprint
Full-text available
We present novel measurements from a field campaign that aims to characterize multi-scale flow patterns, ranging from 0.1 to 10 km, in a mountainous region in Northwestern Spain with a mountain-valley-ridge configuration. We select two flow cases where topographic-flow interactions were measured by five synchronized scanning Doppler wind lidars alo...
Article
Full-text available
Objective: Accurate segmentation of head and neck (H&N) tumors is critical in radiotherapy. However, the existing methods lack effective strategies to integrate local and global information, strong semantic information and context information, and spatial and channel features, which are effective clues to improve the accuracy of tumor segmentation...
Article
Full-text available
We introduce a novel deep learning-based framework to interpret 3D urban scenes represented as textured meshes. Based on the observation that object boundaries typically align with the boundaries of planar regions, our framework achieves semantic segmentation in two steps: planarity-sensible over-segmentation followed by semantic classification. Th...
Preprint
Full-text available
This paper proposes a human-aware deblurring model that disentangles the motion blur between foreground (FG) humans and background (BG). The proposed model is based on a triple-branch encoder-decoder architecture. The first two branches are learned for sharpening FG humans and BG details, respectively; while the third one produces global, harmoniou...

Citations

Article
Unmanned Aerial Vehicles (UAVs) rely on satellite systems for stable positioning. However, due to limited satellite coverage or communication disruptions, UAVs may lose signals for positioning. In such situations, vision-based techniques can serve as an alternative, ensuring the self-positioning capability of UAVs. However, most of the existing datasets are developed for the geo-localization task of the objects captured by UAVs, rather than UAV self-positioning. Furthermore, the existing UAV datasets apply discrete sampling to synthetic data, such as Google Maps, neglecting the crucial aspects of dense sampling and the uncertainties commonly experienced in practical scenarios. To address these issues, this paper presents a new dataset, DenseUAV, that is the first publicly available dataset tailored for the UAV self-positioning task. DenseUAV adopts dense sampling on UAV images obtained in low-altitude urban areas. In total, over 27K UAV- and satellite-view images of 14 university campuses are collected and annotated. In terms of methodology, we first verify the superiority of Transformers over CNNs for the proposed task. Then we incorporate metric learning into representation learning to enhance the model’s discriminative capacity and to reduce the modality discrepancy. Besides, to facilitate joint learning from both the satellite and UAV views, we introduce a mutually supervised learning approach. Last, we enhance the Recall@K metric and introduce a new measurement, SDM@K, to evaluate both the retrieval and localization performance for the proposed task. As a result, the proposed baseline method achieves a remarkable Recall@1 score of 83.01% and an SDM@1 score of 86.50% on DenseUAV. The dataset and code have been made publicly available on https://github.com/Dmmm1997/DenseUAV .
Article
Cross-view geo-localization is finding images containing the same geographic target in multi-views. For example, given a query image from UAV view, a proposed matching model can find an exact image of the same location in a gallery collected by satellites. Using a UAV-view image to acquire the true-matched satellite-view image with a geo-tag, the current geographic location of the UAV can be easily localized based on flight records. However, due to the extreme change of viewpoints across platforms, traditional image processing methods have met difficulties matching multi-view images. This paper proposed advanced neural network-based approaches, which applied the attention mechanism to the feature learning process to improve the ability to learn essential features from the input image. A different pooling method was also implemented to increase the global descriptor. Our proposed models have significantly improved accuracy and have achieved competitive results on the University-1652 dataset.
Article
Full-text available
The online retrieval of images related to clothes is a crucial task because finding the exact items like the query image from a large amount of data is extremely challenging. However, large variations in clothes images degrade the retrieval accuracy of visual searches. Another problem with retrieval accuracy is high dimensions of feature vectors obtained from pre-trained deep CNN models. This research is an effort to enhance the training and test accuracy of clothes retrieval by using two different means. Initially, features are extracted using the modified AlexNet (M-AlexNet) with little modification in which ReLU activation function is replaced with a self-regularized Mish activation function because of its non-monotonic nature. The M-AlexNet with Mish is trained on CIFAR-10 dataset using SoftMax classifier. Another contribution is to reduce the dimensions of feature vectors obtained from M-AlexNet. The dimensions of features are reduced by selecting the top k ranked features and removing some of the dissimilar features using the proposed Joint Shannon’s Entropy Pearson Correlation Coefficient (JSE-PCC) technique to enhance the clothes retrieval performance. To calculate the efficacy of suggested methods, the comparison is performed with other deep CNN models such as baseline AlexNet, VGG-16, VGG-19, and ResNet50 on DeepFashion2, MVC, and the proposed Clothes Image Dataset (CID). Extensive experiments indicate that AlexNet with Mish attains 85.15%, 82.04%, and 83.65% accuracy on DeepFashion2, MVC, and 83.65% on CID datasets respectively. Hence, M-AlexNet and the proposed feature selection technique surpassed the results with a margin of 5.11% on DeepFashion2, 1.95% on MVC, and 3.51% CID datasets.