Figure 2 - uploaded by Zhi-Hong Deng
Content may be subject to copyright.
Schematic illustration.

Schematic illustration.

Source publication
Conference Paper
Full-text available
As an effective technology for navigating a large num-ber of images, image summarization is becoming a promising task with the rapid development of image sharing sites and social networks. Most existing sum-marization approaches use the visual-based features for image representation without considering tag informa-tion. In this paper, we propose a...

Context in source publication

Context 1
... propose a joint framework for image summarization. The schematic illustration is shown in Figure 2. Assume α and β ∈ R n are the indicator vectors, measuring the "rep- resentativeness" of each image from the visual and textual viewpoints respectively. ...

Similar publications

Article
Full-text available
We explore the use of the recently proposed "total nuclear variation" (TNV) as a regularizer for reconstructing multi-channel, spectral CT images. This convex penalty is a natural extension of the total variation (TV) to vector-valued images and has the advantage of encouraging common edge locations and a shared gradient direction among image chann...

Citations

... For example, a photo of Icelandic landscapes and a photo of British landscapes may look similar, but text tags can help us distinguish their difference of concepts. In [103], [106], [107] tags are clustered based on semantic features and images are re-assigned into each cluster according to their tags' semantic clusters. ...
Preprint
Recent years have witnessed the resurgence of knowledge engineering which is featured by the fast growth of knowledge graphs. However, most of existing knowledge graphs are represented with pure symbols, which hurts the machine's capability to understand the real world. The multi-modalization of knowledge graphs is an inevitable key step towards the realization of human-level machine intelligence. The results of this endeavor are Multi-modal Knowledge Graphs (MMKGs). In this survey on MMKGs constructed by texts and images, we first give definitions of MMKGs, followed with the preliminaries on multi-modal tasks and techniques. We then systematically review the challenges, progresses and opportunities on the construction and application of MMKGs respectively, with detailed analyses of the strength and weakness of different solutions. We finalize this survey with open research problems relevant to MMKGs.
... These text summarization techniques have been used in diverse tasks such as micro-blog summarization [35], query-based summarization [13], timeline summarization [38,41], comparative summarization [6], medical report summarization [1,46], etc. Other than text summarization, researchers have also explored the areas of image summarization [36,45], and video summarization [44,47]. Recent years have also shown growth in the area of multi-modal summarization. ...
Conference Paper
Large amounts of multi-modal information online make it difficult for users to obtain proper insights. In this paper, we introduce and formally define the concepts of supplementary and complementary multi-modal summaries in the context of the overlap of information covered by different modalities in the summary output. A new problem statement of combined complementary and supplementary multi-modal summarization (CCS-MMS) is formulated. The problem is then solved in several steps by utilizing the concepts of multi-objective optimization by devising a novel unsupervised framework. An existing multi-modal summarization data set is further extended by adding outputs in different modalities to establish the efficacy of the proposed technique. The results obtained by the proposed approach are compared with several strong baselines; ablation experiments are also conducted to empirically justify the proposed techniques. Furthermore, the proposed model is evaluated separately for different modalities quantitatively and qualitatively, demonstrating the superiority of our approach.
... Providing accurate recommendation has been widely recognized as a corner-stone task for various applications, e.g., applications in the fields of recreation, shopping, academic, etc. [1,2]. Collaborative filtering (CF) is a well-adopted method in literature due to its simplicity and effectiveness in recent years. ...
Conference Paper
The primary purpose of recommender system is providing valuable and pertinent information actively according to users' preference. Collaborative filtering (CF) is a successful approach among different recommendation techniques. However, CF-based methods usually suffer from limited performance due to the sparsity of rating data. To address such issues, utilizing auxiliary information such as content information and social networks is a very promising approach. This paper proposes an effective Bayesian hierarchical framework called Deep Trust Recommender (DTR), which aims to improve the quality of recommender systems by integrating user-item rating information and social trust network into the embedding representation. Specifically, we first apply deep learning methods like stacked denoising auto-encoders to learn the approximation representation of the users preference from the users trust network, and then take this representation to train with rating information collaboratively. Extensive experiments on two real-world datasets reveal that our approach performs much better than several widely adopted state-of-the-art recommendation methods.
... Besides [365,372], other works in the literature rely on external knowledge to accomplish the task of image summarization [60, 353,439]. Camargo et al. [60] combine textual and visual contents of a collection in the same latent semantic space, using Convex Non-Negative Matrix Factorization, to generate multimodal image collection summaries. ...
... They provide the knowledge about the concepts in a domain and are used to derive a set of ontology-based features for measuring the semantic similarity between images. Finally, [439] jointly leverages image content and associated tags and encodes the selection of images in two vectors, for the visual and textual domain respectively, whose non-zero elements represent the images to be included in the summary. The optimization process makes use of a similarity-inducing regularizer imposed on the two vectors to encourage the summary images to be representative in both visual and textual domains. ...
Chapter
Full-text available
Thanks to the spread of digital photography and available devices, taking photographs has become effortless and tolerated nearly everywhere. This makes people easily ending up with hundreds or thousands of photos, for example, when returning from a holiday trip or taking part in ceremonies, concerts, and other events. Furthermore, photos are also taken of more mundane motives, such as food and aspects of everyday life, further increasing the number of photos to be dealt with. The decreased prices of storage devices make dumping the whole set of photos common and affordable. However, this practice frequently makes the stored collections a kind of dark archives, which are rarely accessed and enjoyed again in the future. The big size of the collections makes revisiting them time demanding. This suggests to identify, with the support of automated methods, the sets of most important photos within the whole collections and to invest some preservation effort for keeping them accessible over time. Evaluating the importance of photos to their owners is a complex process, which is often driven by personal attachment, memories behind the content, and personal tastes that are difficult to capture automatically. Therefore, to better understand the selection process for photo preservation and future revisiting, the first part of this chapter presents a user study on a photo selection task where participants selected subsets of most important pictures from their own collections. In the second part of this chapter, we present methods to automatically select important photos from personal collections, in light of the insights emerged from the user study. We model a notion of photo importance driven by user expectations, which represents what photos users perceive as important and would have selected. We present an expectation-oriented method for photo selection, where information at both photo and collection levels is considered to predict the importance of photos.
... They propose a joint optimization method for image summarization. They use an objective function to optimize representativeness of images in both visual and textual space [67]. ...
Article
Full-text available
With the advent of digital cameras, the number of digital images is on the increase. As a result, image collection summarization systems are proposed to provide users with a condense set of summary images as a representative set to the original high volume image set. In this paper, a semantic knowledge-based approach for image collection summarization is presented. Despite ontology and knowledge-based systems have been applied in other areas of image retrieval and image annotation, most of the current image summarization systems make use of visual or numeric metrics for conducting the summarization. Also, some image summarization systems jointly model visual data of images together with their accompanying textual or social information, while these side data are not available out of the context of web or social images. The main motivation of using ontology approach in this study is its ability to improve the result of computer vision tasks by the additional knowledge which it provides to the system. We defined a set of ontology based features to measure the amount of semantic information contained in each image. A semantic similarity graph was made based on semantic similarities. Summary images were then selected based on graph centrality on the similarity graph. Experimental results showed that the proposed approach worked well and outperformed the current image summarization systems.
... Secondly, in tag enrichment, synonyms of existing tags are likely to be selected due to their cooccurrence in other images (e.g. low-rank tag enrichment in [16]). Although the synonyms are "correct" tags, they do not provide additional textual information. ...
Conference Paper
Full-text available
Images without annotations are ubiquitous on the Internet, and recommending tags for them has become a challenging open task in image understanding. A common bottleneck of related work is the semantic gap between the image and text representations. In this paper, we bridge the gap by introducing a semantic layer, the space of word embeddings that represents the image tags as the word vectors. Our model first learns the optimal mapping from the visual space to the semantic space using training sources. Then we annotate test images by decoding the semantic representations of the visual features. Extensive experiments demonstrate that our model outperforms the state-of-the-art approaches in predicting the image tags.
... Rank models merely provide important sentences by score, and hence are not able to cover all aspects of the original corpus. Inspired by data compression and reconstruction (Simon, Snavely, and Seitz 2007a;Yang et al. 2013;Yu et al. 2014), a good summary should recover the whole documents, or in other words, reconstruct the whole documents. Based on the assumption, we think a good summary should meet three key requirements: Coverage, Sparsity and Diversity. ...
Article
Multi-document summarization is of great value to many real world applications since it can help people get the main ideas within a short time.In this paper, we tackle the problem of extracting summary sentences from multi-document sets by applying sparse coding techniques and present a novel framework to this challenging problem. Based on the data reconstruction and sentence denoising assumption, we present a two-level sparse representation model to depict the process of multi-document summarization. Three requisite properties is proposed to form an ideal reconstructable summary: Coverage, Sparsity and Diversity. We then formalize the task of multi-document summarization as an optimization problem according to the above properties, and use simulated annealing algorithm to solve it.Extensive experiments on summarization benchmark data sets DUC2006 and DUC2007 show that our proposed model is effective and outperforms the state-of-the-art algorithms.
... Due to the abundance of online resources like articles, movies, and music, tagging systems (Yu et al. 2014) have become increasingly important for organizing and indexing them. For example, CiteULike 1 uses tags to help categorize millions of articles online and Flickr 2 allows users to use tags to organize their photos. ...
Conference Paper
Full-text available
Tag recommendation has become one of the most important ways of organizing and indexing online resources like articles, movies, and music. Since tagging information is usually very sparse, effective learning of the content representation for these resources is crucial to accurate tag recommendation. Recently, models proposed for tag recommendation, such as collaborative topic regression and its variants, have demonstrated promising accuracy. However, a limitation of these models is that, by using topic models like latent Dirichlet allocation as the key component, the learned representation may not be compact and effective enough. Moreover, since relational data exist as an auxiliary data source in many applications, it is desirable to incorporate such data into tag recommendation models. In this paper, we start with a deep learning model called stacked denoising autoencoder (SDAE) in an attempt to learn more effective content representation. We propose a probabilistic formulation for SDAE and then extend it to a relational SDAE (RSDAE) model. RSDAE jointly performs deep representation learning and relational learning in a principled way under a probabilistic framework. Experiments conducted on three real datasets show that both learning more effective representation and learning from relational data are beneficial steps to take to advance the state of the art.
... Rank models merely provide important sentences by score, and hence are not able to cover all aspects of the original corpus. Inspired by data compression and reconstruction (Simon, Snavely, and Seitz 2007a;Yang et al. 2013;Yu et al. 2014), a good summary should recover the whole documents, or in other words, reconstruct the whole documents. Based on the assumption, we think a good summary should meet three key requirements: Coverage, Sparsity and Diversity. ...
Conference Paper
Full-text available
Multi-document summarization is of great value to many real world applications since it can help people get the main ideas within a short time. In this paper, we tackle the problem of extracting summary sentences from multi-document sets by applying sparse coding techniques and present a novel framework to this challenging problem. Based on the data reconstruction and sentence denoising assumption, we present a two-level sparse representation model to depict the process of multi-document summarization. Three requisite properties is proposed to form an ideal reconstructable summary: Coverage, Sparsity and Diversity. We then formalize the task of multi-document summarization as an optimization problem according to the above properties, and use simulated annealing algorithm to solve it. Extensive experiments on summarization benchmark data sets DUC2006 and DUC2007 show that our proposed model is effective and outperforms the state-of-the-art algorithms.