Fig 1 - available from: Multimedia Tools and Applications
This content is subject to copyright. Terms and conditions apply.
Multimedia system architecture  

Multimedia system architecture  

Source publication
Article
Full-text available
Parliamentary websites have become one of the most important windows for citizens and media to follow the activities of their legislatures and to hold parliaments to account. Therefore, most parliamentary institutions aim to provide new multimedia solutions capable of displaying video fragments on demand on plenary activities. This paper presents a...

Citations

... The features were extracted and used to parameterize two probabilistic representations per frame (i.e., one per feature). Then, neighboring frames were measured via KL-Divergence and compared with a threshold of 0.1 [62]. A pair of frames that scored below the threshold were assumed to be shot boundary frames for V videos of size T , i.e., v t ∈ {1, 2, . . . ...
Article
Full-text available
Kinship is a soft biometric detectable in media with an abundance of practical applications. Despite the difficulty of detecting kinship, annual data challenges using still-images have consistently improved performances and attracted new researchers. Now, systems reach performance levels unforeseeable a decade ago, closing in on performances acceptable to deploy in practice. Similar to other biometric tasks, we expect systems can benefit from additional modalities. We hypothesize that adding modalities to FIW, which contains only still-images, will improve performance. Thus, to narrow the gap between research and reality and enhance the power of kinship recognition systems, we extend FIW with multimedia (MM) data (i.e., video, audio, and text captions). Specifically, we introduce the first publicly available multi-task MM kinship dataset. To build FIW MM, we developed machinery to automatically collect, annotate, and prepare the data, requiring minimal human input and no financial cost. The proposed MM corpus allows the problem statements to be more realistic template-based protocols. We show significant improvements in all benchmarks with the added modalities. The results highlight edge cases to inspire future research with different areas of improvement. FIW MM provides the data required to increase the potential of automated systems to detect kinship in MM. It also allows experts from diverse fields to collaborate in novel ways.
... 3) Visual branch. We first split a video into scenes using two global measures under the assumption that, statistically, neighboring frames will match as close as 90%: HSV (i.e., color) and local binary patterns [156] (i.e., texture) features were extracted and used to parameterize two probabilistic representations per frame, which were compared using KL-Divergence and compared to a threshold of 0.1 [254]. This produces a set of shots for each of the V videos of size C, i.e., v c ∈ {1, . . . ...
Preprint
Full-text available
We built the largest database for kinship recognition. The data were labeled using a novel clustering algorithm that used label proposals as side information to guide more accurate clusters. Great savings in time and human input was had. Statistically, FIW shows enormous gains over its predecessors. We have several benchmarks in kinship verification, family classification, tri-subject verification, and large-scale search and retrieval. We also trained CNNs on FIW and deployed the model on the renowned KinWild I and II to gain SOTA. Most recently, we further augmented FIW with MM. Now, video dynamics, audio, and text captions can be used in the decision making of kinship recognition systems. We expect FIW will significantly impact research and reality. Additionally, we tackled the classic problem of facial landmark localization. A majority of these networks have objectives based on L1 or L2 norms, which inherit several disadvantages. The locations of landmarks are determined from generated heatmaps from which predicted landmark locations get penalized without accounting for the spread: a high scatter corresponds to low confidence and vice-versa. To address this, we introduced an objective that penalizes for low confidence. Another issue is a dependency on labeled data, which is expensive to collect and susceptible to error. We addressed both issues by proposing an adversarial training framework that leverages unlabeled data to improve model performance. Our method claims SOTA on renowned benchmarks. Furthermore, our model is robust with a reduced size: 1/8 the number of channels is comparable to SOTA in real-time on a CPU. Finally, we built BFW to serve as a proxy to measure bias across ethnicity and gender subgroups, allowing us to characterize FR performances per subgroup. We show performances are non-optimal when a single threshold is used to determine whether sample pairs are genuine.
... In this situation, Resnet T improves at least 44.69% compared to the other descriptors, reaching 76.51% in recording 2907. Furthermore, our system is compared with our previous work [51]-as far as we know, the only existing approach in this scenario, i.e., face-based intervener re-identification in open-world parliamentary debates sessions. Additionally, face recognition approaches focusing on the closed world are used to extend the comparative of the proposed ILRA approach. ...
Article
Full-text available
Transparency laws facilitate citizens to monitor the activities of political representatives. In this sense, automatic or manual diarization of parliamentary sessions is required, the latter being time consuming. In the present work, this problem is addressed as a person re-identification problem. Re-identification is defined as the process of matching individuals under different camera views. This paper, in particular, deals with open world person re-identification scenarios, where the captured probe in one camera is not always present in the gallery collected in another one, i.e., determining whether the probe belongs to a novel identity or not. This procedure is mandatory before matching the identity. In most cases, novelty detection is tackled applying a threshold founded in a linear separation of the identities. We propose a threshold-less approach to solve the novelty detection problem, which is based on a one-class classifier and therefore it does not need any user defined threshold. Unlike other approaches that combine audio-visual features, an Isometric LogRatio transformation of a posteriori (ILRA) probabilities is applied to local and deep computed descriptors extracted from the face, which exhibits symmetry and can be exploited in the re-identification process unlike audio streams. These features are used to train the one-class classifier to detect the novelty of the individual. The proposal is evaluated in real parliamentary session recordings that exhibit challenging variations in terms of pose and location of the interveners. The experimental evaluation explores different configuration sets where our system achieves significant improvement on the given scenario, obtaining an average F measure of 71.29% for online analyzed videos. In addition, ILRA performs better than face descriptors used in recent face-based closed world recognition approaches, achieving an average improvement of 1.6% with respect to a deep descriptor.
Article
Given the rising psychological challenges encountered by university students, there is an imperative to address the pressing need for enhancing their psychological capital. This study is to design an innovative multimedia system that seeks to offer comprehensive psychological support and promotion mechanisms for university students. This is achieved through the integrated use of various media forms. Multimedia system group counseling was employed to assess and enhance the psychological capital of college students. This study comprises two main components: first, an analysis of the application of multimedia technology in education, and second, an empirical investigation into college students’ psychological capital through a questionnaire survey. The findings reveal that the introduction of group counseling via a multimedia system significantly enhances the psychological capital of college students. This improvement in psychological capital positively impacts the well-being and mental states of students and contributes novel ideas to mental health education for college students. The effectiveness of the group counseling intervention scheme within the multimedia system is evident, suggesting its potential for widespread adoption. The utilization of multimedia systems in educational settings emphasizes the importance of positive psychology for students and contributes to cultivating a positive and healthy psychological state. This study serves as a valuable reference for enhancing the psychological capital of college students, focusing on aspects such as independent thinking, decision-making, and execution.
Conference Paper
Full-text available
Automatic labelling of speakers is an essential task for speakers diarization in parliamentary debates given the huge amount of video data to annotate. In this paper, we address the speaker diarization problem as a visual speaker re-identification issue with a special emphasis on the analysis of different shot types. We propose two approaches that makes use of convolutional neural networks (CNN) and biometric traits for keyframe extraction. Experimental results have been evaluated with challenging real-world datasets from the Canary Islands Parliament, and contrasted with a similar approach that does not analyze the shot type. Results show that the use of CNN for shot classification and biometric traits help to improve the performance of the re-identification outcomes in an average rate of 9.8 %.