July 2023
·
8 Reads
·
3 Citations
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
July 2023
·
8 Reads
·
3 Citations
May 2023
·
35 Reads
Modern generators render talking-head videos with impressive levels of photorealism, ushering in new user experiences such as videoconferencing under constrained bandwidth budgets. Their safe adoption, however, requires a mechanism to verify if the rendered video is trustworthy. For instance, for videoconferencing we must identify cases in which a synthetic video portrait uses the appearance of an individual without their consent. We term this task avatar fingerprinting. We propose to tackle it by leveraging facial motion signatures unique to each person. Specifically, we learn an embedding in which the motion signatures of one identity are grouped together, and pushed away from those of other identities, regardless of the appearance in the synthetic video. Avatar fingerprinting algorithms will be critical as talking head generators become more ubiquitous, and yet no large scale datasets exist for this new task. Therefore, we contribute a large dataset of people delivering scripted and improvised short monologues, accompanied by synthetic videos in which we render videos of one person using the facial appearance of another. Project page: https://research.nvidia.com/labs/nxp/avatar-fingerprinting/.
December 2022
·
25 Reads
Journal of Vision
November 2022
·
41 Reads
·
6 Citations
ACM Transactions on Applied Perception
Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer’s visual field, of particular importance with the rise of high-resolution and wide field-of-view display devices. However, while variations in acuity and contrast sensitivity across the field of view have been well-studied and modeled, a more consequential variation concerns peripheral vision’s degradation in the face of clutter, known as crowding. Understanding of peripheral crowding has greatly advanced in recent years, in terms of both phenomenology and modeling. Accurately leveraging this knowledge is critical for many applications, as peripheral vision covers a majority of pixels in the image. We advance computational models for peripheral vision aimed toward their eventual use in computer graphics. In particular, researchers have recently developed high-performing models of peripheral crowding, known as “pooling” models, which predict a wide range of phenomena, but are computationally inefficient. We reformulate the problem as a dataflow computation which enables faster processing and operating on larger images. Further, we account for the explicit encoding of “end stopped” features in the image, which was missing from previous methods. We evaluate our model in the context of perception of textures in the periphery, including a novel texture data set and updated textural descriptors. Our improved computational framework may simplify development and testing of more sophisticated, complete models in more robust and realistic settings relevant to computer graphics.
July 2021
·
1,192 Reads
Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer's visual field, especially with the rise of high resolution and wide field-of-view display devices. However, while the ongoing study of foveal vision is advanced, much less is known about how humans process imagery in the periphery of their vision -- which comprises, at any given moment, the vast majority of the pixels in the image. We advance computational models for peripheral vision aimed toward their eventual use in computer graphics. In particular, we present a dataflow computational model of peripheral encoding that is more efficient than prior pooling - based methods and more compact than contrast sensitivity-based methods. Further, we account for the explicit encoding of "end stopped" features in the image, which was missing from previous methods. Finally, we evaluate our model in the context of perception of textures in the periphery. Our improved peripheral encoding may simplify development and testing of more sophisticated, complete models in more robust and realistic settings relevant to computer graphics.
November 2020
·
224 Reads
·
19 Citations
Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67∘ at 400 Hz while demanding only 16 mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57∘ at 250 Hz using 800 mW.
September 2020
·
307 Reads
Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67{\deg} at 400Hz while demanding only 16mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57{\deg} at 250 Hz using 800 mW.
February 2020
·
34 Reads
·
13 Citations
Foveation and (de)focus are two important visual factors in designing near eye displays. Foveation can reduce computational load by lowering display details towards the visual periphery, while focal cues can reduce vergence-accommodation conflict thereby lessening visual discomfort in using near eye displays. We performed two psychophysical experiments to investigate the relationship between foveation and focus cues. The first study measured blur discrimination sensitivity as a function of visual eccentricity, where we found discrimination thresholds significantly lower than previously reported. The second study measured depth discrimination threshold where we found a clear dependency on visual eccentricity. We discuss the study results and suggest further investigation.
February 2020
·
41 Reads
·
24 Citations
IEEE Transactions on Visualization and Computer Graphics
Emergent in the field of head mounted display design is a desire to leverage the limitations of the human visual system to reduce the computation, communication, and display workload in power and form-factor constrained systems. Fundamental to this reduced workload is the ability to match display resolution to the acuity of the human visual system, along with a resulting need to follow the gaze of the eye as it moves, a process referred to as foveation. A display that moves its content along with the eye may be called a Foveated Display, though this term is also commonly used to describe displays with non-uniform resolution that attempt to mimic human visual acuity. We therefore recommend a definition for the term Foveated Display that accepts both of these interpretations. Furthermore, we include a simplified model for human visual Acuity Distribution Functions (ADFs) at various levels of visual acuity, across wide fields of view and propose comparison of this ADF with the Resolution Distribution Function of a foveated display for evaluation of its resolution at a particular gaze direction. We also provide a taxonomy to allow the field to meaningfully compare and contrast various aspects of foveated displays in a display and optical technology-agnostic manner.
November 2019
·
128 Reads
·
27 Citations
In competitive sports, human performance makes the difference between who wins and loses. In some competitive video games (esports), response time is an essential factor of human performance. When the athlete’s equipment (computer, input and output device) responds with lower latency, it provides a measurable advantage. In this study, we isolate latency and refresh rate by artificially increasing latency when operating at high refresh rates. Eight skilled esports athletes then perform gaming-inspired first person targeting tasks under varying conditions of refresh rate and latency, completing the tasks as quickly as possible. We show that reduced latency has a clear benefit in task completion time while increased refresh rate has relatively minor effects on performance when the inherent latency reduction present at high refresh rates is removed. Additionally, for certain tracking tasks, there is a small, but marginally significant effect from high refresh rates alone.
... One challenge of pyramid-based models of peripheral vision is in determining which statistics are calculated in each pooling region. Although most pyramid-based texture models used to study peripheral vision have been validated through human behavioral studies, they still utilize statistic sets that are historically driven, vary study-to-study from previous literature, and are consistently insufficient to capture the wide variety of possible textures Brown et al. (2021). ...
November 2022
ACM Transactions on Applied Perception
... Although few existing video conferencing solutions rely on it (e.g., D'Angelo & Begel, 2017), gaze tracking may play an important role in maintaining gaze awareness in the future. Fortunately, gaze tracking technology is already quite effective and quickly becoming more so: recent systems have achieved a refresh rate of 10,000 Hz using less than 12 Mbits of bandwidth (Angelopoulos et al., 2021), or even power draws as low as 16 mW that are still accurate to within 2.67°w hile maintaining 400-Hz refresh rates (Li et al., 2020). Power and refresh rate concerns are especially important for XR headsets, in which power and latency can hinder not only eye-tracking effectiveness, but general comfort. ...
November 2020
... However, the depthof-field is severely limited by the gap between liquid crystal layers. Light field display technology based on micro-lens array [18] is considered to be one of the best solutions for commercial autostereoscopic display due to its ability to provide high angular resolution and a large depth-of-field range without vergence-accommodation conflict (VAc) [19,20] . However, the main challenges associated with this technology are narrow field-of-view and spatial resolution loss. ...
February 2020
... A relatively untapped avenue is the operability of these systems with multiple users on non-single-view displays. This future line is especially relevant as displays tend to grow in size, together with light field displays that enable watching a scenario from different perspectives (Spjut et al. 2020). Hence, narrowing down the number of perspectives to be rendered and discarding those not directed toward any viewer may help in reducing computations. ...
February 2020
IEEE Transactions on Visualization and Computer Graphics
... The interference from displays of some electrical devices (such as TV, monitor, smartphone, etc.) is the most likely to affect EM Eye since the EM emission pattern of these displays is similar to that of the camera. However, modern displays offer refresh rates of 60, 120, or even 240 fps [36], whereas embedded cameras' frame rates are often limited to 30 fps. Therefore, adversaries can distinguish camera emissions from the display's interference by setting the center frequency at those frequencies with no repetitions above 30 Hz to minimize the interference. ...
November 2019
... Accordingly, Attig et al. [2] concluded that previous guidelines recommending a maximum latency of 100 ms are outdated. To gain an advantage and reduce end-to-end latency, the industry offers hardware peripherals with lower latency or higher refresh rate on monitors [22,24]. ...
September 2019
Journal of Vision
... However, it is still unclear how degradation of the peripheral image impacts attentional behavior and task performance in VR. Previous studies have shown that non-immersive gaze-contingent displays affect task performance (e.g., reading speed) negatively (Albert et al. 2019); therefore, further research is needed to understand the effect of GCDs on task performance in VR. Moreover, most of the eye tracking devices integrated in VR HMDs have high latency, lack precision and do not have simple calibration procedures. ...
September 2019
... The majority of hardware-based methods for prescription correction [5][6][7] could result in VR/AR headsets that are bulkier and more expensive, requiring the upgrading of components with new devices. On the other hand, algorithmic approaches to prescription correction enable tackling the prescription issue without the need for specialized components and with the benefit of software updates [8]. ...
July 2019
... Several popular VR devices, such as Meta Quest Pro, Pico 4 Pro, and Apple Vision Pro, have already incorporated eye-tracking functionality. The integration of eye-tracking in AR/VR devices helps improve graphic computation, e.g., foveated rendering [1], [2], [3] and varifocal display [4]. Eye-tracking also helps to enhance the interactive experience in AR/VR [5], [6]. ...
July 2019
ACM Transactions on Graphics
... Gaze estimation is a crucial task in computer vision that aims to accurately determine the direction of a person's gaze based on visual cues. In recent years, gaze estimation has gained significant attention due to its wide-ranging applications in fields such as human-computer interaction (Majaranta and Bulling 2014) (Rahal and Fiedler 2019), virtual reality (Patney et al. 2016) (Kim et al. 2019), and assistive technology (Jiang and Zhao 2017) (Liu, Li, and Yi 2016) (Dias et al. 2020). Benefiting from the deep learning techniques and large-scale training data, appearancebased gaze estimation has made rapid progress and achieved promising results. ...
May 2019