David Luebke's research works | NVIDIA and other places

This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

AI-Mediated 3D Video Conferencing

Conference Paper

July 2023

8 Reads

3 Citations

[...]

Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos

May 2023

35 Reads

[...]

Modern generators render talking-head videos with impressive levels of photorealism, ushering in new user experiences such as videoconferencing under constrained bandwidth budgets. Their safe adoption, however, requires a mechanism to verify if the rendered video is trustworthy. For instance, for videoconferencing we must identify cases in which a synthetic video portrait uses the appearance of an individual without their consent. We term this task avatar fingerprinting. We propose to tackle it by leveraging facial motion signatures unique to each person. Specifically, we learn an embedding in which the motion signatures of one identity are grouped together, and pushed away from those of other identities, regardless of the appearance in the synthetic video. Avatar fingerprinting algorithms will be critical as talking head generators become more ubiquitous, and yet no large scale datasets exist for this new task. Therefore, we contribute a large dataset of people delivering scripted and improvised short monologues, accompanied by synthetic videos in which we render videos of one person using the facial appearance of another. Project page: https://research.nvidia.com/labs/nxp/avatar-fingerprinting/.

Download

Efficient Dataflow Modeling of Peripheral Encoding

Article

December 2022

25 Reads

Journal of Vision

[...]

Efficient Dataflow Modeling of Peripheral Encoding in the Human Visual System

November 2022

41 Reads

6 Citations

ACM Transactions on Applied Perception

[...]

Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer’s visual field, of particular importance with the rise of high-resolution and wide field-of-view display devices. However, while variations in acuity and contrast sensitivity across the field of view have been well-studied and modeled, a more consequential variation concerns peripheral vision’s degradation in the face of clutter, known as crowding. Understanding of peripheral crowding has greatly advanced in recent years, in terms of both phenomenology and modeling. Accurately leveraging this knowledge is critical for many applications, as peripheral vision covers a majority of pixels in the image. We advance computational models for peripheral vision aimed toward their eventual use in computer graphics. In particular, researchers have recently developed high-performing models of peripheral crowding, known as “pooling” models, which predict a wide range of phenomena, but are computationally inefficient. We reformulate the problem as a dataflow computation which enables faster processing and operating on larger images. Further, we account for the explicit encoding of “end stopped” features in the image, which was missing from previous methods. We evaluate our model in the context of perception of textures in the periphery, including a novel texture data set and updated textural descriptors. Our improved computational framework may simplify development and testing of more sophisticated, complete models in more robust and realistic settings relevant to computer graphics.

Download

Fig. 5. (top) Demonstration of visual crowding. Fixating the cross, it is easy to identify the the isolated letter on the left, but hard when that letter is flanked by other nearby letters (adjust your view distance until this is true). (middle) We can demonstrate that crowding does not merely reflect a lack of peripheral acuity by blurring the image. If the resolution were low enough to interfere with reading the crowded 'G' on the right, it would also interfere with reading the isolated 'G' on the left. (bottom) Applying our metamer model to the image shows the perceptual asymmetry between crowded and uncrowded letters in the periphery.

Fig. 6. Visualization of a partial lattice of foveated pooling regions overlaid on an image, with a viewing position centered on the middle three penguins. As one moves from the center of fixation into peripheral vision, pooling regions become progressively larger, integrating information over larger regions, such that at the edges of the image multiple objects fit within an entire pooling region. For visualization clarity, the pooling regions shown are sparser than those used to create our metamers, and the central 5 degree foveal region has been shown without pooling.

Fig. 8. Including two additional filter orientations and end-stopping statistics noticeably improves the quality of generated metamers. Shown are example insets from the original (top), 4-orientation/non-end-stopped metamer (middle), and metamer generated with 6-orientations and endstopped statistics (bottom). These two additions greatly improve the ability of the model to reproduce high-contrast, continuous, curved edges. Note that while these statistics improve the leafy section of the image, they have little effect on the yellow pepper in the top left. Image from [Stommel 2016].

Fig. 9. Our set of 400 textures spans a range of texture material quality descriptor categories. A single exemplar texture for each of the 20 texture material categories is shown. Contrast enhanced for better viewing at a small scale.

Fig. 10. The texture features based on [Tamura et al. 1978] match perceptual features in our rendered textures. Shown are the lowest, median, and highest scoring textures for each feature descriptor. Contrast enhanced for better viewing at a small scale.

Efficient Dataflow Modeling of Peripheral Encoding in the Human Visual System

July 2021

1,192 Reads

[...]

Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer's visual field, especially with the rise of high resolution and wide field-of-view display devices. However, while the ongoing study of foveal vision is advanced, much less is known about how humans process imagery in the periphery of their vision -- which comprises, at any given moment, the vast majority of the pixels in the image. We advance computational models for peripheral vision aimed toward their eventual use in computer graphics. In particular, we present a dataflow computational model of peripheral encoding that is more efficient than prior pooling - based methods and more compact than contrast sensitivity-based methods. Further, we account for the explicit encoding of "end stopped" features in the image, which was missing from previous methods. Finally, we evaluate our model in the context of perception of textures in the periphery. Our improved peripheral encoding may simplify development and testing of more sophisticated, complete models in more robust and realistic settings relevant to computer graphics.

Download

Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors

November 2020

224 Reads

19 Citations

[...]

Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67∘ at 400 Hz while demanding only 16 mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57∘ at 250 Hz using 800 mW.

Download

Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors

September 2020

307 Reads

[...]

Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67{\deg} at 400Hz while demanding only 16mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57{\deg} at 250 Hz using 800 mW.

Download

Blur perception study setup and a sampled stimulus (see Visualization 1).

Blur study results. We plot the blur detection/discrimination thresholds as a function of eccentricity and pedestal/baseline blur ( $-2, -1, 0, 1, 2{D}$ − 2 , − 1 , 0 , 1 , 2 D ) for four subjects. All thresholds are computed as differences in diopters, i.e., $|D_a - D_b|$ | D a − D b | for test case a and control (pedestal/baseline) case b. For example, a 0.1D of threshold at 2.0D baseline means the subject can perceive the blur caused by 2.1D target with the eye focusing at 2.0D. X-axis represents retinal eccentricity in degree. Y-axis represents measured thresholds in diopter. Each vertical bar indicates the $75\%$ 75 % performance level centered at a $95\%$ 95 % confidence interval. See Data File 1 for underlying values.

Depth perception study design. (a) shows simulated retinal images via DLSR camera photography [8]. The focus depth changed from far (left) to near (right). (b) shows the study setup. The bottom inset is a simulated retinal image of the stimuli. The green object is the fixation; the other two are the test targets (see Visualization 2).

The result of depth detection thresholds (Y) against eccentricity (X). See Data File 2 for underlying values.

Eccentricity effects on blur and depth perception

February 2020

34 Reads

13 Citations

Optics Express

[...]

Foveation and (de)focus are two important visual factors in designing near eye displays. Foveation can reduce computational load by lowering display details towards the visual periphery, while focal cues can reduce vergence-accommodation conflict thereby lessening visual discomfort in using near eye displays. We performed two psychophysical experiments to investigate the relationship between foveation and focus cues. The first study measured blur discrimination sensitivity as a function of visual eccentricity, where we found discrimination thresholds significantly lower than previously reported. The second study measured depth discrimination threshold where we found a clear dependency on visual eccentricity. We discuss the study results and suggest further investigation.

Download

Toward Standardized Classification of Foveated Displays

Article

February 2020

41 Reads

24 Citations

IEEE Transactions on Visualization and Computer Graphics

[...]

Emergent in the field of head mounted display design is a desire to leverage the limitations of the human visual system to reduce the computation, communication, and display workload in power and form-factor constrained systems. Fundamental to this reduced workload is the ability to match display resolution to the acuity of the human visual system, along with a resulting need to follow the gaze of the eye as it moves, a process referred to as foveation. A display that moves its content along with the eye may be called a Foveated Display, though this term is also commonly used to describe displays with non-uniform resolution that attempt to mimic human visual acuity. We therefore recommend a definition for the term Foveated Display that accepts both of these interpretations. Furthermore, we include a simplified model for human visual Acuity Distribution Functions (ADFs) at various levels of visual acuity, across wide fields of view and propose comparison of this ADF with the Resolution Distribution Function of a foveated display for evaluation of its resolution at a particular gaze direction. We also provide a taxonomy to allow the field to meaningfully compare and contrast various aspects of foveated displays in a display and optical technology-agnostic manner.

Latency of 30 ms Benefits First Person Targeting Tasks More Than Refresh Rate Above 60 Hz

Conference Paper

November 2019

128 Reads

27 Citations

[...]

In competitive sports, human performance makes the difference between who wins and loses. In some competitive video games (esports), response time is an essential factor of human performance. When the athlete’s equipment (computer, input and output device) responds with lower latency, it provides a measurable advantage. In this study, we isolate latency and refresh rate by artificially increasing latency when operating at high refresh rates. Eight skilled esports athletes then perform gaming-inspired first person targeting tasks under varying conditions of refresh rate and latency, completing the tasks as quickly as possible. We show that reduced latency has a clear benefit in task completion time while increased refresh rate has relatively minor effects on performance when the inherent latency reduction present at high refresh rates is removed. Additionally, for certain tracking tasks, there is a small, but marginally significant effect from high refresh rates alone.

... One challenge of pyramid-based models of peripheral vision is in determining which statistics are calculated in each pooling region. Although most pyramid-based texture models used to study peripheral vision have been validated through human behavioral studies, they still utilize statistic sets that are historically driven, vary study-to-study from previous literature, and are consistently insufficient to capture the wide variety of possible textures Brown et al. (2021). ...
Reference:
Evaluating Pyramid-Based Image Statistics Using Contrastive Learning

Efficient Dataflow Modeling of Peripheral Encoding in the Human Visual System

Citing Article
Full-text available
November 2022

ACM Transactions on Applied Perception

[...]

... Although few existing video conferencing solutions rely on it (e.g., D'Angelo & Begel, 2017), gaze tracking may play an important role in maintaining gaze awareness in the future. Fortunately, gaze tracking technology is already quite effective and quickly becoming more so: recent systems have achieved a refresh rate of 10,000 Hz using less than 12 Mbits of bandwidth (Angelopoulos et al., 2021), or even power draws as low as 16 mW that are still accurate to within 2.67°w hile maintaining 400-Hz refresh rates (Li et al., 2020). Power and refresh rate concerns are especially important for XR headsets, in which power and latency can hinder not only eye-tracking effectiveness, but general comfort. ...
Reference:
The Shortcomings of Video Conferencing Technology, Methods for Revealing Them, and Emerging XR Solutions

Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors

Citing Conference Paper
Full-text available
November 2020

[...]

... However, the depthof-field is severely limited by the gap between liquid crystal layers. Light field display technology based on micro-lens array [18] is considered to be one of the best solutions for commercial autostereoscopic display due to its ability to provide high angular resolution and a large depth-of-field range without vergence-accommodation conflict (VAc) [19,20] . However, the main challenges associated with this technology are narrow field-of-view and spatial resolution loss. ...
Reference:
Viewing Zone Expansion of Autostereoscopic Display With Composite Lenticular Lens Array and Saddle Lens Array

Eccentricity effects on blur and depth perception

Citing Article
Full-text available
February 2020

Optics Express

[...]

... A relatively untapped avenue is the operability of these systems with multiple users on non-single-view displays. This future line is especially relevant as displays tend to grow in size, together with light field displays that enable watching a scenario from different perspectives (Spjut et al. 2020). Hence, narrowing down the number of perspectives to be rendered and discarding those not directed toward any viewer may help in reducing computations. ...
Reference:
Eye-tracking on virtual reality: a survey

Toward Standardized Classification of Foveated Displays

Citing Article
February 2020

IEEE Transactions on Visualization and Computer Graphics

[...]

... The interference from displays of some electrical devices (such as TV, monitor, smartphone, etc.) is the most likely to affect EM Eye since the EM emission pattern of these displays is similar to that of the camera. However, modern displays offer refresh rates of 60, 120, or even 240 fps [36], whereas embedded cameras' frame rates are often limited to 30 fps. Therefore, adversaries can distinguish camera emissions from the display's interference by setting the center frequency at those frequencies with no repetitions above 30 Hz to minimize the interference. ...
Reference:
EM Eye: Characterizing Electromagnetic Side-channel Eavesdropping on Embedded Cameras

Latency of 30 ms Benefits First Person Targeting Tasks More Than Refresh Rate Above 60 Hz

Citing Conference Paper
November 2019

[...]

... Accordingly, Attig et al. [2] concluded that previous guidelines recommending a maximum latency of 100 ms are outdated. To gain an advantage and reduce end-to-end latency, the industry offers hardware peripherals with lower latency or higher refresh rate on monitors [22,24]. ...
Reference:
Effects of Text Input Latency on Performance and Task Load

Esports Arms Race: Latency and Refresh Rate for Competitive Gaming Tasks

Citing Article
September 2019

Journal of Vision

[...]

... However, it is still unclear how degradation of the peripheral image impacts attentional behavior and task performance in VR. Previous studies have shown that non-immersive gaze-contingent displays affect task performance (e.g., reading speed) negatively (Albert et al. 2019); therefore, further research is needed to understand the effect of GCDs on task performance in VR. Moreover, most of the eye tracking devices integrated in VR HMDs have high latency, lack precision and do not have simple calibration procedures. ...
Reference:
Eye Tracking in Virtual Reality: a Broad Review of Applications and Challenges

Reading Speed Decreases for Fast Readers Under Gaze-Contingent Rendering

Citing Conference Paper
September 2019

Rachel A. Albert

Angelica Godinez

David Luebke

... The majority of hardware-based methods for prescription correction [5][6][7] could result in VR/AR headsets that are bulkier and more expensive, requiring the upgrading of components with new devices. On the other hand, algorithmic approaches to prescription correction enable tackling the prescription issue without the need for specialized components and with the benefit of software updates [8]. ...
Reference:
ChromaCorrect: prescription correction in virtual reality headsets through perceptual guidance

Matching prescription & visual acuity: towards AR for humans

Citing Conference Paper
July 2019

[...]

... Several popular VR devices, such as Meta Quest Pro, Pico 4 Pro, and Apple Vision Pro, have already incorporated eye-tracking functionality. The integration of eye-tracking in AR/VR devices helps improve graphic computation, e.g., foveated rendering [1], [2], [3] and varifocal display [4]. Eye-tracking also helps to enhance the interactive experience in AR/VR [5], [6]. ...
Reference:
Measuring eye-tracking accuracy and its impact on usability in Apple Vision Pro

Foveated AR: dynamically-foveated augmented reality display

Citing Article
Full-text available
July 2019

ACM Transactions on Graphics

[...]

... Gaze estimation is a crucial task in computer vision that aims to accurately determine the direction of a person's gaze based on visual cues. In recent years, gaze estimation has gained significant attention due to its wide-ranging applications in fields such as human-computer interaction (Majaranta and Bulling 2014) (Rahal and Fiedler 2019), virtual reality (Patney et al. 2016) (Kim et al. 2019), and assistive technology (Jiang and Zhao 2017) (Liu, Li, and Yi 2016) (Dias et al. 2020). Benefiting from the deep learning techniques and large-scale training data, appearancebased gaze estimation has made rapid progress and achieved promising results. ...
Reference:
Suppressing Uncertainty in Gaze Estimation

NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation

Citing Conference Paper
May 2019

[...]

David Luebke's research while affiliated with NVIDIA and other places

What is this page?

Publications (92)

Citations (70)