David Luebke's research while affiliated with NVIDIA and other places

What is this page?


This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.

It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.

If you're a ResearchGate member, you can follow this page to keep up with this author's work.

If you are this author, and you don't want us to display this page anymore, please let us know.

Publications (92)


AI-Mediated 3D Video Conferencing
  • Conference Paper

July 2023

·

8 Reads

·

3 Citations

·

Koki Nagano

·

Chao Liu

·

[...]

·

David Luebke
Share

Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
  • Preprint
  • File available

May 2023

·

35 Reads

Modern generators render talking-head videos with impressive levels of photorealism, ushering in new user experiences such as videoconferencing under constrained bandwidth budgets. Their safe adoption, however, requires a mechanism to verify if the rendered video is trustworthy. For instance, for videoconferencing we must identify cases in which a synthetic video portrait uses the appearance of an individual without their consent. We term this task avatar fingerprinting. We propose to tackle it by leveraging facial motion signatures unique to each person. Specifically, we learn an embedding in which the motion signatures of one identity are grouped together, and pushed away from those of other identities, regardless of the appearance in the synthetic video. Avatar fingerprinting algorithms will be critical as talking head generators become more ubiquitous, and yet no large scale datasets exist for this new task. Therefore, we contribute a large dataset of people delivering scripted and improvised short monologues, accompanied by synthetic videos in which we render videos of one person using the facial appearance of another. Project page: https://research.nvidia.com/labs/nxp/avatar-fingerprinting/.

Download


Fig. 6. Visualization of a partial lattice of pooling regions overlaid on an image, with a viewing position centered on the middle penguin. As one moves from the center of fixation into peripheral vision, pooling regions become progressively larger, integrating information over larger regions; at the edges of the image multiple objects fit within a given pooling region. For visualization clarity, the pooling regions shown are sparser than those used to create our metamers, and the central modeled foveal region has been shown without pooling.
Fig. 7. A metamer of the image used in Figure 6 generated with our peripheral encoding model using the same fixation point. When centering gaze at this fixation, the summary statistics under an ideal model should be the same as in the original within each pooling region, and distortions at image edges would be undetectable in peripheral view. This is despite the extremely distorted appearance of the image away from the modeled fixation when viewing those regions foveally.
Fig. 10. Including two additional filter orientations and end-stopping statistics noticeably improves the quality of the generated model metamers. Shown are insets from the original (left), 4-orientation/non-end-stopped model metamer (middle), and model metamer generated with 6-orientations and end-stopped statistics (right). These two additions greatly improve the ability of the model to reproduce high-contrast, continuous, curved edges. Note that while these statistics improve the leafy section of the image, they have little effect on the yellow pepper in the bottom left. Original image from Reference [51].
Fig. 11. An exemplar texture for each of the 20 material categories in our dataset. Our set of 400 textures spans a range of material descriptors. Contrast enhanced for better viewing at a small scale.
Fig. 14. (left) Selected examples of original and synthesized textures that succeeded, i.e., with participants at chance for discriminating original from model metamers, and (right) examples of textures that failed, with participants above chance at distinguishing synthesis from original. Some of the successful metamers, such as the clouds and pink granite, appear as near-metamers even foveally, while others such as the blueberries are metamers only in the periphery. Shown are 512 × 512 sub-images taken at half maximum eccentricity. Best viewed electronically at full resolution.

+2

Efficient Dataflow Modeling of Peripheral Encoding in the Human Visual System

November 2022

·

41 Reads

·

6 Citations

ACM Transactions on Applied Perception

Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer’s visual field, of particular importance with the rise of high-resolution and wide field-of-view display devices. However, while variations in acuity and contrast sensitivity across the field of view have been well-studied and modeled, a more consequential variation concerns peripheral vision’s degradation in the face of clutter, known as crowding. Understanding of peripheral crowding has greatly advanced in recent years, in terms of both phenomenology and modeling. Accurately leveraging this knowledge is critical for many applications, as peripheral vision covers a majority of pixels in the image. We advance computational models for peripheral vision aimed toward their eventual use in computer graphics. In particular, researchers have recently developed high-performing models of peripheral crowding, known as “pooling” models, which predict a wide range of phenomena, but are computationally inefficient. We reformulate the problem as a dataflow computation which enables faster processing and operating on larger images. Further, we account for the explicit encoding of “end stopped” features in the image, which was missing from previous methods. We evaluate our model in the context of perception of textures in the periphery, including a novel texture data set and updated textural descriptors. Our improved computational framework may simplify development and testing of more sophisticated, complete models in more robust and realistic settings relevant to computer graphics.


Efficient Dataflow Modeling of Peripheral Encoding in the Human Visual System

July 2021

·

1,192 Reads

Computer graphics seeks to deliver compelling images, generated within a computing budget, targeted at a specific display device, and ultimately viewed by an individual user. The foveated nature of human vision offers an opportunity to efficiently allocate computation and compression to appropriate areas of the viewer's visual field, especially with the rise of high resolution and wide field-of-view display devices. However, while the ongoing study of foveal vision is advanced, much less is known about how humans process imagery in the periphery of their vision -- which comprises, at any given moment, the vast majority of the pixels in the image. We advance computational models for peripheral vision aimed toward their eventual use in computer graphics. In particular, we present a dataflow computational model of peripheral encoding that is more efficient than prior pooling - based methods and more compact than contrast sensitivity-based methods. Further, we account for the explicit encoding of "end stopped" features in the image, which was missing from previous methods. Finally, we evaluate our model in the context of perception of textures in the periphery. Our improved peripheral encoding may simplify development and testing of more sophisticated, complete models in more robust and realistic settings relevant to computer graphics.


Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors

November 2020

·

224 Reads

·

19 Citations

Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67∘ at 400 Hz while demanding only 16 mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57∘ at 250 Hz using 800 mW.


Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors

September 2020

·

307 Reads

Gaze tracking is an essential component of next generation displays for virtual reality and augmented reality applications. Traditional camera-based gaze trackers used in next generation displays are known to be lacking in one or multiple of the following metrics: power consumption, cost, computational complexity, estimation accuracy, latency, and form-factor. We propose the use of discrete photodiodes and light-emitting diodes (LEDs) as an alternative to traditional camera-based gaze tracking approaches while taking all of these metrics into consideration. We begin by developing a rendering-based simulation framework for understanding the relationship between light sources and a virtual model eyeball. Findings from this framework are used for the placement of LEDs and photodiodes. Our first prototype uses a neural network to obtain an average error rate of 2.67{\deg} at 400Hz while demanding only 16mW. By simplifying the implementation to using only LEDs, duplexed as light transceivers, and more minimal machine learning model, namely a light-weight supervised Gaussian process regression algorithm, we show that our second prototype is capable of an average error rate of 1.57{\deg} at 250 Hz using 800 mW.


Blur perception study setup and a sampled stimulus (see Visualization 1).
Blur study results. We plot the blur detection/discrimination thresholds as a function of eccentricity and pedestal/baseline blur ( $-2, -1, 0, 1, 2{D}$ − 2 , − 1 , 0 , 1 , 2 D ) for four subjects. All thresholds are computed as differences in diopters, i.e., $|D_a - D_b|$ | D a − D b | for test case a and control (pedestal/baseline) case b. For example, a 0.1D of threshold at 2.0D baseline means the subject can perceive the blur caused by 2.1D target with the eye focusing at 2.0D. X-axis represents retinal eccentricity in degree. Y-axis represents measured thresholds in diopter. Each vertical bar indicates the $75\%$ 75 % performance level centered at a $95\%$ 95 % confidence interval. See Data File 1 for underlying values.
Depth perception study design. (a) shows simulated retinal images via DLSR camera photography [8]. The focus depth changed from far (left) to near (right). (b) shows the study setup. The bottom inset is a simulated retinal image of the stimuli. The green object is the fixation; the other two are the test targets (see Visualization 2).
The result of depth detection thresholds (Y) against eccentricity (X). See Data File 2 for underlying values.
Eccentricity effects on blur and depth perception

February 2020

·

34 Reads

·

13 Citations

Optics Express

Optics Express

Foveation and (de)focus are two important visual factors in designing near eye displays. Foveation can reduce computational load by lowering display details towards the visual periphery, while focal cues can reduce vergence-accommodation conflict thereby lessening visual discomfort in using near eye displays. We performed two psychophysical experiments to investigate the relationship between foveation and focus cues. The first study measured blur discrimination sensitivity as a function of visual eccentricity, where we found discrimination thresholds significantly lower than previously reported. The second study measured depth discrimination threshold where we found a clear dependency on visual eccentricity. We discuss the study results and suggest further investigation.


Toward Standardized Classification of Foveated Displays

February 2020

·

41 Reads

·

24 Citations

IEEE Transactions on Visualization and Computer Graphics

Emergent in the field of head mounted display design is a desire to leverage the limitations of the human visual system to reduce the computation, communication, and display workload in power and form-factor constrained systems. Fundamental to this reduced workload is the ability to match display resolution to the acuity of the human visual system, along with a resulting need to follow the gaze of the eye as it moves, a process referred to as foveation. A display that moves its content along with the eye may be called a Foveated Display, though this term is also commonly used to describe displays with non-uniform resolution that attempt to mimic human visual acuity. We therefore recommend a definition for the term Foveated Display that accepts both of these interpretations. Furthermore, we include a simplified model for human visual Acuity Distribution Functions (ADFs) at various levels of visual acuity, across wide fields of view and propose comparison of this ADF with the Resolution Distribution Function of a foveated display for evaluation of its resolution at a particular gaze direction. We also provide a taxonomy to allow the field to meaningfully compare and contrast various aspects of foveated displays in a display and optical technology-agnostic manner.


Latency of 30 ms Benefits First Person Targeting Tasks More Than Refresh Rate Above 60 Hz

November 2019

·

128 Reads

·

27 Citations

In competitive sports, human performance makes the difference between who wins and loses. In some competitive video games (esports), response time is an essential factor of human performance. When the athlete’s equipment (computer, input and output device) responds with lower latency, it provides a measurable advantage. In this study, we isolate latency and refresh rate by artificially increasing latency when operating at high refresh rates. Eight skilled esports athletes then perform gaming-inspired first person targeting tasks under varying conditions of refresh rate and latency, completing the tasks as quickly as possible. We show that reduced latency has a clear benefit in task completion time while increased refresh rate has relatively minor effects on performance when the inherent latency reduction present at high refresh rates is removed. Additionally, for certain tracking tasks, there is a small, but marginally significant effect from high refresh rates alone.


Citations (70)


... One challenge of pyramid-based models of peripheral vision is in determining which statistics are calculated in each pooling region. Although most pyramid-based texture models used to study peripheral vision have been validated through human behavioral studies, they still utilize statistic sets that are historically driven, vary study-to-study from previous literature, and are consistently insufficient to capture the wide variety of possible textures Brown et al. (2021). ...

Reference:

Evaluating Pyramid-Based Image Statistics Using Contrastive Learning
Efficient Dataflow Modeling of Peripheral Encoding in the Human Visual System

ACM Transactions on Applied Perception

... Although few existing video conferencing solutions rely on it (e.g., D'Angelo & Begel, 2017), gaze tracking may play an important role in maintaining gaze awareness in the future. Fortunately, gaze tracking technology is already quite effective and quickly becoming more so: recent systems have achieved a refresh rate of 10,000 Hz using less than 12 Mbits of bandwidth (Angelopoulos et al., 2021), or even power draws as low as 16 mW that are still accurate to within 2.67°w hile maintaining 400-Hz refresh rates (Li et al., 2020). Power and refresh rate concerns are especially important for XR headsets, in which power and latency can hinder not only eye-tracking effectiveness, but general comfort. ...

Optical Gaze Tracking with Spatially-Sparse Single-Pixel Detectors

... However, the depthof-field is severely limited by the gap between liquid crystal layers. Light field display technology based on micro-lens array [18] is considered to be one of the best solutions for commercial autostereoscopic display due to its ability to provide high angular resolution and a large depth-of-field range without vergence-accommodation conflict (VAc) [19,20] . However, the main challenges associated with this technology are narrow field-of-view and spatial resolution loss. ...

Eccentricity effects on blur and depth perception
Optics Express

Optics Express

... A relatively untapped avenue is the operability of these systems with multiple users on non-single-view displays. This future line is especially relevant as displays tend to grow in size, together with light field displays that enable watching a scenario from different perspectives (Spjut et al. 2020). Hence, narrowing down the number of perspectives to be rendered and discarding those not directed toward any viewer may help in reducing computations. ...

Toward Standardized Classification of Foveated Displays
  • Citing Article
  • February 2020

IEEE Transactions on Visualization and Computer Graphics

... The interference from displays of some electrical devices (such as TV, monitor, smartphone, etc.) is the most likely to affect EM Eye since the EM emission pattern of these displays is similar to that of the camera. However, modern displays offer refresh rates of 60, 120, or even 240 fps [36], whereas embedded cameras' frame rates are often limited to 30 fps. Therefore, adversaries can distinguish camera emissions from the display's interference by setting the center frequency at those frequencies with no repetitions above 30 Hz to minimize the interference. ...

Latency of 30 ms Benefits First Person Targeting Tasks More Than Refresh Rate Above 60 Hz
  • Citing Conference Paper
  • November 2019

... Accordingly, Attig et al. [2] concluded that previous guidelines recommending a maximum latency of 100 ms are outdated. To gain an advantage and reduce end-to-end latency, the industry offers hardware peripherals with lower latency or higher refresh rate on monitors [22,24]. ...

Esports Arms Race: Latency and Refresh Rate for Competitive Gaming Tasks
  • Citing Article
  • September 2019

Journal of Vision

... However, it is still unclear how degradation of the peripheral image impacts attentional behavior and task performance in VR. Previous studies have shown that non-immersive gaze-contingent displays affect task performance (e.g., reading speed) negatively (Albert et al. 2019); therefore, further research is needed to understand the effect of GCDs on task performance in VR. Moreover, most of the eye tracking devices integrated in VR HMDs have high latency, lack precision and do not have simple calibration procedures. ...

Reading Speed Decreases for Fast Readers Under Gaze-Contingent Rendering
  • Citing Conference Paper
  • September 2019

... The majority of hardware-based methods for prescription correction [5][6][7] could result in VR/AR headsets that are bulkier and more expensive, requiring the upgrading of components with new devices. On the other hand, algorithmic approaches to prescription correction enable tackling the prescription issue without the need for specialized components and with the benefit of software updates [8]. ...

Matching prescription & visual acuity: towards AR for humans
  • Citing Conference Paper
  • July 2019

... Several popular VR devices, such as Meta Quest Pro, Pico 4 Pro, and Apple Vision Pro, have already incorporated eye-tracking functionality. The integration of eye-tracking in AR/VR devices helps improve graphic computation, e.g., foveated rendering [1], [2], [3] and varifocal display [4]. Eye-tracking also helps to enhance the interactive experience in AR/VR [5], [6]. ...

Foveated AR: dynamically-foveated augmented reality display

ACM Transactions on Graphics

... Gaze estimation is a crucial task in computer vision that aims to accurately determine the direction of a person's gaze based on visual cues. In recent years, gaze estimation has gained significant attention due to its wide-ranging applications in fields such as human-computer interaction (Majaranta and Bulling 2014) (Rahal and Fiedler 2019), virtual reality (Patney et al. 2016) (Kim et al. 2019), and assistive technology (Jiang and Zhao 2017) (Liu, Li, and Yi 2016) (Dias et al. 2020). Benefiting from the deep learning techniques and large-scale training data, appearancebased gaze estimation has made rapid progress and achieved promising results. ...

NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation
  • Citing Conference Paper
  • May 2019