Two visual examples of the base frame at two different rates (top row: 0.01 bpp, bottom row: 0.10 bpp). The original frame xt (left), base framê x b t

Two visual examples of the base frame at two different rates (top row: 0.01 bpp, bottom row: 0.10 bpp). The original frame xt (left), base framê x b t

Source publication
Preprint
Full-text available
Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machin...

Contexts in source publication

Context 1
... base ptq is the rate of the motion bitstream produced by coding v b t conditioned on ¯ v b t plus the rate associated for its hyper prior, and R signal base ptq is the rate for coding x t conditioned on ¯ x b t plus the rate associated for its hyper prior. Two visual examples of the decoded base frame (ˆ x b t ) in the trained system are shown in Fig. 4 for two different rates, along with the original frames (x t ) and the residual (x t ´ ˆ x b t ). As is evident from the residual frames in this figure, the base frame omits certain details of the original frame, which are deemed unnecessary for the object detection task, in order to curtail the rate. Consequently, the base layer can ...
Context 2
... that the base layer of the proposed system generates video frames and stores them in the base frame buffer, as shown in Fig. 2. Such frames are intended for object detection rather than human viewing; they preserve enough information to detect objects at low rates, but lack the details that are deemed irrelevant for this task, as illustrated in Fig. 4. Nevertheless, we could ask what their quality is in terms of frame reconstruction metrics. In Figs. 12-13, we refer to this approach as "Proposed Base" and in Tables II-III as simply "Base." In this case, only the base-layer bitrate is counted, since base frames are generated from the base-layer bitstream ...