The quantization of convolution architecture.

The quantization of convolution architecture.

Source publication
Preprint
Full-text available
Mixed-precision quantization mostly predetermines the model bit-width settings before actual training due to the non-differential bit-width sampling process, obtaining sub-optimal performance. Worse still, the conventional static quality-consistent training setting, i.e., all data is assumed to be of the same quality across training and inference,...

Context in source publication

Context 1
... For a CNN with L convolution layers, we define Θ as the learnable parameters set, and ω l ∈ Θ as the vanilla full precision weight parameters of layer l. A typical quantization-aware training CNN structure can be described as an intertwined pipeline including quantization → convolution → dequantization. As shown in Fig. 2, the left tangle shows the process of the quantization of the convolutional input, wherein the ω l and the x l are the full precision model weight and activations respectively. s is the minimum scale that can be represented after fixed-point quantization, and z represents the quantized fixed-point value corresponding to the ...