Example Parse Tree with Inferred Locations of Parts: Hierarchy of (a) sampled parts and (b) sampled locations, inducing a hierarchy of reference frames.

Example Parse Tree with Inferred Locations of Parts: Hierarchy of (a) sampled parts and (b) sampled locations, inducing a hierarchy of reference frames.

Source publication
Preprint
Full-text available
We introduce Active Predictive Coding Networks (APCNs), a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling: how can neural networks learn intrinsic reference frames for objects and parse visual scenes into part-whole hierarchies by dynamically allocating n...

Context in source publication

Context 1
... learned part-whole hierarchy for an MNIST input, in the form of a parse tree of parts and sub-parts with locations, is shown in Figure 6. The learned strategies sample a wide variety of parts and sub-parts of the object (strokes and mini-strokes). ...

Citations

... We introduce recursive neural programs (RNPs), which address this problem by creating a fully differentiable recursive tree representation of sensory-motor programs. Our model builds on past work on Active Predictive Coding Networks [3] in using state and action networks but is fully generative, recursive, and probabilistic, allowing a structured variational approach to inference and sampling of neural programs. The key differences between our approach and existing approaches are: 1) Our approach can be extended to arbitrary tree depth, creating a "grammar" for images that can be recursively applied 2) our approach provides a sensible way to perform gradient descent in hierarchical "program space," and 3) our model can be made adaptive by letting information flow from children to parents in the tree, e.g., via prediction errors [11,3]. ...
... Our model builds on past work on Active Predictive Coding Networks [3] in using state and action networks but is fully generative, recursive, and probabilistic, allowing a structured variational approach to inference and sampling of neural programs. The key differences between our approach and existing approaches are: 1) Our approach can be extended to arbitrary tree depth, creating a "grammar" for images that can be recursively applied 2) our approach provides a sensible way to perform gradient descent in hierarchical "program space," and 3) our model can be made adaptive by letting information flow from children to parents in the tree, e.g., via prediction errors [11,3]. ...
... The above model can be made recursive, with generation performed in a depth-first manner: each z k t generates the program for a sequence {z k−1 1 , ..., z k−1 τ k−1 }. z k t+1 begins after z k t terminates. Here we use the decoded patches {x k−1 1 , ...,x k−1 τ k−1 } as accumulated evidence to update z k t (similar to other predictive coding models [11,3]). ...
Preprint
Full-text available
Human vision involves parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using capsule networks, reference frames and active predictive coding, but a generative model formulation has been lacking. We introduce Recursive Neural Programs (RNPs), which, to our knowledge, is the first neural generative model to address the part-whole hierarchy learning problem. RNPs model images as hierarchical trees of probabilistic sensory-motor programs that recursively reuse learned sensory-motor primitives to model an image within different reference frames, forming recursive image grammars. We express RNPs as structured variational autoencoders (sVAEs) for inference and sampling, and demonstrate parts-based parsing, sampling and one-shot transfer learning for MNIST, Omniglot and Fashion-MNIST datasets, demonstrating the model's expressive power. Our results show that RNPs provide an intuitive and explainable way of composing objects and scenes, allowing rich compositionality and intuitive interpretations of objects in terms of part-whole hierarchies.