Example Parse Tree with Inferred Locations of Parts: Hierarchy of (a) sampled parts and (b) sampled locations, inducing a hierarchy of reference frames.

Source publication

Figure 1: Active Predictive Coding Networks (APCNs): (a) Canonical...

Figure 2: Example of a Parse Tree for a Human Body: Black box: Frame of...

Figure 3: Reference Frames for Images: (a) The top level of the model...

Figure 4: Inference in Active Predictive Coding Networks: (a) Dynamic...

Figure 6: Example Parse Tree with Inferred Locations of Parts:...

Active Predictive Coding Networks: A Neural Solution to the Problem of Learning Reference Frames and Part-Whole Hierarchies

Preprint

Full-text available

Jan 2022

We introduce Active Predictive Coding Networks (APCNs), a new class of neural networks that solve a major problem posed by Hinton and others in the fields of artificial intelligence and brain modeling: how can neural networks learn intrinsic reference frames for objects and parse visual scenes into part-whole hierarchies by dynamically allocating n...

Context 1

... learned part-whole hierarchy for an MNIST input, in the form of a parse tree of parts and sub-parts with locations, is shown in Figure 6. The learned strategies sample a wide variety of parts and sub-parts of the object (strokes and mini-strokes). ...

View in full-text

Recursive Neural Programs: Variational Learning of Image Grammars and Part-Whole Hierarchies

Preprint

Full-text available

Jun 2022

Human vision involves parsing and representing objects and scenes using structured representations based on part-whole hierarchies. Computer vision and machine learning researchers have recently sought to emulate this capability using capsule networks, reference frames and active predictive coding, but a generative model formulation has been lacking. We introduce Recursive Neural Programs (RNPs), which, to our knowledge, is the first neural generative model to address the part-whole hierarchy learning problem. RNPs model images as hierarchical trees of probabilistic sensory-motor programs that recursively reuse learned sensory-motor primitives to model an image within different reference frames, forming recursive image grammars. We express RNPs as structured variational autoencoders (sVAEs) for inference and sampling, and demonstrate parts-based parsing, sampling and one-shot transfer learning for MNIST, Omniglot and Fashion-MNIST datasets, demonstrating the model's expressive power. Our results show that RNPs provide an intuitive and explainable way of composing objects and scenes, allowing rich compositionality and intuitive interpretations of objects in terms of part-whole hierarchies.

Example Parse Tree with Inferred Locations of Parts: Hierarchy of (a) sampled parts and (b) sampled locations, inducing a hierarchy of reference frames.

Context in source publication

Citations