Katherine R. Storrs's research while affiliated with University of Auckland and other places

Publications (43)

Article
Full-text available
Estimating perceptual attributes of materials directly from images is a challenging task due to their complex, not fully‐understood interactions with external factors, such as geometry and lighting. Supervised deep learning models have recently been shown to outperform traditional approaches, but rely on large datasets of human‐annotated images for...
Article
Visual shape perception is central to many everyday tasks, from object recognition to grasping and handling tools.¹,²,³,⁴,⁵,⁶,⁷,⁸,⁹,¹⁰ Yet how shape is encoded in the visual system remains poorly understood. Here, we probed shape representations using visual aftereffects—perceptual distortions that occur following extended exposure to a stimulus.¹¹...
Article
Everyone agrees that testing hypotheses is important, but Bowers et al. provide scant details about where hypotheses about perception and brain function should come from. We suggest that the answer lies in considering how information about the outside world could be acquired – that is, learned – over the course of evolution and development. Deep ne...
Preprint
Although ‘glossiness’ is an optical property of materials, while ‘softness’ is a mechanical property, there is an intriguing perceptual connection between the two as both specular reflections and shape deformations produce distinctive motion patterns. Observers are generally excellent at determining properties of moving surfaces. However, under cer...
Article
Full-text available
When we look at an object, we simultaneously see how glossy or matte it is, how light or dark, and what color. Yet, at each point on the object's surface, both diffuse and specular reflections are mixed in different proportions, resulting in substantial spatial chromatic and luminance variations. To further complicate matters, this pattern changes...
Article
Why do we perceive illusory motion in some static images? Several accounts point to eye movements, response latencies to different image elements, or interactions between image patterns and motion energy detectors. Recently PredNet, a recurrent deep neural network (DNN) based on predictive coding principles, was reported to reproduce the "Rotating...
Preprint
Why do we perceive illusory motion in some static images? Several accounts have been proposed based on eye movements, response latencies to different image elements, or interactions between image patterns and motion energy detectors. Recently, PredNet, a recurrent deep neural network (DNN) based on predictive coding principles was reported to repro...
Preprint
Full-text available
Shape perception is essential for numerous everyday behaviors from object recognition to grasping and handling objects. Yet how the brain encodes shape remains poorly understood. Here, we probed shape representations using visual aftereffects (perceptual distortions that occur following extended exposure to a stimulus) to resolve a long-standing de...
Preprint
Full-text available
When we look at an object, we simultaneously see how glossy or matte it is, how light or dark, and what color. Yet, at each point on the object's surface, both diffuse and specular reflections are mixed in different proportions, resulting in substantial spatial chromatic and luminance variations. To further complicate matters, this pattern changes...
Article
Full-text available
The discovery of mental rotation was one of the most significant landmarks in experimental psychology, leading to the ongoing assumption that to visually compare objects from different three-dimensional viewpoints, we use explicit internal simulations of object rotations, to ‘mentally adjust’ one object until it matches the other. These rotations a...
Article
Full-text available
Human vision is attuned to the subtle differences between individual faces. Yet we lack a quantitative way of predicting how similar two face images look and whether they appear to show the same person. Principal component–based three-dimensional (3D) morphable models are widely used to generate stimuli in face perception research. These models cap...
Article
Full-text available
Reflectance, lighting and geometry combine in complex ways to create images. How do we disentangle these to perceive individual properties, such as surface glossiness? We suggest that brains disentangle properties by learning to model statistical structure in proximal images. To test this hypothesis, we trained unsupervised generative neural networ...
Article
Full-text available
Common everyday materials such as textiles, foodstuffs, soil or skin can have complex, mutable and varied appearances. Under typical viewing conditions, most observers can visually recognize materials effortlessly, and determine many of their properties without touching them. Visual material perception raises many fascinating questions for vision r...
Article
Full-text available
Significance We perceive our environment through multiple independent sources of sensory input. The brain is tasked with deciding whether multiple signals are produced by the same or different events (i.e., solve the problem of causal inference). Here, we train a neural network to solve causal inference by either combining or separating visual and...
Article
Deep neural networks (DNNs) trained on object recognition provide the best current models of high-level visual cortex. What remains unclear is how strongly experimental choices, such as network architecture, training, and fitting to brain data, contribute to the observed similarities. Here, we compare a diverse set of nine DNN architectures on thei...
Preprint
Full-text available
Despite the importance of face perception in human and computer vision, no quantitative model of perceived face dissimilarity exists. We designed an efficient behavioural task to collect dissimilarity and same/different identity judgements for 232 pairs of realistic faces that densely sampled geometric relationships in a face space derived from pri...
Article
One of the deepest insights in neuroscience is that sensory encoding should take advantage of statistical regularities. Humans’ visual experience contains many redundancies: Scenes mostly stay the same from moment to moment, and nearby image locations usually have similar colors. A visual system that knows which regularities shape natural images ca...
Article
Full-text available
Faces of different people elicit distinct fMRI patterns in several face-selective regions of the human brain. Here we used representational similarity analysis to investigate what type of identity-distinguishing information is encoded in three face-selective regions: Fusiform face area (FFA), occipital face area (OFA), and posterior superior tempor...
Preprint
Full-text available
Faces of different people elicit distinct functional MRI (fMRI) patterns in several face-selective brain regions. Here we used representational similarity analysis to investigate what type of identity-distinguishing information is encoded in three face-selective regions: fusiform face area (FFA), occipital face area (OFA), and posterior superior te...
Chapter
The sixth edition of the foundational reference on cognitive neuroscience, with entirely new material that covers the latest research, experimental approaches, and measurement methodologies. Each edition of this classic reference has proved to be a benchmark in the developing field of cognitive neuroscience. The sixth edition of The Cognitive Neuro...
Preprint
Full-text available
Deep neural networks (DNNs) trained on object recognition provide the best current models of high-level visual areas in the brain. What remains unclear is how strongly network design choices, such as architecture, task training, and subsequent fitting to brain data contribute to the observed similarities. Here we compare a diverse set of nine DNN a...
Preprint
Full-text available
Gloss perception is a challenging visual inference that requires disentangling the contributions of reflectance, lighting, and shape to the retinal image. Learning to see gloss must somehow proceed without labelled training data as no other sensory signals can provide the 'ground truth' required for supervised learning. We reasoned that paradoxical...
Preprint
Full-text available
An error was made in including noise ceilings for human data in Khaligh-Razavi and Kriegeskorte (2014). For comparability with the macaque data, human data were averaged across participants before analysis. Therefore the noise ceilings indicating variability across human participants do not accurately depict the upper bounds of possible model perfo...
Article
Full-text available
Materials with complex appearances, like textiles and foodstuffs, pose challenges for conventional theories of vision. But recent advances in unsupervised deep learning provide a framework for explaining how we learn to see them. We suggest that perception does not involve estimating physical quantities like reflectance or lighting. Instead, repres...
Preprint
Neural network models can now recognise images, understand text, translate languages, and play many human games at human or superhuman levels. These systems are highly abstracted, but are inspired by biological brains and use only biologically plausible computations. In the coming years, neural networks are likely to become less reliant on learning...
Article
Full-text available
Recent advances in Deep convolutional Neural Networks (DNNs) have enabled unprecedentedly accurate computational models of brain representations, and present an exciting opportunity to model diverse cognitive functions. State-of-the-art DNNs achieve human-level performance on object categorisation, but it is unclear how well they capture human beha...
Article
Full-text available
Recent advances in Deep convolutional Neural Networks (DNNs) have enabled unprecedentedly accurate computational models of brain representations, and present an exciting opportunity to model diverse cognitive functions. State-of-the-art DNNs achieve human-level performance on object categorisation, but it is unclear how well they capture human beha...
Article
"Grid cells'' encode an animal's location and direction of movement in 2D physical environments via regularly repeating receptive fields. Constantinescu et al. (2016) report the first evidence of grid cells for 2D conceptual spaces. The work has exciting implications for mental representation and shows how detailed neural-coding hypotheses can be t...
Article
Full-text available
The influential “two-streams hypothesis” of visual processing proposes that features related to object recognition are primarily encoded in the visual ventral–temporal stream (the “what” pathway), while spatial relationships among objects are primarily encoded in the dorsal–parietal

Citations

... In PredNet, higher layers do not predict the activity of the representation units of the layer below, but of the error units (Lotter et al., 2016). Moreover, more recent investigations deemed the motion illusion results inconsistent with human-like illusory motion (Kirubeswaran & Storrs, 2023). ...
... In this study, we leveraged the flexibility of DNNs to construct an image-computational model of human motion processing. While recent studies successfully used DNNs to elaborate the understanding of the neural mechanism of visual motion processing [19][20][21][22][23], we aimed to make a model that can explain a broader range of physiological and psychophysical phenomena, including those whose neural mechanisms are not yet clear. From an engineering standpoint, we aimed to make a human-aligned optic flow algorithm while maintaining a flow estimation capability comparable to the state-of-the-art (SOTA) CV models. ...
... Consider, for example, the act of solving a metal puzzle, where one of the rings must be removed from the rest of the tangled metal loops. A representationalist approach, borrowed from traditional cognitive psychology, might assume that a mental representation of the metal puzzle must rst be generated in the brain of the observer and then the cognitive operations -such as mental rotation [36] and other transformations -would all be performed on that mental representation. However, that is not at all how one actually solves a metal puzzle in the real world. ...
... Market-oriented research prompted AI engineers to endorse the "if it works, it works" motto which essentially meant that they did not really care about the neuroscientific details when designing and training DL models as long as these models surpass human performance and/or turn out to be lucrative. Hassabis et al. (2017) were among the first to plea for renewing the vows between AI research and neuroscience, with cognitive neuroscientists follow-ing suit and claiming that DL models are indeed a valuable toolkit for learning about the brain, especially given their predictive power and previous success in modeling primate visual cortex (see Storrs & Kriegeskorte, 2019 for an overview, Yamins & DiCarlo, 2016). Concrete examples of models from recent years bear witness to the beneficial exchange between these fields. ...
... This suggests that setting dw = 70 as 100% of the other's destination face allows for sufficient observation of the boundary between self and other within the facial distance (the latent vector distance dw = 70) used in this experiment. Particularly, between 20 and 60%, we observed a region where subjective self-face recognition and the morphing rate are approximately linear, as reported previously 47 , confirming the validity of this facial distance we proposed. ...
... The such design mirrors the temporal predictive coding of the human visual system [28,35], wherein higher order neurons send feedback signals to lower order neurons and thus control their behaviors. This mechanism implicitly forces neurons to learn motion perception in a dynamic environment [40]. Integration of the above modules yields a lightweight network with only 2.50 M parameters, which handles causal sequences of arbitrary length to estimate optical flow. ...
... Adelson argued stuff is important because of its ubiquity in everyday life. Ritchie et al. [25] describe material perception in two parts. The first part is categorical recognition of what something is made of. ...
... Although effective, existing multimodal algorithms often lack a dynamical perception of the informativeness of each modality for different samples, which could otherwise enhance the trustworthiness, stability, and explainability of these methods. Evidence has shown that the informativeness of a modality typically varies across different samples [18], [19], which motivates efforts to model modality informativeness to enhance fusion effectiveness [20]. ...
... Thus, VGG-19 models that lack any knowledge of natural image structure show the same patterns of bias. Our findings involving V1 neural predictivity can be contrasted with a prior study that evaluated CNN models of neural responses to complex objects, for which a variety of control networks with randomized weights (including VGG-16) showed very similar levels of predictive performance from layer 1 onward (Storrs, Kietzmann, Walther, Mehrer, & Kriegeskorte, 2021). We performed an additional control analysis to evaluate the potential contributions of nonlinear complexity; this was done by removing every ReLU operation from VGG-19 except for the final one to occur in the feedforward pipeline of a layer-specific analysis. ...
... Second, the VAE model family, specifically hierarchical VAEs, is broad with other generative models, such as diffusion models, understood as special cases of hierarchical VAEs [37][38][39]. Finally, VAEs learn representations that are similar to cortex [1, 2, 40], exhibit cortex-like topographic organization [41,42], and make perceptual errors that mimic those of humans [43], indicating a significant degree of neural, organizational, and psychophysical alignment with the brain. ...