ArticleLiterature Review

Visual Object Recognition

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Visual object recognition is of fundamental importance to most animals. The diversity of tasks that any biological recognition system must solve suggests that object recognition is not a single, general purpose process. In this review, we consider evidence from the fields of psychology, neuropsychology, and neurophysiology, all of which supports the idea that there are multiple systems for recognition. Data from normal adults, infants, animals, and brain damaged patients reveal a major distinction between the classification of objects at a basic category level and the identification of individual objects from a homogeneous object class. An additional distinction between object representations used for visual perception and those used for visually guided movements provides further support for a multiplicity of visual recognition systems. Recent evidence from psychophysical and neurophysiological studies indicates that one system may represent objects by combinations of multiple views, or aspects, and another may represent objects by structural primitives and their spatial interrelationships.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For decades, people have debated how the brain represents visual objects 1,2 . In anterior regions of the ventral visual pathway, which supports objects discrimination and recognition 1,3-5 , neurons can show apparently idiosyncratic activation patterns, responding vigorously to only some objects 6 , but what these preferred objects have in common may not be apparent (but see e.g. 7 ). ...
... According to this line of thinking, categories such as faces evoke activity in specific anatomical regions of the ventral visual pathway due to the visual qualities that they happen to have, as almost all human faces fall within the stubby animatelooking quadrant of object space, also known as the face quadrant. Other objects that share these qualities also evoke activity in the same 1 This paper is on object space. We also preregistered hypotheses on so-called face space (https://osf.io/q5ne8), ...
Preprint
Full-text available
What are the diagnostic dimensions on which objects truly differ visually? We constructed a two-dimensional object space based on such attributes captured by a deep convolutional neural network. These attributes can be approximated as stubby/spiky and animate-/inanimate-looking. If object space underlies human visual cognition, this should have a measurable effect on object discrimination abilities. We administered an object foraging task to a large, diverse sample (N=511). We focused on the stubby animate-looking “face quadrant” of object space given known variations in face discrimination abilities. Stimuli were picked out of tens of thousands of images to either match or not match with the coordinates of faces in object space. People who struggled with telling apart faces also had difficulties with discriminating other objects with the same object space attributes. This study provides the first behavioral evidence for the existence of an object space in human visual cognition. Public Significance Statement The study emphasizes individual differences in visual cognition, a relatively neglected field of research. Unlike differences in other cognitive traits (e.g., Big Five personality traits, g-factor of general intelligence), we have limited knowledge on how people differ in their object processing capacity, and whether such abilities are fractionated or unitary. In this study, we ask whether visual object perception abilities are organized around an object space as evidenced by individual differences in behavior.
... Converging evidence has shown that visual object recognition is processed along the ventral visual pathway [1][2][3] . In macaque monkeys, faceselective neurons have long been identified in the highest stages of this pathway, i.e., the infero-temporal cortex 4,5 . ...
... Source data are provided as a Source Data file. face-selective regions of the monkey visual cortex 1,5,7 and in contrast to the nearly binary coding by downstream neurons in the human hippocampal formation 26 . Moreover, although with a lower strength, the face-responsive units also responded to some of the places (and vice versa), with the maximum response to the non-preferred stimulus category reaching about 60-70% of the maximum value for the preferred stimulus category. ...
Article
Full-text available
Faces are critical for social interactions and their recognition constitutes one of the most important and challenging functions of the human brain. While neurons responding selectively to faces have been recorded for decades in the monkey brain, face-selective neural activations have been reported with neuroimaging primarily in the human midfusiform gyrus. Yet, the cellular mechanisms producing selective responses to faces in this hominoid neuroanatomical structure remain unknown. Here we report single neuron recordings performed in 5 human subjects (1 male, 4 females) implanted with intracerebral microelectrodes in the face-selective midfusiform gyrus, while they viewed pictures of familiar and unknown faces and places. We observed similar responses to faces and places at the single cell level, but a significantly higher number of neurons responding to faces, thus offering a mechanistic account for the face-selective activations observed in this region. Although individual neurons did not respond preferentially to familiar faces, a population level analysis could consistently determine whether or not the faces (but not the places) were familiar, only about 50 ms after the initial recognition of the stimuli as faces. These results provide insights into the neural mechanisms of face processing in the human brain.
... For decades, people have debated how the brain represents visual objects 1,2 . In anterior regions of the ventral visual pathway, which supports objects discrimination and recognition 1,3-5 , neurons can show apparently idiosyncratic activation patterns, responding vigorously to only some objects 6 , but what these preferred objects have in common may not be apparent (but see e.g. 7 ). ...
... According to this line of thinking, categories such as faces evoke activity in specific anatomical regions of the ventral visual pathway due to the visual qualities that they happen to have, as almost all human faces fall within the stubby animatelooking quadrant of object space, also known as the face quadrant. Other objects that share these qualities also evoke activity in the same 1 This paper is on object space. We also preregistered hypotheses on so-called face space (https://osf.io/q5ne8), ...
Article
Full-text available
What are the organizational principles of visual object perception as evidenced by individual differences in behavior? What specific abilities and disabilities in object discrimination go together? In this preregistered study (https://osf.io/q5ne8), we collected data from a large (N=511) heterogeneous sample to amplify individual differences in visual discrimination abilities. We primarily targeted people with self-declared face recognition abilities on opposite sides of the spectrum, ranging from poor to excellent face recognizers. We then administered a visual foraging task where people had to discriminate between various faces, other familiar objects, and novel objects. Each image had a known location in both face space and object space, which both were defined based on activation patterns in a convolutional neural network trained on object classification. Face space captures the main dimensions on which faces visually differ from one another while object space captures the main diagnostic dimensions across various objects. Distance between two images in face/object space can be calculated, where greater distance indicates that the images are visually different from one another on dimensions that are diagnostic for telling apart different faces/objects. Our results suggest that there simply are not any measurable stable individual differences in the usage of face space. However, we furthermore show that people who struggle with telling apart different faces also have some difficulties with visual processing of objects that share visual qualities with faces as measured by their location in object space. Face discrimination may therefore not rely on completely domain-specific abilities but may tap into mechanisms that support other object discrimination. We discuss how these results may or may not provide support for the existence of an object space in human high-level vision.
... Unlike the fronto-parietal system, IT neurons often have very large receptive fields (Gross, Bender, & Rocha-Miranda, 1969; but see Rolls et al., 2003). These cells might therefore not carry as much information on the location of an object, but instead they selectively respond to the complex features of objects of interest (Logothetis & Sheinberg, 1996). ...
... There is now quite a lot of data that confirms that regions such as the lateral intraparietal area and the frontal eye field have reciprocal structural connections to some regions in the ventral visual stream, such as V4 and parts of the inferior temporal cortex (see figure 1; Blatt, Andersen, & Stoner, 1990;Distler, Boussaoud, Desimone, & Ungerleider, 1993;Lewis & Van Essen, 2000;Schall, Morel, King, & Bullier, 1995;Stanton, Bruce, & Goldberg, 1995;Webster, Bachevalier, & Ungerleider, 1994), that respond selectively to moderately or highly complex visual features or even whole objects and which are thought important for visual object recognition (Logothetis & Sheinberg, 1996). Since these connections exist, the lateral intraparietal area and the frontal eye fields must, at least under some circumstances, exchange information with regions in the ventral visual stream. ...
Chapter
Full-text available
Cutting-edge research on the visual cognition of scenes, covering issues that include spatial vision, context, emotion, attention, memory, and neural mechanisms underlying scene representation. For many years, researchers have studied visual recognition with objects—single, clean, clear, and isolated objects, presented to subjects at the center of the screen. In our real environment, however, objects do not appear so neatly. Our visual world is a stimulating scenery mess; fragments, colors, occlusions, motions, eye movements, context, and distraction all affect perception. In this volume, pioneering researchers address the visual cognition of scenes from neuroimaging, psychology, modeling, electrophysiology, and computer vision perspectives. Building on past research—and accepting the challenge of applying what we have learned from the study of object recognition to the visual cognition of scenes—these leading scholars consider issues of spatial vision, context, rapid perception, emotion, attention, memory, and the neural mechanisms underlying scene representation. Taken together, their contributions offer a snapshot of our current knowledge of how we understand scenes and the visual world around us. ContributorsElissa M. Aminoff, Moshe Bar, Margaret Bradley, Daniel I. Brooks, Marvin M. Chun, Ritendra Datta, Russell A. Epstein, Michèle Fabre-Thorpe, Elena Fedorovskaya, Jack L. Gallant, Helene Intraub, Dhiraj Joshi, Kestutis Kveraga, Peter J. Lang, Jia Li Xin Lu, Jiebo Luo, Quang-Tuan Luong, George L. Malcolm, Shahin Nasr, Soojin Park, Mary C. Potter, Reza Rajimehr, Dean Sabatinelli, Philippe G. Schyns, David L. Sheinberg, Heida Maria Sigurdardottir, Dustin Stansbury, Simon Thorpe, Roger Tootell, James Z. Wang
... In contrast, the style latent captures characteristics that are not causally related to the true label, but may contribute towards the generation of noisy labels ( Figure 1). Guided by a structural causal model and implemented through Variational Auto-Encoders (VAEs) [13,11], our generative model not only exploits the theoretical and empirical success of weakly supervised causal representation learning [34,47], but also aligns with psychological and physiological evidence that human annotators do not perceive objects based on the raw input signal from the retina, but instead rely on semantic concepts processed by the visual cortex [20,33]. ...
... It is worth noting that fỸ also depends on x since the latent factors are designed to capture the high-level semantics instead of all the subtle details of complex objects. Therefore, for instances that are difficult to label, annotators will consider finer details to be a secondary factor [20]. ...
Preprint
Full-text available
Label noise widely exists in large-scale datasets and significantly degenerates the performances of deep learning algorithms. Due to the non-identifiability of the instance-dependent noise transition matrix, most existing algorithms address the problem by assuming the noisy label generation process to be independent of the instance features. Unfortunately, noisy labels in real-world applications often depend on both the true label and the features. In this work, we tackle instance-dependent label noise with a novel deep generative model that avoids explicitly modeling the noise transition matrix. Our algorithm leverages casual representation learning and simultaneously identifies the high-level content and style latent factors from the data. By exploiting the supervision information of noisy labels with structural causal models, our empirical evaluations on a wide range of synthetic and real-world instance-dependent label noise datasets demonstrate that the proposed algorithm significantly outperforms the state-of-the-art counterparts.
... There is substantial evidence that the primate ventral visual stream underlies their ability to recognize objects [29,30]. Modeling experiments have also shown that the internal activity of neural network models trained to perform visual object recognition tasks, are highly similar to the neuronal responses measured from the ventral visual cortex [19,31]. ...
Article
Full-text available
Unit activity in particular deep neural networks (DNNs) are remarkably similar to the neuronal population responses to static images along the primate ventral visual cortex. Linear combinations of DNN unit activities are widely used to build predictive models of neuronal activity in the visual cortex. Nevertheless, prediction performance in these models is often investigated on stimulus sets consisting of everyday objects under naturalistic settings. Recent work has revealed a generalization gap in how predicting neuronal responses to synthetically generated out-of-distribution (OOD) stimuli. Here, we investigated how the recent progress in improving DNNs’ object recognition generalization, as well as various DNN design choices such as architecture, learning algorithm, and datasets have impacted the generalization gap in neural predictivity. We came to a surprising conclusion that the performance on none of the common computer vision OOD object recognition benchmarks is predictive of OOD neural predictivity performance. Furthermore, we found that adversarially robust models often yield substantially higher generalization in neural predictivity, although the degree of robustness itself was not predictive of neural predictivity score. These results suggest that improving object recognition behavior on current benchmarks alone may not lead to more general models of neurons in the primate ventral visual cortex.
... Similarly, the superior parietal cortex, where the facial saliency score was positively correlated to the theta/alphaband activities, has been associated with the disambiguation process of bistable images (Kanai et al., 2011). Furthermore, the inferior temporal cortex is famously associated with object recognition (Logothetis and Sheinberg, 1996;Conway, 2018); thus, the negative cluster identified in this region for facial saliency score would indicate the contribution of the recognition process to disambiguate the visual information to drive affective processing. Although the neurological mechanisms underlying the interest-based liking system and PP framework in the context of aesthetic emotion remain unknown, these regions may play a pivotal role in the processes. ...
Article
Full-text available
Introduction Aesthetic emotions are a class of emotions aroused by evaluating aesthetically appealing objects or events. While evolutionary aesthetics suggests the adaptive roles of these emotions, empirical assessments are lacking. Previous neuroscientific studies have demonstrated that visual stimuli carrying evolutionarily important information induce neural responses even when presented non-consciously. To examine the evolutionary importance of aesthetic emotions, we conducted a neuroscientific study using magnetoencephalography (MEG) to measure induced neural responses to non-consciously presented portrait paintings categorised as biological and non-biological and examined associations between the induced responses and aesthetic ratings. Methods MEG and pre-rating data were collected from 23 participants. The pre-rating included visual analogue scales for object saliency, facial saliency, liking, and beauty scores, in addition to ‘biologi-ness,’ which was used for subcategorising stimuli into biological and non-biological. The stimuli were presented non-consciously using a continuous flash suppression paradigm or consciously using binocular presentation without flashing masks, while dichotomic behavioural responses were obtained (beauty or non-beauty). Time-frequency decomposed MEG data were used for correlation analysis with pre-rating scores for each category. Results Behavioural data revealed that saliency scores of non-consciously presented stimuli influenced dichotomic responses (beauty or non-beauty). MEG data showed that non-consciously presented portrait paintings induced spatiotemporally distributed low-frequency brain activities associated with aesthetic ratings, which were distinct between the biological and non-biological categories and conscious and non-conscious conditions. Conclusion Aesthetic emotion holds evolutionary significance for humans. Neural pathways are sensitive to visual images that arouse aesthetic emotion in distinct ways for biological and non-biological categories, which are further influenced by consciousness. These differences likely reflect the diversity in mechanisms of aesthetic processing, such as processing fluency, active elaboration, and predictive processing. The aesthetic processing of non-conscious stimuli appears to be characterised by fluency-driven affective processing, while top-down regulatory processes are suppressed. This study provides the first empirical evidence supporting the evolutionary significance of aesthetic processing.
... Most stimuli are processed using general-purpose (flexible) mechanisms, rather than specialized mechanisms. For example, cups and tables are identified using similar processes, rather than there being a special mechanism used only for cups and a different mechanism used only for tables (DiCarlo et al. 2012, Logothetis & Sheinberg 1996. Flexibility is important because identifying diverse stimuli is necessary for social and ecological success. ...
Article
Animals live in visually complex environments. As a result, visual systems have evolved mechanisms that simplify visual processing and allow animals to focus on the information that is most relevant to adaptive decision making. This review explores two key mechanisms that animals use to efficiently process visual information: categorization and specialization. Categorization occurs when an animal's perceptual system sorts continuously varying stimuli into a set of discrete categories. Specialization occurs when particular classes of stimuli are processed using distinct cognitive operations that are not used for other classes of stimuli. We also describe a nonadaptive consequence of simplifying heuristics: visual illusions, where visual perception consistently misleads the viewer about the state of the external world or objects within it. We take an explicitly comparative approach by exploring similarities and differences in visual cognition across human and nonhuman taxa. Considering areas of convergence and divergence across taxa provides insight into the evolution and function of visual systems and associated perceptual strategies.
... Marrying our results with previous literature, we propose that both High and Low Memorable images are similarly processed in an initial feedforward pass through the visual perceptual hierarchy [55,56] to extract their low-level (e.g., spatial frequency, color, contrast) and highlevel (e.g., shapes, concepts) properties. The high-level visual regions at the end stages of the initial visual perception, specifically the inferior temporal cortex (Fig 2B, middle left panel) and banks of the STS (Fig 2B, lower right panel) seen here at around 278 ms and 269 ms, respectively, may preferentially process High Memorable images due to their relatively more conceptually [27,57] and socially [33,34] useful information content [58]. ...
Article
Full-text available
Behavioral and neuroscience studies in humans and primates have shown that memorability is an intrinsic property of an image that predicts its strength of encoding into and retrieval from memory. While previous work has independently probed when or where this memorability effect may occur in the human brain, a description of its spatiotemporal dynamics is missing. Here, we used representational similarity analysis (RSA) to combine functional magnetic resonance imaging (fMRI) with source-estimated magnetoencephalography (MEG) to simultaneously measure when and where the human cortex is sensitive to differences in image memorability. Results reveal that visual perception of High Memorable images, compared to Low Memorable images, recruits a set of regions of interest (ROIs) distributed throughout the ventral visual cortex: a late memorability response (from around 300 ms) in early visual cortex (EVC), inferior temporal cortex, lateral occipital cortex, fusiform gyrus, and banks of the superior temporal sulcus. Image memorability magnitude results are represented after high-level feature processing in visual regions and reflected in classical memory regions in the medial temporal lobe (MTL). Our results present, to our knowledge, the first unified spatiotemporal account of visual memorability effect across the human cortex, further supporting the levels-of-processing theory of perception and memory.
... The development of CNNs has received attention among deep learning's significant advancements. Their impact has been felt in some areas, including generative AI, examining medical images, identifying objects [9], and finding anomalies [10]. CNNs, constituting a feedforward neural network, integrate convolution operations into their architecture [7] [11]. ...
Article
Full-text available
In today’s digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and image segmentation. There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention, depthwise convolutions, and NAS, among others. Each type of CNN has its unique structure and characteristics, making it suitable for specific tasks. It’s crucial to gain a thorough understanding and perform a comparative analysis of these different CNN types to understand their strengths and weaknesses. Furthermore, studying the performance, limitations, and practical applications of each type of CNN can aid in the development of new and improved architectures in the future. We also dive into the platforms and frameworks that researchers utilize for their research or development from various perspectives. Additionally, we explore the main research fields of CNN like 6D vision, generative models, and meta-learning. This survey paper provides a comprehensive examination and comparison of various CNN architectures, highlighting their architectural differences and emphasizing their respective advantages, disadvantages, applications, challenges, and future trends.
... Humans and monkeys are adept at recognizing objects in everyday scenes. The neural substrate for object recognition is a series of computations in the ventral stream of visual cortex (Ungerleider and Mishkin, 1982;Mishkin et al., 1983;Goodale and Milner, 1992;Logothetis and Sheinberg, 1996;DiCarlo et al., 2012;Kaas et al., 2022). This consists of a series of hierarchically connected visual areas: beginning in area V1, continuing through areas V2, then V4, before culminating in inferotemporal cortex (IT). ...
Preprint
Full-text available
Humans and monkeys can effortlessly recognize objects in everyday scenes. This ability relies on neural computations in the ventral stream of visual cortex. The intermediate computations that lead to object selectivity are not well understood, but previous studies implicate V4 as an early site of selectivity for object shape. To explore the mechanisms of this selectivity, we generated a continuum of images between “scrambled” textures and photographic images of both natural and manmade environments, using techniques that preserve the local statistics of the original image while discarding information about scene and shape. We measured the responses of single units in awake macaque V4 to these images. On average, V4 neurons were slightly more active in response to photographic images than to their scrambled counterparts. However, responses in V4 varied widely both across different cells and different sets of images. An important determinant of this variation was the effectiveness of image families at driving strong neural responses. Across the full V4 population, a cell’s average evoked firing rate for a family reliably predicted that family’s preference for photographic over scrambled images. Accordingly, the cells that respond most strongly to each image family showed a much stronger difference between photographic and scrambled images and a graded level of modulation for images scrambled at intermediate levels. This preference for photographic images was not evident until ∼50 ms after the onset of neuronal activity and did not peak in strength until 140 ms after activity onset. Finally, V4 neural responses seemed to categorically separate photographic images from all of their scrambled counterparts, despite the fact that the least scrambled images in our set appear similar to the originals. When these same images were analyzed with DISTS (Deep Image Structure and Texture Similarity), an image-computable similarity metric that predicts human judgements of image degradation, this same pattern emerged. This suggests that V4 responses are highly sensitive to small deviations from photographic image structure.
... Health care is another industry where deep learning has found several applications, including diagnosis, treatment planning, drug discovery [9], and medical imaging analysis [10][11][12]. In robotics, deep learning is used for autonomous navigation, object recognition [13][14][15], and robotic control, handwritten recognition for various languages [16][17][18][19][20], questions-answering [21][22][23][24][25], intrusion detection in IoT [26][27][28], and energy consumption prediction [29,30]. ...
Article
Full-text available
In the current business environment, where the customer is the primary focus, effective communication between marketing and senior management is vital for success. Effective customer profiling is a cornerstone of strategic decision-making for digital start-ups seeking sustainable growth and customer satisfaction. This research investigates the clustering of customers based on recency, frequency, and monetary (RFM) analysis and employs validation metrics to derive optimal clusters. The K-means clustering algorithm, coupled with the Elbow method, Silhouette coefficient, and Gap Statistics method, facilitates the identification of distinct customer segments. The study unveils three primary clusters with unique characteristics: new customers (Cluster A), best customers (Cluster B), and intermittent customers (Cluster C). For platform-based Edutech start-ups, Cluster A underscores the importance of tailored learning content and support, Cluster B emphasizes personalized incentives, and Cluster C suggests re-engagement strategies. By understanding and addressing the diverse needs of these clusters, digital start-ups can forge enduring connections, optimize customer engagement, and fuel sustainable business growth.
... Such contrastive perception is rather general in human perception, and has been well noted in modalities other than auditory perception, such as visual perception (Arend and Reeves 1986;Bäuml 1999;Grill-Spector 2003;Lawson 1999;Logothetis and Sheinberg 1996;Maloney and Wandell 1986;Wallach 1948). For instance, the size of a visual object is perceived depending on the size of other objects in the environment, such that the same visual object presented next to a large object might appear to be smaller than when presented next to a small object. ...
... Humans show high proficiency in invariant object recognition, the ability to recognize the same objects from different viewpoints or in different scenes. This ability is supported by the ventral visual stream, the so-called what stream (Logothetis and Sheinberg, 1996). A question that is repeatedly addressed in vision studies is whether and how we can model this stream by means of animal models or computational models to further examine and quantify the representations along the ventral visual stream. ...
Article
Full-text available
Many species are able to recognize objects, but it has been proven difficult to pinpoint and compare how different species solve this task. Recent research suggested to combine computational and animal modelling in order to obtain a more systematic understanding of task complexity and compare strategies between species. In this study, we created a large multidimensional stimulus set and designed a visual discrimination task partially based upon modelling with a convolutional deep neural network (CNN). Experiments included rats (N = 11; 1115 daily sessions in total for all rats together) and humans (N = 45). Each species was able to master the task and generalize to a variety of new images. Nevertheless, rats and humans showed very little convergence in terms of which object pairs were associated with high and low performance, suggesting the use of different strategies. There was an interaction between species and whether stimulus pairs favoured early or late processing in a CNN. A direct comparison with CNN representations and visual feature analyses revealed that rat performance was best captured by late convolutional layers and partially by visual features such as brightness and pixel-level similarity, while human performance related more to the higher-up fully connected layers. These findings highlight the additional value of using a computational approach for the design of object recognition tasks. Overall, this computationally informed investigation of object recognition behaviour reveals a strong discrepancy in strategies between rodent and human vision.
... Object detection and recognition are fundamental components of primate vision, and a substantial number of visual cortical areas are dedicated to processing visual objects [1][2][3][4][5] . However, vision does not occur in complete isolation of behavior, and an element of visual object processing in the brain must facilitate active orienting in association with objects, whether to avoid threats 6 or to foveate and further process behaviorally-relevant items. ...
Article
Full-text available
Primate superior colliculus (SC) neurons exhibit visual feature tuning properties and are implicated in a subcortical network hypothesized to mediate fast threat and/or conspecific detection. However, the mechanisms through which SC neurons contribute to peripheral object detection, for supporting rapid orienting responses, remain unclear. Here we explored whether, and how quickly, SC neurons detect real-life object stimuli. We presented experimentally-controlled gray-scale images of seven different object categories, and their corresponding luminance- and spectral-matched image controls, within the extrafoveal response fields of SC neurons. We found that all of our functionally-identified SC neuron types preferentially detected real-life objects even in their very first stimulus-evoked visual bursts. Intriguingly, even visually-responsive motor-related neurons exhibited such robust early object detection. We further identified spatial frequency information in visual images as an important, but not exhaustive, source for the earliest (within 100 ms) but not for the late (after 100 ms) component of object detection by SC neurons. Our results demonstrate rapid and robust detection of extrafoveal visual objects by the SC. Besides supporting recent evidence that even SC saccade-related motor bursts can preferentially represent visual objects, these results reveal a plausible mechanism through which rapid orienting responses to extrafoveal visual objects can be mediated.
... Conversely, when asked to distinguish among multiple objects, they resort to object identification. Despite the similarities between object recognition and identification, object identification is perceived as a unique process, with different brain regions engaged in processing the information [9]. ...
Article
Full-text available
Object recognition and object identification are multifaceted cognitive operations that require various brain regions to synthesize and process information. Prior research has evidenced the activity of both visual and temporal cortices during these tasks. Notwithstanding their similarities, object recognition and identification are recognized as separate brain functions. Drawing from the two-stream hypothesis, our investigation aims to understand whether the channels within the ventral and dorsal streams contain pertinent information for effective model learning regarding object recognition and identification tasks. By utilizing the data we collected during the object recognition and identification experiment, we scrutinized EEGNet models, trained using channels that replicate the two-stream hypothesis pathways, against a model trained using all available channels. The outcomes reveal that the model trained solely using the temporal region delivered a high accuracy level in classifying four distinct object categories. Specifically, the object recognition and object identification models achieved an accuracy of 89% and 85%, respectively. By incorporating the channels that mimic the ventral stream, the model’s accuracy was further improved, with the object recognition model and object identification model achieving an accuracy of 95% and 94%, respectively. Furthermore, the Grad-CAM result of the trained models revealed a significant contribution from the ventral and dorsal stream channels toward the training of the EEGNet model. The aim of our study is to pinpoint the optimal channel configuration that provides a swift and accurate brain-computer interface system for object recognition and identification.
... However, not all descriptors are suitable for feature matching. It has been studied that the distinctiveness required for accurate matching is lacking in uniform regions or elongated 1D patterns (Logothetis and Sheinberg, 1996), which are common in PCAM images. Therefore, an enhanced AP loss is applied to spare the network in wasting its efforts on undistinctive regions (Revaud et al., 2019): ...
Article
Topographic reconstruction of the lunar surface or other planets is important to engineering applications and scientific research in a planetary exploration mission. The typical methods of terrain reconstruction are usually based on photogrammetry techniques. Structure-from-Motion (SfM) is one of the most effective and commonly used photogrammetric technologies that estimate three-dimensional structures from two-dimensional image sequences. To find correspondences and align photos, SfM approaches require invariant features, such as Scale Invariant Feature Transform (SIFT). The hand-crafted features, however, seriously degrade performance due to weak texture and low light conditions causing image matching failure, which will directly affect the accuracy and robustness of reconstruction. Robust local descriptors based on deep learning outperform hand-crafted descriptors since convolutional neural networks are more robust than hand-engineered representations. Therefore, a novel and robust deep learning-based local feature extraction method is proposed, comprising two branch networks integrated with attention mechanisms for generating reliable keypoints and descriptors, respectively. Furthermore, a 3D terrain surface reconstruction workflow is constructed by combining it with the modern advanced image matching method and SfM system. The effectiveness of the proposed method and the workflow were verified in experiments using Panoramic Camera (PCAM) images acquired from three waypoints explored by the Yutu-2 lunar rover during the Chang'e-4 mission. We also illustrate how our approach supports other applications, such as creating panoramic mosaics of surface imagery. This provides a new and powerful method for planetary terrain reconstruction at a high spatial resolution that can meet the requirements for rover navigation and positioning, as well as geological analysis of the Moon and other planets. The source codes developed in this study are openly available at https://github.com/Atypical-Programmer/Deep_Reconstruction_Workflow.
... Electrophysiological measurements in animal models have been most successful at resolving how pre-decision activity contributes to the subsequent behavioral response (Hunt et al., 2015;Kim & Shadlen, 1999;Shadlen & Newsome, 2001; Thompson et al., 1996). Using non-invasive functional magnetic resonance imaging (MRI) methods on humans, however, poses a significant challenge because of the discrepancy between the slow temporal resolution of blood oxygenation-level dependent (BOLD) measurements and the rapid dynamics of the types of decisions commonly used in human cognitive studies, such as object recognition/decision (Logothetis & Sheinberg, 1996;Potter, 1976;Thorpe et al., 1996). ...
Article
Full-text available
Introduction Many theories contend that evidence accumulation is a critical component of decision‐making. Cognitive accumulation models typically interpret two main parameters: a drift rate and decision threshold. The former is the rate of accumulation, based on the quality of evidence, and the latter is the amount of evidence required for a decision. Some studies have found neural signals that mimic evidence accumulators and can be described by the two parameters. However, few studies have related these neural parameters to experimental manipulations of sensory data or memory representations. Here, we investigated the influence of affective salience on neural accumulation parameters. High affective salience has been repeatedly shown to influence decision‐making, yet its effect on neural evidence accumulation has been unexamined. Methods The current study used a two‐choice object categorization task of body images (feet or hands). Half the images in each category were high in affective salience because they contained highly aversive features (gore and mutilation). To study such quick categorization decisions with a relatively slow technique like functional magnetic resonance imaging, we used a gradual reveal paradigm to lengthen cognitive processing time through the gradual “unmasking” of stimuli. Results Because the aversive features were task‐irrelevant, high affective salience produced a distractor effect, slowing decision time. In visual accumulation regions of interest, high affective salience produced a longer time to peak activation. Unexpectedly, the later peak appeared to be the product of changes to both drift rate and decision threshold. The drift rate for high affective salience was shallower, and the decision threshold was greater. To our knowledge, this is the first demonstration of an experimental manipulation of sensory data or memory representations that changed the neural decision threshold. Conclusion These findings advance our knowledge of the neural mechanisms underlying affective responses in general and the influence of high affective salience on object representations and categorization decisions.
... Healthcare is another industry where DL has found several applications, including diagnosis, treatment planning, drug discovery (Fakoor et al., 2013), and medical imaging analysis (Nie et al., 2015;Abdallah et al., 2020a;Yu et al., 2014). In robotics, DL is used for autonomous navigation, object recognition (Mahmoud and Kang, 2023;Logothetis and Sheinberg, 1996;Nurseitov et al., 2022), and robotic control, handwritten recognition for various languages (Mahmoud et al., 2014;Nurseitov et al., 2021;Toiganbayeva et al., 2022;Abdallah et al., 2020b;Daniyar Nurseitov et al., 2020), Questions-Answering (Karpukhin et al., 2020;Chen and Yih, 2020;Abdallah and Jatowt, 2023;Abdallah et al., 2023b). Intrusion Detection in IoT Xu et al., 2021;Akkad et al., 2023), energy consumption prediction (Waschneck et al., 2018;Hamada et al., 2021;Kasem et al., 2023). ...
Article
Software-Defined Network (SDN) is an established networking paradigm that separates the control plane from the data plane. It has central network control, and programmability facilities, therefore SDN can improve network flexibility, management, performance, and scalability. The programmability and control centralization of SDN have improved network functions but also exposed it to security challenges such as Distributed Denial of Service (DDoS) attacks that target both control and data planes. This paper proposes an effective detection technique against DDoS attack in SDN control plane and data plane. For the control plane, the technique detects DDoS attacks through a Deep Learning (DL) model using new features extracted from traffic statistics. A DL method (AE-BGRU) for DDoS detection uses Autoencoder (AE) with Bidirectional Gated Recurrent Unit (BGRU). The proposed features for the control plane include unknown IP destination address, packets inter-arrival time, Transport layer protocol (TLP) header, and Type of service (ToS) header. For the data plane, the technique tracks the switch's average arrival bit rate with an unknown destination address in the data plane. Then, the technique detects DDoS attacks through a DL-based model which also uses AE with BGRU. The proposed features in the data plane include the switch's stored capacity, the average rate of packets with unknown destination addresses, the IP Options header, and the average number of flows. The dataset is generated from feature extraction and computations from normal and attack packets and used with the classifier. Also, additional Machine Learning (ML) methods are used to enhance the detection process. If the model detects an attack, the technique mitigates DDoS effects by updating the user's trust value and blocking suspicious senders based on the trust value. The experimental results proved that compared to related techniques, the suggested method had a higher accuracy and lower false alarm rate.
... Studies of visual associative long-term memory have indicated the importance of the inferior temporal cortex (ITC) and the prefrontal cortex (PFC) in memory retrieval by visual cues and WM maintenance processes. [1][2][3] The ITC, which receives visual information via the ventral visual stream in the primate cerebral cortex, is important for object recognition [4][5][6] and contains many neurons that respond selectively to visual features and objects 4,7 and neurons involved in visual associative memory retrieval and its WM maintenance. 8,9 The PFC, which has reciprocal anatomical connections with the ITC, 10,11 also contains neurons that are selective for attributes of visual stimuli [12][13][14][15] and is thought to receive bottom-up visual information from the ITC. ...
Article
Full-text available
Interaction between the inferotemporal (ITC) and prefrontal (PFC) cortices is critical for retrieving information from memory and maintaining it in working memory. Neural oscillations provide a mechanism for communication between brain regions. However, it remains unknown how information flow via neural oscillations is functionally organized in these cortices during these processes. In this study, we apply Granger causality analysis to electrocorticographic signals from both cortices of monkeys performing visual association tasks to map information flow. Our results reveal regions within the ITC where information flow to and from the PFC increases via specific frequency oscillations to form clusters during memory retrieval and maintenance. Theta-band information flow in both directions increases in similar regions in both cortices, suggesting reciprocal information exchange in those regions. These findings suggest that specific subregions function as nodes in the memory information-processing network between the ITC and the PFC.
... Humans interact with a multitude of different objects each day, recognizing and generalizing complex patterns of sensory information to identify and categorize them in concepts based on their perceived meaning (Logothetis and Sheinberg 1996). Vision and language are key to these tasks, since they allow to identify objects from concrete or abstract representations, as well as to rely information about them to other human beings (Bonner and Epstein 2021). ...
Preprint
Full-text available
Spatio-temporal patterns of evoked brain activity contain information that can be used to decode and categorize the semantic content of visual stimuli. This procedure can be biased by statistical regularities which can be independent from the concepts that are represented in the stimuli, prompting the need to dissociate between the contributions of image statistics and semantics to decoding accuracy. We trained machine learning models to distinguish between concepts included in the THINGS-EEG dataset using electroencephalography (EEG) data acquired during a rapid serial visual presentation protocol. After systematic univariate feature selection in the temporal and spatial domains, we constructed simple models based on local signals which superseded the accuracy of more complex classifiers based on distributed patterns of information. Simpler models were characterized by their sensitivity to biases in the statistics of visual stimuli, with some of them preserving their accuracy after random replacement of the training dataset while maintaining the overall statistics of the images. We conclude that model complexity impacts on the sensitivity to confounding factors regardless of performance; therefore, the choice of EEG features for semantic decoding should ideally be informed by the underlying neurobiological mechanisms.
... when a towel is dropped in a pile, or when a child traces patterns in sand. These shape changes are a complex challenge to our visual and cognitive systems, which have to solve two complementary and linked inferences: recognizing objects across transformations (Biederman, 1987;DiCarlo et al., 2012;Logothetis & Sheinberg, 1996;Pasupathy et al., 2018;Riesenhuber & Poggio, 2000), and recognizing transformations across objects (Arnheim, 1974;Chen et al., 2021;Leyton, 1989;Ons & Wagemans, 2012;Pinna, 2010;Pinna & Deiana, 2015;Schmidt & Fleming, 2018;. ...
Article
Full-text available
Many objects and materials in our environment are subject to transformations that alter their shape. For example, branches bend in the wind, ice melts, and paper crumples. Still, we recognize objects and materials across these changes, suggesting we can distinguish an object's original features from those caused by the transformations ("shape scission"). Yet, if we truly understand transformations, we should not only be able to identify their signatures but also actively apply the transformations to new objects (i.e., through imagination or mental simulation). Here, we investigated this ability using a drawing task. On a tablet computer, participants viewed a sample contour and its transformed version, and were asked to apply the same transformation to a test contour by drawing what the transformed test shape should look like. Thus, they had to (i) infer the transformation from the shape differences, (ii) envisage its application to the test shape, and (iii) draw the result. Our findings show that drawings were more similar to the ground truth transformed test shape than to the original test shape-demonstrating the inference and reproduction of transformations from observation. However, this was only observed for relatively simple shapes. The ability was also modulated by transformation type and magnitude but not by the similarity between sample and test shapes. Together, our findings suggest that we can distinguish between representations of original object shapes and their transformations, and can use visual imagery to mentally apply nonrigid transformations to observed objects, showing how we not only perceive but also 'understand' shape.
... Object perception has long been an area of interest in the visual neurosciences [1][2][3][4][5][6][7][8][9][10] . When viewing a scene, our attention is typically captured by objects that are salient (which may be driven by low-level visual features such as contrast [11][12][13], and guided by our expectations (based on prior knowledge of the objects themselves 13,14 ). ...
Article
Full-text available
An enduring question in cognitive science is how perceptually novel objects are processed. Addressing this issue has been limited by the absence of a standardised set of object-like stimuli that appear realistic, but cannot possibly have been previously encountered. To this end, we created a dataset, at the core of which are images of 400 perceptually novel objects. These stimuli were created using Generative Adversarial Networks that integrated features of everyday stimuli to produce a set of synthetic objects that appear entirely plausible, yet do not in fact exist. We curated an accompanying dataset of 400 familiar stimuli, which were matched in terms of size, contrast, luminance, and colourfulness. For each object, we quantified their key visual properties (edge density, entropy, symmetry, complexity, and spectral signatures). We also confirmed that adult observers (N = 390) perceive the novel objects to be less familiar, yet similarly engaging, relative to the familiar objects. This dataset serves as an open resource to facilitate future studies on visual perception.
... In the psychological literature, it is commonly accepted that similarity might exist between pairs of stimuli presented within the same sensory modality (e.g., Blank and Mattes 1990;Ekman 1954;Ekman et al. 1964;Shepard 1962Shepard , 1974Tversky 1977). The majority of the studies in the literature on sensory similarity reference vision (e.g., Logothetis and Sheinberg 1996; though see Spence 2022a, for an isolated exception); this bias is not unexpected, given the well-known primacy of vision in Western culture (Classen 1997;Hutmacher 2019;Jenks 2002; see also Levin 1993). At the same time, however, talking about sensory similarity between pairs of stimuli presented in different sensory modalities would appear to be a much more controversial topic (e.g., Helmholtz 1878Helmholtz /1971Marks 1978). ...
Article
Full-text available
Perceptual similarity is one of the most fiercely debated topics in the philosophy and psychology of perception. The documented history of the issue spans all the way from Plato – who regarded similarity as a key factor for human perceptual experience and cognition – through to contemporary psychologists – who have tried to determine whether, and if so, how similarity relationships can be established between stimuli both within and across the senses. Recent research on cross-sensory associations, otherwise known as crossmodal correspondences – that is, the existence of observable consensual associations, or mappings, between stimuli across different senses – represents an especially interesting field in which to study perceptual similarity. In fact, most accounts of crossmodal association that have been put forward in the literature to date evoke perceptual similarity as a key explanatory factor mediating the underlying association. At the same time, however, these various accounts raise several important theoretical questions concerning the very nature of similarity, with, for example, the sensory, affective, or cognitive underpinnings of similarity judgements remaining unclear. We attempt to shed light on these questions by examining the various accounts of crossmodal associations that have been put forward in the literature. Our suggestion is that perceptual similarity varies from being phenomenologically-based to conceptually-based. In particular, we propose that the nature of the associations underlying similarity judgements – whether these associations are phenomenologically-, structurally-, emotionally-, or conceptually-based – may be represented in a two-dimensional space with associative strength on one axis, and cognitive penetrability on the other.
... The research of cognitive science proposed the concept of ''Object Constancy on the object recognition behavior of brain [19,28]. Object constancy means human's ability to recognize an object as having the same structure despite changes in its retinal projection. ...
Article
Full-text available
Hand pose estimation is a challenging task in hand-object interaction scenarios due to the uncertainty caused by object occlusions. Inspired by human reasoning from a hand-object interaction video sequence, we propose a hand pose estimation model. It uses three cascaded modules to imitate human’s estimation and observation process. The first module predicts an initial pose based on the visible information and the prior hand knowledge. The second module updates the hand shape memory based on the new information coming from the subsequent frames. The bone’s length updating is initiated by the predicted joint’s reliability. The third module refines the coarse pose according to the hand-object contact state represented by the object’s Signed Distance Function field. Our model gets the mean joints estimation error of 21.3 mm, the Procrustes error of 9.9 mm, and the Trans &Scale error of 22.3 mm on HO3Dv2, and Root-Relative error of 12.3 mm on DexYCB which are superior to other state-of-the-art models.
... Conversely, for any given two-dimensional image on the retina-the proximal stimulus-there are an infinite number of potentially very different three-dimensional scenesdistal stimuli-whose projections would have resulted in the very same image (e.g., see DiCarlo & Cox, 2007;Pinto, Cox, & DiCarlo, 2008). 1 The human ability to recognize objects rapidly and effortlessly across a wide range of identity-preserving transformations has been termed core object recognition (see DiCarlo, Zoccolan, & Rust, 2012, for a review). The computational difficulty notwithstanding, human object recognition ability is not only subjectively effortless, but objectively often tremendously complex (e.g., Biederman, 1987 or see Logothetis & Sheinberg, 1996;Peissig & Tarr, 2007;Gauthier & Tarr, 2015, for reviews). ...
Article
Full-text available
In laboratory object recognition tasks based on undistorted photographs, both adult humans and deep neural networks (DNNs) perform close to ceiling. Unlike adults', whose object recognition performance is robust against a wide range of image distortions, DNNs trained on standard ImageNet (1.3M images) perform poorly on distorted images. However, the last 2 years have seen impressive gains in DNN distortion robustness, predominantly achieved through ever-increasing large-scale datasets-orders of magnitude larger than ImageNet. Although this simple brute-force approach is very effective in achieving human-level robustness in DNNs, it raises the question of whether human robustness, too, is simply due to extensive experience with (distorted) visual input during childhood and beyond. Here we investigate this question by comparing the core object recognition performance of 146 children (aged 4-15 years) against adults and against DNNs. We find, first, that already 4- to 6-year-olds show remarkable robustness to image distortions and outperform DNNs trained on ImageNet. Second, we estimated the number of images children had been exposed to during their lifetime. Compared with various DNNs, children's high robustness requires relatively little data. Third, when recognizing objects, children-like adults but unlike DNNs-rely heavily on shape but not on texture cues. Together our results suggest that the remarkable robustness to distortions emerges early in the developmental trajectory of human object recognition and is unlikely the result of a mere accumulation of experience with distorted visual input. Even though current DNNs match human performance regarding robustness, they seem to rely on different and more data-hungry strategies to do so.
... The reverse hierarchy theory (RHT) [41,42] posits that what is learned is dependent on the flow of information through feedback and feedforward cortical pathways. Like PCT, RHT considers the hierarchical organization of sensory-perceptual systems of the cerebral cortex to comprise bottom-up representations of increasing generality and non-invariance at higher cortical levels [43][44][45][46]. However, RHT offers an alternative mechanism for perceptual learning across both auditory and visual speech. ...
Article
Full-text available
Traditionally, speech perception training paradigms have not adequately taken into account the possibility that there may be modality-specific requirements for perceptual learning with auditory-only (AO) versus visual-only (VO) speech stimuli. The study reported here investigated the hypothesis that there are modality-specific differences in how prior information is used by normal-hearing participants during vocoded versus VO speech training. Two different experiments, one with vocoded AO speech (Experiment 1) and one with VO, lipread, speech (Experiment 2), investigated the effects of giving different types of prior information to trainees on each trial during training. The training was for four ~20 min sessions, during which participants learned to label novel visual images using novel spoken words. Participants were assigned to different types of prior information during training: Word Group trainees saw a printed version of each training word (e.g., “tethon”), and Consonant Group trainees saw only its consonants (e.g., “t_th_n”). Additional groups received no prior information (i.e., Experiment 1, AO Group; Experiment 2, VO Group) or a spoken version of the stimulus in a different modality from the training stimuli (Experiment 1, Lipread Group; Experiment 2, Vocoder Group). That is, in each experiment, there was a group that received prior information in the modality of the training stimuli from the other experiment. In both experiments, the Word Groups had difficulty retaining the novel words they attempted to learn during training. However, when the training stimuli were vocoded, the Word Group improved their phoneme identification. When the training stimuli were visual speech, the Consonant Group improved their phoneme identification and their open-set sentence lipreading. The results are considered in light of theoretical accounts of perceptual learning in relationship to perceptual modality.
... Humans show high proficiency in invariant object recognition, the ability to recognize the same objects from different viewpoints or in different scenes. This ability is supported by the ventral visual stream, the so-called what stream (Logothetis and Sheinberg, 1996). A question that is repeatedly addressed in vision studies is whether and how we can model this stream by means of animal models or computational models to further examine and quantify the representations along the ventral visual stream. ...
Preprint
Full-text available
Many species are able to recognize objects, but it has been proven difficult to pinpoint and compare how different species solve this task. Recent research suggested to combine computational and animal modelling in order to obtain a more systematic understanding of task complexity and compare strategies between species. In the present study, we created a large multidimensional stimulus set and designed a visual categorization task partially based upon modelling with a convolutional deep neural network (cDNN). Experiments included rats (N = 11; 1115 daily sessions in total for all rats together) and humans (N = 50). Each species was able to master the task and generalize to a variety of new images. Nevertheless, rats and humans showed very little convergence in terms of which object pairs were associated with high and low performance, suggesting the use of different strategies. There was an interaction between species and whether stimulus pairs favoured early or late processing in a cDNN. A direct comparison with cDNN representations revealed that rat performance was best captured by late convolutional layers while human performance related more to the higher-up fully connected layers. These findings highlight the additional value of using a computational approach for the design of object recognition tasks. Overall, this computationally informed investigation of object recognition behaviour reveals a strong discrepancy in strategies between rodent and human vision.
... How can the higher level representation of the object or group feed back to the locations and features that define the object in early visual areas? After all, the goal of higher level representations in object areas is to code for abstract properties and identity independently of size, orientation, pose or location (Logothetis & Sheinberg, 1996;Tanaka, 1996). However, more recent single cell studies have shown more sensitivity to position than was once thought (DiCarlo & Maunsell, 2003). ...
... Humans can parse cluttered visual scenes and recognize objects within about 150 ms 46 . If this process involves sequential computations across a hierarchy of brain areas in the visual pathway, as is commonly assumed 65,66 , it leaves only tens of milliseconds for computation at each processing stage 46 . Given the sparse activity of cortical pyramidal cells, individual neurons can only contribute a few spikes in these short intervals. ...
Article
Full-text available
Parallel multisite recordings in the visual cortex of trained monkeys revealed that the responses of spatially distributed neurons to natural scenes are ordered in sequences. The rank order of these sequences is stimulus-specific and maintained even if the absolute timing of the responses is modified by manipulating stimulus parameters. The stimulus specificity of these sequences was highest when they were evoked by natural stimuli and deteriorated for stimulus versions in which certain statistical regularities were removed. This suggests that the response sequences result from a matching operation between sensory evidence and priors stored in the cortical network. Decoders trained on sequence order performed as well as decoders trained on rate vectors but the former could decode stimulus identity from considerably shorter response intervals than the latter. A simulated recurrent network reproduced similarly structured stimulus-specific response sequences, particularly once it was familiarized with the stimuli through non-supervised Hebbian learning. We propose that recurrent processing transforms signals from stationary visual scenes into sequential responses whose rank order is the result of a Bayesian matching operation. If this temporal code were used by the visual system it would allow for ultrafast processing of visual scenes.
... As humans recognize novel objects, multiple views could be helpful [25]. At the technical level, we use rotational geometric transformation to extract additional perspectives of the input images and use the combined features during prediction and training. ...
Article
Full-text available
A traditional deep neural network-based classifier assumes that only training classes appear during testing in closed-world settings. In most real-world applications, an open-set environment is more realistic than a conventional approach where unseen classes are potentially present during the model’s lifetime. Open-set recognition (OSR) provides the model with the capability to address this issue by reducing open-set risk, in which unknown classes could be recognized as known classes. Unfortunately, many proposed open-set techniques evaluate performance using "toy" datasets and do not consider transfer learning, which has become common practice in deriving a strong performance from deep learning models. We propose a quad-channel contrastive prototype network (QC-CPN) using quad-channel views of the input with contrastive prototype loss for real-world applications. These open-set techniques also require the tuning of new hyperparameters to justify their performance, so we first employ evolutionary simulated annealing (EvoSA) to find good hyperparameters and evaluate their performance with our proposed approach. The comparison results show that QC-CPN effectively outperforms other state-of-the-art techniques in rejecting unseen classes in a domain-specific dataset using the same backbone (MNetV3-Large) and could become a strong baseline for future study.
... The prototype-distortion task was originally designed to study category learning (Posner & Keele, 1968), but the idea that the brain abstracts a wide variety of perceptual information soon became a key component of many object recognition theories (e.g., see Logothetis & Sheinberg, 1996). Therefore, models that assume categorization depends on the representation of prototypes are often tested with more complex stimuli, such as abstract objects (Riesenhuber & Poggio, 1999), artificial creatures (Riesenhuber & Poggio, 2002;Love & Gureckis, 2007), and real-world scenes (Serre, Oliva, & Poggio, 2007). ...
Chapter
Full-text available
The Cambridge Handbook of Computational Cognitive Sciences is a comprehensive reference for this rapidly developing and highly interdisciplinary field. Written with both newcomers and experts in mind, it provides an accessible introduction of paradigms, methodologies, approaches, and models, with ample detail and illustrated by examples. It should appeal to researchers and students working within the computational cognitive sciences, as well as those working in adjacent fields including philosophy, psychology, linguistics, anthropology, education, neuroscience, artificial intelligence, computer science, and more.
Preprint
Full-text available
Numerals, i.e., semantic expressions of numbers, enable us to have an exact representation of the amount of things. Visual processing of numerals plays an indispensable role in the recognition and interpretation of numbers. Here, we investigate how visual information from numerals is processed to achieve semantic understanding. We first found that partial occlusion of some digital numerals introduces bistable interpretations. Next, by using the visual adaptation method, we investigated the origin of this bistability in human participants. We showed that adaptation to digital and normal Arabic numerals, as well as homologous shapes, but not Chinese numerals, biases the interpretation of a partially occluded digital numeral. We suggest that this bistable interpretation is driven by intermediate shape processing stages of vision, i.e., by features more complex than local visual orientations but more basic than the abstract concepts of numerals.
Article
Full-text available
Image is a powerful way to share information in the digital world. The sources of images are everywhere, magazines, newspapers, healthcare, entertainment, education, social media and electronic media. With the advancement of image editing software and cheap camera-enabled mobile devices, image manipulation is very easy without any prior knowledge or expertise. So, image authenticity has questioned. Some people use the forged image for fun, but some people may have bad intentions. The manipulated image may use by political parties to spread their false propaganda. Fake images use by people to spread rumours and stoking someone. In addition to harming individuals, fake images can damage the credibility of media outlets and undermine the public trust in them. The need for reliable and efficient image forgery detection methods to combat misinformation, propaganda, hoaxes, and other malicious uses of manipulated images. These are some known issues on digital images. The researcher, scientist, and image forensic experts are working on the development of fake image detection and identification tools. Presently digital image forgery detection is a trending field of research. The main aim of this paper is to provide the exhaustive review on digital image forgery detection tools and techniques. It also discusses various machine learning techniques, such as supervised, unsupervised, and deep learning approaches, that can be employed for image forgery detection it demonstrate the challenges of the current state of the work.
Article
Sensory stimulation triggers synchronized bioelectrical activity in the brain across various frequencies. This study delves into network-level activities, specifically focusing on local field potentials as a neural signature of visual category representation. Specifically, we studied the role of different local field potential frequency oscillation bands in visual stimulus category representation by presenting images of faces and objects to three monkeys while recording local field potential from inferior temporal cortex. We found category selective local field potential responses mainly for animate, but not inanimate, objects. Notably, face-selective local field potential responses were evident across all tested frequency bands, manifesting in both enhanced (above mean baseline activity) and suppressed (below mean baseline activity) local field potential powers. We observed four different local field potential response profiles based on frequency bands and face selective excitatory and suppressive responses. Low-frequency local field potential bands (1–30 Hz) were more prodominstaly suppressed by face stimulation than the high-frequency (30–170 Hz) local field potential bands. Furthermore, the low-frequency local field potentials conveyed less face category informtion than the high-frequency local field potential in both enhansive and suppressive conditions. Furthermore, we observed a negative correlation between face/object d-prime values in all the tested local field potential frequency bands and the anterior–posterior position of the recording sites. In addition, the power of low-frequency local field potential systematically declined across inferior temporal anterior–posterior positions, whereas high-frequency local field potential did not exhibit such a pattern. In general, for most of the above-mentioned findings somewhat similar results were observed for body, but not, other stimulus categories. The observed findings suggest that a balance of face selective excitation and inhibition across time and cortical space shape face category selectivity in inferior temporal cortex.
Chapter
The Warping Object Detection Network (WPOD-Net) is a model designed to detect license plate contours in images. This study aims to enhance the performance of the original WPOD-Net model by incorporating knowledge about edges in the image through feature engineering. By leveraging edge information, the proposed approach improves the accuracy of license plate contour determination. The Sobel filter has been selected experimentally and acts as a Convolutional Neural Network layer, the edge information is combined with the old information of the original network to create the final embedding vector. The proposed model was compared with the original model on a set of data that we collected for evaluation. The results are evaluated through the Quadrilateral Intersection over Union value and demonstrate that the model has a significant improvement in performance.
Article
Full-text available
What are the causes of dyslexia? Decades of research reflect a determined search for a single cause where a common assumption is that dyslexia is a consequence of problems with converting phonological information into lexical codes. But reading is a highly complex activity requiring many well-functioning mechanisms, and several different visual problems have been documented in dyslexic readers. We critically review evidence from various sources for the role of visual factors in dyslexia, from magnocellular dysfunction through accounts based on abnormal eye movements and attentional processing, to recent proposals that problems with high-level vision contribute to dyslexia. We believe that the role of visual problems in dyslexia has been underestimated in the literature, to the detriment of the understanding and treatment of the disorder. We propose that rather than focusing on a single core cause, the role of visual factors in dyslexia fits well with risk and resilience models that assume that several variables interact throughout prenatal and postnatal development to either promote or hinder efficient reading.
Preprint
Deep neural networks (DNNs) are machine learning algorithms that have revolutionised computer vision due to their remarkable successes in tasks like object classification and segmentation. The success of DNNs as computer vision algorithms has led to the suggestion that DNNs may also be good models of human visual perception. We here review evidence regarding current DNNs as adequate behavioural models of human core object recognition. To this end, we argue that it is important to distinguish between statistical tools and computational models, and to understand model quality as a multidimensional concept where clarity about modelling goals is key. Reviewing a large number of psychophysical and computational explorations of core object recognition performance in humans and DNNs, we argue that DNNs are highly valuable scientific tools but that as of today DNNs should only be regarded as promising -- but not yet adequate -- computational models of human core object recognition behaviour. On the way we dispel a number of myths surrounding DNNs in vision science.
Article
Full-text available
Categorizations which humans make of the concrete world are not arbitrary but highly determined. In taxonomies of concrete objects, there is one level of abstraction at which the most basic category cuts are made. Basic categories are those which carry the most information, possess the highest category cue validity, and are, thus, the most differentiated from one another. The four experiments of Part I define basic objects by demonstrating that in taxonomies of common concrete nouns in English based on class inclusion, basic objects are the most inclusive categories whose members: (a) possess significant numbers of attributes in common, (b) have motor programs which are similar to one another, (c) have similar shapes, and (d) can be identified from averaged shapes of members of the class. The eight experiments of Part II explore implications of the structure of categories. Basic objects are shown to be the most inclusive categories for which a concrete image of the category as a whole can be formed, to be the first categorizations made during perception of the environment, to be the earliest categories sorted and earliest named by children, and to be the categories most codable, most coded, and most necessary in language.
Article
Full-text available
Trained 16 goldfish to discriminate between a regular square and a square containing an irregularity (either a protrusion or an indentation). Transfer tests were given with 32 new shapes. Transfer was poor to rotated shapes but remained good when the nature of the irregularity was altered, provided there was a sudden break on the side of the square on which the original irregularity occurred. It was demonstrated that goldfish can detect and discount interposition. Results are incompatible with the idea that goldfish recognize shapes by a process of template matching. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Five experiments were conducted to examine whether varying certain perceptual attributes of study and test items influences priming on pictorial tasks. Priming was found to be specific to the form of studied items; substantial priming occurred from studying pictures, whereas little or no priming occurred from studying the pictures' names (read or generated). Priming was specific to the exact contour presented at study. Studying the same fragment that was presented at test resulted in greater priming than did studying an intact image or a different fragment of the object. Priming was also specific to the viewing angle of studied objects. Same study–test views showed the greatest priming, whereas priming across different views was greater when Ss studied an unusual view of the object and were tested on a canonical view than when the reverse was true. Together, these data suggest that priming is mediated by representations that are presemantic, highly specific, structural descriptions of objects. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
If abstraction of information concerning the central tendency of a set of distortions occurs during learning, a time delay could lead to less forgetting of the schema than of the patterns which S memorized. 2 experiments, using 50 undergraduates, suggested that the schema pattern is less subject to loss over time than the learned instances. Results are consistent with the idea that information concerning the central tendency is abstracted during learning. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
In a study with 249 Ss, it was hypothesized that the effects associated with degree of typicality in natural semantic categories can be generated as a function of the structure of artificial categories. Three types of category were used: (a) dot patterns, in which typicality was defined as similarity to a prototype pattern in overall configuration; (b) stick figures, in which typicality was defined as closeness to a prototype figure possessing the means of attributes for the category; and (c) letter strings, in which typicality was defined as the degree of family resemblance (overlap of letters) among category members. For all 3 category types, it was found that structural typicality determined ease of item learning, speed of classification of items after learning, ratings of the typicality of items, order in which items were generated in a production task, and facilitation or inhibition of responses to items in a priming paradigm. These typicality effects were obtained both when frequency of items was equated and when rates of learning were equated by inverting the relation between frequency and typicality. (30 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Investigated that ease of prototype abstraction and recognition of new instances belonging to the prototype by increasing the number of instances sorted together during original learning. 72 undergraduates sorted distorted dot patterns into groups of 3, 6, and 9 instances, each group containing distortions generated from a single prototype. Following the sorting task, 36 Ss were tested immediately on their ability to correctly classify old and new patterns as well as the prototype; the other 36 Ss were tested 4 days later. Correct classification of both the prototype and new instances increased as a function of the number of old instances sorted together in the original learning task. Old instances exhibited some forgetting over the delay, but neither the prototype nor new instances did. It is concluded that the abstraction of a prototype undergoes repeated change as a function of the number of instances which define it, and that the ability to correctly recognize new exemplars of a concept is dependent upon the number of instances as well. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Successful object recognition is essential for finding food, identifying kin, and avoiding danger, as well as many other adaptive behaviors. To accomplish this feat, the visual system must reconstruct 3-D interpretations from 2-D “snapshots” falling on the retina. Theories of recognition address this process by focusing on the question of how object representations are encoded with respect to viewpoint. Although empirical evidence has been equivocal on this question, a growing body of surprising results, including those obtained in the experiments presented in this case study, indicates that recognition is often viewpoint dependent. Such findings reveal a prominent role for viewpointdependent mechanisms and provide support for themultiple-views approach, in which objects are encoded as a set of view-specific representations that are matched to percepts using normalization procedures.
Article
Full-text available
Ss classified a test form as matching or not matching one of a set of memorized forms relative to which the test forms were rotated up to 90 deg. “Match” reaction times increased monotonically with both number of forms memorized and degrees of rotation. Disappearance of the rotation effect after practice was attributed to a change from considering rotational information during comparison to comparing rotation invariant features. The change in strategy is considered an indicant of the development of shape constancy. Time taken to memorize the patterns increased linearly with the size of the memory set, justifying interpretation of the RT effects in terms of comparison time differences rather than differences in memorial specification of the patterns.
Article
Full-text available
We describe a novel approach, based on ideal observer analysis, for measuring the ability of human observers to use image information for 3D object perception. We compute the statistical efficiency of subjects relative to an ideal observer for a 3D object classification task. After training to 11 different views of a randomly shaped thick wire object, subjects were asked which of a pair of noisy views of the object best matched the learned object. Efficiency relative to the actual information in the stimuli can be as high as 20%. Increases in object regularity (e.g. symmetry) lead to increases in the efficiency with which novel views of an object could be classified. Furthermore, such increases in regularity also lead to decreases in the effect of viewpoint on classification efficiency. Human statistical efficiencies relative to a 2D ideal observer exceeded 100%, thereby excluding all models which are sub-optimal relative to the 2D ideal.
Article
Full-text available
Classic research on conceptual hierarchies has shown that the interaction between the human perceiver and objects in the environment specifies one level of abstraction for categorizing objects, called the basic level, which plays a primary role in cognition. The question of whether the special psychological status of the basic level can be modified by experience was addressed in three experiments comparing the performance of subjects in expert and novice domains. The main findings were that in the domain of expertise (a) subordinate-level categories were as differentiated as the basic-level categories, (b) subordinate-level names were used as frequently as basic-level names for identifying objects, and (c) subordinate-level categorizations were as fast as basic-level categorizations. Taken together, these results demonstrate that individual differences in domain-specific knowledge affect the extent that the basic level is central to categorization.
Article
Full-text available
Two experiments are reported in which subjects had to match pairs of pictures of objects. "Same" pairs could be either identical (Ps), pictures of different views of the same object (Pv), or pictures of different objects having the same name (Pd). With line drawings as stimuli, RTs for Condition Ps were shorter than for Condition Pv, which in turn were shorter than for Condition Pd. Visual similarity had no effect on Pd RTs. However, in Experiment II, where photographs of objects with high-frequency (HF) and low-frequency (LF) names were used, no difference was found between Conditions Ps(HF), Ps(LF) and Condition Pv(HF); and no difference occurred between Conditions Pd(HF), Pd(LF) and Condition Pv(LF), the latter set of conditions being associated with longer RTs than the former. This pattern of results was found with both a .25-sec and a 2-sec ISI. The results are discussed in terms of the levels of coding involved in processing information from picture stimuli. It is concluded that at least two levels are involved in matching photographs of real objects (an object-code level and a nonvisual semantic code level), while a third level may be used in matching tasks involving stylized line drawings (a picture-code level).
Article
Full-text available
The organization of cortical projections to the caudate nucleus was investigated in the rhesus monkey, using the autoradiographic tracing method. Following injections of tritiated leucine and proline into selected pre- and post-Rolandic association areas in the frontal, parietal, occipital and temporal lobes, widespread projections were observed to one, or more typically, more than one of the major subdivisions of the caudate nucleus. When cortical areas having strong reciprocal cortico-cortical connections were compared, a considerable communality of their cortico-caudate projections was noted; depending on the location of the cortical areas, the region of common distribution lay within the head, the body, or the tail of the caudate nucleus. This correlation between cortico-cortical and cortico-striate projections characterized all pairs of cases studied. It suggests a previously undescribed principle of organization within the telencephalon, namely, that areas of cerebral cortex having reciprocal cortico-cortical connections, while having unique overall patterns of projection to the caudate nucleus, project, in part, to one and the same region of the nucleus. This might imply that a given region of the caudate nucleus receives input not only from a particular area of cortex, but also from all other cortical areas reciprocally interconnected with that area.
Article
Full-text available
Unlike older children and adults, children of less than about 10 years of age remember photographs of faces presented upside down almost as well as those shown upright and are easily fooled by simple disguises. The development at age 10 of the ability to encode orientation-specific configurational aspects of a face may reflect completion of certain maturational changes in the right cerebral hemisphere.
Article
Full-text available
Cortical neurons that are selectively sensitive to faces, parts of faces and particular facial expressions are concentrated in the banks and floor of the superior temporal sulcus in macaque monkeys. Their existence has prompted suggestions that it is damage to such a region in the human brain that leads to prosopagnosia: the inability to recognize faces or to discriminate between faces. This was tested by removing the face-cell area in a group of monkeys. The animals learned to discriminate between pictures of faces or inanimate objects, to select the odd face from a group, to inspect a face then select the matching face from a pair of faces after a variable delay, to discriminate between novel and familiar faces, and to identify specific faces. Removing the face-cell area produced no or little impairment which in the latter case was not specific for faces. In contrast, several prosopagnosic patients were impaired at several of these tasks. The animals were less able than before to discern the angle of regard in pictures of faces, suggesting that this area of the brain may be concerned with the perception of facial expression and bearing, which are important social signals in primates.
Article
Full-text available
Cells selectively responsive to the face have been found in several visual sub-areas of temporal cortex in the macaque brain. These include the lateral and ventral surfaces of inferior temporal cortex and the upper bank, lower bank and fundus of the superior temporal sulcus (STS). Cells in the different regions may contribute in different ways to the processing of the facial image. Within the upper bank of the STS different populations of cells are selective for different views of the face and head. These cells occur in functionally discrete patches (3-5 mm across) within the STS cortex. Studies of output connections from the STS also reveal a modular anatomical organization of repeating 3-5 mm patches connected to the parietal cortex, an area thought to be involved in spatial awareness and in the control of attention. The properties of some cells suggest a role in the discrimination of heads from other objects, and in the recognition of familiar individuals. The selectivity for view suggests that the neural operations underlying face or head recognition rely on parallel analyses of different characteristic views of the head, the outputs of these view-specific analyses being subsequently combined to support view-independent (object-centred) recognition. An alternative functional interpretation of the sensitivity to head view is that the cells enable an analysis of 'social attention', i.e. they signal where other individuals are directing their attention. A cell maximally responsive to the left profile thus provides a signal that the attention (of another individual) is directed to the observer's left. Such information is useful for analysing social interactions between other individuals.(ABSTRACT TRUNCATED AT 250 WORDS)
Article
Full-text available
The purpose of the experiments reported was to examine how novel, three-dimensional shapes are represented in long-term memory and how this might be differentially affected by monocular and binocular viewing. Three experiments were conducted. The first experiment established that slide projections of the novel objects could be recognized readily if seen in the same orientation as seen during learning. The second and third experiments examined generalization to novel depth rotations of the objects. The second experiment used slide projections of the objects. The results indicated that the representation of the objects seen during training was quite viewpoint-specific as recognition of objects in novel orientations was relatively poor. In the third experiment subjects were shown the real objects under monocular or binocular viewing. Overall, the results are consistent with a growing body of recent research showing that, at least under certain conditions, the visual system stores viewpoint-specific representations of objects.
Article
Patients with visual associative agnosia have a particular difficulty in identifying visually presented living things (plants and animals) as opposed to nonliving things. It has been claimed that this effect cannot be explained by differences in the inherent visual discriminability of living and nonliving things. To test this claim further, we performed two experiments with normal subjects. In Experiment 1 normal human observers were asked to identify objects in tachistoscopically presented line drawings. They made more errors with living things than with nonliving things. In Experiment 2 normal monkeys learned to discriminate among the same line drawings for food reward. They made many more errors in discriminating among living things than nonliving things. Agnosic patients' responses to the same line drawings were made available to us for correlative analysis with the subjects' responses to these drawings in Experiments 1 and 2. We conclude that a category-specific visual agnosia for living things can arise as a consequence of a modality-specific but not category-specific impairment in visual representation, since living things are more similar to each other visually than nonliving things are.
Article
Three experiments investigated the effects of familiarity, practice, and stimulus variability on naming latencies for photographs of objects. Latencies for pictures of objects having the same name decreased most with practice when the same picture was always used to represent a given object (Condition Ps-Ns), less if different views of the same object were used (Condition Pv-Ns), and least if pictures of different objects having the same name were used (Condition Pd-Ns). In all cases, however, the effect of practice was significant. The savings in naming latency associated with practice on Conditions Ps-Ns and Pv-Ns showed almost no transfer to condition Pd-Ns, even though the same responses were being given before and after transfer. However, practice on Condition Ps-Ns transferred completely to Condition Pv-Ns. Name frequency affected latency in all conditions. The frequency effect decreased slightly with practice.These results are related to several alternative models of the coding processes involved in naming objects. It is concluded that at least three types of representation may be necessary: visual codes, nonverbal semantic codes, and name codes. A distinction is made between visual codes that characterize two-dimensional stimuli and those that characterize three-dimensional objects.
Article
In the rhesus monkey, the cortices of the parahippocampal gyrus are pivotal relay areas in a series of multisynaptic input pathways that connect the hippocampal formation to other areas of the cerebral cortex. Recent investigations now suggest that they play a similar role in relaying hippocampal formation output back to widespread areas of the cerebral cortex and, in particular, to the association cortices.
Article
Trained 6 adult cock bantams to discriminate slides of 2 conspecifics in various positions. All Ss learned the discrimination and generalized to new slides of the training birds in a transfer test. Only 1 bird showed any significant discrimination between S "positive" and S "negative" slides of an unfamiliar bird included in the transfer session to test for artifacts. Results support the hypothesis that the Ss achieved the discrimination by forming concepts of the individuals to be discriminated. (16 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Repeats 2 studies by M. I. Posner and S. W. Keele (see 42:10 and 44:4), using procedures and materials from the earlier of the 2 studies for comparing immediate and delayed recognition, as in their later study. Ss were 72 undergraduates. Accuracy of identification of prototypes and distortions over a 1-wk delay was compared both within Ss and across independent groups. Both comparisons showed that the old distortions, although initially recognized better than the prototypes, were forgotten at a faster rate. Results support the general conclusions of Posner and Keele's later study. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Four experiments are reported which attempt to determine how people make classifications when categories are defined by sets of exemplars and not by logical rules. College students classified schematic faces into one of two categories each composed of five faces. One probability model and three distance models were tested. The predominant strategy, as revealed by successful models, was to abstract a prototype representing each category and to compare the distance of novel patterns to each prototype, emphasizing those features which best discriminated the two categories.
Article
Fat manzum Schlu die Merkmale, die das Wesen der Prosopagnosie ausmachen, zusammen, so lt sich sagen: Sie ist die Agnosie fr das Erkennen von Gesichtern und Ausdrucksphnomen berhaupt. Bei ungestrter Perzeption der Formteile von Physiognomien bleibt der Erkenntnisvorgang aus oder kommt, wie wir das auch von anderen Agnosien kennen, nur unvollkommen zustande. Wie es nun zum Wesen der Agnosie gehrt, sich auf eine optische Kategorie zu beschrnken, so die Prosopagnosie elektiv auf Gesichter. Nicht blo die Tatsache der Prosopagnosie selbst, sondern auch gewisse Beobachtungen (Flimmeranflle, cerebrale Metamorphopsien fr Gesichter) weisen darauf hin, da sie die Strung einer eigentmlichen optischen Kategorie ist, die sowohl das Physiognomiesehen, wie das Physiognomieerkennen umfat und der im Aufbau der Wahrnehmungswelt ein ganz bestimmter Platz zukommt. Es handelt sich hier prinzipiell um den gleichen Vorgang wie bei den brigen Kategorien fr Objekte, Sinnzusammenhnge, Farben, symbolische Zeichen usw., in deren Strungsformen die ihnen zugrunde liegenden optischen Sonderkategorien zum Ausdruck kommen. Da die Agnosie als klinisches Phnomen in solche kategorialen Einzelformen auseinanderfllt, wird gewhnlich einfach hingenommen, ist aber im Grunde tief rtselhaft, und mit der Annahme von gestrten Sonderapparaten im Gehirn so wenig erklrt, wie durch die Theorie der Gestaltspsychologie, da in aller Agnosie die Strung der Erfassung der Gestalt es sei, die die Agnosie bedinge.Die Bearbeitung der Frage, ob die in den Agnosien erscheinenden optischen Sonderkategorien nur durch Zufall vereint auftreten, oder eine innere Hierarchie erkennen lassen, steht noch in den Anfngen. Ein erster Anhaltspunkt lt sich gewinnen durch unseren Nachweis, da die Prosopagnosie Storungsfrm einer optischen Kategorie sein mu, in der die ursprnglichste genetisch frheste Wahrnehmungs- und Erkenntnisfunktion sich prsentiert. Im Strungstyp der Prosopagnosie sehen wir eine Regression auf diese frheste optische Umwelterfassungsstufe, eine Grund- und Urfunktion der Sehwelt berhaupt. In einzelnen Merkmalen der Agnosie, Radikalen gleichsam, vermgen wir noch die, diese Grundkategorie ursprnglich konstituierenden Elemente, wenn auch in verzerrter Form zu erkennen: In der Ocula das primre Wahrnehmungsfeld, in der Faszination durch das mitmenschliche Auge den frhesten optischen Erlebnisakt, in der Strung der eigenen Ausdrucksfhigkeit den durchgehenden Seinsbezug dieser optischen Kategorie und in dem konstanten Ausfall der optischen Merkfhigkeit fr Gesichter das zeitliche Vorausgehen des Ausdruckserkennens vor dem Objektsehen.Bezglich der Frage der chronogenen Engraphierung (v. Monakow) der optischen Kategorien im Laufe der Entwicklung hat Ptzl die Ansicht geuert, da die Simultanagnosie die Regression auf das Bilderbuchstadium der Kinder darstelle, auf die Phase der Und-Verbindungen (Pick). Demnach wre die Simultanagnosie die Strungsform der optischen Sinnkategorie. Zwischen beiden, der Ausdruckskategorie und der Simultankategorie, drfte die Kategorie der Objekt- und Farberfassung liegen, jenseits davon die Welt der symbolischen Zeichen. Diese, ihrer Qualitt nach ganz unterschiedlichen Kategorien, von denen wir hier der Einfachheit halber nur die Ausdrucksschicht, die Objekt-, Sinn- und Symbolschicht nennen, gehen nicht kontinuierlich auseinander hervor, sondern die Entwicklung geschieht in Sprngen. Jede Schicht ist von der anderen durch einen Hiatus irrationalis getrennt. Mit jeder der genannten Schichten beginnt etwas kategorial Neues. Da optisch gegebene Zusammenhnge simultan erkannt werden, ist nicht einfach Folge des vorausgegangenen optisch-gnostischen Erfassens von Objekten, so wenig wie das Erkennen von Symbolzeichen seinen Grund in der Erfassung ihrer Formen hat.Wir stoen hier von der klinischen Empirie her auf dasselbe Phnomen, das die moderne Ontologie, am strengsten verkrpert in Nicolai Hartmann, dazu gefhrt hat, der Welt den Charakter der Schichtung beizulegen. Im Schichtenbau der Welt hat jede Schicht ihre eigenen Gesetze, keine hat ein selbstndiges Sein, immer ruht die hhere auf der niederen, doch ohne Beeintrchtigung ihrer autonomen innerkategorialen Freiheit, denn mit jeder Schicht beginnt ein kategoriales Novum. Diese allgemeinsten Schichtgesetze treffen auch auf die optischen Kategorien zu, nur darf dabei nie bersehen werden, da wir in den optischen Kategorien keine Seinskategorien vor uns haben. Denn im Erkenntnisgebilde als der bloen, sich in Annherung vollziehenden Reprsentanz des Objekts im erkennenden Bewutsein erscheinen nur die Abbilder der eigentlichen Seinskategorien, eben die optischen Kategorien. Ihre Schichtung ist nur ein Hinweis auf die Schichtung der Welt, deren Objekte dem Menschen nie an sich, sondern immer nur als Bilder gegeben sind.So erhebt sich nach der Analyse des Phnomens unabweisbar die anthropologische Frage nach dem Wesen des Menschen und nach seiner Stellung in der Welt, denn in keinem anderen Problembereich, als in dem der Agnosie, Aphasie und Apraxie berhrt sich medizinische Tatsachenforschung so eng mit philosophischer Besinnung, als dem tragenden Grund aller Wissenschaft.
Article
In three experiments, human observers made timed decisions about alphanumeric characters, displayed singly in different orientations and versions (normal vs. backward). Latency to identify the characters was longer for backward than for normal versions, regardless of angular orientation and even under conditions in which latency was independent of angular orientation. Subjects also took longer to respond to a target orientation (whatever the character) than to respond to a target character (whatever the orientation). The results suggest that the observer first induces a description of a character that is largely independent of orientation but not of version, although the representation of version is too weak at this stage to permit an overt decision about it. Next, the angular orientation of the character is determined. Finally, the observer might “mentally rotate” the representation to the standard upright, for matching against an internally generated template.
Article
The nature of form categories in 3 to 4-month infants was studied using the visual preference for novelty in the familiarization-novelty paradigm. Novelty preference indicates habituation to and recognition of the familiar. In a series of experiments employing three form categories composed of dot patterns, generalized habituation to new category members was used to assess categorization behavior in the recognition of visual forms. At 3 to 4 months of age, infants did not initially show any systematic preferences for “good” or symmetrical examples of a category relative to “distorted” examples (Experiment 1) and this was true for all three form categories used (i.e., square, triangle, and diamond). Evidence for categorization was seen in the recognition performance of 3- to 4-month infants (Experiment 2). Infants showed generalized habituation to the previously unseen category prototypes following exposure to six exemplars within each of the three form categories. Given evidence that infants could discriminate between the prototype and other category members (Experiment 3), “inability to discriminate” was ruled out as an explanation for this form categorization or generalized habituation effect. Four subsequent experiments were conducted to determine whether infants exhibit a prototypicality structure for their remembered categories and whether certain conditions which have been shown to enhance prototypicality effects with adults have similar effects with infants. No evidence of a prototypicality structure was found for the form categories of infants when the number of exemplars during familiarization was limited to 6 and the test for form recognition followed immediately (Experiment 4). However, a prototypicality structure for the remembered form categories was found when a 3-min delay was introduced between familiarization and tests for form recognition (Experiment 5), when 12 exemplars were presented during familiarization (Experiment 7), or when the prototype was included as one of the six exemplars during the familiarization period (Experiment 6).
Article
Neurobiological data from the cerebral cortex of the macaque monkey suggest a model of object recognition that is a series of four computational stages. These are executed in seven major hierarchically arranged areas of processing, each area with an input and an output layer of cells. The first computational stage occurs within early visual cortex and involves the first two cortical areas. Here it appears that boundaries between image regions and logical groupings of local oriented image elements that “belong” together are computed. These processes segregate image attributes that can then be treated as arising from the same object. The next three visual cortical areas execute the second computational stage and display sensitivity to an ever increasing complexity and variety of visual shape features (e.g., T junctions, concentric rings, spotted triangle shape). The third stage of processing seems to utilize combinations of these shape features to establish selectivity to what we refer to as object-feature instances (i.e., the approximate appearance of a small number of object attributes seen under particular viewing conditions). Cells in these areas tolerate change in position but show only limited generalization for change in retinal size, orientation, or perspective view. The fourth computational process occurs within the final cortical areas and gives rise to cell selectivity showing object constancy across size and orientation. This process probably occurs through pooling of the outputs of cells responsive to different instances of the same object view. Importantly, constancy across perspective view (i.e., the transition between viewer-centred and object-centred representation) does not seem to be completed except by a small percentage of cells. Synaptic changes encompassing various associative (e.g., Hebbian) and non-associative (e.g., decorrelating) procedures may allow cells throughout the stages of processing to become tuned to frequently experienced image attributes, shapes, and objects. Associative learning procedures operating over short time periods may underlie the progressive generalization over changing viewing conditions. Constancy across position, orientation, size, and, finally, perspective view and object parts is established slowly as each area pools the appropriate outputs of the less specific cells in the preceding area. After such learning procedures, the visual system can operate to resolve the appearance of unexpected objects primarily in a feedforward manner, without the need for lateral inhibition or feedback loops, a property few models embody. This feedforward processing does not deny the possibility that top-down influences, although poorly understood, may play a role in nulling image aspects that are predictable in appearance and/or not the object of attention such that only features containing relevant discriminatory information are processed further.
Article
Mechanisms by which the brain could perform invariant recognition of objects including faces are addressed neurophysiologically, and then a computational model of how this could occur is described. Some neurons that respond primarily to faces are found in the macaque cortex in the anterior part of the superior temporal sulcus (in which region neurons are especially likely to be tuned to facial expression, and to face movement involved in gesture). They are also found more ventrally in the TE areas which form the inferior temporal gyrus. Here the neurons are more likely to have responses related to the identity of faces. These areas project on to the amygdala and orbitofrontal cortex, in which face-selective neurons are also found. Quantitative studies of the responses of the neurons that respond differently to the faces of different individuals show that information about the identity of the individual is represented by the responses of a population of neurons, that is, ensemble encoding is used. The rather distributed encoding (within the class faces) about identity in these sensory cortical regions has the advantages of maximising the information in the representation useful for discrimination between stimuli, generalisation, and graceful degradation. In contrast, the more sparse representations in structures such as the hippocampus may be useful to maximise the number of different memories stored. There is evidence that the responses of some of these neurons are altered by experience so that new stimuli become incorporated in the network, in only a few seconds of experience with a new stimulus. It is shown that the representation that is built in temporal cortical areas shows considerable invariance for size, contrast, spatial frequency and translation. Thus the representation is in a form which is particularly useful for storage and as an output from the visual system. It is also shown that one of the representations which is built is view-in-variant, which is suitable for recognition and as an input to associative memory. Another is viewer-centered, which is appropriate for conveying information about gesture. It is shown that these computational processes operate rapidly, in that in a backward masking paradigm, 20–40 ms of neuronal activity in a cortical area is sufficient to support face recognition. In a clinical application of these findings, it is shown that humans with ventral frontal lobe damage have in some cases impairments in face and voice expression identification. These impairments are correlated with and may contribute to the problems some of these patients have in emotional and social behaviour. To help provide an understanding of how the invariant recognition described could be performed by the brain, a neuronal network model of processing in the ventral visual system is described. The model uses a multistage feed-forward architecture, and is able to learn invariant representations of objects including faces by use of a Hebbian synaptic modification rule which incorporates a short memory trace (0.5 s) of preceding activity to enable the network to learn the properties of objects which are spatio-temporally invariant over this time scale.
Article
A theoretical framework for perceptual representation is presented which proposes that information is coded in hierarchical networks of nonverbal propositions. The hierarchical structure of the representations implies selective organization: Some subsets of a figure will be encoded as integral, structural units of that figure, while others will not. A context-sensitive metric for the “goodness” of a part within a figure is developed, corresponding to the probability that the subset will be encoded as a structural unit. Converging evidence supporting this position is presented from four different tasks using simple, straight-line figures. The tasks studied are (a) dividing figures into “natural” parts, (b) rating the “goodness” of parts within figures, (c) timed verification of parts within figures, and (d) timed mental synthesis of spatially separated parts into unitary figures. The results are discussed in terms of the proposed theory of representation, the processes that operate on those representations, and the general implications of the data for perceptual theories.
Article
To support our reasoning abilities perception must recover environmental regularities—e.g., rigidity, “objectness,” axes of symmetry—for later use by cognition. To create a theory of how our perceptual apparatus can produce meaningful cognitive primitives from an array of image intensities we require a representation whose elements may be lawfully related to important physical regularities, and that correctly describes the perceptual organization people impose on the stimulus. Unfortunately, the representations that are currently available were originally developed for other purposes (e.g., physics, engineering) and have so far proven unsuitable for the problems of perception or common-sense reasoning. In answer to this problem we present a representation that has proven competent to accurately describe an extensive variety of natural forms (e.g., people, mountains, clouds, trees), as well as man-made forms, in a succinct and natural manner. The approach taken in this representational system is to describe scene structure at a scale that is similar to our naive perceptual notion of “a part,” by use of descriptions that reflect a possible formative history of the object, e.g., how the object might have been constructed from lumps of clay. For this representation to be useful it must be possible to recover such descriptions from image data; we show that the primitive elements of such descriptions may be recovered in an overconstrained and therefore reliable manner. We believe that this descriptive system makes an important contribution towards solving current problems in perceiving and reasoning about natural forms by allowing us to construct accurate descriptions that are extremely compact and that capture people's intuitive notions about the part structure of three-dimensional forms.
Article
The effectiveness of depth cues such as occlusion and shading was examined in images defined by color, texture, binocular disparity or motion. Line drawings represented in any of these modalities were able to signal shape and occlusion showing that contour occlusions are analyzed at a high level, following the reintegration of the separate representations of visual attributes such as color and motion. Subjective contours, on the other hand, could be seen only if the figures were defined by luminance differences. Figures whose depth depended on the interpretation of shadows also required luminance differences: shadow regions had to be darker than the surrounding, non-shadow regions. Shadows areas filled with colors or textures that could not occur in natural scenes were perceived as shadows as readily as real shadows. Even when shadow and non-shadow regions had different depths or had textures that moved in different directions, the depth from shading was still seen as long as there was an appropriate brightness difference. These findings indicate a variety of mechanisms analyze cues to 3-dimensional structure. Occlusion cues in line drawings appear to be analyzed by a general purpose mechanism having access to all pathways of the visual system. Subjective contours and shadows appear to depend on special purpose processes accessing only the luminance pathway. Finally, although natural constraints have proved useful in solving many visual problems, they did not play a significant role in the interpretation of the depth cues examined here.
Article
How do we recognize objects despite differences in their retinal projections when they are seen at different orientations? Marr and Nishihara (1978) proposed that shapes are represented in memory as structural descriptions in object-centered coordinate systems, so that an object is represented identically regardless of its orientation. An alternative hypothesis is that an object is represented in memory in a single representation corresponding to a canonical orientation, and a mental rotation operation transforms an input shape into that orientation before input and memory are compared. A third possibility is that shapes are stored in a set of representations, each corresponding to a different orientation. In four experiments, subjects studied several objects each at a single orientation, and were given extensive practice at naming them quickly, or at classifying them as normal or mirror-reversed, at several orientations. At first, response times increased with departure from the study orientation, with a slope similar to those obtained in classic mental rotation experiments. This suggests that subjects made both judgments by mentally transforming the orientation of the input shape to the one they had initially studied. With practice, subjects recognized the objects almost equally quickly at all the familiar orientations. At that point they were probed with the same objects appearing at novel orientations. Response times for these probes increased with increasing disparity from the previously trained orientations. This indicates that subjects had stored representations of the shapes at each of the practice orientations and recognized shapes at the new orientations by rotating them to one of the stored orientations. The results are consistent with a hybrid of the second (mental transformation) and third (multiple view) hypotheses of shape recognition: input shapes are transformed to a stored view, either the one at the nearest orientation or one at a canonical orientation. Interestingly, when mirrorimages of trained shapes were presented for naming, subjects took the same time at all orientations. This suggests that mental transformations of orientation can take the shortest path of rotation that will align an input shape and its memorized counterpart, in this case a rotation in depth about an axis in the picture plane.
Article
Recent physiological findings are reviewed and synthesized into a model of shape processing and object recognition. Gestalt laws (e.g. good continuation, closure) and ‘non-accidental’ image properties (e.g. colinear terminating lines) are resolved in prestriate visual cortex, (areas V2 and V3) to support the extraction of 2D shape boundaries. Processing of shape continues along a ventral route through inferior temporal (IT) cortex where a vast catalogue of 2D shape primitives is established. Each catalogue entry is size-specific (±0.5 log scale unit) and orientation-specific (±45°), but can generalize over position (±150 degree2). Several shape components are used to activate representations of the approximate appearance of one object type at one view, orientation and size. Subsequent generalization, first over orientation and size, then over view, and finally over object sub-component, is achieved in the anterior temporal cortex by combining descriptions of the same object from different orientations and views, through associative learning. This scheme provides a route to 3D object recognition through 2D shape description and reduces the problem of perceptual invariance to a series of independent analyses with an associative link established between the outputs. The system relies on parallel processing with computations performed in a series of hierarchical steps with relatively simple operations at each stage.
Article
Although it is generally assumed thot vision is orientation invariant, that is, that shapes can be recognized regardless of viewing angle, there is little evidence that speaks directly to this issue, ond what evidence there is fails to support orientation invariance. We propose an explanation for the previous results in terms of the kinds of shape primitives used by the visual system in achieving orientation invariance: Whereas contours are used at stages of vision thot are not orientation invariant, surfaces and/or volumes are used at stages of vision that are orientotion invariant. The stimuli in previously reported studies were wire forms, which con represented only in terms of contour. In four experiments, testing both short-term and long-term memory for shape, we replicated the pre- vious failures of orientation invariance using wire forms, but found relatively good or perfect orientation invariance with equivalently shaped surfaces.
Article
The human visual process can be studied by examining the computational problems associated with deriving useful information from retinal images. In this paper, we apply this approach to the problem of representing three-dimensional shapes for the purpose of recognition. 1. Three criteria, accessibility, scope and uniqueness, and stability and sensitivity, are presented for judging the usefulness of a representation for shape recognition. 2. Three aspects of a representation's design are considered, (i) the representation's coordinate system, (ii) its primitives, which are the primary units of shape information used in the representation, and (iii) the organization the representation imposes on the information in its descriptions. 3. In terms of these design issues and the criteria presented, a shape representation for recognition should: (i) use an object-centred coordinate system, (ii) include volumetric primitives of varied sizes, and (iii) have a modular organization. A representation based on a shape's natural axes (for example the axes identified by a stick figure) follows directly from these choices. 4. The basic process for deriving a shape description in this representation must involve: (i) a means for identifying the natural axes of a shape in its image and (ii) a mechanism for transforming viewer-centred axis specifications to specifications in an object-centred coordinate system. 5. Shape recognition involves: (i) a collection of stored shape descriptions, and (ii) various indexes into the collection that allow a newly derived description to be associated with an appropriate stored description. The most important of these indexes allows shape recognition to proceed conservatively from the general to the specific based on the specificity of the information available from the image. 6. New constraints supplied by a conservative recognition process can be used to extract more information from the image. A relaxation process for carrying out this constraint analysis is described.
Article
Rhesus monkeys were trained on a variety of simultaneous two-choice visual discrimination tasks to assess their ability to utilize pictures of other rhesus monkey faces as discriminative stimuli. The results revealed that this non-human primate is particularly adept at making such discriminations and is not confused by manipulations in orientation; posture; size; color; or illumination.RésuméOn a entrainé des singes rhésus sur un certain nombre d'épreuves de discrimination visuelle simultanée à 2 choix pur s'assurer de leurs capacités à utiliser comme stimulus de discrimination des repésentations de visages d'autres singes rhésus. Les résultats montraient que ce primate non humain est particulièrement capable de réussir de telles discriminations et qu'il n'est pas troublé par des manipulations de l'orientation de la position, de la taille, de la couleur ou de l'illumination.ZusammenfassungRhesusaffen wurden mit einer Anzahl von optischen Diskriminationstests, bei denen zwei Objekte simultan zur Wahl standen, trainiert, um ihre Fähigkeit zu prüfen, Bilder von anderen Rhesusaffen als Unterscheidungsstimuli zu verwenden. Die Resultate ergaben, daβ dieser nicht humane Primat teilweise die Fähigkeit erwirbt, derartige Unterscheidungen zu treffen und dabei nicht durch Manipulationen der Orientierung, der Lage, der Gröβe, der Farbe oder der Beleuchtung irritiert wird.
Article
Visual receptive fields and responsiveness of neurons to somesthetic and auditory stimuli were studied in the inferior temporal cortex and adjacent regions of immobilized macaques. Neurons throughout cytoarchitectonic area TE were responsive only to visual stimuli and had large receptive fields that almost always included the center of gaze and usually extended into both visual half-fields. There was no indication of any visuotopic organization within area TE. Neurons in an anterior and in a dorsal portion of TE tended to have larger receptive fields. By contrast, dorsal, ventral and anterior to area TE, units often responded to somesthetic and auditory as well as to visual stimuli. In these regions visual receptive fields were even larger than in TE and often included the entire visual field. Posterior to TE the neurons were exclusively visual and had much smaller receptive fields that were confined to the contralateral visual field and were topographically organized.
Article
Most neurons in the inferior temporal cortex of the rhesus monkeys have visual receptive fields that extend across the vertical meridian well into both the contralateral and ipsilateral visual half-fields. We examined the role of different portions of the forebrain commissures in providing the ipsilateral input with the following results. (1) Combined section of the splenium and anterior commissure eliminated visual activation from the ipsilateral visual half-field. (2) Section of the splenium, with sparing of the anterior commissure, reduced the incidence of ipsilateral activation by about one-half. (3) Section of the anterior commissure, with sparing of the splenium, did not alter the incidence of ipsilateral activation. (4) Section of the non-splenial portions of the corpus callosum had no effect on the laterality of the receptive fields. Thus, both the splenium and the anterior commissure but not the non-splenial callosum can provide information from the ipsilateral visual field to neurons in inferior temporal cortex. These results are interpreted as suggesting that the converging input from the two visual half-fields onto single inferior temporal neurons provided by the forebrain commissures may mediate interhemispheric transfer of visual habits.
Article
A cyto- and myeloarchitectonic parcellation of the superior temporal sulcus and surrounding cortex in the rhesus monkey has been correlated with the pattern of afferent cortical connections from ipsilateral temporal, parietal and occipital lobes, studied by both silver impregnation and autoradiographic techniques. The results suggest a definite organization of this region. Subdivisions of the superior temporal gyrus are tied together in a precise sequence of connections beginning in primary auditory cortex. The inferotemporal area, which receives input from the lateral peristriate region, can also be divided into architectonic divisions, each of which is related to the others in a specific pattern of connections. Within the superior temporal sulcus several distinct areas exist. In the caudal reaches is found a region that receives input from both primary visual and visual association cortices. This zone is similar to the Clare-Bishop area of the cat. Other superior temporal sulcus zones receive input primarily from one limited area of association cortex. A strip in the upper bank receives input exclusively from the superior temporal gyrus. An area in the rostral lower bank has afferent connections mainly with the inferotemporal area, and a zone in the depth of the superior temporal sulcus receives fibers from a region within the lower bank of the intraparietal sulcus. Two additional zones, in the upper bank of the superior temporal sulcus, however, have multiple sources of cortical input: the peristriate belt, inferior parietal lobule and caudal superior temporal gyrus.
Article
There is evidence that the inferotemporal visual cortex in the monkey projects to the amygdala, and evidence that damage to this region impairs the learning of associations between visual stimuli and reward or punishment. In recordings made in the amygdala to determine whether or not visual responses were found, and if so how they were affected by the significance of the visual stimuli, neurons were found in the dorsolateral part of the amygdala with visual responses which in most cases were sustained while the animal looked at effective visual stimuli. The latency of the responses was 100 to 140 ms or more. The majority (85%) of these neurons responded more strongly to some stimuli than to others, but physical factors which accounted for the responses of the neurons, such as shape, size, orientation, color, or texture, could not usually be identified. Although 22 (19.5%) of these neurons responded primarily to food objects, the responses were not uniquely food-related. Furthermore, although some neurons responded in a visual discrimination test to a visual stimulus which indicated reward, and not to a visual stimulus which indicated saline, only minor modifications of the magnitude of the neuronal responses to the stimuli were obtained when the reward-related significance of the stimuli was reversed. The visual responses of these amygdaloid neurons were thus intermediate in some respects between those of neurons in the inferotemporal cortex, which are not affected by the significance of visual stimuli, and those of neurons in a region to which the amygdala projects, the lateral hypothalamus and substantia innominata, where neurons respond to visual stimuli associated with food reward.
Article
Conducted 5 experiments, using 4 pigeons in each. After being trained on an oak leaf pattern, Ss responded to other oak leaf patterns but not to leaf patterns of other species. Thus, graphic variation among the instances of a species is "transparent" to the visual system. At this taxonomic level, concept formation is spontaneous rather than inductive. It is argued that such immediate generalization may be critical to the survival of the organism. (15 ref) (PsycINFO Database Record (c) 2006 APA, all rights reserved).
Article
Perception of mirror images by three-to four-month infants was studied in five experiments using habituation paradigms. In the first experiment, babies discriminated right profiles of two different faces but not the left and right profile of the same face. In the second, babies discriminated a 45° oblique from a vertical line, but not the oblique from its mirror image. In the third, babies discriminated oblique lines that differed by 50° and were not mirror images. In the final experiments. 90° rotations of a {square subset}-shape were discriminated but not 180° rotations that formed lateral or vertical mirror images. These results demonstrated that although babies were able to discriminate differences in orientation (even among obliques) they tended to view mirror images, especially lateral mirror images, as equivalent stimuli. We propose that the perceptual equivalence of mirror images reflects an adaptive mode of visual processing; mirror images in nature are almost always aspects of the same object, and they usually need not be discriminated. The relations of the perceptual similarity of mirror images to the ontogeny of the object concept and to the development of reading are discussed.
Article
The temporal neocortical afferent connections to the amygdala were investigated in the rhesus monkey using the silver impregnation and autoradiographic tracing methods. A large topographically organized projection to the amygdala was found to originate from the anterior superior temporal gyrus (area TA), the anterior middle and inferior temporal gyri (area TE), and the medial and lateral aspects of the temporal pole (area TG). These projections terminate in discrete adjacent regions of the lateral and basal amygdaloid nuclei. The temporal pole projection terminates in the ventral two thirds of the medial one half of the lateral nucleus and in the accessory basal nucleus, the anterior superior temporal gyrus projection terminates in the ventral two thirds of the lateral one half of the lateral nucleus, and the anterior middle and inferior temporal gyri projection terminates in the dorsal parts of the lateral and lateral basal nuclei. A projection from the perirhinal cortex to the medial basal nucleus of the amygdala is also discussed. Our findings reveal that a far greater proportion of the temporal neocortex than previously described contributes afferents to the amygdala, further strengthening the view that the amygdala occupies a key anatomical position linking the neocortex with diencephalic structures.
Article
Three patients with prosopagnosia are described of whom two had right occipital lesions. An analysis of visual and perceptual functions demonstrated a defect in perceptual classification which appeared to be stimulus-specific. A special mechanism for facial recognition is postulated, and the importance of the right sided posterior lesion is stressed.
Article
An attempt has been made to elucidate in the rhesus monkey the role of intrahemispheric cortico-cortical connexions in visual guidance of relatively independent hand and finger movements which are governed mainly from the precentral motor cortex. These movements were tested by requiring the animals to retrieve with their fingers small food pellets from a special test board in which the pellets were easily visible but were more difficult to palpate. Unilateral occipital lobectomy combined with a commissurotomy impaired the performance of the contralateral hand. The same was true for the parietal leucotomy of Myers et al. (1962) which transects the bulk of the intrahemispheric occipitofrontal cortical fibres. Tests of the visual discrimination of the leucotomized hemispheric showed that the motor impairment after this leucotomy did not represent a visual defect. In control animals no impairment was found after ablation of the cortex on the surface of the postcentral gyrus and the superior parietal lobule. A mild impairment occurred, however, when the lesion either involved also the inferior parietal lobule or was accompanied by a white matter infarct deep under the postcentral gyrus. The findings make it likely that the intrahemispheric cortical fibres to the frontal lobe play a role in visual guidance of relatively independent hand and finger movements. This conclusion is also supported by some preliminary findings after frontal lobe lesions, but further experiments are necessary to establish it firmly.
Article
We report four experiments that investigated the representation of novel three-dimensional (3D) objects by the human visual system. In the first experiment, canonical views were demonstrated for novel objects seen equally often from all test viewpoints. The next two experiments showed that the canonical views persisted under repeated testing, and in the presence of a variety of depth cues, including binocular stereo. The fourth experiment probed the ability of subjects to generalize recognition to unfamiliar views of objects previously seen at a limited range of attitudes. Both mono and stereo conditions yielded the same increase in the error rate with misorientation relative to the training attitude. Taken together, these results support the notion that 3D objects are represented by multiple specific views, possibly augmented by partial viewer-centered 3D information.
Article
The ways in which information about faces is represented and stored in the temporal lobe visual areas of primates, as shown by recordings from single neurons in macaques, are considered. Some neurons that respond primarily to faces are found in the cortex in the anterior part of the superior temporal sulcus (in which neurons are especially likely to be tuned to facial expression and to face movement involved in gesture), and in the TE areas more ventrally forming the inferior temporal gyrus (in which neurons are more likely to have responses related to the identity of faces). Quantitative studies of the responses of the neurons that respond differently to the faces of different individuals show that information about the identity of the individual is represented by the responses of a population of neurons, that is, ensemble encoding rather than 'grandmother cell' encoding is used. It is argued that this type of tuning is a delicate compromise between very fine tuning, which has the advantage of low interference in neuronal network operations but the disadvantage of losing the useful properties (such as generalization, completion and graceful degradation) of storage in neuronal networks, and broad tuning, which has the advantage of allowing these properties of neuronal networks to be realized but the disadvantage of leading to interference between the different memories stored in an associative network. There is evidence that the responses of some of these neurons are altered by experience so that new stimuli become incorporated in the network. It is shown that the representation that is built in temporal cortical areas shows considerable invariance for size, contrast, spatial frequency and translation. Thus the representation is in a form which is particularly useful for storage and as an output from the visual system. It is also shown that one of the representations that is built is object based, which is suitable for recognition and as an input to associative memory, and that another is viewer centred, which is appropriate for conveying information about gesture. Ways are considered in which such cortical representations might be built by competitive self-organization aided by back projections in the multi-stage cortical processing hierarchy which has convergence from stage to stage.