Article

Preparatory attention incorporates contextual expectations

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Humans are remarkably proficient at finding objects within complex visual scenes. According to current theories of attention,1, 2, 3 visual processing of an object of interest is favored through the preparatory activation of object-specific representations in visual cortex.4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 One key problem that is inherent to real-world visual search but is not accounted for by current theories is that a given object will produce a dramatically different retinal image depending on its location, which is unknown in advance. For instance, the color of the retinal image depends on the illumination on the object, its shape depends on the viewpoint, and (most critically) its size can vary by several orders of magnitude, depending on the distance to the observer. In order to benefit search, preparatory activity thus needs to incorporate contextual expectations. In the current study, we measured fMRI blood-oxygen-level-dependent (BOLD) activity in human observers while they prepared to search for objects at different distances in indoor-scene photographs. First, we established that observers instantiated preparatory object representations: activity patterns in object-selective cortex evoked during search preparation (while no objects were presented) resembled activity patterns evoked by viewing those objects in isolation. Second, we demonstrated that these preparatory object representations were systematically modulated by expectations derived from scene context: activity patterns reflected the predicted retinal image of the object at each distance (i.e., distant search evoking smaller object representations and nearby search evoking larger object representations). These findings reconcile current theories of attentional selection with the challenges of real-world vision.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Moreover, priming the upcoming target object with word-cues or semantically congruent scenes benefits subsequent search (Malcolm & Henderson, 2009;Robbins & Hout, 2020;, suggesting that observers adjust their attentional template to account for the provided context. Most specifically, we recently showed that when participants prepare to search for a target object nearby (compared to far away), patterns of neural activity emerge in the visual cortex that are similar to activity patterns evoked by viewing large (compared to small) images of this target object (Gayet & Peelen, 2022). This shows that the human visual system anticipates the size of an object depending on the viewing distance. ...
... Our conclusion stems from the observation that the category-specific effect on the target-dot report (discussed above) increased when silhouettes were of the expected size (within an experimental session) compared to the unexpected size. This is consistent with the idea that the attentional template is a visual representation of the object category that is scaled to the expected size of the target object (Gayet & Peelen, 2022). Can this finding explain how observers search for objects at different distances within a three-dimensional real-world environment? ...
... Importantly, these two aspects of the attentional template are codependent: objectselectivity is more pronounced for objects of the expected target size (Experiment 1) and size-selectivity is more pronounced for objects of the expected target shape (Experiment 2). This argues against the existence of two independent attentional templates (a size-specific template and a shape-specific template) and demonstrates that a single attentional template incorporates both category-specific shape information and context-dependent size information (see also Gayet & Peelen, 2022). The contributions of these two aspects to visual search performance seem asymmetrical, however. ...
Article
Full-text available
According to theories of visual search, observers generate a visual representation of the search target (the “attentional template”) that guides spatial attention toward target-like visual input. In real-world vision, however, objects produce vastly different visual input depending on their location: your car produces a retinal image that is 10 times smaller when it is parked 50 compared to 5 m away. Across four experiments, we investigated whether the attentional template incorporates viewing distance when observers search for familiar object categories. On each trial, participants were precued to search for a car or person in the near or far plane of an outdoor scene. In “search trials,” the scene reappeared and participants had to indicate whether the search target was present or absent. In intermixed “catch-trials,” two silhouettes were briefly presented on either side of fixation (matching the shape and/or predicted size of the search target), one of which was followed by a probe-stimulus. We found that participants were more accurate at reporting the location (Experiments 1 and 2) and orientation (Experiment 3) of probe stimuli when they were presented at the location of size-matching silhouettes. Thus, attentional templates incorporate the predicted size of an object based on the current viewing distance. This was only the case, however, when silhouettes also matched the shape of the search target (Experiment 2). We conclude that attentional templates for finding objects in scenes are shaped by a combination of category-specific attributes (shape) and context-dependent expectations about the likely appearance (size) of these objects at the current viewing location.
... Following previous work (Gayet & Peelen, 2022;Peelen & Kastner, 2011) our analyses focused on two visual cortex regions that may encode preparatory attentional templates: object-selective lateral occipital cortex (LOC) in the ventral visual stream and early visual cortex (EVC). Specifically, LOC has most consistently been implicated in visual search for real-world targets (Gayet & Peelen, 2022;Peelen & Kastner, 2011;Soon et al., 2013;van Loon et al., 2018) and may therefore also represent a guiding template for anchor objects. ...
... Following previous work (Gayet & Peelen, 2022;Peelen & Kastner, 2011) our analyses focused on two visual cortex regions that may encode preparatory attentional templates: object-selective lateral occipital cortex (LOC) in the ventral visual stream and early visual cortex (EVC). Specifically, LOC has most consistently been implicated in visual search for real-world targets (Gayet & Peelen, 2022;Peelen & Kastner, 2011;Soon et al., 2013;van Loon et al., 2018) and may therefore also represent a guiding template for anchor objects. ...
... Our results show that preparatory activity patterns in LOC, previously implicated in encoding preparatory attentional templates for naturalistic search targets (Gayet & Peelen, 2022;Peelen & Kastner, 2011), supports such searches by encoding the relevant guiding objects in the current scene context, rather than the search target per se. Importantly, in contrast to the target object in this or previous studies investigating preparatory activity, the guiding object did not have to be reported, nor was it explicitly cued. ...
Preprint
Full-text available
Efficient behavior requires the rapid attentional selection of task-relevant objects. Preparatory activity of target-selective neurons in visual cortex is thought to support attentional selection, guiding spatial attention and favoring processing of target-matching input. However, naturalistic searches are often guided by non-targets, including target-associated "anchor" objects. For instance, when looking for a pen, we may direct our attention to the office desk on which we expect to find it. Here, using fMRI and eyetracking in a context-guided search task, we tested whether preparatory activity in visual cortex reflected the target, the guiding anchor object, or both. Participants learned associations between targets and anchors, reversing across two scene contexts, before searching for these targets. Participants' first fixations were reliably guided by the associated anchor. Preparatory activity in lateral occipital cortex (LOC), and right intraparietal sulcus (IPS), represented the target-associated anchor rather than the target. These results shed light on the neural basis of context-guided search in structured environments.
... A number of studies have shown that preparatory attention to familiar objects evokes brain activities that resemble those during the perception of the same stimulus, which we refer to as a "sensory template [8][9][10] ". Other studies in the literature, however, showed that attention can tune neural response to high-level information such as abstract categories [11][12][13] , and also suggested flexible re-coding of sensory information into distinct formats during retention of mnemonic contents 14,15 . ...
... Yet in the context of other reliable decoding results, our results at least suggest that preparatory attention relies primarily on non-sensory representations. Three potential and non-mutually exclusive explanations may account for the divergence between our results and previous studies supporting sensory template [8][9][10] . First, these previous studies used familiar objects (letters and natural objects and scenes) while we used a basic visual feature (moving dots). ...
... This difference parallels the distinction between attention and expectation: while attention is deployed based on the task relevance of sensory information, expectation reflects visual interpretations of stimuli due to sensory uncertainty [49][50][51] . It is known that the latter tends to evoke a sensory template 10,52 , and that attention and expectation can have distinct effects on neural responses 53,54 . Thus, it is possible that previous work on preparatory attention introduced a component of expectation because participants likely expected to perceive the cued object on target-absent trials. ...
Article
Full-text available
Prior knowledge of behaviorally relevant information promotes preparatory attention before the appearance of stimuli. A key question is how our brain represents the attended information during preparation. A sensory template hypothesis assumes that preparatory signals evoke neural activity patterns that resembled the perception of the attended stimuli, whereas a non-sensory, abstract template hypothesis assumes that preparatory signals reflect the abstraction of attended stimuli. To test these hypotheses, we used fMRI and multivariate analysis to characterize neural activity patterns when human participants were prepared to attend a feature and then select it from a compound stimulus. In an fMRI experiment using basic visual feature (motion direction), we observed reliable decoding of the to-be-attended feature from the preparatory activity in both visual and frontoparietal areas. However, while the neural patterns constructed by a single feature from a baseline task generalized to the activity patterns during stimulus selection, they could not generalize to the activity patterns during preparation. Our findings thus suggest that neural signals during attentional preparation are predominantly non-sensory in nature that may reflect an abstraction of the attended feature. Such a representation could provide efficient and stable guidance of attention.
... A technique called cross-decoding has been particularly useful for elucidating how mechanisms underlying high-level vision are implemented during other cognitive processes. For example, Gayet & Peelen (2022) investigated preparatory attention mechanisms in a visual search paradigm in which participants were cued to search for a melon or a box. Of key interest was the neural response to target-absent displays, where a cross-decoding approach showed that neural activation patterns within object selective cortex, but not early visual cortex, corresponded specifically to the target object that observers were holding in mind. ...
... Representational modulation, by contrast, appears to involve recurrent processing within the visual system (Rajaei et al. 2019) and feedback processing from high-level frontoparietal regions that enhance or instantiate visual representations in a top-down manner (Dijkstra et al. 2017b, Keller et al. 2022. A converging line of research shows that perception-like stimulus representations can be induced in the absence of that stimulus under different circumstances, for example, in expectancy (Blom et al. 2020;Kok et al. 2014Kok et al. , 2017, working memory (Albers et al. 2013), attentional preparation (Gayet & Peelen 2022), imagery (Dijkstra et al. 2018(Dijkstra et al. , 2020, and occlusion (Teichmann et al. 2022). Efforts to understand the instantiations and changes in representational content associated with different internal states constitute an emerging field with great potential to elucidate how visual processing produces perception. ...
Article
Patterns of brain activity contain meaningful information about the perceived world. Recent decades have welcomed a new era in neural analyses, with computational techniques from machine learning applied to neural data to decode information represented in the brain. In this article, we review how decoding approaches have advanced our understanding of visual representations and discuss efforts to characterize both the complexity and the behavioral relevance of these representations. We outline the current consensus regarding the spatiotemporal structure of visual representations and review recent findings that suggest that visual representations are at once robust to perturbations, yet sensitive to different mental states. Beyond representations of the physical world, recent decoding work has shone a light on how the brain instantiates internally generated states, for example, during imagery and prediction. Going forward, decoding has remarkable potential to assess the functional relevance of visual representations for human behavior, reveal how representations change across development and during aging, and uncover their presentation in various mental disorders. Expected final online publication date for the Annual Review of Vision Science, Volume 9 is September 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
... For example, recent studies found that participants are sensitive to color statistics within specific object categories (Bahle et al., 2021) also in a context-dependent manner (Kershner & Hollingworth, 2022). In addition, relations between objects and the overall scene context (e.g., the expected retinal size of an object at a particular location) are encoded in the attentional template (Gayet & Peelen, 2022), all providing evidence for the adaptability of attentional templates. ...
Article
Full-text available
Visual search is supported by an internal representation of the target, the attentional template. However, which features are diagnostic of target presence critically depends on the distractors. Accordingly, previous research showed that consistent distractor context shapes the attentional template for simple targets, with the template emphasizing diagnostic dimensions (e.g., color or orientation) in blocks of trials. Here, we investigated how distractor expectations bias attentional templates for complex shapes, and tested whether such biases reflect intertrial priming or can be instantiated flexibly. Participants searched for novel shapes (cued by name) in two probabilistic distractor contexts: Either the target's orientation or rectilinearity was unique (80% validity). Across four experiments, performance was better when the distractor context was expected, indicating that target features in the expected diagnostic dimension were emphasized. Attentional templates were biased by distractor expectations when distractor context was blocked, also for participants reporting no awareness of the manipulation. Interestingly, attentional templates were also biased when distractor context was cued on a trial-by-trial basis, but only when the two contexts were consistently presented at distinct spatial locations. These results show that attentional templates can flexibly and adaptively incorporate expectations about target-distractor relations when looking for the same object in different contexts. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
... However, the temporal order of predictive representations observed here is in line with hierarchical Bayesian brain theories such as predictive coding that postulate that higher-level predictions act on, and therefore must precede lower-level predictions 11,12,25 . Furthermore, such hierarchical prediction is also suggested by empirical findings from different sensory modalities [26][27][28][29] and complex contexts such as social perception and action observation 30,31 . In complex naturalistic stimuli as utilized here one would similarly expect a causal relationship amongst the partly independent prediction streams, such that high-level prediction affects low-level prediction. ...
Article
Full-text available
Adaptive behavior such as social interaction requires our brain to predict unfolding external dynamics. While theories assume such dynamic prediction, empirical evidence is limited to static snapshots and indirect consequences of predictions. We present a dynamic extension to representational similarity analysis that uses temporally variable models to capture neural representations of unfolding events. We applied this approach to source-reconstructed magnetoencephalography (MEG) data of healthy human subjects and demonstrate both lagged and predictive neural representations of observed actions. Predictive representations exhibit a hierarchical pattern, such that high-level abstract stimulus features are predicted earlier in time, while low-level visual features are predicted closer in time to the actual sensory input. By quantifying the temporal forecast window of the brain, this approach allows investigating predictive processing of our dynamic world. It can be applied to other naturalistic stimuli (e.g., film, soundscapes, music, motor planning/execution, social interaction) and any biosignal with high temporal resolution.
... J. Müller & Rabbitt, 1989;Posner, 1980), top-down attention is comparatively slower and requires more volitional effort on our part -hence the idiom "to pay attention" (Baluch & Itti, 2011). At the same time, prioritising what is most relevant to our real-world goals is often controlled by a dynamic interplay between bottom-up and top-down factors, wherein feature-based attentional sets (Maunsell & Treue, 2006) are subject to influence by expectations and knowledge about the world (Gayet & Peelen, 2022;Yeh & Peelen, 2022). Thus, if you lose your dog at the park, you might search for black, smallish objects, preferentially deploying this search template within the region of space where the target is most probable (e.g., on the grass, rather than in the lake). ...
Preprint
Full-text available
Observers can selectively deploy attention to regions of space, moments in time, specific visual features, and individual objects. One level higher, we can also attend to particular high-level categories – for example, when keeping an eye out for dogs while jogging. Here we exploited visual periodicity to examine how the focus of category-based attention serves to differentially enhance and suppress selective neural processing of face and non-face categories. We combined electroencephalography (EEG) with an adaptation of the established Fast Periodic Visual Stimulation (FPVS) paradigm designed to capture selective neural responses for multiple visual categories contained in the same rapid stream of object images (faces and birds in Exp 1; houses and birds in Exp 2). Our key finding was that the pattern of attentional enhancement and suppression for face-selective processing is unique compared to non-face object categories. Specifically, where attending to non-face objects provides a strong boost their selective neural signals during the 300-500ms window, attentional enhancement of face-selective processing is both earlier and comparatively more modest. Moreover, only the selective neural response for faces appears to be actively suppressed when observers attend towards an alternate visual category. These results underscore the special status that human faces hold within the human visual system, and highlight the utility of visual periodicity as a powerful tool for indexing selective neural processing of multiple visual categories contained within the same visual sequence.
... As Stewart et al. 3 note, their account joins other recent studies that have emphasized the ''perspectival'' character of visual perception. For example, Gayet et al. 10 , reporting in Current Biology, showed that observers who expect to fixate a far-away object (versus a nearby object) anticipate its smaller twodimensional retinal size from the perceiver's point-of-view, even when the objects have the same distal size. Similarly, Morales et al. 11 showed that observers who must locate a distally elliptical object in a search array get distracted by non-elliptical objects that have two-dimensional elliptical projections from the observer's perspective. ...
Article
Manipulating an object in one’s mind has long been thought to mirror physically manipulating that object in allocentric three-dimensional space. A new study revises and clarifies this foundational assumption, identifying a previously unknown role for the observer’s point-of-view.
... For example, the likely location of the target in a scene can be learned and used to facilitate search, both based on recent experience in controlled laboratory experiments ("contextual cueing"; Chun, 2000) and based on long-term daily-life experience (Castelhano & Krzyś, 2020;Võ et al., 2019): When searching for a computer mouse, we start searching to the right of the keyboard and below the monitor. Scene context also provides information about the features that characterize the target (Peelen & Kastner, 2014) or distinguish the target from the distractors (Geng & Witkowski, 2019): We look for a small target far away and a large target nearby (Gayet & Peelen, 2022). Finally, targets are recognized more quickly when embedded in context, reflecting the facilitatory influence of contextual expectations on object recognition (Bar, 2004;de Lange et al., 2018). ...
Article
Full-text available
Visual search is facilitated by knowledge of the relationship between the target and the distractors, including both where the target is likely to be among the distractors and how it differs from the distractors. Whether the statistical structure among distractors themselves, unrelated to target properties, facilitates search is less well understood. Here, we assessed the benefit of distractor structure using novel shapes whose relationship to each other was learned implicitly during visual search. Participants searched for target items in arrays of shapes that comprised either four pairs of co-occurring distractor shapes (structured scenes) or eight distractor shapes randomly partitioned into four pairs on each trial (unstructured scenes). Across five online experiments (N = 1,140), we found that after a period of search training, participants were more efficient when searching for targets in structured than unstructured scenes. This structure benefit emerged independently of whether the position of the shapes within each pair was fixed or variable and despite participants having no explicit knowledge of the structured pairs they had seen. These results show that implicitly learned co-occurrence statistics between distractor shapes increases search efficiency. Increased efficiency in the rejection of regularly co-occurring distractors may contribute to the efficiency of visual search in natural scenes, where such regularities are abundant.
... when searching for a computer mouse, we start searching to the right of the keyboard and below the monitor. Scene context also provides information about the features that characterize the target (Peelen & Kastner, 2014), or distinguish the target from the distractors (Geng & Witkowski, 2019): we look for a small target far away and a large target nearby (Gayet & Peelen, 2022). Finally, targets are recognized more quickly when embedded in context, reflecting the facilitatory influence of contextual expectations on object recognition (Bar, 2004;de Lange et al., 2018). ...
Preprint
Full-text available
Visual search depends on the relationship between the target and the distractors, including how the target differs from the distractors and where the target is likely to be amongst the distractors. Whether the statistical structure amongst distractors themselves facilitates search is less well understood. Here, we assessed the benefit of distractor structure using novel shapes whose relationship to each other was learned implicitly during visual search. Participants searched for target items in arrays of shapes that comprised either four pairs of co-occurring distractor shapes (structured scenes) or eight distractor shapes randomly partitioned into four pairs on each trial (unstructured scenes). Across five online experiments (N=1140), we found that after a period of search training, participants were more efficient when searching for targets in structured than unstructured scenes. This structure-benefit emerged independently of whether the position of the shapes within each pair was fixed or varied, and despite participants having no explicit knowledge of the structured pairs they had seen. These results show that learned cooccurrence statistics between distractor shapes increases search efficiency. An increased efficiency of the rejection of regular distractors may contribute to the efficiency of visual search in natural scenes, where such regularities are abundant.
... Factors biasing attention include priming (Li, Wolfe, & Chen, 2020), experience (Brockmole & Henderson, 2006;Goldfarb, Chun, & Phelps, 2016;Theeuwes, 2019;van Moorselaar, Daneshtalab, & Slagter, 2021), reward (Della Libera & Chelazzi, 2009;Failing & Theeuwes, 2018;Hickey, Chelazzi, & Theeuwes, 2010;Meyer, Sheridan, & Hopfinger, 2020;Peck, Jangraw, Suzuki, Efem, & Gottlieb, 2009), object meaning (Gayet & Peelen, 2022;Peacock, Cronin, Hayes, & Henderson, 2021) and high-level behavioral goals and motivations (Banerjee, Frey, Molholm, & Foxe, 2015;Lepsien, Thornton, & Nobre, 2011;Luck, Gaspelin, Folk, Remington, & Theeuwes, 2021;McMains & Kastner, 2011;Serences et al., 2005). When attention is voluntarily directed in the absence of explicit external cues, this has been referred to as internally-driven (Taylor, Rushworth, & Nobre, 2008) or self-initiated (Hopfinger, Camblin, & Parks, 2010) attention, or in our work as "willed attention" (Bengson, Kelley, Zhang, Wang, & Mangun, 2014;Bengson, Kelley, & Mangun, 2015;Bengson, Liu, Khodayari, & Mangun, 2020;Liu et al., 2017;Rajan et al., 2018). ...
Preprint
Full-text available
Most models of attention distinguish between voluntary and involuntary attention, the latter being driven in a bottom-up fashion by salient sensory signals. Studies of voluntary visual-spatial attention have used informational or instructional cues, such as arrows, to induce or instruct observers to direct selective attention to relevant locations in visual space in order to detect or discriminate subsequent target stimuli. In everyday vision, however, voluntary attention is influenced by a host of factors, most of which are quite different from the laboratory paradigms that utilize attention-directing cues. These factors include priming, experience, reward, meaning, motivations, and high-level behavioral goals. Attention that is endogenously directed in the absence of external cues has been referred to as self-initiated attention, or in our prior work as "willed attention". Such studies typically replace attention-directing cues with a "prompt" that signals the subject when to choose where they will attend in preparation for the upcoming target stimulus. We used a novel paradigm that was designed to minimize external influences (i.e., cues or prompts) as to where, as well as when, spatial attention would be shifted and focused. Participants were asked to view bilateral dynamic dot motion displays, and to shift their covert spatial attention to either the left or right visual field patch at a time of their own choosing, thus allowing the participants to control both when and where they attended on each trial. The task was to discriminate and respond to a pattern in the attended dot motion patch. Our goal was to identify patterns of neural activity in the scalp-recorded EEG that revealed when and where attention was focused. Using machine learning methods to decode attention-related EEG alpha band activity, we were able to identify the onset of voluntary (willed) shifts of visual-spatial attention, and to determine where attention was focused. This work contributes to our understanding of the neural antecedents of voluntary attention, opening the door for improved models of attentional control, and providing steps toward development of brain-computer interfaces using non-invasive electrical recordings of brain activity.
Article
Observers can selectively deploy attention to regions of space, moments in time, specific visual features, individual objects, and even specific high-level categories—for example, when keeping an eye out for dogs while jogging. Here, we exploited visual periodicity to examine how category-based attention differentially modulates selective neural processing of face and non-face categories. We combined electroencephalography with a novel frequency-tagging paradigm capable of capturing selective neural responses for multiple visual categories contained within the same rapid image stream (faces/birds in Exp 1; houses/birds in Exp 2). We found that the pattern of attentional enhancement and suppression for face-selective processing is unique compared to other object categories: Where attending to non-face objects strongly enhances their selective neural signals during a later stage of processing (300–500 ms), attentional enhancement of face-selective processing is both earlier and comparatively more modest. Moreover, only the selective neural response for faces appears to be actively suppressed by attending towards an alternate visual category. These results underscore the special status that faces hold within the human visual system, and highlight the utility of visual periodicity as a powerful tool for indexing selective neural processing of multiple visual categories contained within the same image sequence.
Article
Multivariate pattern analysis (MVPA) has emerged as a powerful method for the analysis of functional magnetic resonance imaging, electroencephalography and magnetoencephalography data. The new approaches to experimental design and hypothesis testing afforded by MVPA have made it possible to address theories that describe cognition at the functional level. Here we review a selection of studies that have used MVPA to test cognitive theories from a range of domains, including perception, attention, memory, navigation, emotion, social cognition and motor control. This broad view reveals properties of MVPA that make it suitable for understanding the 'how' of human cognition, such as the ability to test predictions expressed at the item or event level. It also reveals limitations and points to future directions.
Article
As our viewpoint changes, the whole scene around us rotates coherently. This allows us to predict how one part of a scene (e.g., an object) will change by observing other parts (e.g., the scene background). While human object perception is known to be strongly context-dependent, previous research has largely focused on how scene context can disambiguate fixed object properties, such as identity (e.g., a car is easier to recognize on a road than on a beach). It remains an open question whether object representations are updated dynamically based on the surrounding scene context, for example across changes in viewpoint. Here, we tested whether human observers dynamically and automatically predict the appearance of objects based on the orientation of the background scene. In three behavioral experiments (N = 152), we temporarily occluded objects within scenes that rotated. Upon the objects' reappearance, participants had to perform a perceptual discrimination task, which did not require taking the scene rotation into account. Performance on this orthogonal task strongly depended on whether objects reappeared rotated coherently with the surrounding scene or not. This effect persisted even when a majority of trials violated this real-world contingency between scene and object, showcasing the automaticity of these scene-based predictions. These findings indicate that contextual information plays an important role in predicting object transformations in structured real-world environments.
Article
Full-text available
Studies of voluntary visual spatial attention have used attention-directing cues, such as arrows, to induce or instruct observers to focus selective attention on relevant locations in visual space to detect or discriminate subsequent target stimuli. In everyday vision, however, voluntary attention is influenced by a host of factors, most of which are quite different from the laboratory paradigms that use attention-directing cues. These factors include priming, experience, reward, meaning, motivations, and high-level behavioral goals. Attention that is endogenously directed in the absence of external attention-directing cues has been referred to as “self-initiated attention” or, as in our prior work, as “willed attention” where volunteers decide where to attend in response to a prompt to do so. Here, we used a novel paradigm that eliminated external influences (i.e., attention-directing cues and prompts) about where and/or when spatial attention should be directed. Using machine learning decoding methods, we showed that the well known lateralization of EEG alpha power during spatial attention was also present during purely self-generated attention. By eliminating explicit cues or prompts that affect the allocation of voluntary attention, this work advances our understanding of the neural correlates of attentional control and provides steps toward the development of EEG-based brain–computer interfaces that tap into human intentions.
Preprint
Full-text available
Adaptive behavior requires our brain to predict unfolding external dynamics. While theories assume such dynamic prediction, empirical evidence is limited to static snapshots and indirect consequences of predictions. We present a dynamic extension to representational similarity analysis that uses temporally variable models to capture neural representations of unfolding events. We applied this approach to source-reconstructed MEG data of healthy human subjects, and demonstrate predictive neural representations of future posture and motion of observed actions. Predictive representations exhibit a hierarchical pattern, such that high-level abstract stimulus features are predicted earlier, while low-level visual features are predicted closer in time to the actual sensory input. This new approach offers a window into predictive processing of dynamic events, and can be applied to other sensory modalities and biosignals. One-Sentence Summary Dynamic representational similarity analysis unveils how and when our brain represents and predicts dynamics of the world.
Article
Full-text available
Used 3 converging procedures to determine whether pictures presented in a rapid sequence at rates comparable to eye fixations are understood and then quickly forgotten. In 2 experiments, with 96 and 16 college students, respectively, sequences of 16 color photographs were presented at rates of 113, 167, 250, or 333 msec/picture. In 1 group, Ss were given an immediate test of recognition memory for the pictures and in other groups they searched for a target picture. Even when the target had only been specified by a title (e.g., a boat), detection of a target was strikingly superior to recognition memory. Detection was slightly but significantly better for pictured than named targets. In Exp III, with 8 college students, pictures were presented for 50, 70, 90, or 120 msec preceded and followed by a visual mask; at 120 msec recognition memory was as accurate as detection had been. Results, taken together with those of M. C. Potter and E. I. Levy for slower rates of sequential presentation, suggest that on the average a scene is understood and so becomes immune to ordinary visual masking within about 100 msec but requires about 300 msec of further processing before the memory representation is resistant to conceptual masking from a following picture. Possible functions of a short-term conceptual memory (e.g., the control of eye fixations) are discussed. (25 ref)
Article
Full-text available
Arguably the most foundational principle in perception research is that our experience of the world goes beyond the retinal image; we perceive the distal environment itself, not the proximal stimulation it causes. Shape may be the paradigm case of such “unconscious inference”: When a coin is rotated in depth, we infer the circular object it truly is, discarding the perspectival ellipse projected on our eyes. But is this really the fate of such perspectival shapes? Or does a tilted coin retain an elliptical appearance even when we know it’s circular? This question has generated heated debate from Locke and Hume to the present; but whereas extant arguments rely primarily on introspection, this problem is also open to empirical test. If tilted coins bear a representational similarity to elliptical objects, then a circular coin should, when rotated, impair search for a distal ellipse. Here, nine experiments demonstrate that this is so, suggesting that perspectival shapes persist in the mind far longer than traditionally assumed. Subjects saw search arrays of three-dimensional “coins,” and simply had to locate a distally elliptical coin. Surprisingly, rotated circular coins slowed search for elliptical targets, even when subjects clearly knew the rotated coins were circular. This pattern arose with static and dynamic cues, couldn’t be explained by strategic responding or unfamiliarity, generalized across shape classes, and occurred even with sustained viewing. Finally, these effects extended beyond artificial displays to real-world objects viewed in naturalistic, full-cue conditions. We conclude that objects have a remarkably persistent dual character: their objective shape “out there,” and their perspectival shape “from here.”
Article
Full-text available
Neural activation in the early visual cortex (EVC) reflects the perceived rather than retinal size of stimuli, suggesting that feedback possibly from extrastriate regions modulates retinal size information in EVC. Meanwhile, the lateral occipital cortex (LOC) has been suggested to be critically involved in object size processing. To test for the potential contributions of feedback modulations on size representations in EVC, we investigated the dynamics of relevant processes using transcranial magnetic stimulation (TMS). Specifically, we briefly disrupted the neural activity of EVC and LOC at early, intermediate, and late time windows while participants performed size judgement tasks in either an illusory or neutral context. TMS over EVC and LOC allowed determining whether these two brain regions are relevant for generating phenomenological size impressions. Furthermore, the temporal order of TMS effects allowed inferences on the dynamics of information exchange between the two areas. Particularly, if feedback signals from LOC to EVC are crucial for generating altered size representations in EVC, then TMS effects over EVC should be observed simultaneously or later than the effects following LOC stimulation. The data from 20 humans (13 females) revealed that TMS over both EVC and LOC impaired illusory size perception. However, the strongest effects of TMS applied over EVC occurred later than those of LOC, supporting a functionally relevant feedback modulation from LOC to EVC for scaling size information. Our results suggest that context integration and the concomitant change of perceived size require LOC and result in modulating representations in EVC via recurrent processing.
Article
Full-text available
Memories are about the past, but they serve the future. Memory research often emphasizes the former aspect: focusing on the functions that re-constitute (re-member) experience and elucidating the various types of memories and their interrelations, timescales, and neural bases. Here we highlight the prospective nature of memory in guiding selective attention, focusing on functions that use previous experience to anticipate the relevant events about to unfold-to "premember" experience. Memories of various types and timescales play a fundamental role in guiding perception and performance adaptively, proactively, and dynamically. Consonant with this perspective, memories are often recorded according to expected future demands. Using working memory as an example, we consider how mnemonic content is selected and represented for future use. This perspective moves away from the traditional representational account of memory toward a functional account in which forward-looking memory traces are informationally and computationally tuned for interacting with incoming sensory signals to guide adaptive behavior.
Article
Full-text available
When searching for relevant objects in our environment (say, an apple), we create a memory template (a red sphere), which causes our visual system to favor template-matching visual input (applelike objects) at the expense of template-mismatching visual input (e.g., leaves). Although this principle seems straightforward in a lab setting, it poses a problem in naturalistic viewing: Two objects that have the same size on the retina will differ in real-world size if one is nearby and the other is far away. Using the Ponzo illusion to manipulate perceived size while keeping retinal size constant, we demonstrated across 71 participants that visual objects attract attention when their perceived size matches a memory template, compared with mismatching objects that have the same size on the retina. This shows that memory templates affect visual selection after object representations are modulated by scene context, thus providing a working mechanism for template-based search in naturalistic vision.
Article
Full-text available
Humans and many animals make frequent saccades requiring coordinated movements of the eyes. When landing on the new fixation point, the eyes must converge accurately or double images will be perceived. We asked whether the visual system uses statistical regularities in the natural environment to aid eye alignment at the end of saccades. Wemeasured the distribution of naturally occurring disparities in different parts of the visual field. The central tendency of the distributions was crossed (nearer than fixation) in the lower field and uncrossed (farther) in the upper field in male and female participants. It was uncrossed in the left and right fields. We also measured horizontal vergence after completion of vertical, horizontal, and oblique saccades. When the eyes first landed near the eccentric target, vergence was quite consistent with the natural-disparity distribution. For example, when making an upward saccade, the eyes diverged to be aligned with the most probable uncrossed disparity in that part of the visual field. Likewise, when making a downward saccade, the eyes converged to enable alignment with crossed disparity in that part of the field. Our results show that rapid binocular eye movements are adapted to the statistics of the 3D environment, minimizing the need for large corrective vergence movements at the end of saccades. The results are relevant to the debate about whether eye movements are derived from separate saccadic and vergence neural commands that control both eyes or from separate monocular commands that control the eyes independently.
Article
Full-text available
Recent years have seen an increase in the popularity of multivariate pattern (MVP) analysis of functional magnetic resonance (fMRI) data, and, to a much lesser extent, magneto- and electro-encephalography (M/EEG) data. We present CoSMoMVPA, a lightweight MVPA (MVP analysis) toolbox implemented in the intersection of the Matlab and GNU Octave languages, that treats both fMRI and M/EEG data as first-class citizens. CoSMoMVPA supports all state-of-the-art MVP analysis techniques, including searchlight analyses, classification, correlations, representational similarity analysis, and the time generalization method. These can be used to address both data-driven and hypothesis-driven questions about neural organization and representations, both within and across: space, time, frequency bands, neuroimaging modalities, individuals, and species. It uses a uniform data representation of fMRI data in the volume or on the surface, and of M/EEG data at the sensor and source level. Through various external toolboxes, it directly supports reading and writing a variety of fMRI and M/EEG neuroimaging formats, and, where applicable, can convert between them. As a result, it can be integrated readily in existing pipelines and used with existing preprocessed datasets. CoSMoMVPA overloads the traditional volumetric searchlight concept to support neighborhoods for M/EEG and surface-based fMRI data, which supports localization of multivariate effects of interest across space, time, and frequency dimensions. CoSMoMVPA also provides a generalized approach to multiple comparison correction across these dimensions using Threshold-Free Cluster Enhancement with state-of-the-art clustering and permutation techniques. CoSMoMVPA is highly modular and uses abstractions to provide a uniform interface for a variety of MVP measures. Typical analyses require a few lines of code, making it accessible to beginner users. At the same time, expert programmers can easily extend its functionality. CoSMoMVPA comes with extensive documentation, including a variety of runnable demonstration scripts and analysis exercises (with example data and solutions). It uses best software engineering practices including version control, distributed development, an automated test suite, and continuous integration testing. It can be used with the proprietary Matlab and the free GNU Octave software, and it complies with open source distribution platforms such as NeuroDebian. CoSMoMVPA is Free/Open Source Software under the permissive MIT license. Website: http://cosmomvpa.org Source code: https://github.com/CoSMoMVPA/CoSMoMVPA
Article
Full-text available
Humans and many animals have forward-facing eyes providing different views of the environment. Precise depth estimates can be derived from the resulting binocular disparities, but determining which parts of the two retinal images correspond to one another is computationally challenging. To aid the computation, the visual system focuses the search on a small range of disparities. We asked whether the disparities encountered in the natural environment match that range. We did this by simultaneously measuring binocular eye position and three-dimensional scene geometry during natural tasks. The natural distribution of disparities is indeed matched to the smaller range of correspondence search. Furthermore, the distribution explains the perception of some ambiguous stereograms. Finally, disparity preferences of macaque cortical neurons are consistent with the natural distribution.
Article
Full-text available
Size constancy is the result of cognitive scaling operations that enable us to perceive an object as having the same size when presented at different viewing distances. In this article, we review the literature on size and distance perception to form an overarching synthesis of how the brain might combine retinal images and distance cues of retinal and extra-retinal origin to produce a perceptual visual experience of a world where objects have a constant size. A convergence of evidence from visual psychophysics, neurophysiology, neuropsychology, electrophysiology and neuroimaging highlight the primary visual cortex (V1) as an important node in mediating size–distance scaling. It is now evident that this brain area is involved in the integration of multiple signals for the purposes of size perception and does much more than fulfil the role of an entry position in a series of hierarchical cortical events. We also discuss how information from other sensory modalities can also contribute to size–distance scaling and shape our perceptual visual experience.
Article
Full-text available
The multivariate analysis of brain signals has recently sparked a great amount of interest, yet accessible and versatile tools to carry out decoding analyses are scarce. Here we introduce The Decoding Toolbox (TDT) which represents a user-friendly, powerful and flexible package for multivariate analysis of functional brain imaging data. TDT is written in Matlab and equipped with an interface to the widely used brain data analysis package SPM. The toolbox allows running fast whole-brain analyses, region-of-interest analyses and searchlight analyses, using machine learning classifiers, pattern correlation analysis, or representational similarity analysis. It offers automatic creation and visualization of diverse cross-validation schemes, feature scaling, nested parameter selection, a variety of feature selection methods, multiclass capabilities, and pattern reconstruction from classifier weights. While basic users can implement a generic analysis in one line of code, advanced users can extend the toolbox to their needs or exploit the structure to combine it with external high-performance classification toolboxes. The toolbox comes with an example data set which can be used to try out the various analysis methods. Taken together, TDT offers a promising option for researchers who want to employ multivariate analyses of brain activity patterns.
Article
Full-text available
Sensory processing is strongly influenced by prior expectations. Valid expectations have been shown to lead to improvements in perception as well as in the quality of sensory representations in primary visual cortex. However, very little is known about the neural correlates of the expectations themselves. Previous studies have demonstrated increased activity in sensory cortex following the omission of an expected stimulus, yet it is unclear whether this increased activity constitutes a general surprise signal or rather has representational content. One intriguing possibility is that top-down expectation leads to the formation of a template of the expected stimulus in visual cortex, which can then be compared with subsequent bottom-up input. To test this hypothesis, we used fMRI to noninvasively measure neural activity patterns in early visual cortex of human participants during expected but omitted visual stimuli. Our results show that prior expectation of a specific visual stimulus evokes a feature-specific pattern of activity in the primary visual cortex (V1) similar to that evoked by the corresponding actual stimulus. These results are in line with the notion that prior expectation triggers the formation of specific stimulus templates to efficiently process expected sensory inputs.
Article
Full-text available
Perception is strongly influenced by expectations. Accordingly, perception has sometimes been cast as a process of inference, whereby sensory inputs are combined with prior knowledge. However, despite a wealth of behavioral literature supporting an account of perception as probabilistic inference, the neural mechanisms underlying this process remain largely unknown. One important question is whether top-down expectation biases stimulus representations in early sensory cortex, i.e., whether the integration of prior knowledge and bottom-up inputs is already observable at the earliest levels of sensory processing. Alternatively, early sensory processing may be unaffected by top-down expectations, and integration of prior knowledge and bottom-up input may take place in downstream association areas that are proposed to be involved in perceptual decision-making. Here, we implicitly manipulated human subjects' prior expectations about visual motion stimuli, and probed the effects on both perception and sensory representations in visual cortex. To this end, we measured neural activity noninvasively using functional magnetic resonance imaging, and applied a forward modeling approach to reconstruct the motion direction of the perceived stimuli from the signal in visual cortex. Our results show that top-down expectations bias representations in visual cortex, demonstrating that the integration of prior information and sensory input is reflected at the earliest stages of sensory processing.
Article
Full-text available
Occipito-temporal cortex is known to house visual object representations, but the organization of the neural activation patterns along this cortex is still being discovered. Here we found a systematic, large-scale structure in the neural responses related to the interaction between two major cognitive dimensions of object representation: animacy and real-world size. Neural responses were measured with functional magnetic resonance imaging while human observers viewed images of big and small animals and big and small objects. We found that real-world size drives differential responses only in the object domain, not the animate domain, yielding a tripartite distinction in the space of object representation. Specifically, cortical zones with distinct response preferences for big objects, all animals, and small objects, are arranged in a spoked organization around the occipital pole, along a single ventromedial, to lateral, to dorsomedial axis. The preference zones are duplicated on the ventral and lateral surface of the brain. Such a duplication indicates that a yet unknown higher-order division of labor separates object processing into two substreams of the ventral visual pathway. Broadly, we suggest that these large-scale neural divisions reflect the major joints in the representational structure of objects and thus place informative constraints on the nature of the underlying cognitive architecture.
Article
Full-text available
Visual search involves the matching of visual input to a "search template," an internal representation of task-relevant information. The present study investigated the contents of the search template during visual search for object categories in natural scenes, for which low-level features do not reliably distinguish targets from nontargets. Subjects were cued to detect people or cars in diverse photographs of real-world scenes. On a subset of trials, the cue was followed by task-irrelevant stimuli instead of scenes, directly followed by a dot that subjects were instructed to detect. We hypothesized that stimuli that matched the active search template would capture attention, resulting in faster detection of the dot when presented at the location of a template-matching stimulus. Results revealed that silhouettes of cars and people captured attention irrespective of their orientation (0°, 90°, or 180°). Interestingly, strong capture was observed for silhouettes of category-diagnostic object parts, such as the wheel of a car. Finally, attentional capture was also observed for silhouettes presented at locations that were irrelevant to the search task. Together, these results indicate that search for familiar object categories in real-world scenes is mediated by spatially global search templates that consist of view-invariant shape representations of category-diagnostic object parts.
Article
Full-text available
When a sequence of pictures is presented at the rapid rate of 113 msec/picture, a viewer can detect a verbally specified target more than 60% of the time. In the present experiment, sequences of pictures were presented to 96 undergraduates at rates of 258, 172, and 114 msec/picture. A target was specified by name, superordinate category, or "negative" category (e.g., "the picture that is not of food"). Although the probability of detection decreased as cue specificity decreased, even in the most difficult condition (negative category cue at 114 msec/picture) 35% of the targets were detected. When the scores from the 3 detection tasks were compared with a control group's immediate recognition memory for the targets, immediate recognition memory was invariably lower than detection. Results are consistent with the hypothesis that rapidly presented pictures may be momentarily understood at the time of viewing and then quickly forgotten. (19 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
A central issue for understanding visual object recognition is how the cortical hierarchy represents incoming sensory information and transforms it across successive processing stages. The format of object representation in the human brain has thus far mostly been studied using adaptation paradigms because the neuronal layout of object selectivities was thought to be beyond the resolution of conventional functional MRI (fMRI). Recently, however, multivariate pattern recognition succeeded in discriminating fMRI responses of object-selective cortex to different object exemplars within a given category. Here, we use increased spatial fMRI resolution to explore size sensitivity and tolerance to size change of response patterns evoked by object exemplars across a range of three sizes. Results from Support Vector Classification on responses of the human lateral occipital complex (LOC) show that discrimination of size (for a given object) and discrimination of objects across changes in size depended on the amount of size difference. Even across the largest amount of size change, accuracy for generalization was still significant in LOC, whereas the same comparison was at chance performance in early visual (calcarine) cortex. Analyzing subregions, we further found an anterior-posterior gradient in the degree of size sensitivity and size generalization within the posterior-dorsal and anterior-ventral parts of LOC. These results speak against fully size-invariant representation of object information in human LOC and are hence congruent with findings in monkeys showing object identity and size information in population activity of inferotemporal cortex. Moreover, these results provide evidence for a fine-grained functional heterogeneity within human LOC beyond the commonly used LO/fusiform subdivision.
Article
Full-text available
Most theories of attention propose that we maintain attentional templates in visual working memory to control what information is selected. In the present study, we directly tested this proposal by measuring the contralateral-delay activity (CDA) of human event-related potentials during visual search tasks in which the target is cued on each trial. Here we show that the CDA can be used to measure the maintenance of attentional templates in visual working memory while processing complex visual scenes. In addition, this method allowed us to directly observe the shift from working memory to long-term memory representations controlling attention as learning occurred and experience accrued searching for the same target object. Our findings provide definitive support for several critical proposals made in theories of attention, learning, and automaticity.
Article
Full-text available
The mechanisms of attention prioritize sensory input for efficient perceptual processing. Influential theories suggest that attentional biases are mediated via preparatory activation of task-relevant perceptual representations in visual cortex, but the neural evidence for a preparatory coding model of attention remains incomplete. In this experiment, we tested core assumptions underlying a preparatory coding model for attentional bias. Exploiting multivoxel pattern analysis of functional neuroimaging data obtained during a non-spatial attention task, we examined the locus, time-course, and functional significance of shape-specific preparatory attention in the human brain. Following an attentional cue, yet before the onset of a visual target, we observed selective activation of target-specific neural subpopulations within shape-processing visual cortex (lateral occipital complex). Target-specific modulation of baseline activity was sustained throughout the duration of the attention trial and the degree of target specificity that characterized preparatory activation patterns correlated with perceptual performance. We conclude that top-down attention selectively activates target-specific neural codes, providing a competitive bias favoring task-relevant representations over competing representations distributed within the same subregion of visual cortex.
Article
Full-text available
Selective visual attention directed to a location (even in the absence of a stimulus) increases activity in the corresponding regions of visual cortex and enhances the speed and accuracy of target perception. We further explored top-down influences on perceptual representations by manipulating observers’ expectations about the category of an upcoming target. Observers viewed a display in which an object (either a face or a house) gradually emerged from a state of phase-scrambled noise; a cue established expectation about the object category. Observers were faster to categorize faces (gender discrimination) or houses (structural discrimination) when the category of the partially scrambled object matched their expectation. Functional magnetic resonance imaging revealed that this expectation was associated with anticipatory increases in category-specific visual cortical activity, even in the absence of object- or category-specific visual information. Expecting a face evoked increased activity in face-selective cortical regions in the fusiform gyrus and superior temporal sulcus. Conversely, expecting a house increased activity in parahippocampal gyrus. These results suggest that visual anticipation facilitates subsequent perception by recruiting, in advance, the same cortical mechanisms as those involved in perception.
Article
Full-text available
Three converving procedures were used to determine whether pictures presented in a rapid sequence at rates comparable to eye fixations are understood and then quickly forgotten. In two experiments, sequences of 16 color photographs were presented at rates of 113, 167, or 333 msec per picture. In one group, subjects were given an immediate test of recognition memory for the pictures and in other groups they searched for a target picture. Even when the target had only been specified by a title (e.g., a boat) detection of a target was strikingly superior to recognition memory. Detection was slightly but significantly better for pictured than named targets. In a third experiment pictures were presented for 50, 70, 90 or 120 msec preceded and followed by a visual mask; at 120 msec recognition memory was as accurate as detection had been. The results, taken together with those in 1969 of Potter and Levy for slower rates of sequential presentation, suggest that on the average a scene is understood and so becomes immune to ordinary visual masking within about 100 msec but requires about 300 msec of further processing before the memory representation is resistant to conceptual masking from a following picture. Possible functions of a short-term conceptual memory, such as the control of eye fixations, are discussed.
Article
Full-text available
Viewers briefly glimpsed pictures presented in a sequence at rates up to eight per second. They recognized a target picture as accurately and almost as rapidly when they knew only its meaning given by a name (for example, a boat) as when they had seen the picture itself in advance.
Article
Full-text available
Subjects searched sets of items for targets defined by conjunctions of color and form, color and orientation, or color and size. Set size was varied and reaction times (RT) were measured. For many unpracticed subjects, the slopes of the resulting RT X Set Size functions are too shallow to be consistent with Treisman's feature integration model, which proposes serial, self-terminating search for conjunctions. Searches for triple conjunctions (Color X Size X Form) are easier than searches for standard conjunctions and can be independent of set size. A guided search model similar to Hoffman's (1979) two-stage model can account for these data. In the model, parallel processes use information about simple features to guide attention in the search for conjunctions. Triple conjunctions are found more efficiently than standard conjunctions because three parallel processes can guide attention more effectively than two.
Article
Full-text available
A new theory of search and visual attention is presented. Results support neither a distinction between serial and parallel search nor between search for features and conjunctions. For all search materials, instead, difficulty increases with increased similarity of targets to nontargets and decreased similarity between nontargets, producing a continuum of search efficiency. A parallel stage of perceptual grouping and description is followed by competitive interaction between inputs, guiding selective access to awareness and action. An input gains weight to the extent that it matches an internal description of that information needed in current behavior (hence the effect of target-nontarget similarity). Perceptual grouping encourages input weights to change together (allowing "spreading suppression" of similar nontargets). The theory accounts for harmful effects of nontargets resembling any possible target, the importance of local nontarget grouping, and many other findings.
Article
Full-text available
Hemimicropsia is a rare disorder of visual perception characterized by an apparent reduction of the size of objects when presented in one hemifield. We report two cases of hemimicropsia resulting from focal brain lesions. The first patient was an art teacher and could accurately depict his abnormal visual perception. He subsequently died and his brain was examined post mortem. In the second patient, micropsia was assessed by a quantified size comparison task. The size of a given object is normally perceived as constant across any spatial position. Hemimicropsia may thus be considered a limited violation of the size constancy principle. Behavioural and anatomical data are discussed in relation to the neural basis of visual object perception in humans.
Article
Full-text available
We often search for a face in a crowd or for a particular object in a cluttered environment. In this type of visual search, memory interacts with attention: the mediating neural mechanisms should include a stored representation of the object and a means for selecting that object from among others in the scene. Here we test whether neurons in inferior temporal cortex, an area known to be important for high-level visual processing, might provide these components. Monkeys were presented with a complex picture (the cue) to hold in memory during a delay period. The cue initiated activity that persisted through the delay among the neurons that were tuned to its features. The monkeys were then given 2-5 choice pictures and were required to make an eye movement to the one (the target) that matched the cue. About 90-120 milliseconds before the onset of the eye movement to the target, responses to non-targets were suppressed and the neuronal response was dominated by the target. The results suggest that inferior temporal cortex is involved in selecting the objects to which we attend and foveate.
Article
Full-text available
How long does it take for the human visual system to process a complex natural image? Subjectively, recognition of familiar objects and scenes appears to be virtually instantaneous, but measuring this processing time experimentally has proved difficult. Behavioural measures such as reaction times can be used, but these include not only visual processing but also the time required for response execution. However, event-related potentials (ERPs) can sometimes reveal signs of neural processing well before the motor output. Here we use a go/no-go categorization task in which subjects have to decide whether a previously unseen photograph, flashed on for just 20 ms, contains an animal. ERP analysis revealed a frontal negativity specific to no-go trials that develops roughly 150 ms after stimulus onset. We conclude that the visual processing needed to perform this highly demanding task can be achieved in under 150 ms.
Article
Although the cognitive sciences aim to ultimately understand behavior and brain function in the real world, for historical and practical reasons, the field has relied heavily on artificial stimuli, typically pictures. We review a growing body of evidence that both behavior and brain function differ between image proxies and real, tangible objects. We also propose a new framework for immersive neuroscience to combine two approaches: (i) the traditional build-up approach of gradually combining simplified stimuli, tasks, and processes; and (ii) a newer tear-down approach that begins with reality and compelling simulations such as virtual reality to determine which elements critically affect behavior and brain processing.
Article
In mammals with frontal eyes, optic-nerve fibers from nasal retina project to the contralateral hemisphere of the brain, and fibers from temporal retina project ipsilaterally. The division between crossed and uncrossed projections occurs at or near the vertical meridian. If the division was precise, a problem would arise. Small objects near midline, but nearer or farther than current fixation, would produce signals that travel to opposite hemispheres, making the binocular disparity of those objects difficult to compute. However, in species that have been studied, the division is not precise. Rather, there are overlapping crossed and uncrossed projections such that some fibers from nasal retina project ipsilaterally as well as contralaterally and some from temporal retina project contralaterally as well as ipsilaterally. This increases the probability that signals from an object near vertical midline travel to the same hemisphere, thereby aiding disparity estimation. We investigated whether there is a deficit in binocular vision near the vertical meridian in humans and found no evidence for one. We also investigated the effectiveness of the observed decussation pattern, quantified from anatomical data in monkeys and humans. We used measurements of naturally occurring disparities in humans to determine disparity distributions across the visual field. We then used those distributions to calculate the probability of natural disparities transmitting to the same hemisphere, thereby aiding disparity computation. We found that the pattern of overlapping projections is quite effective. Thus, crossed and uncrossed projections from the retinas are well designed for aiding disparity estimation and stereopsis.
Article
Cluster‐based permutation tests are gaining an almost universal acceptance as inferential procedures in cognitive neuroscience. They elegantly handle the multiple comparisons problem in high‐dimensional magnetoencephalographic and EEG data. Unfortunately, the power of this procedure comes hand in hand with the allure for unwarranted interpretations of the inferential output, the most prominent of which is the overestimation of the temporal, spatial, and frequency precision of statistical claims. This leads researchers to statements about the onset or offset of a certain effect that is not supported by the permutation test. In this article, we outline problems and common pitfalls of using and interpreting cluster‐based permutation tests. We illustrate these with simulated data in order to promote a more intuitive understanding of the method. We hope that raising awareness about these issues will be beneficial to common scientific practices, while at the same time increasing the popularity of cluster‐based permutation procedures. Cluster‐based permutation tests are a powerful solution to the multiple comparisons problem in EEG and MEG data. We report on extremely common, yet inapplicable interpretations of this procedure, suggesting unwarranted precision of the actual underlying test statistic and leading to strong, but unsubstantiated claims. In this article, we outline problems and common pitfalls of using and interpreting cluster‐based permutation tests. Accurate interpretations of cluster‐based permutation tests will contribute to the adequate utilization, as well as the popularity, of this powerful method.
Article
Perception and perceptual decision-making are strongly facilitated by prior knowledge about the probabilistic structure of the world. While the computational benefits of using prior expectation in perception are clear, there are myriad ways in which this computation can be realized. We review here recent advances in our understanding of the neural sources and targets of expectations in perception. Furthermore, we discuss Bayesian theories of perception that prescribe how an agent should integrate prior knowledge and sensory information, and investigate how current and future empirical data can inform and constrain computational frameworks that implement such probabilistic integration in perception.
Article
Significance The way that we perceive the world is partly shaped by what we expect to see at any given moment. However, it is unclear how this process is neurally implemented. Recently, it has been proposed that the brain generates stimulus templates in sensory cortex to preempt expected inputs. Here, we provide evidence that a representation of the expected stimulus is present in the neural signal shortly before it is presented, showing that expectations can indeed induce the preactivation of stimulus templates. Importantly, these expectation signals resembled the neural signal evoked by an actually presented stimulus, suggesting that expectations induce similar patterns of activations in visual cortex as sensory stimuli.
Article
Top-down attention is the mechanism that allows us to selectively process goal-relevant aspects of a scene while ignoring irrelevant aspects. A large body of research has characterized the effects of attention on neural activity evoked by a visual stimulus. However, attention also includes a preparatory phase before stimulus onset in which the attended dimension is internally represented. Here, we review neurophysiological, functional magnetic resonance imaging, magnetoencephalography, electroencephalography, and transcranial magnetic stimulation (TMS) studies investigating the neural basis of preparatory attention, both when attention is directed to a location in space and when it is directed to nonspatial stimulus attributes (content-based attention) ranging from low-level features to object categories. Results show that both spatial and content-based attention lead to increased baseline activity in neural populations that selectively code for the attended attribute. TMS studies provide evidence that this preparatory activity is causally related to subsequent attentional selection and behavioral performance. Attention thus acts by preactivating selective neurons in the visual cortex before stimulus onset. This appears to be a general mechanism that can operate on multiple levels of representation. We discuss the functional relevance of this mechanism, its limitations, and its relation to working memory, imagery, and expectation. We conclude by outlining open questions and future directions.
Article
Representational similarity analysis of activation patterns has become an increasingly important tool for studying brain representations. The dissimilarity between two patterns is commonly quantified by the correlation distance or the accuracy of a linear classifier. However, there are many different ways to measure pattern dissimilarity and little is known about their relative reliability. Here, we compare the reliability of three classes of dissimilarity measure: classification accuracy, Euclidean/Mahalanobis distance, and Pearson correlation distance. Using simulations and four real functional magnetic resonance imaging (fMRI) datasets, we demonstrate that continuous dissimilarity measures are substantially more reliable than the classification accuracy. The difference in reliability can be explained by two characteristics of classifiers: discretization and susceptibility of the discriminant function to shifts of the pattern ensemble between runs. Reliability can be further improved through multivariate noise normalization for all measures. Finally, unlike conventional distance measures, crossvalidated distances provide unbiased estimates of pattern dissimilarity on a ratio scale, thus providing an interpretable zero point. Overall, our results indicate that the crossvalidated Mahalanobis distance is preferable to both the classification accuracy and the correlation distance for characterizing representational geometries.
Article
Prior expectations about the visual world facilitate perception by allowing us to quickly deduce plausible interpretations from noisy and ambiguous data. The neural mechanisms of this facilitation remain largely unclear. Here, we used functional magnetic resonance imaging (fMRI) and multivariate pattern analysis (MVPA) techniques to measure both the amplitude and representational content of neural activity in the early visual cortex of human volunteers. We find that while perceptual expectation reduces the neural response amplitude in the primary visual cortex (V1), it improves the stimulus representation in this area, as revealed by MVPA. This informational improvement was independent of attentional modulations by task relevance. Finally, the informational improvement in V1 correlated with subjects' behavioral improvement when the expected stimulus feature was relevant. These data suggest that expectation facilitates perception by sharpening sensory representations.
Article
LIBSVM is a library for support vector machines (SVM). Its goal is to help users to easily use SVM as a tool. In this document, we present all its imple-mentation details. For the use of LIBSVM, the README file included in the package and the LIBSVM FAQ provide the information.
Article
While there are selective regions of occipitotemporal cortex that respond to faces, letters, and bodies, the large-scale neural organization of most object categories remains unknown. Here, we find that object representations can be differentiated along the ventral temporal cortex by their real-world size. In a functional neuroimaging experiment, observers were shown pictures of big and small real-world objects (e.g., table, bathtub; paperclip, cup), presented at the same retinal size. We observed a consistent medial-to-lateral organization of big and small object preferences in the ventral temporal cortex, mirrored along the lateral surface. Regions in the lateral-occipital, inferotemporal, and parahippocampal cortices showed strong peaks of differential real-world size selectivity and maintained these preferences over changes in retinal size and in mental imagery. These data demonstrate that the real-world size of objects can provide insight into the spatial topography of object representation.
Article
The VideoToolbox is a free collection of two hundred C subroutines for Macintosh computers that calibrates and controls the computer-display interface to create accurately specified visual stimuli. High-level platform-independent languages like MATLAB are best for creating the numbers that describe the desired images. Low-level, computer-specific VideoToolbox routines control the hardware that transforms those numbers into a movie. Transcending the particular computer and language, we discuss the nature of the computer-display interface, and how to calibrate and control it.
Article
Mammals are highly skilled in rapidly detecting objects in cluttered natural environments, a skill necessary for survival. What are the neural mechanisms mediating detection of objects in natural scenes? Here, we use human brain imaging to address the role of top-down preparatory processes in the detection of familiar object categories in real-world environments. Brain activity was measured while participants were preparing to detect highly variable depictions of people or cars in natural scenes that were new to the participants. The preparation to detect objects of the target category, in the absence of visual input, evoked activity patterns in visual cortex that resembled the response to actual exemplars of the target category. Importantly, the selectivity of multivoxel preparatory activity patterns in object-selective cortex (OSC) predicted target detection performance. By contrast, preparatory activity in early visual cortex (V1) was negatively related to search performance. Additional behavioral results suggested that the dissociation between OSC and V1 reflected the use of different search strategies, linking OSC preparatory activity to relatively abstract search preparation and V1 to more specific imagery-like preparation. Finally, whole-brain searchlight analyses revealed that, in addition to OSC, response patterns in medial prefrontal cortex distinguished the target categories based on the search cues alone, suggesting that this region may constitute a top-down source of preparatory activity observed in visual cortex. These results indicate that in naturalistic situations, when the precise visual characteristics of target objects are not known in advance, preparatory activity at higher levels of the visual hierarchy selectively mediates visual search.
Article
Expectation of locations and low-level features increases activity in extrastriate visual areas even in the absence of a stimulus, but it is unclear whether or how expectation of higher-level stimulus properties affects visual responses. Here, we used event-related functional magnetic resonance imaging (fMRI) to test whether category expectation affects baseline and stimulus-evoked activity in higher-level, category-selective inferotemporal (IT) visual areas. Word cues indicating an image category (FACE or HOUSE) were followed by a delay, then a briefly presented image of a face or a house. On most trials, the cue correctly predicted the upcoming stimulus. Baseline activity in regions within the fusiform face area (FFA) and parahippocampal place area (PPA) was modulated such that activity was higher during expectation of the preferred (e.g., FACE for FFA) vs. non-preferred category. Stimulus-evoked responses reflected an initial bias (higher overall activity) followed by increased selectivity (greater difference between activity to a preferred vs. non-preferred stimulus) after expectation of the preferred vs. non-preferred category. Consistent with the putative role of a frontoparietal network in top-down modulation of activity in sensory cortex, expectation-related activity in several frontal and parietal areas correlated with the magnitude of baseline shifts in the FFA and PPA across subjects. Furthermore, expectation-related activity in lateral prefrontal cortex also correlated with the magnitude of expectation-based increases in stimulus selectivity in IT areas. These findings demonstrate that category expectation influences both baseline and stimulus-evoked activity in category-selective inferotemporal visual areas, and that these modulations may be driven by a frontoparietal attentional control network.
Article
The visual system has an extraordinary capability to extract categorical information from complex natural scenes. For example, subjects are able to rapidly detect the presence of object categories such as animals or vehicles in new scenes that are presented very briefly. This is even true when subjects do not pay attention to the scenes and simultaneously perform an unrelated attentionally demanding task, a stark contrast to the capacity limitations predicted by most theories of visual attention. Here we show a neural basis for rapid natural scene categorization in the visual cortex, using functional magnetic resonance imaging and an object categorization task in which subjects detected the presence of people or cars in briefly presented natural scenes. The multi-voxel pattern of neural activity in the object-selective cortex evoked by the natural scenes contained information about the presence of the target category, even when the scenes were task-irrelevant and presented outside the focus of spatial attention. These findings indicate that the rapid detection of categorical information in natural scenes is mediated by a category-specific biasing mechanism in object-selective cortex that operates in parallel across the visual field, and biases information processing in favour of objects belonging to the target object category.
Article
The biased competition theory of selective attention has been an influential neural theory of attention, motivating numerous animal and human studies of visual attention and visual representation. There is now neural evidence in favor of all three of its most basic principles: that representation in the visual system is competitive; that both top-down and bottom-up biasing mechanisms influence the ongoing competition; and that competition is integrated across brain systems. We review the evidence in favor of these three principles, and in particular, findings related to six more specific neural predictions derived from these original principles.
Article
Five classes of relations between an object and its setting can characterize the organization of objects into real-world scenes. The relations are (1) Interposition (objects interrupt their background), (2) Support (objects tend to rest on surfaces), (3) Probability (objects tend to be found in some scenes but not others), (4) Position (given an object is probable in a scene, it often is found in some positions and not others), and (5) familiar Size (objects have a limited set of size relations with other objects). In two experiments subjects viewed brief (150 msec) presentations of slides of scenes in which an object in a cued location in the scene was either in a normal relation to its background or violated from one to three of the relations. Such objects appear to (1) have the background pass through them, (2) float in air, (3) be unlikely in that particular scene, (4) be in an inappropriate position, and (5) be too large or too small relative to the other objects in the scene. In Experiment I, subjects attempted to determine whether the cued object corresponded to a target object which had been specified in advance by name. With the exception of the Interposition violation, violation costs were incurred in that the detection of objects undergoing violations was less accurate and slower than when those same objects were in normal relations to their setting. However, the detection of objects in normal relations to their setting (innocent bystanders) was unaffected by the presence of another object undergoing a violation in that same setting. This indicates that the violation costs were incurred not because of an unsuccessful elicitation of a frame or schema for the scene but because properly formed frames interfered with (or did not facilitate) the perceptibility of objects undergoing violations. As the number of violations increased, target detectability generally decreased. Thus, the relations were accessed from the results of a single fixation and were available sufficiently early during the time course of scene perception to affect the perception of the objects in the scene. Contrary to expectations from a bottom-up account of scene perception, violations of the pervasive physical relations of Support and Interposition were not more disruptive on object detection than the semantic violations of Probability, Position and Size. These are termed semantic because they require access to the referential meaning of the object. In Experiment II, subjects attempted to detect the presence of the violations themselves. Violations of the semantic relations were detected more accurately than violations of Interposition and at least as accurately as violations of Support. As the number of violations increased, the detectability of the incongruities between an object and its setting increased. These results provide converging evidence that semantic relations can be accessed from the results of a single fixation. In both experiments information about Position was accessed at least as quickly as information on Probability. Thus in Experiment I, the interference that resulted from placing a fire hydrant in a kitchen was not greater than the interference from placing it on top of a mail ☐ in a street scene. Similarly, violations of Probability in Experiment II were not more detectable than violations of Position. Thus, the semantic relations which were accessed included information about the detailed interactions among the objects—information which is more specific than what can be inferred from the general setting. Access to the semantic relations among the entities in a scene is not deferred until the completion of spatial and depth processing and object identification. Instead, an object's semantic relations are accessed simultaneously with its physical relations as well as with its own identification.
Article
The Psychophysics Toolbox is a software package that supports visual psychophysics. Its routines provide an interface between a high-level interpreted language (MATLAB on the Macintosh) and the video display hardware. A set of example programs is included with the Toolbox distribution.
Article
Responses of neurons in inferior temporal cortex during memory-guided visual search. J. Neurophysiol. 80: 2918-2940, 1998. A typical scene will contain many different objects, few of which are relevant to behavior at any given moment. Thus attentional mechanisms are needed to select relevant objects for visual processing and control over behavior. We examined this role of attention in the inferior temporal cortex of macaque monkeys, using a visual search paradigm. While the monkey maintained fixation, a cue stimulus was presented at the center of gaze, followed by a blank delay period. After the delay, an array of two to five choice stimuli was presented extrafoveally, and the monkey was rewarded for detecting a target stimulus matching the cue. The behavioral response was a saccadic eye movement to the target in one version of the task and a lever release in another. The array was composed of one "good" stimulus (effective in driving the cell when presented alone) and one or more "poor" stimuli (ineffective in driving the cell when presented alone). Most cells showed higher delay activity after a good stimulus used as the cue than after a poor stimulus. The baseline activity of cells was also higher preceding a good cue, if the animal expected it to occur. This activity may depend on a top-down bias in favor of cells coding the relevant stimulus. When the choice array was presented, most cells showed suppressive interactions between the stimuli as well as strong attention effects. When the choice array was presented in the contralateral visual field, most cells initially responded the same, regardless of which stimulus was the target. However, within 150-200 ms of array onset, responses were determined by the target stimulus. If the target was the good stimulus, the response to the array became equal to the response to the good stimulus presented alone. If the target was a poor stimulus, the response approached the response to that stimulus presented alone. Thus the influence of the nontarget stimulus was eliminated. These effects occurred well in advance of the behavioral response. When the array was positioned with stimuli on opposite sides of the vertical meridian, the contralateral stimulus appeared to dominate the response, and this dominant effect could not be overcome by attention. Overall, the results support a "biased competition" model of attention, according to which 1) objects in the visual field compete for representation in the cortex, and 2) this competition is biased in favor of the behaviorally relevant object by virtue of "top-down" feedback from structures involved in working memory.