ArticlePDF Available

The psychometric function: I. Fitting, sampling, and goodness of fit

Authors:

Abstract and Figures

The psychometric function relates an observer’s performance to an independent variable, usually some physical quantity of a stimulus in a psychophysical task. This paper, together with its companion paper (Wichmann & Hill, 2001), describes an integrated approach to (1) fitting psychometric functions, (2) assessing the goodness of fit, and (3) providing confidence intervals for the function’s parameters and other estimates derived from them, for the purposes of hypothesis testing. The present paper deals with the first two topics, describing a constrained maximum-likelihood method of parameter estimation and developing several goodness-of-fit tests. Using Monte Carlo simulations, we deal with two specific difficulties that arise when fitting functions to psychophysical data. First, we note that human observers are prone to stimulus-independent errors (orlapses). We show that failure to account for this can lead to serious biases in estimates of the psychometric function’s parameters and illustrate how the problem may be overcome. Second, we note that psychophysical data sets are usually rather small by the standards required by most of the commonly applied statistical tests. We demonstrate the potential errors of applying traditionalX 2 methods to psychophysical data and advocate use of Monte Carlo resampling techniques that do not rely on asymptotic theory. We have made available the software to implement our methods.
Content may be subject to copyright.
A preview of the PDF is not available
... Crucially, the REM allows for mistranslations of these internal states I ∈ {I xy , I si , I yx } into the responses R ∈ {R xy , R si , R yx }: Each internal state I (e.g., I xy ) is misreported with some probability, and given a misreport of I, each non-corresponding response R (e.g., R si and R yx ) is provided with a certain probability. This resembles modeling approaches in visual psychophysics to account for finger errors and attentional lapses (e.g., Swanson & Birch, 1992;Wichmann & Hill, 2001). ...
... Because G 2 is only asymptotically χ 2 -distributed, chisquare goodness-of-fit tests yield an inflation of significant results for small expected frequencies (García-Pérez, 1994;García-Pérez & Núñez-Antón, 2001, which were prevalent in many of the data sets considered here (e.g., Lahkar et al., 2023). Furthermore, estimation of confidence intervals via parametric bootstrapping (see Alcalá-Quintana & García-Pérez, 2013;García-Pérez & Alcalá-Quintana, 2012b;Wichmann & Hill, 2001) lead to large differences in the width of the confidence intervals for G 2 across the three models, which exhibited considerable differences in their best-fitting variability parameter values. It therefore seemed most sensible not to subject G 2 to a chi-square or parametricbootstrap goodness-of-fit test, but instead to first assess the goodness of fit qualitatively based on visual inspection and then to offset goodness of fit (G 2 ) against model complexity in a quantitative model comparison. ...
Article
Full-text available
The perception of temporal order or simultaneity of stimuli is almost always explained in terms of independent-channels models, such as perceptual-moment, triggered-moment, and attention-switching models. Independent-channels models generally posit that stimuli are processed in separate peripheral channels and that their arrival-time difference at a central location is translated into an internal state of order (simultaneity) if it reaches (misses) a certain threshold. Non-monotonic and non-parallel psychometric functions in a ternary-response task provided critical evidence against a wide range of independent-channels models. However, two independent-channels models have been introduced in the last decades that can account for such shapes by considering misreports of internal states (response-error model) or by assuming that simultaneity and order judgments rely on distinct sensory and decisional processes (two-stage model). Based on previous ideas, we also consider a two-threshold model, according to which the same arrival-time difference may need to reach a higher threshold for order detection than for successiveness detection. All three models were fitted to various data sets collected over a period of more than a century. The two-threshold model provided the best balance between goodness of fit and parsimony. This preference for the two-threshold model over the two-stage model and the response-error model aligns well with several lines of evidence from cognitive modeling, psychophysics, mental chronometry, and psychophysiology. We conclude that the seemingly deviant shapes of psychometric functions can be explained within the framework of independent-channels models in a simpler way than previously assumed.
... Second, a cumulative Gaussian function was fitted for each psychometric data point to determine the threshold for each subject for each test session using the psignifit toolbox (ver. 2.5.6) for MATLAB 88 . All trials were included for the estimation of threshold SOA except for trials in which the letter task was incorrect, as incorrect trials for the letter task suggest that participants did not effectively maintain eye fixation during these trials. ...
Article
Full-text available
Individuals experience difficulty falling asleep in a new environment, termed the first night effect (FNE). However, the impact of the FNE on sleep-induced brain plasticity remains unclear. Here, using a within-subject design, we found that the FNE significantly reduces visual plasticity during sleep in young adults. Sleep-onset latency (SOL), an indicator of the FNE, was significantly longer during the first sleep session than the second session, confirming the FNE. We assessed performance gains in visual perceptual learning after sleep and increases in the excitatory-to-inhibitory neurotransmitter (E/I) ratio in early visual areas during sleep using magnetic resonance spectroscopy and polysomnography. These parameters were significantly smaller in sleep with the FNE than in sleep without the FNE; however, these parameters were not correlated with SOL. These results suggest that while the neural mechanisms of the FNE and brain plasticity are independent, sleep disturbances temporarily block the neurochemical process fundamental for brain plasticity.
... How discriminable are these two statistical properties given a brief exposure time? We quantify the required signal strength for reliable judgments using psychometric modeling techniques [39]. RQ2: Does the choice of colormap impact ensemble estimates? ...
Preprint
Visualizations support rapid analysis of scientific datasets, allowing viewers to glean aggregate information (e.g., the mean) within split-seconds. While prior research has explored this ability in conventional charts, it is unclear if spatial visualizations used by computational scientists afford a similar ensemble perception capacity. We investigate people's ability to estimate two summary statistics, mean and variance, from pseudocolor scalar fields. In a crowdsourced experiment, we find that participants can reliably characterize both statistics, although variance discrimination requires a much stronger signal. Multi-hue and diverging colormaps outperformed monochromatic, luminance ramps in aiding this extraction. Analysis of qualitative responses suggests that participants often estimate the distribution of hotspots and valleys as visual proxies for data statistics. These findings suggest that people's summary interpretation of spatial datasets is likely driven by the appearance of discrete color segments, rather than assessments of overall luminance. Implicit color segmentation in quantitative displays could thus prove more useful than previously assumed by facilitating quick, gist-level judgments about color-coded visualizations.
... Once they reliably scored 70% in that task, F1 and F2 were set to vary in a range of frequencies going from 0 to 90 Hz, defining the stimulus space (Fig. 1d). To quantify the relationship between the stimulus and the sensory capabilities of mice, we fitted the probability of F1 being called higher with a psychometric function 14 which reveals a reliable dependence of the choice side on the stimulation frequency difference ΔF = F1 − F2 (Fig. 1e) across animals. ΔF is the dimension explaining most performance across the stimulus space (Fig. S1). ...
Article
Full-text available
During perceptually guided decisions, correlates of choice are found as upstream as in the primary sensory areas. However, how well these choice signals align with early sensory representations, a prerequisite for their interpretation as feedforward substrates of perception, remains an open question. We designed a two alternative forced choice task (2AFC) in which male mice compared stimulation frequencies applied to two adjacent vibrissae. The optogenetic silencing of individual columns in the primary somatosensory cortex (wS1) resulted in predicted shifts of psychometric functions, demonstrating that perception depends on focal, early sensory representations. Functional imaging of layer II/III single neurons revealed mixed coding of stimuli, choices and engagement in the task. Neurons with multi-whisker suppression display improved sensory discrimination and had their activity increased during engagement in the task, enhancing selectively representation of the signals relevant to solving the task. From trial to trial, representation of stimuli and choice varied substantially, but mostly orthogonally to each other, suggesting that perceptual variability does not originate from wS1 fluctuations but rather from downstream areas. Together, our results highlight the role of primary sensory areas in forming a reliable sensory substrate that could be used for flexible downstream decision processes.
Preprint
Full-text available
Animals and humans are endowed with an adaptive ability to rapidly extract approximate numerical information from sets, yet the underlying visual mechanisms are poorly understood. Evidence suggests that visual approximate numerosity relies on segmented perceptual units, modulated by grouping cues. Indeed, perceived numerosity decreases when objects are connected by irrelevant-lines without varying low-level features. However, approximate numerosity perception has been largely studied with physical objects. Illusory contours (ICs) are crucial psychophysical tools for uncovering segmentation mechanisms built into the visual cortex. Strikingly, “illusory” objects are subjected to several perceptual biases (e.g., tilt aftereffect) akin to physical objects, indicating a common processing mechanism. Here, to unveil further similarities between real and ICs processing, we tested whether approximate numerical ICs perception is affected by connectedness grouping. In a forced-choice task, participants compared pairs of stimuli containing Ehrenstein-like ICs with varying numerosity, interspersed with four physical task-irrelevant lines. We manipulated the number of connected pairs (0, 2, or 4), aligning the lines to the gaps triggering ICs, while keeping low-level features constant across connectedness levels. Results revealed a monotonic numerosity underestimation as connections increased, and a constant precision implying a Weber-like encoding of numerosity. Furthermore, connectedness causes a proportional cost in reaction-times. These results clearly show that numerical processing of ICs ensembles is subjected to the same connectedness effect observed with real objects, suggesting a shared visual segmentation/grouping mechanism for approximate numerosity extraction from both real and ICs objects. Results are discussed in light of their significance for artificial intelligence models of visual perception.
Preprint
Full-text available
Parkinson's disease (PD) is characterized by the degeneration of dopaminergic neurons in the striatum, predominantly associated with motor symptoms. However, non-motor deficits, particularly sensory symptoms, often precede motor manifestations, offering a potential early diagnostic window. The impact of non-motor deficits on sensation behavior and the underlying mechanisms remains poorly understood. In this study, we examined changes in tactile sensation within a Parkinsonian state by employing a mouse model of PD induced by 6-hydroxydopamine (6-OHDA) to deplete striatal dopamine (DA). Leveraging the conserved mouse whisker system as a model for tactile-sensory stimulation, we conducted psychophysical experiments to assess sensory-driven behavioral performance during a tactile detection task in both the healthy and Parkinson-like states. Our findings reveal that DA depletion induces pronounced alterations in tactile sensation behavior, extending beyond expected motor impairments. We observed diverse behavioral deficits, spanning detection performance, task engagement, and reward accumulation, among lesioned individuals. While subjects with extreme DA depletion consistently showed severe sensory behavioral deficits, others with substantial DA depletion displayed minimal changes in sensory behavior performance. Moreover, some exhibited moderate degradation of behavioral performance, likely stemming from sensory signaling loss rather than motor impairment. The implementation of a sensory detection task is a promising approach to quantify the extent of impairments associated with DA depletion in the animal model. This facilitates the exploration of early non-motor deficits in PD, emphasizing the importance of incorporating sensory assessments in understanding the diverse spectrum of PD symptoms.
Article
Researchers have been focusing on perceptual characteristics of autism spectrum disorder (ASD) in terms of sensory hyperreactivity. Previously, we demonstrated that temporal resolution, which is the accuracy to differentiate the order of two successive vibrotactile stimuli, is associated with the severity of sensory hyperreactivity. We currently examined whether an increase in the perceptual intensity of a tactile stimulus, despite its short duration, is derived from high temporal resolution and high frequency of sensory temporal summation. Twenty ASD and 22 typically developing (TD) participants conducted two psychophysical experimental tasks to evaluate detectable duration of vibrotactile stimulus with same amplitude and to evaluate temporal resolution. The sensory hyperreactivity was estimated using self-reported questionnaire. There was no relationship between the temporal resolution and the duration of detectable stimuli in both groups. However, the ASD group showed severe sensory hyperreactivity in daily life than TD group, and the ASD participants with severe sensory hyperreactivity tended to have high temporal resolution, not high sensitivity of detectable duration. Contrary to the hypothesis, there might be different processing between temporal resolution and sensitivity for stimulus detection. We suggested that the atypical temporal processing would affect to sensory reactivity in ASD.
Article
Full-text available
When interacting with the environment, humans typically shift their gaze to where information is to be found that is useful for the upcoming action. With increasing age, people become slower both in processing sensory information and in performing their movements. One way to compensate for this slowing down could be to rely more on predictive strategies. To examine whether we could find evidence for this, we asked younger (19–29 years) and older (55–72 years)healthy adults to perform a reaching task wherein they hit a visual target that appeared at one of two possible locations. In separate blocks of trials, the target could appear always at the same location (predictable), mainly at one of the locations (biased), or at either location randomly (unpredictable). As one might expect, saccades toward predictable targets had shorter latencies than those toward less predictable targets, irrespective of age. Older adults took longer to initiate saccades toward the target location than younger adults, even when the likely target location could be deduced. Thus we found no evidence of them relying more on predictive gaze. Moreover, both younger and older participants performed more saccades when the target location was less predictable, but again no age-related differences were found. Thus we found no tendency for older adults to rely more on prediction.
Article
Infant primates see poorly, and most perceptual functions mature steadily beyond early infancy. Behavioral studies on human and macaque infants show that global form perception, as measured by the ability to integrate contour information into a coherent percept, improves dramatically throughout the first several years after birth. However, it is unknown when sensitivity to curvature and shape emerges in early life or how it develops. We studied the development of shape sensitivity in 18 macaques, aged 2 months to 10 years. Using radial frequency stimuli, circular targets whose radii are modulated sinusoidally, we tested monkeys' ability to radial frequency stimuli from circles as a function of the depth and frequency of sinusoidal modulation. We implemented a new four-choice oddity task and compared the resulting data with that from a traditional two-alternative forced choice task. We found that radial frequency pattern perception was measurable at the youngest age tested (2 months). Behavioral performance at all radial frequencies improved with age. Performance was better for higher radial frequencies, suggesting the developing visual system prioritizes processing of fine visual details that are ecologically relevant. By using two complementary methods, we were able to capture a comprehensive developmental trajectory for shape perception.
Article
Even a transient period of hearing loss during the developmental critical period can induce long-lasting deficits in temporal and spectral perception. These perceptual deficits correlate with speech perception in humans. In gerbils, these hearing loss–induced perceptual deficits are correlated with a reduction of both ionotropic GABA A and metabotropic GABA B receptor–mediated synaptic inhibition in auditory cortex, but most research on critical period plasticity has focused on GABA A receptors. Therefore, we developed viral vectors to express proteins that would upregulate gerbil postsynaptic inhibitory receptor subunits (GABA A , Gabra1 ; GABA B , Gabbr1b ) in pyramidal neurons, and an enzyme that mediates GABA synthesis ( GAD65 ) presynaptically in parvalbumin-expressing interneurons. A transient period of developmental hearing loss during the auditory critical period significantly impaired perceptual performance on two auditory tasks: amplitude modulation depth detection and spectral modulation depth detection. We then tested the capacity of each vector to restore perceptual performance on these auditory tasks. While both GABA receptor vectors increased the amplitude of cortical inhibitory postsynaptic potentials, only viral expression of postsynaptic GABA B receptors improved perceptual thresholds to control levels. Similarly, presynaptic GAD65 expression improved perceptual performance on spectral modulation detection. These findings suggest that recovering performance on auditory perceptual tasks depends on GABA B receptor-dependent transmission at the auditory cortex parvalbumin to pyramidal synapse and point to potential therapeutic targets for developmental sensory disorders.
Article
Full-text available
Data analysis methods in psychology still emphasize statistical significance testing, despite numerous articles demonstrating its severe deficiencies. It is now possible to use meta-analysis to show that reliance on significance testing retards the development of cumulative knowledge. But reform of teaching and practice will also require that researchers learn that the benefits that they believe flow from use of significance testing are illusory. Teachers must revamp their courses to bring students to understand that (a) reliance on significance testing retards the growth of cumulative research knowledge; (b) benefits widely believed to flow from significance testing do not in fact exist; and (c) significance testing methods must be replaced with point estimates and confidence intervals in individual studies and with meta-analyses in the integration of multiple studies. This reform is essential to the future progress of cumulative knowledge in psychological research.