Some Recent Work on Resampling Methods for Complex Surveys“

Hypotheses Testing from Complex Survey Data Using Bootstrap Weights: A Unified Approach

Article

Full-text available

Mar 2023

Standard statistical methods without taking proper account of the complexity of a survey design can lead to erroneous inferences when applied to survey data due to unequal selection probabilities, clustering, and other design features. In particular, the type I error rates of hypotheses tests using standard methods can be much larger than the nominal significance level. Methods incorporating design features in testing hypotheses have been proposed, including Wald tests and quasi-score tests that involve estimated covariance matrices of parameter estimates. In this paper, we present a unified approach to hypothesis testing without requiring estimated covariance matrices or design effects, by constructing bootstrap approximations to quasi-likelihood ratio statistics and quasi-score statistics and establishing its asymptotic validity. The proposed method can be easily implemented without a specific software designed for complex survey sampling. We also consider hypothesis testing for categorical data and present a bootstrap procedure for testing simple goodness of fit and independence in a two-way table. In simulation studies, the type I error rates of the proposed approach are much closer to their nominal significance level compared with the naive likelihood ratio test and quasi-score test. An application to an educational survey under a logistic regression model is also presented.

A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase

Article

Full-text available

Mar 2022

The bootstrap method is often used for variance estimation in sample surveys with a stratified multistage sampling design. It is typically implemented by producing a set of bootstrap weights that is made available to users and that accounts for the complexity of the sampling design. The Rao–Wu–Yue method is often used to produce the required bootstrap weights. It is valid under stratified with-replacement sampling at the first stage or fixed-size without-replacement sampling provided the first-stage sampling fractions are negligible. Some surveys use designs that do not satisfy these conditions. We propose a simple and unified bootstrap method that addresses this limitation of the Rao–Wu–Yue bootstrap weights. This method is applicable to any multistage sampling design as long as valid bootstrap weights can be produced for each distinct stage of sampling. Our method is also applicable to two-phase sampling designs provided that Poisson sampling is used at the second phase. We use this design to model survey nonresponse and derive bootstrap weights that account for nonresponse weighting. The properties of our bootstrap method are evaluated in three limited simulation studies.

A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs

Article

Full-text available

Jun 2022

Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the availability of second-order inclusion probabilities at each stage. To cope with this issue, several bootstrap algorithms have been proposed in the literature in the context of a two-stage sampling design. In this paper, we describe some of these algorithms and compare them empirically in terms of bias, stability, and coverage probability.

Design-based composite estimation of small proportions in small domains

Preprint

Full-text available

Feb 2022

Andrius Čiginas

Traditional direct estimation methods are not efficient for domains of a survey population with small sample sizes. To estimate the domain proportions, we combine the direct estimators and the regression-synthetic estimators based on domain-level auxiliary information. For the case of small true proportions, we introduce the design-based linear combination that is a robust alternative to the empirical best linear unbiased predictor (EBLUP) based on the Fay--Herriot model. We also consider an adaptive procedure optimizing a sample-size-dependent composite estimator, which depends on a single parameter for all domains. We imitate the Lithuanian Labor Force Survey, where we estimate the proportions of the unemployed and employed in municipalities. We show where the considered design-based compositions and estimators of their mean square errors are competitive for EBLUP and its accuracy estimation.

Asymptotic properties of cross-classified sampling designs

Preprint

Full-text available

May 2024

We investigate the family of cross-classified sampling designs across an arbitrary number of dimensions. We introduce a variance decomposition that enables the derivation of general asymptotic properties for these designs and the development of straightforward and asymptot-ically unbiased variance estimators. Additionally, we demonstrate the suitability of weighted bootstrap techniques for CCS, given the availability of a weighted bootstrap technique in each dimension. Our conclusions are supported by an extensive simulation study. Finally, we apply the proposed methods to a French longitudinal survey conducted among children.

Strengths and vulnerabilities: Comparing post-9/11 U.S. veterans’ and non-veterans’ perceptions of health and broader well-being

Article

Full-text available

Aug 2022

Background Prior research has examined how the post-military health and well-being of both the larger veteran population and earlier veteran cohorts differs from non-veterans. However, no study has yet to provide a holistic examination of how the health, vocational, financial, and social well-being of the newest generation of post-9/11 U.S. military veterans compares with their non-veteran peers. This is a significant oversight, as accurate knowledge of the strengths and vulnerabilities of post-9/11 veterans is required to ensure that the needs of this population are adequately addressed, as well as to counter inaccurate veteran stereotypes. Methods Post-9/11 U.S. veterans’ (N = 15,160) and non-veterans’ (N = 4,533) reported on their health and broader well-being as part of a confidential web-based survey in 2018. Participants were drawn from probability-based sampling frames, and sex-stratified weighted logistic regressions were conducted to examine differences in veterans’ and non-veterans’ reports of health, vocational, financial, and social outcomes. Results Although both men and women post-9/11 veterans endorsed poorer health status than non-veterans, they reported greater engagement in a number of positive health behaviors (healthy eating and exercise) and were more likely to indicate having access to health care. Veterans also endorsed greater social well-being than non-veterans on several outcomes, whereas few differences were observed in vocational and financial well-being. Conclusion Despite their greater vulnerability to experiencing health conditions, the newest generation of post-9/11 U.S. veterans report experiencing similar or better outcomes than non-veterans in many aspects of their lives. Findings underscore the value of examining a wider range of health and well-being outcomes in veteran research and highlight a number of important directions for intervention, public health education, policy, and research related to the reintegration of military veterans within broader civilian society.

Resampling under Complex Sampling Designs: Roots, Development and the Way Forward

Article

Full-text available

Mar 2022

In the present paper, resampling for finite populations under an iid sampling design is reviewed. Our attention is mainly focused on pseudo-population-based resampling due to its properties. A principled appraisal of the main theoretical foundations and results is given and discussed, together with important computational aspects. Finally, a discussion on open problems and research perspectives is provided.

Trauma Exposure, Mental Health, and Mental Health Treatment Among LGBTQ+ Veterans and Nonveterans

Article

Full-text available

Jun 2024

Objective: The purpose of the study was to compare lesbian, gay, bisexual, transgender, queer+ (LGBTQ+) veterans’ and nonveterans’ prevalence of potentially traumatic events (PTEs) and other stressor exposures, mental health concerns, and mental health treatment. Method: A subsample of veterans and nonveterans who identified as LGBTQ+ (N = 1,291; 851 veterans; 440 nonveterans) were identified from a national cohort of post-9/11 veterans and matched nonveterans. Majority of the sample identified as White (59.7%), men (40.4%), and gay or lesbian (48.6%). Measures included PTEs and other stressors, depression, anxiety, posttraumatic stress disorder (PTSD), and receipt of mental health treatment. Logistic regressions compared the likelihood of experiencing PTEs and other stressors, self-reported mental health diagnoses, and mental health treatment between LGBTQ+ veterans and nonveterans. Results: Compared with LGBTQ+ nonveterans, LGBTQ+ veterans were more likely to report financial strain, divorce, discrimination, witnessing the sudden death of a friend or family member, and experiencing a serious accident or disaster. LGBTQ+ veterans reported greater depression, anxiety, and PTSD symptom severity than LGBTQ+ nonveterans. However, LGBTQ+ veterans were only more likely to receive psychotherapy for PTSD and did not differ from nonveterans in the likelihood of receiving any other types of mental health treatment. Conclusions: The study was the first to demonstrate that LGBTQ+ veterans have a greater prevalence of PTEs and other stressors and report worse mental health symptoms. These findings suggest that LGBTQ+ veterans may have unmet mental health treatment needs and need interventions to increase engagement in needed mental health services, especially for depression and anxiety.

Changes in the Dependence Structure of AROPE Components: Evidence from the Spanish Regions

Article

Full-text available

Mar 2024
HACIENDA PUBLICA ESP

The AROPE rate is a multidimensional indicator to monitor poverty in the European Union whichcombines income, work intensity and material deprivation. However, it misses the possible relationshipbetween its components. To overcome this drawback, some authors proposed to complement theAROPE rate with measures of the dependence between its dimensions, since higher dependence canexacerbate poverty. In this paper, we follow this approach and measure such dependence in the Spanishregions over the period 2008-2018 using three multivariate versions of Spearman’s rank correlationcoefficient. Our results reveal an asymmetric effect of the economic cycle on the dependence betweenpoverty dimensions, as this dependence, in many Spanish regions, substantially increased during theGreat Recession but dropped little during the economic recovery. Moreover, regions with higherAROPE rates also tend to experience more dependence between its dimensions.

Examining Differences in Mental Health and Mental Health Service Use Among Lesbian, Gay, Bisexual, and Heterosexual Veterans

Article

Full-text available

Feb 2024

Sexual minority veterans are at heightened risk for mental health conditions compared with their heterosexual peers. Subpopulations of the sexual minority community, including veterans, are at even greater risk for mental health conditions. Despite this heightened risk, little is known about mental health treatment seeking among sexual minority veterans, especially in under-researched sexual minority subpopulations (e.g., bisexual men and women). This study examined sexual orientation-based differences in mental health symptom severity and past-year mental health treatment among a national sample of post-9/11 veteran men and women (N = 14,968). Results indicated that bisexual veteran women had greater mental health symptom severity compared with lesbian/gay and heterosexual veteran women. Gay and bisexual veteran men had greater depression and anxiety symptom severity than heterosexual veteran men. However, among individuals who reported receiving a mental health diagnosis (posttraumatic stress disorder, depression, anxiety) there were no significant differences in odds of receiving mental health treatment between lesbian/gay and bisexual veteran men and women compared to their heterosexual counterparts. These results suggest the need for additional research on facilitators and barriers to accessing and engaging in mental health care among sexual minority veterans, especially bisexual veteran women who experience disproportionate psychological burden compared to their lesbian/gay and heterosexual peers.

Application of Sampling Variance Smoothing Methods for Small Area Proportion Estimation

Article

Full-text available

Dec 2023

Sampling variance smoothing is an important topic in small area estimation. In this article, we propose sampling variance smoothing methods for small area proportion estimation. In particular, we consider the generalized variance function and design effect methods for sampling variance smoothing. We evaluate and compare the smoothed sampling variances and small area estimates based on the smoothed variance estimates through analysis of survey data from Statistics Canada. The results from real data analysis and simulation study indicate that the proposed sampling variance smoothing methods perform very well for small area estimation.

On using a non-probability sample for the estimation of population parametersSavanoriškosios imties panaudojimas populiacijos parametrams vertinti

Article

Full-text available

Nov 2023

We aim to find a way to effectively integrate a non-probability (voluntary) sample under the data framework, where the study variable is also observed in a probability sample of some statistical survey. The selection bias that arises from voluntary participation in the survey is corrected by estimating the inclusion into the sample probabilities (propensity scores) for the units in the non-probability sample. The estimators for the propensity scores are constructed using a parametric logistic regression model. We consider two modeling scenarios: with an assumption that the willingness to participate in the voluntary survey does not depend on the survey variable itself and that such a variable does contribute to whether the individual responds or not. The maximum likelihood method is applied in both scenarios to estimate the propensity scores. The estimators of the population mean based on the estimated propensity scores are linearly combined with the unbiased estimator using the probability sample data. We compare the constructed estimators in the simulation study, where we estimate the population proportions using data from the Population and Housing Census surveys.

Advances and Applications in Statistics A GENERAL DELETE-GROUP JACKKNIFE VARIANCE ESTIMATOR FOR STRATIFIED SAMPLING SURVEY

Article

Nov 2016

csSampling: An R Package for Bayesian Models for Complex Survey Data

Preprint

Full-text available

Aug 2023

We present csSampling, an R package for estimation of Bayesian models for data collected from complex survey samples. csSampling combines functionality from the probabilistic programming language Stan (via the rstan and brms R packages) and the handling of complex survey data from the survey R package. Under this approach, the user creates a survey-weighted model in brms or provides a custom weighted model via rstan. Survey design information is provided via the svydesign function of the survey package. The cs_sampling function of csSampling estimates the weighted stan model and provides an asymptotic covariance correction for model mis-specification due to using survey sampling weights as plug-in values in the likelihood. This is often known as a ``design effect'' which is the ratio between the variance from a complex survey sample and a simple random sample of the same size. The resulting adjusted posterior draws can then be used for the usual Bayesian inference while also achieving frequentist properties of asymptotic consistency and correct uncertainty (e.g. coverage).

Pretest estimation in combining probability and non-probability samples

Article

Jan 2023

Pretest estimation in combining probability and non-probability samples

Preprint

Full-text available

May 2023

Multiple heterogeneous data sources are becoming increasingly available for statistical analyses in the era of big data. As an important example in finite-population inference, we develop a unified framework of the test-and-pool approach to general parameter estimation by combining gold-standard probability and non-probability samples. We focus on the case when the study variable is observed in both datasets for estimating the target parameters, and each contains other auxiliary variables. Utilizing the probability design, we conduct a pretest procedure to determine the comparability of the non-probability data with the probability data and decide whether or not to leverage the non-probability data in a pooled analysis. When the probability and non-probability data are comparable, our approach combines both data for efficient estimation. Otherwise, we retain only the probability data for estimation. We also characterize the asymptotic distribution of the proposed test-and-pool estimator under a local alternative and provide a data-adaptive procedure to select the critical tuning parameters that target the smallest mean square error of the test-and-pool estimator. Lastly, to deal with the non-regularity of the test-and-pool estimator, we construct a robust confidence interval that has a good finite-sample coverage property.

Exploring and Correcting the Bias in the Estimation of the Gini Measure of Inequality

Article

Full-text available

May 2023

The Gini index is probably the most commonly used indicator to measure inequality. For continuous distributions, the Gini index can be computed using several equivalent formulations. However, this is not the case with discrete distributions, where controversy remains regarding the expression to be used to estimate the Gini index. We attempt to bring a better understanding of the underlying problem by regrouping and classifying the most common estimators of the Gini index proposed in both infinite and finite populations, and focusing on the biases. We use Monte Carlo simulation studies to analyse the bias of the various estimators under a wide range of scenarios. Extremely large biases are observed in heavy-tailed distributions with high Gini indices, and bias corrections are recommended in this situation. We propose the use of some (new and traditional) bootstrap-based and jackknife-based strategies to mitigate this bias problem. Results are based on continuous distributions often used in the modelling of income distributions. We describe a simulation-based criterion for deciding when to use bias corrections. Various real data sets are used to illustrate the practical application of the suggested bias corrected procedures.

Design-based composite estimation of small proportions in small domains

Article

Full-text available

May 2023

Andrius Čiginas

Traditional direct estimation methods are inefficient for domains of a survey population with small sample sizes. To estimate the domain proportions, we combine the direct estimators and the regression-synthetic estimators based on domain-level auxiliary information. For the case of small true proportions, we propose the design-based linear combination that is a robust alternative to the empirical best linear unbiased predictor (EBLUP) based on the Fay–Herriot model. We imitate the Lithuanian Labor Force Survey, where we estimate the proportions of the unemployed and employed in municipalities. We show where the proposed design-based composition and estimator of its mean square error are competitive for EBLUP and its accuracy estimation.

An Empirical Comparison of a Traditional Strategy and Network Scale-Up Method for Prevalence Estimation of Child Trafficking in Sierra Leone

Article

Full-text available

Apr 2023
CRIME DELINQUENCY

The goal of this paper is to compare a traditional survey method with the network scale-up method (NSUM) for the prevalence estimation of child trafficking in Sierra Leone in 2020. The traditional survey method involved a probability-based, stratified, and clustered multistage sampling design in which adult respondents in 3,070 households were interviewed about trafficking of children who reside in their household in three selected districts. This paper details the first attempt to estimate the prevalence of child trafficking using NSUM, which entailed questioning the same adult respondents about the trafficking-related activities of children in their personal networks. Findings and interpretation of these results are presented, along with implications and recommendations for future studies.

Design‐based composite estimation rediscovered

Article

Apr 2023

Andrius Čiginas

Small area estimation methods are used in surveys, where sample sizes are too small to get reliable direct estimates of parameters in some population domains. We consider design‐based linear combinations of direct and synthetic estimators and propose a two‐step procedure to approach the optimal combination. We construct the mean square error estimator suitable for this and any other linear composition that estimates the optimal one. We apply the theory to two design‐based compositions analogous to the empirical best linear unbiased predictors (EBLUPs) based on the basic area‐ and unit‐level models. The simulation study shows that the new methods are efficient compared to estimation using EBLUP.

Rapidly Rising Diabetes and Increasing Body Weight: A Counterfactual Analysis in Repeated Cross-sectional Nationally Representative Data from Bangladesh

Article

Apr 2023

Background: Diabetes is a growing concern in South Asia but few nationally representative studies identify factors behind this rising disease burden. We studied the nationwide change in diabetes prevalence in Bangladesh, subpopulations disproportionately affected, and the contribution of rising unhealthy weight to the change in diabetes prevalence. Methods: Based on a sample of 13,959 adults aged 35 years and older with biomarker measurements from the 2011 and 2017/2018 Bangladesh Demographic and Health Surveys, we estimated how the prevalence of diabetes changed nationally and across socioeconomic/geographic groups. Using counterfactual decomposition, we assessed how much the prevalence of diabetes would have grown if BMI had not changed between 2011 and 2017. Results: Diabetes prevalence increased from 12.1 (11.1, 13.1) to 14.4% (13.3, 15.5) between 2011 and 2017/2018. Diabetes grew disproportionately quickly among population groups with higher household wealth, more education, and in three regions. Over this same period, mean BMI increased from 20.9 (20.8, 21.1) to 22.5 kg/m2 (22.4, 22.7) and overweight from 25.8 (24.4, 27.3) to 42.1% (40.4, 43.7). Under the counterfactual scenario of constant BMI, diabetes would have risen by only 1.0 (-0.4, 2.4) instead of 2.3 percentage points (0.8, 3.7) nationally, corresponding to a contribution of 58% (-106.3, 221.7). Similarly, group-specific trends were largely attributable to increasing BMI. Conclusions: Diabetes prevalence in Bangladesh has increased rapidly between 2011 and 2017/2018. Decomposition analysis estimates have wide confidence intervals but are consistent with the hypothesis that this change was driven by the dramatic rise in body weights.

Physical Health of Post-9/11 U.S. Military Veterans in the Context of Healthy People 2020 Targeted Topic Areas: Results from the Comparative Health Assessment Interview Research Study

Article

Full-text available

Feb 2023

Large-scale epidemiological studies suggest that veterans may have poorer physical health than nonveterans, but this has been largely unexamined in post-9/11 veterans despite research indicating their high levels of disability and healthcare utilization. Additionally, little investigation has been conducted on sex-based differences and interactions by veteran status. Notably, few studies have explored veteran physical health in relation to national health guidelines. Self-reported, weighted data were analyzed on post-9/11 U.S. veterans and nonveterans (n = 19,693; 6,992 women, 12,701 men; 15,160 veterans, 4,533 nonveterans). Prevalence was estimated for 24 physical health conditions classified by Healthy People 2020 targeted topic areas. Associations between physical health outcomes and veteran status were evaluated using bivariable and multivariable analyses. Back/neck pain was most reported by veterans (49.3 %), twice that of nonveterans (22.8 %)(p < 0.001). Adjusted odds ratios (AORs) for musculoskeletal and hearing disorders, traumatic brain injury, and chronic fatigue syndrome (CFS) were 3-6 times higher in veterans versus nonveterans (p < 0.001). Women versus men had the greatest adjusted odds for bladder infections (males:females, AOR = 0.08, 95 % CI:0.04-0.18)(p < 0.001), and greater odds than men for multiple sclerosis, CFS, cancer, irritable bowel syndrome/colitis, respiratory disease, some musculoskeletal disorders, and vision loss (p < 0.05). Cardiovascular-related conditions were most prominent for men (p < 0.001). Veteran status by sex interactions were found for obesity (p < 0.03; greater for male veterans) and migraine (p < 0.01; greater for females). Healthy People 2020 targeted topic areas exclude some important physical health conditions that are associated with being a veteran. National health guidelines for Americans should provide greater consideration of veterans in their design.

Measuring economic mobility in India using noisy data: a partial identification approach

Article

Jan 2023

We examine economic mobility in India while accounting for misclassification to better understand the welfare effects of the rise in inequality. To proceed, we extend recently developed methods on the partial identification of transition matrices. Allowing for modest misclassification, we find overall mobility has been remarkably low: at least 65% of poor households remained poor or at-risk of being poor between 2005 and 2012. We also find Muslims, lower caste groups, and rural households are in a more disadvantageous position compared to Hindus, upper caste groups, and urban households. These findings cast doubt on the conventional wisdom that marginalized households in India are catching up.

Association between Neighbourhood Deprivation Trajectories and Self-Perceived Health: Analysis of a Linked Survey and Health Administrative Data

Article

Full-text available

Dec 2022
Int J Environ Res Publ Health

Life course exposure to neighbourhood deprivation may have a previously unstudied relationship with health disparities. This study examined the association between neighbourhood deprivation trajectories (NDTs) and poor reported self-perceived health (SPH) among Quebec’s adult population. Data of 45,990 adults with complete residential address histories from the Care-Trajectories-Enriched Data cohort, which links Canadian Community Health Survey respondents to health administrative data, were used. Accordingly, participants were categorised into nine NDTs (T1 (Privileged Stable)–T9 (Deprived Stable)). Using multivariate logistic regression, the association between trajectory groups and poor SPH was estimated. Of the participants, 10.3% (95% confidence interval [CI]: 9.9–10.8) had poor SPH status. This proportion varied considerably across NDTs: From 6.4% (95% CI: 5.7–7.2) for Privileged Stable (most advantaged) to 16.4% (95% CI: 15.0–17.8) for Deprived Stable (most disadvantaged) trajectories. After adjustment, the likelihood of reporting poor SPH was significantly higher among participants assigned to a Deprived Upward (odds ratio [OR]: 1.77; 95% CI: 1.48–2.12), Average Downward (OR: 1.75; CI: 1.08–2.84) or Deprived trajectory (OR: 1.81; CI: 1.45–2.86), compared to the Privileged trajectory. Long-term exposure to neighbourhood deprivation may be a risk factor for poor SPH. Thus, NDT measures should be considered when selecting a target population for public-health-related interventions.

Correcting Selection Bias in Big Data by Pseudo-Weighting

Article

Full-text available

Dec 2022

Nonprobability samples, for example observational studies, online opt-in surveys, or register data, do not come from a sampling design and therefore may suffer from selection bias. To correct for selection bias, Elliott and Valliant (EV) proposed a pseudo-weight estimation method that applies a two-sample setup for a probability sample and a nonprobability sample drawn from the same population, sharing some common auxiliary variables. By estimating the propensities of inclusion in the nonprobability sample given the two samples, we may correct the selection bias by (pseudo) design-based approaches. This paper expands the original method, allowing for large sampling fractions in either sample or for high expected overlap between selected units in each sample, conditions often present in administrative data sets and more frequently occurring with Big Data.

Leisure noise exposure and hearing outcomes among Canadians aged 6 to 79 years

Article

Full-text available

Aug 2022

Objective: To examine the association between individual and cumulative leisure noise exposure in addition to acceptable yearly exposure (AYE) and hearing outcomes among a nationally representative sample of Canadians. Design: Audiometry, distortion-product otoacoustic emissions (DPOAEs) and in-person questionnaires were used to evaluate hearing and leisure noise exposure across age, sex, and household income/education level. High-risk cumulative leisure noise exposure was defined as 85 dBA or greater for 40 h or more per week, with AYE calculations also based on this occupational limit. Study sample: A randomised sample of 10,460 respondents, aged 6–79, completed questionnaires and hearing evaluations between 2012 and 2015. Results: Among 50–79 year olds, high-risk cumulative leisure noise was associated with increased odds of a notch while high exposure to farming/construction equipment noise was associated with hearing loss, notches and absent DPOAEs. No associations with hearing loss were found however, non-significant tendencies observed included higher mean hearing thresholds, notches and hearing loss odds. Conclusion: Educational outreach and monitoring of hearing among young and middle-aged populations exposed to hazardous leisure noise would be beneficial.

Jackknife Bias-Corrected Generalized Regression Estimator in Survey Sampling

Article

Feb 2024

The generalized regression estimator (GREG) is a well-known procedure for using auxiliary data to estimate means or totals using a sample selected from a finite population. The GREG estimator is motivated by an assumed linear superpopulation model and it is known to be asymptotically unbiased regardless of whether the model is correctly specified or not. When the sample size is small and/or when the linear model does not fit the sample data well, the GREG estimator may have nonnegligible bias. In this paper, we use the jackknife procedure to correct the bias of the GREG. We evaluate both theoretically and by simulation, the performance of the jackknife bias-corrected regression estimator (GREG-JK) under unistage sampling without replacement with unequal probabilities. A jackknife mean squared error estimator is proposed that naturally includes a finite population correction which is usually absent in the standard jackknife methods for variance estimation. A simulation study shows that the empirical bias of GREG-JK is negligible for all sample sizes and generated populations. Furthermore, the proposed jackknife mean squared error estimator demonstrates improvements over the customary estimator.

Increased risks for mental disorders among LGB individuals: cross-national evidence from the World Mental Health Surveys

Article

Full-text available

Jul 2022

Purpose Lesbian, gay, and bisexual (LGB) individuals, and LB women specifically, have an increased risk for psychiatric morbidity, theorized to result from stigma-based discrimination. To date, no study has investigated the mental health disparities between LGB and heterosexual AQ1individuals in a large cross-national population-based comparison. The current study addresses this gap by examining differences between LGB and heterosexual participants in 13 cross-national surveys, and by exploring whether these disparities were associated with country-level LGBT acceptance. Since lower social support has been suggested as a mediator of sexual orientation-based differences in psychiatric morbidity, our secondary aim was to examine whether mental health disparities were partially explained by general social support from family and friends. Methods Twelve-month prevalence of DSM-IV anxiety, mood, eating, disruptive behavior, and substance disorders was assessed with the WHO Composite International Diagnostic Interview in a general population sample across 13 countries as part of the World Mental Health Surveys. Participants were 46,889 adults (19,887 males; 807 LGB-identified). Results Male and female LGB participants were more likely to report any 12-month disorder (OR 2.2, p < 0.001 and OR 2.7, p < 0.001, respectively) and most individual disorders than heterosexual participants. We found no evidence for an association between country-level LGBT acceptance and rates of psychiatric morbidity between LGB and heterosexualAQ2 participants. However, among LB women, the increased risk for mental disorders was partially explained by lower general openness with family, although most of the increased risk remained unexplained. Conclusion These results provide cross-national evidence for an association between sexual minority status and psychiatric morbidity, and highlight that for women, but not men, this association was partially mediated by perceived openness with family. Future research into individual-level and cross-national sexual minority stressors is needed.

Rescaling Bootstrap Variance Estimation of Level-0 Ranked Set Sampling under Finite Population Framework

Article

Full-text available

May 2022

McIntyre (1952) introduced Ranked Set Sampling (RSS) to advance upon Simple Random Sampling (SRS) for circumstances where any preliminary ranking of sampled units is possible for variable of interest using visual inspection or some other means without physically measuring the units. Further, the RSS was classified into three sampling protocols named as Level-0, Level-1 and Level-2 (Deshpande et al., 2006). The Level-0 sampling protocol of RSS is considered in this article. Estimating the variance of the Level-0 RSS estimator under the finite population framework was found to be cumbersome. In this article, two distinct rescaling bootstrap with replacement methods known as Strata-based rescaling bootstrap with-replacement (SRBWR) method and Cluster-based rescaling bootstrap with-replacement (CRBWR) method have been proposed to unbiasedly estimate the variance of Level-0 RSS estimator of finite population mean. Rescaling factors are obtained for both the proposed methods to estimate the variance of the Level-0 RSS estimator unbiasedly. The results of the simulation analysis, together with real data application support, proposed methods are capable of estimating the variance of the Level-0 RSS estimator almost unbiasedly. The developed SRBWR method performs better than the CRBWR method considering Relative stability (RS) and percentage Relative Bias (%RB) for various combinations of set size (m) and several cycles (r).

Bootstrap Inference for the Finite Population Mean under Complex Sampling Designs

Article

Full-text available

Apr 2022

Bootstrap is a useful computational tool for statistical inference, but it may lead to erroneous analysis under complex survey sampling. In this paper, we propose a unified bootstrap method for stratified multi‐stage cluster sampling, Poisson sampling, simple random sampling without replacement and probability proportional to size sampling with replacement. In the proposed bootstrap method, we first generate bootstrap finite populations, apply the same sampling design to each bootstrap population to get a bootstrap sample, and then apply studentization. The second‐order accuracy of the proposed bootstrap method is established by the Edgeworth expansion. Simulation studies confirm that the proposed bootstrap method outperforms the commonly used Wald‐type method in terms of coverage, especially when the sample size is not large.

Mediation analysis using incomplete information from publicly available data sources

Article

Apr 2024
STAT MED

Our work was motivated by the question whether, and to what extent, well‐established risk factors mediate the racial disparity observed for colorectal cancer (CRC) incidence in the United States. Mediation analysis examines the relationships between an exposure, a mediator and an outcome. All available methods require access to a single complete data set with these three variables. However, because population‐based studies usually include few non‐White participants, these approaches have limited utility in answering our motivating question. Recently, we developed novel methods to integrate several data sets with incomplete information for mediation analysis. These methods have two limitations: (i) they only consider a single mediator and (ii) they require a data set containing individual‐level data on the mediator and exposure (and possibly confounders) obtained by independent and identically distributed sampling from the target population. Here, we propose a new method for mediation analysis with several different data sets that accommodates complex survey and registry data, and allows for multiple mediators. The proposed approach yields unbiased causal effects estimates and confidence intervals with nominal coverage in simulations. We apply our method to data from U.S. cancer registries, a U.S.‐population‐representative survey and summary level odds‐ratio estimates, to rigorously evaluate what proportion of the difference in CRC risk between non‐Hispanic Whites and Blacks is mediated by three potentially modifiable risk factors (CRC screening history, body mass index, and regular aspirin use).

Optimization for Calibration of Survey Weights under a Large Number of Conflicting Constraints

Article

Jan 2024

Avaliação de vício em métodos de estimação de variância baseados unidades primárias de amostragem

Conference Paper

Jan 2018

Practical Considerations for Sandwich Variance Estimation in 2-Stage Regression Settings

Article

Nov 2023

We present a practical approach for computing the sandwich variance estimator in two-stage regression model settings. As a motivating example for two-stage regression, we consider regression calibration, a popular approach for addressing covariate measurement error. The sandwich variance approach has been rarely applied in regression calibration, despite it requiring less computation time than popular resampling approaches for variance estimation, specifically the bootstrap. This is likely due to requiring specialized statistical coding. We first outline the steps needed to compute the sandwich variance estimator. We then develop a convenient method of computation in R for sandwich variance estimation, which leverages standard regression model outputs and existing R functions and can be applied in the case of a simple random sample or complex survey design. We use a simulation study to compare the sandwich to a resampling variance approach for both settings. Finally, we further compare these two variance estimation approaches for data examples from the Women’s Health Initiative (WHI) and Hispanic Community Health Study/Study of Latinos (HCHS/SOL). The sandwich variance estimator typically had good numerical performance, but simple Wald bootstrap confidence intervals were unstable or over-covered in certain settings, particularly when there was high correlation between covariates or large measurement error.

Estimation of SARS-CoV-2 Seroprevalence in Central North Carolina: Accounting for Outcome Misclassification in Complex Sample Designs

Article

Jul 2023

Background: Population-based seroprevalence studies are crucial to understand community transmission of COVID-19 and guide responses to the pandemic. Seroprevalence is typically measured from diagnostic tests with imperfect sensitivity and specificity. Failing to account for measurement error can lead to biased estimates of seroprevalence. Methods to adjust seroprevalence estimates for the sensitivity and specificity of the diagnostic test have largely focused on estimation in the context of convenience sampling. Many existing methods are inappropriate when data are collected using a complex sample design. Methods: We present methods for seroprevalence point estimation and confidence interval construction that account for imperfect test performance for use with complex sample data. We apply these methods to data from the Chatham County COVID-19 Cohort (C4), a longitudinal seroprevalence study conducted in central North Carolina. Using simulations, we evaluate bias and confidence interval coverage for the proposed estimator compared with a standard estimator under a stratified, three-stage cluster sample design. Results: We obtained estimates of seroprevalence and corresponding confidence intervals for the C4 study. SARS-CoV-2 seroprevalence increased rapidly from 10.4% in January to 95.6% in July 2021 in Chatham County, North Carolina. In simulation, the proposed estimator demonstrates desirable confidence interval coverage and minimal bias under a wide range of scenarios. Conclusion: We propose a straightforward method for producing valid estimates and confidence intervals when data are based on a complex sample design. The method can be applied to estimate the prevalence of other infections when estimates of test sensitivity and specificity are available.

Survey Sampling During the Last 50 Years

Article

Full-text available

Jul 2023

In this short paper we sketch how survey sampling changed during the last 50 years. We describe the development and use of model-assisted survey sampling and model-assisted estimators, such as the generalized regression estimator. We also discuss the development of complex survey designs, in particular mixed-mode survey designs and adaptive survey designs. These latter two kinds of survey designs were mainly developed to increase response rates and decrease survey costs. A third topic that we discuss is the estimation of sampling variance. The increased computing power of computers has made it possible to estimate sampling variance of an estimator by means of replication methods, such as the bootstrap. Finally, we briefly discuss current and future developments in survey sampling, such as the increased interest in using nonprobability samples.

Augmented two-step estimating equations with nuisance functionals and complex survey data

Article

Jul 2023
ECONOMET J

Statistical inference in the presence of nuisance functionals with complex survey data is an important topic in social and economic studies. The Gini index, Lorenz curves and quantile shares are among the commonly encountered examples. The nuisance functionals are usually handled by a plug-in nonparametric estimator and the main inferential procedure can be carried out through a two-step generalized empirical likelihood method. Unfortunately, the resulting inference is not efficient and the nonparametric version of the Wilks’ theorem breaks down even under simple random sampling. We propose an augmented estimating equations method with nuisance functionals and complex surveys. The second-step augmented estimating functions obey the Neyman orthogonality condition and automatically handle the impact of the first-step plug-in estimator, and the resulting estimator of the main parameters of interest is invariant to the first step method. More importantly, the generalized empirical likelihood based Wilks’ theorem holds for the main parameters of interest under the design-based framework for commonly used survey designs, and the maximum generalized empirical likelihood estimators achieve the semiparametric efficiency bound. Performances of the proposed methods are demonstrated through simulation studies and an application using the dataset from the New York City Social Indicators Survey.

Augmented two-step estimating equations with nuisance functionals and complex survey data

Preprint

Full-text available

Feb 2023

Statistical inference in the presence of nuisance functionals with complex survey data is an important topic in social and economic studies. The Gini index, Lorenz curves and quantile shares are among the commonly encountered examples. The nuisance functionals are usually handled by a plug-in nonparametric estimator and the main inferential procedure can be carried out through a two-step generalized empirical likelihood method. Unfortunately, the resulting inference is not efficient and the nonparametric version of the Wilks' theorem breaks down even under simple random sampling. We propose an augmented estimating equations method with nuisance functionals and complex surveys. The second-step augmented estimating functions obey the Neyman orthogonality condition and automatically handle the impact of the first-step plug-in estimator, and the resulting estimator of the main parameters of interest is invariant to the first step method. More importantly, the generalized empirical likelihood based Wilks' theorem holds for the main parameters of interest under the design-based framework for commonly used survey designs, and the maximum generalized empirical likelihood estimators achieve the semiparametric efficiency bound. Performances of the proposed methods are demonstrated through simulation studies and an application using the dataset from the New York City Social Indicators Survey.

Adaptation of Statistics Canada and Eurostat methodologies for variance estimation of changes of the main labour force indicators in Iran

Article

Jan 2023

The changing values of the indicators obtained from national labour force surveys provide analysts and planners with valuable information on the fluctuations of the labour market of the country. Labour force surveys in many countries follow the standards established by the International Labour Organization, and, as a result, tend to be similar in various respects. Given these similarities, the procedures used by the statistical organizations of Canada and the European Union are examined in this paper for the development of variance estimates of changes of the labour force indicators in Iran. While the survey in Iran and those in the countries under study have many similarities, they also differ in certain respects, namely, in terms of the periodicity of the survey, the rotation pattern as well as the unit of rotation, and the possible existence of non-response among the primary sampling units. Here, first, the methodologies of Statistics Canada and Eurostat are modified and adapted to the particularities of the labour force survey in Iran. Then, the results are compared. Among the four methods examined, the bootstrap methodology of Statistics Canada, after some modifications and adaptations, is found to be especially suitable for application in the labour force survey of Iran and, perhaps, in other counties with similar conditions. The proposed methodology can, particularly well, take into account the impact of the various steps of weight calculations on the variance estimates of change of the main labour force indicators.

Variance Estimation for Probability and Nonprobability Establishment Surveys: An Overview

Chapter

Jan 2023

Survey data provide a key source for calculating point and variance estimates on the population of interest. In this chapter, we discuss several factors to guide the choice of an appropriate variance formula for measuring the precision of point estimates, especially for surveys of establishments. Specific examples are taken from establishment surveys conducted around the world for additional background. A critical factor is the protocol used to obtain the sample members—probability‐based or nonprobability sampling. Variance estimation for probability surveys, where the sample inclusion probabilities are defined for all units on the sampling frame, relies on well‐developed design‐based theory that account for inclusion probabilities and design features, such as stratification and clustering. Conversely, inclusion probabilities for nonprobability surveys are unknown, and design‐ or model‐based variance estimation methods are used under a set of strict assumptions. Another important factor is the form of the point estimate. Design‐based estimates are calculated with survey analysis weights, whereas model‐based estimates rely on a set of strong model covariates. We provide an overview of different approaches for statistical inference and survey weighting for probability and nonprobability surveys, citing additional references where appropriate. For example, ratio point estimators, such as a mean, are a function of two (weighted) survey estimates each with an associated measure of precision. Relative to an estimated total, variance formula for a ratio estimate does not have a closed form and must be approximated. Moreover, additional complexities for variance estimation must be addressed when data include statistical imputation to treat missing values. Consequently, we discuss pros and cons of variance estimation with linearization, replication, and model‐based techniques for probability and nonprobability establishment surveys under a variety of analytic needs.

Modeling longitudinal change in biomarkers using data from a complex survey sampling design: An application to the Hispanic Community Health Study/Study of Latinos

Article

Jan 2023

In observational cohort studies, there is frequently interest in modeling longitudinal change in a biomarker (ie, physiological measure indicative of metabolic dysregulation or disease; eg, blood pressure) in the absence of treatment (ie, medication), and its association with modifiable risk factors expected to affect health (eg, body mass index). However, individuals may start treatment during the study period, and consequently biomarker values observed while on treatment may be different than those that would have been observed in the absence of treatment. If treated individuals are excluded from analysis, then effect estimates may be biased if treated individuals differ systematically from untreated individuals. We addressed this concern in the setting of the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), an observational cohort study that employed a complex survey sampling design to enable inference to a finite target population. We considered biomarker values measured while on treatment to be missing data, and applied missing data methodology (inverse probability weighting (IPW) and doubly robust estimation) to this problem. The proposed methods leverage information collected between study visits on when individuals started treatment, by adapting IPW and doubly robust approaches to model the treatment mechanism using survival analysis methods. This methodology also incorporates sampling weights and uses a bootstrap approach to estimate standard errors accounting for the complex survey sampling design. We investigated variance estimation for these methods, conducted simulation studies to assess statistical performance in finite samples, and applied the methodology to model temporal change in blood pressure in HCHS/SOL.

ENCUESTA DE HOGARES ROSARIO 2021

Technical Report

Full-text available

Nov 2022

En este dosier estadístico se publican los primeros resultados de la Encuesta de Hogares llevada a cabo en la ciudad de Rosario, durante el último trimestre de 2021, por la Usina de Datos UNR. En primer lugar, se presentan datos en relación a los tipos de hogar, vivienda y régimen de tenencia; y otras condiciones socio habitacionales de los hogares. Además, la EHR aporta por primera vez, información valiosa sobre la tenencia de animales. Asimismo, el informe contiene datos sobre la población en relación al lugar de nacimiento, personas migrantes y con dificultades de largo plazo. Además, proporciona información sobre educación, salud, ambiente y fuentes de ingreso. De forma posterior, se desarrollan las consideraciones metodológicas relativas a la metodología, diseño muestral y trabajo de campo. Por último, se presenta un glosario compuesto por las categorías relevantes del presente dosier. En las próximas entregas se publicarán otros indicadores relevados por la EHR que profundizarán y ampliarán la información aquí presentada. CONTENIDOS

Dietary Patterns Attributable Mortality and Life Expectancy Lost in Canada: Evidence from Canadian National Nutrition Survey Linked to Routinely-Collected Health Administrative Databases

Article

Oct 2022
AM J EPIDEMIOL

Using five diet quality indices, we estimated the poor dietary pattern attributable mortality and life expectancy lost at the national level, which had previously been largely unknown. The Canadian Community Health Survey 2004 linked to vital statistics was used (n=16 212 adults; representing n=22 898 880). After a median follow-up of 7.5 years, 1722 mortality cases were recorded. Population attributable fractions were calculated to estimate mortality burden of poor dietary patterns (Dietary Guidelines for Americans Adherence Index 2015, Dietary Approaches to Stop Hypertension, Healthy Eating Index, Alternative HEI, and Mediterranean Style Dietary Pattern Score). Better diet quality was associated with a 32-51% and 21-43% reduction in all-cause mortality among adults 45-80 years and ≥20 years, respectively. Projected life expectancy at 45 years was longer for Canadians adhering to a healthy dietary pattern (average 5.2-8.0 years (males) and 1.6-4.1 (females)). At the population level, 26.5-38.9% (males) and 8.9-22.9% (females) of deaths were attributable to poor dietary patterns. Survival benefit was greater for individuals with higher scores on all diet indices, even with relatively small intake differences. The large attributable burden was likely from assessing overall dietary patterns instead of a limited range of foods and nutrients.

A Bootstrap Variance Procedure for the Generalized Regression Estimator

Article

Aug 2023
INT STAT REV

The generalized regression estimator (GREG) uses auxiliary data that are available from the finite population to improve the efficiency of the estimator of a total (mean). Estimators of the variance of GREG that have been proposed in the sampling literature include those based on Taylor linearization and the jackknife techniques. Approximations based on Taylor expansions are reasonable for large samples. However, when the sample size is small, the Taylor-based variance estimator has a large negative bias. The jackknife variance estimators overestimate the variance of GREG for small sample sizes. We offset these setbacks using a bootstrap procedure for estimating the variance of the GREG. The method uses a bootstrap population constructed with the model underlying the GREG estimator. Repeated samples are selected in the bootstrap population according to the design used to select the initial sample, and the variability associated with these bootstrap samples is used to compute the proposed bootstrap variance estimator. Simulations show that the new bootstrap estimator has a small bias for samples that have few observations.

A Rescaling Bootstrap Approach For Imputed Survey Data

Article

Feb 2022

Imputation is usually used to deal with item nonresponse in surveys. Treating the imputed values as true observations may obviously lead to serious underestimation of the variance of point estimators. In this article, we propose a new bootstrap method under the rescaling bootstrap approach for estimating the variance of an imputed estimator obtained after applying deterministic regression or random hot-deck imputation. A novel technique is used to rescale the original data set through solving certain systems of linear equations. The proposed procedure can handle unequal response probabilities and large sampling fractions. Some simulation studies are conducted to show the great performance of the proposed method in terms of relative bias, relative efficiency, and coverage probability, for both population mean and median.

Private Tabular Survey Data Products through Synthetic Microdata Generation

Article

Mar 2022

We propose two synthetic microdata approaches to generate private tabular survey data products for public release. We adapt a pseudo posterior mechanism that downweights by-record likelihood contributions with weights ∈[0,1] based on their identification disclosure risks to producing tabular products for survey data. Our method applied to an observed survey database achieves an asymptotic global probabilistic differential privacy guarantee. Our two approaches synthesize the observed sample distribution of the outcome and survey weights, jointly, such that both quantities together possess a privacy guarantee. The privacy-protected outcome and survey weights are used to construct tabular cell estimates (where the cell inclusion indicators are treated as known and public) and associated standard errors to correct for survey sampling bias. Through a real data application to the Survey of Doctorate Recipients public use file and simulation studies motivated by the application, we demonstrate that our two microdata synthesis approaches to construct tabular products provide superior utility preservation as compared to the additive noise approach of the Laplace Mechanism. Moreover, our approaches allow the release of microdata to the public, enabling additional analyses at no extra privacy cost.

Inverse Weighting Method with Jackknife Variance Estimator for Differential Expression Analysis of Single-Cell RNA Sequencing Data

Article

Jul 2022
COMPUT BIOL CHEM

Single-cell RNA sequencing (scRNA-seq) data exhibit an unusual abundance of zero counts with a considerable fraction due to the dropout events, which introduces challenges to differential expression analysis. To correct biases in differential expression due to the informative dropouts, an inverse non-dropout-probability weighting method is proposed given that the dropout rate is negatively dependent on the underlying gene expression magnitude in scRNA-seq data. The weights are estimated using the maximum likelihood method where dropout values are integrated out using the Gauss-Hermite quadrature. Linear, generalized linear and mixed regressions with the estimated weights are fitted on original or transformed scRNA-seq data. Variances of coefficient estimators from the weighted regressions are estimated using the jackknife method. Extensive simulation studies are carried out to compare the proposed method to five cutting-edge methods (Limma, edgeR, MAST, ZIAQ and scImpute), where the proposed method performs among the best under all scenarios in terms of AUC, sensitivity, specificity and FDR. Rate of detecting true positives is examined for the proposed method and five comparison methods using mouse embryonic stem cells and fibroblasts where differentially expressed (DE) genes detected in bulk RNA-seq data on the same set of genes under the same conditions from independent source serve as true positives. Specificity is compared for these methods on true negative data by random splitting of a real dataset. Furthermore, the proposed method is illustrated on a lineage study where cells in the same embryo are correlated and genes differentially expressed between cell division lineages are identified.

Associations between dietary patterns and cardiovascular disease risk in Canadian adults: a comparison of partial least squares, reduced rank regression and the simplified dietary pattern technique

Article

May 2022

Background Hybrid methodologies have gained continuing interest as unique data reduction techniques for establishing a direct link between dietary exposures and clinical outcomes. Objectives We aimed to compare partial least squares (PLS) and reduced rank regression (RRR) in identifying a dietary pattern associated with a high cardiovascular disease (CVD) risk in Canadian adults, construct PLS- and RRR-based simplified dietary patterns, and assess associations between the four dietary pattern scores and CVD risk. Design Data were collected from 24-hour dietary recalls of adult respondents in the two cycles of the nationally representative Canadian Community Health Survey (CCHS)-Nutrition: CCHS 2004 linked to health administrative databases (n = 12,313) and CCHS 2015 (n = 14,020). Using 39 food groups, PLS and RRR were applied for the identification of an energy-dense (ED), high-saturated-fat (HSF) and low-fiber-density (LFD) dietary pattern. Associations of the derived dietary pattern scores with lifestyle characteristics and CVD risk were examined using weighted multivariate regression and weighted multivariable-adjusted Cox-proportional hazard models, respectively. Results PLS and RRR identified highly similar ED, HSF, LFD dietary patterns with common high positive loadings for fast food, carbonated drinks, salty snacks and solid fats, and high negative loadings for fruit, dark green vegetables, red and orange vegetables, other vegetables, whole grains, legumes and soy (≥|0.17|). Food groups with the highest loadings were summed to form simplified pattern scores. Although the dietary patterns were not significantly associated with CVD risk, they were positively associated with 402 kcal/d higher energy intake (P-trends <0.05) and higher obesity risk [PLS (OR: 2.09; 95% CI: 1.62, 2.7) and RRR (OR: 1.76; 95% CI: 1.44, 2.17)] (P-trends <0.0001) in the fourth quartiles. Conclusion PLS and RRR were shown to be equally effective for the derivation of a high-CVD-risk dietary pattern among Canadian adults. Further research is warranted on the role of major dietary components in cardiovascular health.

Unconditional empirical likelihood approach for analytic use of public survey data

Article

Apr 2022

Yves G. Berger

Modelling survey data often requires having the knowledge of design and weighting variables. With public‐use survey data, some of these variables may not be available for confidentiality reasons. The proposed approach can be used in this situation, as long as calibrated weights and variables specifying the strata and primary sampling units are available. It gives consistent point estimation and a pivotal statistics for testing and confidence intervals. The proposed approach does not rely on with‐replacement sampling, single‐stage, negligible sampling fractions or non‐informative sampling. Adjustments based on design effects, eigenvalues, joint‐inclusion probabilities or bootstrap, are not needed. The inclusion probabilities and auxiliary variables do not have to be known. Multi‐stage designs with unequal selection of primary sampling units are considered. Non‐response can be easily accommodated if the calibrated weights include re‐weighting adjustment for non‐response. We use an unconditional approach, where the variables and sample are random variables. The design can be informative.

Sodium, Added Sugar and Saturated Fat Intake in Relation to Mortality and Cardiovascular Disease Events in Adults: Canadian National Nutrition Survey Linked with Vital Statistics and Health Administrative Databases

Article

Apr 2022

This study aimed to determine whether higher intakes of sodium, added sugars and saturated fat are prospectively associated with all-cause mortality and cardiovascular disease (CVD) incidence and mortality in a diverse population. The nationally-representative Canadian Community Health Survey (CCHS)-Nutrition 2004 was linked with the Canadian Vital Statistics – Death Database and the Discharge Abstract Database (2004-2011). Outcomes were all-cause mortality and CVD incidence and mortality. There were 1,722 mortality cases within 115,566 person-years of follow-up (median (IQR) of 7.48 (7.22-7.70) years). There was no statistically significant association between sodium density or energy from saturated fat and all-cause mortality or CVD events for all models investigated. The association of usual percentage of energy from added sugars and all-cause mortality was significant in the base model with participants consuming 11.47% of energy from added sugars having 1.34 (95% CI: 1.01-1.77) times higher risk of all-cause mortality compared to those consuming 4.17% of energy from added sugars. Overall, our results did not find statistically significant associations between the three nutrients and risk of all-cause mortality or CVD events at the population level in Canada. Large-scale linked national nutrition datasets may not have the discrimination to identify prospective impacts of nutrients on health measures.

Some Recent Work on Resampling Methods for Complex Surveys“

No full-text available

Recommended publications

Invited Discussion Paper Resampling Methods in Sample Surveys

Testing for Association in Contingency Tables with Multiple Column Responses

On Second order correctness of Bootstrap in Logistic Regression

Resampling methods in sample surveys