ResearchPDF Available

SCF Complex Sample Specification for Stata

Authors:

Abstract

For new SCF users the process of combining the five implicates of the main data file to produce estimates that account for the Federal Reserve Board’s multiple imputation process, coupled with merging the supplemental replicate weight file to produce estimates that also account for the survey’s dual-frame complex sample, can be challenging. This technical note offers an example, using Stata, of how to complete this process to produce estimates that account for both sources of error. Though the example makes use of 2013 data, other cross-sectional SCF datasets (the triennial data) use similar methods. The example makes use of a Stata macro described by the University of Wisconsin-Madison’s Center for Financial Security.
December 2015
!
1
SCF Complex Sample Specification for Stata
Robert B. Nielsen1
The Federal Reserve’s Survey of Consumer Finances (SCF) is derived from a dual-frame complex
sample design. One frame includes households selected via a standard area probability sample; the
second includes households from a list provided by the Internal Revenue Service, with an oversampling
of households likely to be wealthy.
As a complex sample, variance estimates from the SCF should be adjusted to reflect households’
unequal probability of selection (Federal Reserve, 2014a; Nielsen et al. 2009; Nielsen & Seay, 2014).
To allow public users to make these adjustments the Federal Reserve releases a set of 999 replicate
weights to be downloaded and used in a bootstrapping routine that adjusts variance estimates in a
manner consistent with internal procedures designed to protect the identity of respondents while
accounting for unequal probability of selection (Federal Reserve, 2014b).
!
Like most data, SCF data contain missing data. Unlike most datasets, the Federal Reserve imputes
replacement values for users beforehand and releases five replicate datasets that include these
multiply-imputed values. Thus, the apparent sample size of approximately 30,000 cases is actually five
implicates (five versions) of the approximately 6,000 households (Federal Reserve, 2014b). These five
implicates must be combined to account for the uncertainty associated with this multiple imputation
process. See the SCF Codebook for resources that describe this (Federal Reserve, 2014a). Also
consult the excellent resources from Hanna and Lindamood (e.g. Lindamood, Hanna, and Bi, 2007).
For new SCF users the process of combining the five implicates of the main data file to produce
estimates that account for the Federal Reserve Board’s multiple imputation process, coupled with
merging the supplemental replicate weight file to produce estimates that also account for the survey’s
dual-frame complex sample, can be challenging. This technical note offers an example, using Stata, of
how to complete this process to produce estimates that account for both sources of error. Though the
example makes use of 2013 data, other cross-sectional SCF datasets (the triennial data) use similar
methods. The example makes use of a Stata macro described by the University of Wisconsin-
Madison’s Center for Financial Security (Center for Financial Security, 2015).
Example steps using 2013 SCF data:
1. Install scfcombo macro in Stata
a. In Stata command window, type ssc install scfcombo
b. Read the scfcombo help file and keep for later reference
c. Send thank you notes to Jane Brittingham (U. Wisconsin) for providing this macro and Karen
Pence (Federal Reserve) for developing its’ predecessors, scfimp and scfboot
2. Download and prepare the main dataset
a. Create an id variable to identify households
b. Create implicate variable to identify implicates
c. Prepare for multiple imputation analyses (the MI commands in Stata)
3. Download replicate weight dataset and merge with the main dataset per codebook and scfcombo
macro instructions
4. Prepare variables for analyses (recode, transform, generate new, etc.)
5. Run analyses; when estimating multivariate models invoke scfcombo macro to adjust standard errors
for both imputation and complex sample design
_______________________!
1 Associate Professor, Department of Financial Planning Housing and Consumer Economics, University of
Georgia, 205 Consumer Research Center, Athens, GA 30602. rnielsen@uga.edu
December 2015
!
2
Example of the above steps using Stata:
* STEP 1
* install scfcombo macro in Stata (needs to be done only once):
ssc install scfcombo
* STEP 2
* Get original main SCF dataset.
use "C:\___your file path___\p13i6.dta", clear
* Prepare main dataset for use with Stata's multiple imputation (MI) commands
mi query
generate IMP=Y1-10*YY1
generate YY1a = 0
replace YY1a = (YY1*10)
generate Y1a = 0
replace Y1a = (Y1 - YY1a)
generate rep=Y1a
* Save the mi-ready dataset
save "C:\___your file path___\SCF2013imps.dta", replace
mi import flong, m(Y1a) id(YY1)
sort Y1a
tab Y1a
mi describe
save "C: \___your file path___\SCF2013imps.dta", replace
* the above code preps the data to have the same unique identifier Y1
* and a new indicator of the implicate Y1a (min 1, max 5)
sum Y1 Y1a rep
* STEP 3
* download replicate weight file and merge with main file...note that
* the order of the variables MM999-MM1 and WT1B1-WT1B999 matters for how the
* scfcombo macro proceeds so don't ever sort on those variables
use "C:\___your file path___\p13_rw1.dta", clear
* the scfcombo macro uses lower case variable names for the replicate weights
rename (MM999-MM1), lower
rename (WT1B1-WT1B999), lower
sort Y1 // sorts replicate weight file by the unique id //
save "C:\___your file path___\SCF2013reps.dta", replace
use "C:\___your file path___\SCF2013imps.dta", clear
sort Y1 // sorts main data file by the unique id //
save "C:\___your file path___\SCF2013imps.dta", replace
merge 1:1 Y1 using "C:\___your file path___\SCF2013reps.dta"
* save as a complete file with the main data and the bootstrap variables
save "C:\___your file path___\SCF2013complete.dta", replace
clear
December 2015
!
3
* STEP 4
* prepare variables by renaming, transformations, etc. prior to analyses, for example
use "C:\___your file path___\SCF2013complete.dta"
rename X5729 income
rename X3024 foodhome
rename X14 age
* STEP 5
* Use the scfcombo macro…OLS regression shown but other compatible commands also work
* note that this may be done weighted or unweighted depending on one’s needs.
* reps is the number of bootstrap replications (default is 200), imps should be 5
* scfcombo DEPVAR INDVARS [aw=x42001], command(regress) reps(200) imps(5)
* I strongly recommend exploring the number of bootstraps used as results tend to vary slightly!
scfcombo foodhome age income [aw=x42001], command(regress) reps(200) imps(5)
References
Center for Financial Security. (2015). CFS promotes Stata program for using Survey of Consumer Finances
data. Issue Brief 2016-6.1. University of Wisconsin-Madison Center for Financial Security, Madison,
WI. Available: http://cfs.wisc.edu/presentations/scf_combo_brief.pdf
Federal Reserve. (2014a). Codebook for 2013 Survey of Consumer Finances. Board of Governors of the
Federal Reserve System. Available:
http://www.federalreserve.gov/econresdata/scf/files/codebk2013.txt
Federal Reserve. (2014b). 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve
System. Available: http://www.federalreserve.gov/econresdata/scf/scfindex.htm
Lindamood, S., Hanna, S. D., & Bi, L. (2007). Using the Survey of Consumer Finances: Some
methodological considerations and issues. Journal of Consumer Affairs, 41(2), 195-222.
doi:10.1111/j.1745-6606.2007.00075.x
Nielsen, R. B., Davern, M., Jones, A. Jr., & Boies, J. L. (2009). Complex sample design effects and health
insurance variance estimation. Journal of Consumer Affairs, 43(2), 346-366. doi:10.1111/j.1745-
6606.2009.01143.x
Nielsen, R. B., & Seay, M. C. (2014). Complex samples and regression-based inference: Considerations for
consumer researchers. Journal of Consumer Affairs, 48(3), 606-619. doi:10.1111/joca.12038
Comments and Corrections:
This technical note is provided as a service to anyone who might find it helpful, particularly graduate students
who are using the Survey of Consumer Finances. Comments and corrections are appreciated!
Suggested citation: Nielsen, R. B. (2015). SCF complex sample specification for Stata. Technical note,
Department of Financial Planning Housing and Consumer Economics, University of Georgia, Athens, GA.
doi: 10.13140/RG.2.1.4126.8240
... Equal percentiles were used based on the scanned cases (equal intervals with two cutoff points). Thus, this study considered the dependent variable a multinomial ordinal-dependent variable (1 = low, 2 = average, and 3 = high; Lobos et al., 2016;Nielsen, 2015). ...
Article
Full-text available
What impacts the financial well-being of African Americans, compared with other ethnic groups, has been a mystery beyond basic socio-economic factors. However, when explored through the lens of homeownership and employment, two variables that have been latent due to historical racism, African Americans fare far worse than other ethnic groups. This study utilized data from the 2016 National Financial Well-Being Survey (NFWBS) including the CFP Financial Well-Being Scale, and specifically targeted middle-income African Americans. Researchers found that when efforts are made to pull themselves up by their bootstraps through long-term savings, investing, and education, African Americans only show statistical significance if they are middle-income because student loans tend to create a drag on financial well-being levels.
... Complex sampling design was not accounted for since population weights were not applied (Nielson and Seay, 2014). Given the complex sample design and multiple implicate structure of the SCF, we use the Stata program scfcombo (see Nielsen, 2015;Pence 2015) and the replicate weight file to bootstrap the standard errors as recommended by Nielsen and Seay (2014). The 2019 survey contains information on 5,783 households. ...
Article
Using data from the 2016 Survey of Consumer Finances, this study investigates factors that affect electronic banking adoption rates. Financial knowledge, income, education, and credit card ownership are associated with a high probability of electronic banking adoption. However, age is negatively associated with the probability of online banking adoption and the African American consumer is less likely to adopt electronic banking. This result is more prominent for African American women but does not hold for African American business owners. Financial counselors, planners, and educators should be aware and sensitive to these differences in order to provide additional education as needed on how to effectively use electronic banking services in order to achieve a greater degree of financial inclusion.
Article
Full-text available
We identify and present original analyses of four methodological issues related to using Survey of Consumer Finances data sets and illustrate these issues with recent articles published in this journal. The issues are recognizing that the respondent is not necessarily the household head, reporting race and ethnicity in conformity with Survey of Consumer Finances and federal standards, using the repeated-imputation inference method to combine the five implicates in each survey year’s data set, and discussing the use of weighted or unweighted data in multivariate analysis. We found a considerable variation in how authors dealt with these issues, which could hinder replication or comparison of research results. Authors and reviewers should consider methodological issues related to the Survey of Consumer Finances more carefully. https://drive.google.com/open?id=0B8ZvPcaWDg4wbDFxN2VrT0VMQzA
Article
This article demonstrates that researchers who treat data collected via complex sampling procedures as if they were collected via simple random sample (SRS) may draw improper inferences when estimating regression models. Using complex sample data from the 2004 panel of the Survey of Income and Program Participation (SIPP) two models—one ordinary least squares (OLS) regression and one logistic regression—were estimated using three methods: SRS with and without population weights, Taylor series linearization, and Fay's Balanced Repeated Replication (BRR). The results of the alternative models demonstrate that depending on the variables of interest, authors who fail to incorporate sample design information or fail to consider the effects of weighting may draw improper inferences from their regression models. Reasons why researchers continue to neglect complex sample-based variance are proposed and discussed, and example SAS and Stata code is offered to encourage adoption by the consumer research community.
Article
Fifty-one articles using complex sample data published between 2000 and 2007 in three journals are reviewed. Of these, three articles indicate whether the analyses account for sampling design when calculating standard errors. To demonstrate how neglecting to properly calculate variances increases the probability of Type I errors, data from the Survey of Income and Program Participation (SIPP) are used to estimate health insurance coverage using three methods: simple random sample (SRS), generalized variance functions (GVFs), and direct estimation via replicate weights. The analysis shows that researchers using complex sample data are likely to draw improper inferences if they do not use replicate weights to estimate standard errors.
CFS promotes Stata program for using Survey of Consumer Finances data. Issue Brief 2016-6.1. University of Wisconsin-Madison Center for Financial Security
  • Financial Center
  • Security
Center for Financial Security. (2015). CFS promotes Stata program for using Survey of Consumer Finances data. Issue Brief 2016-6.1. University of Wisconsin-Madison Center for Financial Security, Madison, WI. Available: http://cfs.wisc.edu/presentations/scf_combo_brief.pdf
Codebook for 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System
  • Federal Reserve
Federal Reserve. (2014a). Codebook for 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System. Available: http://www.federalreserve.gov/econresdata/scf/files/codebk2013.txt
2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System
  • Federal Reserve
Federal Reserve. (2014b). 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System. Available: http://www.federalreserve.gov/econresdata/scf/scfindex.htm
This technical note is provided as a service to anyone who might find it helpful, particularly graduate students who are using the Survey of Consumer Finances. Comments and corrections are appreciated! Suggested citation
  • Corrections Comments
Comments and Corrections: This technical note is provided as a service to anyone who might find it helpful, particularly graduate students who are using the Survey of Consumer Finances. Comments and corrections are appreciated! Suggested citation: Nielsen, R. B. (2015). SCF complex sample specification for Stata. Technical note, Department of Financial Planning Housing and Consumer Economics, University of Georgia, Athens, GA. doi: 10.13140/RG.2.1.4126.8240