Content uploaded by Robert Nielsen
Author content
All content in this area was uploaded by Robert Nielsen on Oct 11, 2019
Content may be subject to copyright.
December 2015
!
1
SCF Complex Sample Specification for Stata
Robert B. Nielsen1
The Federal Reserve’s Survey of Consumer Finances (SCF) is derived from a dual-frame complex
sample design. One frame includes households selected via a standard area probability sample; the
second includes households from a list provided by the Internal Revenue Service, with an oversampling
of households likely to be wealthy.
As a complex sample, variance estimates from the SCF should be adjusted to reflect households’
unequal probability of selection (Federal Reserve, 2014a; Nielsen et al. 2009; Nielsen & Seay, 2014).
To allow public users to make these adjustments the Federal Reserve releases a set of 999 replicate
weights to be downloaded and used in a bootstrapping routine that adjusts variance estimates in a
manner consistent with internal procedures designed to protect the identity of respondents while
accounting for unequal probability of selection (Federal Reserve, 2014b).
!
Like most data, SCF data contain missing data. Unlike most datasets, the Federal Reserve imputes
replacement values for users beforehand and releases five replicate datasets that include these
multiply-imputed values. Thus, the apparent sample size of approximately 30,000 cases is actually five
implicates (five versions) of the approximately 6,000 households (Federal Reserve, 2014b). These five
implicates must be combined to account for the uncertainty associated with this multiple imputation
process. See the SCF Codebook for resources that describe this (Federal Reserve, 2014a). Also
consult the excellent resources from Hanna and Lindamood (e.g. Lindamood, Hanna, and Bi, 2007).
For new SCF users the process of combining the five implicates of the main data file to produce
estimates that account for the Federal Reserve Board’s multiple imputation process, coupled with
merging the supplemental replicate weight file to produce estimates that also account for the survey’s
dual-frame complex sample, can be challenging. This technical note offers an example, using Stata, of
how to complete this process to produce estimates that account for both sources of error. Though the
example makes use of 2013 data, other cross-sectional SCF datasets (the triennial data) use similar
methods. The example makes use of a Stata macro described by the University of Wisconsin-
Madison’s Center for Financial Security (Center for Financial Security, 2015).
Example steps using 2013 SCF data:
1. Install scfcombo macro in Stata
a. In Stata command window, type ssc install scfcombo
b. Read the scfcombo help file and keep for later reference
c. Send thank you notes to Jane Brittingham (U. Wisconsin) for providing this macro and Karen
Pence (Federal Reserve) for developing its’ predecessors, scfimp and scfboot
2. Download and prepare the main dataset
a. Create an id variable to identify households
b. Create implicate variable to identify implicates
c. Prepare for multiple imputation analyses (the MI commands in Stata)
3. Download replicate weight dataset and merge with the main dataset per codebook and scfcombo
macro instructions
4. Prepare variables for analyses (recode, transform, generate new, etc.)
5. Run analyses; when estimating multivariate models invoke scfcombo macro to adjust standard errors
for both imputation and complex sample design
_______________________!
1 Associate Professor, Department of Financial Planning Housing and Consumer Economics, University of
Georgia, 205 Consumer Research Center, Athens, GA 30602. rnielsen@uga.edu
December 2015
!
2
Example of the above steps using Stata:
* STEP 1
* install scfcombo macro in Stata (needs to be done only once):
ssc install scfcombo
* STEP 2
* Get original main SCF dataset.
use "C:\___your file path___\p13i6.dta", clear
* Prepare main dataset for use with Stata's multiple imputation (MI) commands
mi query
generate IMP=Y1-10*YY1
generate YY1a = 0
replace YY1a = (YY1*10)
generate Y1a = 0
replace Y1a = (Y1 - YY1a)
generate rep=Y1a
* Save the mi-ready dataset
save "C:\___your file path___\SCF2013imps.dta", replace
mi import flong, m(Y1a) id(YY1)
sort Y1a
tab Y1a
mi describe
save "C: \___your file path___\SCF2013imps.dta", replace
* the above code preps the data to have the same unique identifier Y1
* and a new indicator of the implicate Y1a (min 1, max 5)
sum Y1 Y1a rep
* STEP 3
* download replicate weight file and merge with main file...note that
* the order of the variables MM999-MM1 and WT1B1-WT1B999 matters for how the
* scfcombo macro proceeds so don't ever sort on those variables
use "C:\___your file path___\p13_rw1.dta", clear
* the scfcombo macro uses lower case variable names for the replicate weights
rename (MM999-MM1), lower
rename (WT1B1-WT1B999), lower
sort Y1 // sorts replicate weight file by the unique id //
save "C:\___your file path___\SCF2013reps.dta", replace
use "C:\___your file path___\SCF2013imps.dta", clear
sort Y1 // sorts main data file by the unique id //
save "C:\___your file path___\SCF2013imps.dta", replace
merge 1:1 Y1 using "C:\___your file path___\SCF2013reps.dta"
* save as a complete file with the main data and the bootstrap variables
save "C:\___your file path___\SCF2013complete.dta", replace
clear
December 2015
!
3
* STEP 4
* prepare variables by renaming, transformations, etc. prior to analyses, for example
use "C:\___your file path___\SCF2013complete.dta"
rename X5729 income
rename X3024 foodhome
rename X14 age
* STEP 5
* Use the scfcombo macro…OLS regression shown but other compatible commands also work
* note that this may be done weighted or unweighted depending on one’s needs.
* reps is the number of bootstrap replications (default is 200), imps should be 5
* scfcombo DEPVAR INDVARS [aw=x42001], command(regress) reps(200) imps(5)
* I strongly recommend exploring the number of bootstraps used as results tend to vary slightly!
scfcombo foodhome age income [aw=x42001], command(regress) reps(200) imps(5)
References
Center for Financial Security. (2015). CFS promotes Stata program for using Survey of Consumer Finances
data. Issue Brief 2016-6.1. University of Wisconsin-Madison Center for Financial Security, Madison,
WI. Available: http://cfs.wisc.edu/presentations/scf_combo_brief.pdf
Federal Reserve. (2014a). Codebook for 2013 Survey of Consumer Finances. Board of Governors of the
Federal Reserve System. Available:
http://www.federalreserve.gov/econresdata/scf/files/codebk2013.txt
Federal Reserve. (2014b). 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve
System. Available: http://www.federalreserve.gov/econresdata/scf/scfindex.htm
Lindamood, S., Hanna, S. D., & Bi, L. (2007). Using the Survey of Consumer Finances: Some
methodological considerations and issues. Journal of Consumer Affairs, 41(2), 195-222.
doi:10.1111/j.1745-6606.2007.00075.x
Nielsen, R. B., Davern, M., Jones, A. Jr., & Boies, J. L. (2009). Complex sample design effects and health
insurance variance estimation. Journal of Consumer Affairs, 43(2), 346-366. doi:10.1111/j.1745-
6606.2009.01143.x
Nielsen, R. B., & Seay, M. C. (2014). Complex samples and regression-based inference: Considerations for
consumer researchers. Journal of Consumer Affairs, 48(3), 606-619. doi:10.1111/joca.12038
Comments and Corrections:
This technical note is provided as a service to anyone who might find it helpful, particularly graduate students
who are using the Survey of Consumer Finances. Comments and corrections are appreciated!
Suggested citation: Nielsen, R. B. (2015). SCF complex sample specification for Stata. Technical note,
Department of Financial Planning Housing and Consumer Economics, University of Georgia, Athens, GA.
doi: 10.13140/RG.2.1.4126.8240