ResearchPDF Available

SCF Complex Sample Specification for Stata

January 2016

January 2016

Authors:

University of Alabama

For new SCF users the process of combining the five implicates of the main data file to produce estimates that account for the Federal Reserve Board’s multiple imputation process, coupled with merging the supplemental replicate weight file to produce estimates that also account for the survey’s dual-frame complex sample, can be challenging. This technical note offers an example, using Stata, of how to complete this process to produce estimates that account for both sources of error. Though the example makes use of 2013 data, other cross-sectional SCF datasets (the triennial data) use similar methods. The example makes use of a Stata macro described by the University of Wisconsin-Madison’s Center for Financial Security.

Content uploaded by Robert Nielsen

Content may be subject to copyright.

December 2015

SCF Complex Sample Specification for Stata

Robert B. Nielsen1

The Federal Reserve’s Survey of Consumer Finances (SCF) is derived from a dual-frame complex

sample design. One frame includes households selected via a standard area probability sample; the

second includes households from a list provided by the Internal Revenue Service, with an oversampling

of households likely to be wealthy.

As a complex sample, variance estimates from the SCF should be adjusted to reflect households’

unequal probability of selection (Federal Reserve, 2014a; Nielsen et al. 2009; Nielsen & Seay, 2014).

To allow public users to make these adjustments the Federal Reserve releases a set of 999 replicate

weights to be downloaded and used in a bootstrapping routine that adjusts variance estimates in a

manner consistent with internal procedures designed to protect the identity of respondents while

accounting for unequal probability of selection (Federal Reserve, 2014b).

Like most data, SCF data contain missing data. Unlike most datasets, the Federal Reserve imputes

replacement values for users beforehand and releases five replicate datasets that include these

multiply-imputed values. Thus, the apparent sample size of approximately 30,000 cases is actually five

implicates (five versions) of the approximately 6,000 households (Federal Reserve, 2014b). These five

implicates must be combined to account for the uncertainty associated with this multiple imputation

process. See the SCF Codebook for resources that describe this (Federal Reserve, 2014a). Also

consult the excellent resources from Hanna and Lindamood (e.g. Lindamood, Hanna, and Bi, 2007).

For new SCF users the process of combining the five implicates of the main data file to produce

estimates that account for the Federal Reserve Board’s multiple imputation process, coupled with

merging the supplemental replicate weight file to produce estimates that also account for the survey’s

dual-frame complex sample, can be challenging. This technical note offers an example, using Stata, of

how to complete this process to produce estimates that account for both sources of error. Though the

example makes use of 2013 data, other cross-sectional SCF datasets (the triennial data) use similar

methods. The example makes use of a Stata macro described by the University of Wisconsin-

Madison’s Center for Financial Security (Center for Financial Security, 2015).

Example steps using 2013 SCF data:

1. Install scfcombo macro in Stata

a. In Stata command window, type ssc install scfcombo

b. Read the scfcombo help file and keep for later reference

c. Send thank you notes to Jane Brittingham (U. Wisconsin) for providing this macro and Karen

Pence (Federal Reserve) for developing its’ predecessors, scfimp and scfboot

2. Download and prepare the main dataset

a. Create an id variable to identify households

b. Create implicate variable to identify implicates

c. Prepare for multiple imputation analyses (the MI commands in Stata)

3. Download replicate weight dataset and merge with the main dataset per codebook and scfcombo

macro instructions

4. Prepare variables for analyses (recode, transform, generate new, etc.)

5. Run analyses; when estimating multivariate models invoke scfcombo macro to adjust standard errors

for both imputation and complex sample design

_______________________!

1 Associate Professor, Department of Financial Planning Housing and Consumer Economics, University of

Georgia, 205 Consumer Research Center, Athens, GA 30602. rnielsen@uga.edu

December 2015

Example of the above steps using Stata:

* STEP 1

* install scfcombo macro in Stata (needs to be done only once):

ssc install scfcombo

* STEP 2

* Get original main SCF dataset.

use "C:\___your file path___\p13i6.dta", clear

* Prepare main dataset for use with Stata's multiple imputation (MI) commands

mi query

generate IMP=Y1-10*YY1

generate YY1a = 0

replace YY1a = (YY1*10)

generate Y1a = 0

replace Y1a = (Y1 - YY1a)

generate rep=Y1a

* Save the mi-ready dataset

save "C:\___your file path___\SCF2013imps.dta", replace

mi import flong, m(Y1a) id(YY1)

sort Y1a

tab Y1a

mi describe

save "C: \___your file path___\SCF2013imps.dta", replace

* the above code preps the data to have the same unique identifier Y1

* and a new indicator of the implicate Y1a (min 1, max 5)

sum Y1 Y1a rep

* STEP 3

* download replicate weight file and merge with main file...note that

* the order of the variables MM999-MM1 and WT1B1-WT1B999 matters for how the

* scfcombo macro proceeds so don't ever sort on those variables

use "C:\___your file path___\p13_rw1.dta", clear

* the scfcombo macro uses lower case variable names for the replicate weights

rename (MM999-MM1), lower

rename (WT1B1-WT1B999), lower

sort Y1 // sorts replicate weight file by the unique id //

save "C:\___your file path___\SCF2013reps.dta", replace

use "C:\___your file path___\SCF2013imps.dta", clear

sort Y1 // sorts main data file by the unique id //

save "C:\___your file path___\SCF2013imps.dta", replace

merge 1:1 Y1 using "C:\___your file path___\SCF2013reps.dta"

* save as a complete file with the main data and the bootstrap variables

save "C:\___your file path___\SCF2013complete.dta", replace

clear

December 2015

* STEP 4

* prepare variables by renaming, transformations, etc. prior to analyses, for example

use "C:\___your file path___\SCF2013complete.dta"

rename X5729 income

rename X3024 foodhome

rename X14 age

* STEP 5

* Use the scfcombo macro…OLS regression shown but other compatible commands also work

* note that this may be done weighted or unweighted depending on one’s needs.

* reps is the number of bootstrap replications (default is 200), imps should be 5

* scfcombo DEPVAR INDVARS [aw=x42001], command(regress) reps(200) imps(5)

* I strongly recommend exploring the number of bootstraps used as results tend to vary slightly!

scfcombo foodhome age income [aw=x42001], command(regress) reps(200) imps(5)

References

Center for Financial Security. (2015). CFS promotes Stata program for using Survey of Consumer Finances

data. Issue Brief 2016-6.1. University of Wisconsin-Madison Center for Financial Security, Madison,

WI. Available: http://cfs.wisc.edu/presentations/scf_combo_brief.pdf

Federal Reserve. (2014a). Codebook for 2013 Survey of Consumer Finances. Board of Governors of the

Federal Reserve System. Available:

http://www.federalreserve.gov/econresdata/scf/files/codebk2013.txt

Federal Reserve. (2014b). 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve

System. Available: http://www.federalreserve.gov/econresdata/scf/scfindex.htm

Lindamood, S., Hanna, S. D., & Bi, L. (2007). Using the Survey of Consumer Finances: Some

methodological considerations and issues. Journal of Consumer Affairs, 41(2), 195-222.

doi:10.1111/j.1745-6606.2007.00075.x

Nielsen, R. B., Davern, M., Jones, A. Jr., & Boies, J. L. (2009). Complex sample design effects and health

insurance variance estimation. Journal of Consumer Affairs, 43(2), 346-366. doi:10.1111/j.1745-

6606.2009.01143.x

Nielsen, R. B., & Seay, M. C. (2014). Complex samples and regression-based inference: Considerations for

consumer researchers. Journal of Consumer Affairs, 48(3), 606-619. doi:10.1111/joca.12038

Comments and Corrections:

This technical note is provided as a service to anyone who might find it helpful, particularly graduate students

who are using the Survey of Consumer Finances. Comments and corrections are appreciated!

Suggested citation: Nielsen, R. B. (2015). SCF complex sample specification for Stata. Technical note,

Department of Financial Planning Housing and Consumer Economics, University of Georgia, Athens, GA.

doi: 10.13140/RG.2.1.4126.8240

Exploring differences in African-Americans’ financial well-being based on financial security factors

Article

Full-text available

Aug 2023

What impacts the financial well-being of African Americans, compared with other ethnic groups, has been a mystery beyond basic socio-economic factors. However, when explored through the lens of homeownership and employment, two variables that have been latent due to historical racism, African Americans fare far worse than other ethnic groups. This study utilized data from the 2016 National Financial Well-Being Survey (NFWBS) including the CFP Financial Well-Being Scale, and specifically targeted middle-income African Americans. Researchers found that when efforts are made to pull themselves up by their bootstraps through long-term savings, investing, and education, African Americans only show statistical significance if they are middle-income because student loans tend to create a drag on financial well-being levels.

Black-White Differences in Life Insurance Ownership Among Middle-Income Couples

Article

Jan 2017

Factors Associated With Electronic Banking Adoption

Article

Mar 2020

Stephanie Rozelle Yates

Using data from the 2016 Survey of Consumer Finances, this study investigates factors that affect electronic banking adoption rates. Financial knowledge, income, education, and credit card ownership are associated with a high probability of electronic banking adoption. However, age is negatively associated with the probability of online banking adoption and the African American consumer is less likely to adopt electronic banking. This result is more prominent for African American women but does not hold for African American business owners. Financial counselors, planners, and educators should be aware and sensitive to these differences in order to provide additional education as needed on how to effectively use electronic banking services in order to achieve a greater degree of financial inclusion.

Using the Survey of Consumer Finances: Some Methodological Considerations and Issues

Article

Full-text available

Dec 2007
J CONSUM AFF

We identify and present original analyses of four methodological issues related to using Survey of Consumer Finances data sets and illustrate these issues with recent articles published in this journal. The issues are recognizing that the respondent is not necessarily the household head, reporting race and ethnicity in conformity with Survey of Consumer Finances and federal standards, using the repeated-imputation inference method to combine the five implicates in each survey year’s data set, and discussing the use of weighted or unweighted data in multivariate analysis. We found a considerable variation in how authors dealt with these issues, which could hinder replication or comparison of research results. Authors and reviewers should consider methodological issues related to the Survey of Consumer Finances more carefully. https://drive.google.com/open?id=0B8ZvPcaWDg4wbDFxN2VrT0VMQzA

Complex Samples and Regression-Based Inference: Considerations for Consumer Researchers

Article

Oct 2014
J CONSUM AFF

This article demonstrates that researchers who treat data collected via complex sampling procedures as if they were collected via simple random sample (SRS) may draw improper inferences when estimating regression models. Using complex sample data from the 2004 panel of the Survey of Income and Program Participation (SIPP) two models—one ordinary least squares (OLS) regression and one logistic regression—were estimated using three methods: SRS with and without population weights, Taylor series linearization, and Fay's Balanced Repeated Replication (BRR). The results of the alternative models demonstrate that depending on the variables of interest, authors who fail to incorporate sample design information or fail to consider the effects of weighting may draw improper inferences from their regression models. Reasons why researchers continue to neglect complex sample-based variance are proposed and discussed, and example SAS and Stata code is offered to encourage adoption by the consumer research community.

Complex Sample Design Effects and Health Insurance Variance Estimation

Article

Jun 2009
J CONSUM AFF

Fifty-one articles using complex sample data published between 2000 and 2007 in three journals are reviewed. Of these, three articles indicate whether the analyses account for sampling design when calculating standard errors. To demonstrate how neglecting to properly calculate variances increases the probability of Type I errors, data from the Survey of Income and Program Participation (SIPP) are used to estimate health insurance coverage using three methods: simple random sample (SRS), generalized variance functions (GVFs), and direct estimation via replicate weights. The analysis shows that researchers using complex sample data are likely to draw improper inferences if they do not use replicate weights to estimate standard errors.

CFS promotes Stata program for using Survey of Consumer Finances data. Issue Brief 2016-6.1. University of Wisconsin-Madison Center for Financial Security

Jan 2015

Financial Center
Security

Center for Financial Security. (2015). CFS promotes Stata program for using Survey of Consumer Finances data. Issue Brief 2016-6.1. University of Wisconsin-Madison Center for Financial Security, Madison, WI. Available: http://cfs.wisc.edu/presentations/scf_combo_brief.pdf

Codebook for 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System

Jan 2014

Federal Reserve

Federal Reserve. (2014a). Codebook for 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System. Available: http://www.federalreserve.gov/econresdata/scf/files/codebk2013.txt

2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System

Jan 2014

Federal Reserve

Federal Reserve. (2014b). 2013 Survey of Consumer Finances. Board of Governors of the Federal Reserve System. Available: http://www.federalreserve.gov/econresdata/scf/scfindex.htm

This technical note is provided as a service to anyone who might find it helpful, particularly graduate students who are using the Survey of Consumer Finances. Comments and corrections are appreciated! Suggested citation

Jan 2015

Corrections Comments

Comments and Corrections: This technical note is provided as a service to anyone who might find it helpful, particularly graduate students who are using the Survey of Consumer Finances. Comments and corrections are appreciated! Suggested citation: Nielsen, R. B. (2015). SCF complex sample specification for Stata. Technical note, Department of Financial Planning Housing and Consumer Economics, University of Georgia, Athens, GA. doi: 10.13140/RG.2.1.4126.8240

SCF Complex Sample Specification for Stata

Abstract

Recommended publications

Leaf tissues flows of alexandergrass grazed by heifers under different supplementation frequencies

S1 File

Structural Learning of Bayesian Networks Via Constrained Hill Climbing Algorithms: Adjusting Trade-o...

A New Look at Patent Quality: Relating Patent Prosecution to Validity