Applied Survey Data Analysis

 Project Overview         Information about Authors           Links to Data Sets          Links to Additional Sites        

Survey Data Analysis Publications         Professional Reviews        Frequently Asked Questions     Supplemental Code


           

Analysis Examples Replication     

The analysis examples replication materials cover Chapters 5-12 of ASDA but not every software package contains all 8 chapters.  Lack of a link for a given chapter indicates that this software package does not include the ability to perform this type of analysis technique. 


SAS v9.2 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

 

Sudaan 10.0 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

 

SPSS/PASW V18.0 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

 

IVEware Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

 

WesVar 4.3 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

 

R Survey 3.2 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

 

Mplus 5.2 Code and Results

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

Chapter 12 Analysis Examples

 

Stata v10.1 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

Chapter 12 Analysis Examples

 

Site Overview

This site contains information about the text "Applied Survey Data Analysis" including author biographies, links to public release data sets and related sites, code and output for analysis examples replicated in current software packages, and information about new publications of interest to survey data analysts.  Other features include a FAQ log and links to other software and statistical sites.  We plan to intermittently update this site with news about ongoing statistical and software advances in the field of analysis of survey data.  

 

Special Note from Authors

The most recent printing of Applied Survey Data Analysis, as of March 7, 2013, has a font issue where some symbols appear to be missing in the text.  This problem is being corrected for all future printings.  Please accept our apologies and this will be fixed as soon as possible. 

 

Project Overview

Applied Survey Data Analysis is the product born of many years of teaching applied survey data analysis classes and practical experience analyzing survey data. We have taught various versions of this course in the ISR/SRC Summer Institute Program, as part of University of Michigan/CSCAR, and within the Survey Methodology Program at University of Michigan and University of Maryland.  Our goal has been to integrate teaching materials and practical analysis knowledge into a textbook geared to a level accessible for graduate students and working analysts who may have varying levels of statistical and analytic expertise.  We intend to update the materials on this website as statistical and software improvements emerge with the goal of assisting analyst and researchers performing survey data analysis.  

 

Information About Authors

Patricia A. Berglund is a Senior Research Associate in the Survey Methodology Program at the Institute for Social Research.  She has extensive experience in the use of computing systems for data management and complex sample survey data analysis. She works on research projects in youth substance abuse, adult mental health, and survey methodology using data from Army STARRS, Monitoring the Future, the National Comorbidity Surveys, World Mental Health Surveys, Collaborative Psychiatric Epidemiology Surveys, and various other national and international surveys. In addition, she is involved in development, implementation, and teaching of analysis courses and computer training programs at the Survey Research Center-Institute for Social Research.  She also lectures in the SAS® Institute-Business Knowledge Series.   mailto:pberg@umich.edu

Steven G. Heeringa is a Research Scientist in the Survey Methodology Program, the Director of the Statistical and Research Design Group in the Survey Research Center, and the Director of the Summer Institute in Survey Research Techniques at the Institute for Social Research. He has over 25 years of statistical sampling experience directing the development of the SRC National Sample design, as well as sample designs for SRC's major longitudinal and cross-sectional survey programs. During this period he has been actively involved in research and publication on sample design methods and procedures such as weighting, variance estimation, and the imputation of missing data that are required in the analysis of sample survey data. He has been a teacher of survey sampling methods to U.S. and international students and has served as a sample design consultant to a wide variety of international research programs based in countries such as Russia, the Ukraine, Uzbekistan, Kazakhstan, India, Nepal, China, Egypt, Iran, and Chile. mailto:sheering@umich.edu

Brady T. West is an Assistant Research Professor in the Survey Methodology Program at the University of Michigan and an Assistant Research Scientist at the Center for Statistical Consultation and Research (CSCAR) on the University of Michigan campus. He earned a PhD in Survey Methodology from the Michigan Program in Survey Methodology, and also received an MA in Applied Statistics from the University of Michigan Statistics Department.  His primary research interests revolve around regression models for clustered and longitudinal data, and he has authored a book, "Linear Mixed Models: A Practical Guide Using Statistical Software" (www.umich.edu/~bwest/almmussp.html) comparing different statistical software packages in terms of their mixed modeling procedures (Chapman Hall/CRC Press, 2007). He specializes in applications of statistical software and analysis of survey data, and through CSCAR teaches several yearly short courses on statistical methodology and software. mailto:bwest@umich.edu

Professional Reviews of ASDA

    1. Review/Summary of ASDA from the Stata Bookstore: Stata Review of ASDA

    2. Review posted on Amazon.com:   

5.0 out of 5 stars Simply a Great Book, December 25, 2010 By Dennis Hanseman (Cincinnati, OH United States)

"Applied Survey Data Analysis (ASDA) is a crystal-clear survey of modern techniques for analyzing complex survey data. Note the word "analyzing".

 

This is not a text on sampling methods per se. Rather, it is a guide to using existing data sets that result from a complex survey design that employs weighting, clustering, and stratification. The authors demonstrate how a correct analysis should be undertaken. In doing so, they review descriptive statistics, categorical methods, regression analysis (linear and logistic), survival analysis, and multiple imputation. Most examples use Stata, but some are in SAS.

 

The level of mathematical sophistication is not high, although "theory boxes" are interspersed to add additional detail. Anyone who is challenged by the mathematical level of this book probably should not be working with survey data in the first place.

 

In sum, this is an important -- and very well written -- contribution to the literature on survey data analysis."

 

    3. Review from International Statistical Review (2010), 78, 3, 445–482. (Page 463 extracted here). 2010 The Authors.     

    International Statistical Review 2010 International Statistical Institute.  To read this review click here: Review of ASDA.

 

    4. Review from "Applied Quantitative Methods Network" Newsletter in the UK: Review of ASDA.

    

     5.  Review from Amazon.com:

5.0 out of 5 stars: "A must-have for anyone analyzing survey data", (Kristen Olson, Lincoln NE): May 2, 2011.    This review is from: Applied Survey Data Analysis (Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences) (Hardcover)
"This book is unique in the extensive market of books on analysis of survey data. Most data collected on finite populations are selected with unequal probabilities of selection, strata and clusters, but most regression textbooks assume a simple random sample. This is the first full-length textbook that deals with subclass analyses, categorical data analysis, and various generalized linear models (from linear regression through hierarchical models) and complex survey designs at a statistical level accessible to most graduate students or data analysts. Weight creation and multiple imputation are also covered.

Readers will not be scared off by 'too many' formulas in this book. Although formulas are used throughout the book, there is not a great deal of detailed statistical theory presented; additionally, the 'Theory Boxes' provide enough information for more statistically inclined readers to know where to turn for more information. The book is best used for a more advanced statistical models class, after students have taken their basic regression/correlation class (and possibly after a categorical data class). The homework assignments at the end of each chapter are a useful addition to the text, with the required data sets available on the book's website. I find that the homework assignments are best supplemented with additional examples with other data sets, especially for classes taught repeatedly, but they are a great starting place.

I strongly recommend this book to anyone wanting a 'how to' book for conducting and interpreting analyses on complex survey data, supplemented with extensive documentation on model fitting and diagnostics under a complex survey design (where they are available). It is immediately useful, with Stata code for all of the analyses provided in the text and SAS, Stata, MPlus, R, SUDAAN, WesVar, and SPSS code on the website."

 

     6. Review posted on Amazon.com:

5.0 out of 5 stars: from Sophia (August 23,2011):

"I bought this book in preparation for analyzing NHANES data for the first time.   To give an idea of my background: My formal biostatistics training is quite limited to what I got through experience on different research projects and through medical school. I use Stata 12 in a very basic way, and this was the first project using a large dataset that I did my own analysis for from start to finish. I've worked with weighted survey data before, but another statistician was doing all the actual coding.

I thought this book was pretty balanced between theory and practical issues. A lot of the theory was over my head, but certain parts were extremely illuminating and useful to read through. They do specify at least two semester of graduate level statistics or something like that as a prerequisite, but obviously I don't have that and I still found the book useful. The real value of the book lies in its numerous and detailed examples. The authors actually use NHANES data (and other national datasets) to work through their examples. They walk you through many different types of analyses, include multiple linear regression, multiple logistic regression, etc. Many of these examples are very detailed, and they build the whole model step-by-step and explain the rationale behind each step and decision. The book is extremely well organized, so that by flipping through the table of contents you can immediately find the relevant section for what you want to do with the data, read through the example, and apply the Stata code directly to your own analysis.

NHANES does have tutorials about how to work through their data; although I also found those to be useful and essential, I think that this book is superior because it does give more background about why you need to run certain types of analyses and tests rather than others.   I can't believe my university library didn't have this book--I think it's a totally worthwhile purchase for anyone preparing to work with large national datasets."

   

    7. Link to Chapman Hall Bestsellers List: (see ASDA on the list!): Link to BestSellers

 

    8. Link to ASDA review from The American Statistician: http://pubs.amstat.org/toc/tas/65/4       

"Overall, the book is clearly written and easy to follow, and well equipped with real data examples and a book web site. The program codes used in the example are also available, mostly written in Stata. I like the presentations with real survey examples and, in particular, the unified four-step approach to the regression analysis in different models. Anyone working on survey data analysis would find the book very helpful and instructive. The book website seems to be a good complement, with additional resources on this book." (Partial review).

    9.  Link to review from Journal of Statistics, 2011: http://www.jos.nu/Articles/abstract.asp?article=271139               

"Many survey data analysts have a good general understanding of the theory and application of statistical analysis to basic behavioral science data. However, many analysts do not receive specialized training in the specific aspects of complex survey design and its

implications for the statistical analysis of survey data. Applied Survey Data Analysis is a great remedy to fill this gap."

Links to Data Sets

National Comorbidity Survey-Replication (Collaborative Psychiatric Epidemiology Surveys)

        http://www.icpsr.umich.edu/cpes (for online documentation tools and data download) 

        http://www.hcp.med.harvard.edu/ncs (for NCS-R specific information)

National Health and Nutrition Examination Survey (National Center for Health Statistics)    

        http://www.cdc.gov/nchs/

Health and Retirement Survey (Institute for Social Research-University of Michigan)

        http://hrsonline.isr.umich.edu

United States Census Bureau

        http://www.census.gov/

 

Chapter Exercises Data Sets

        These data sets are subsets of the original data and are designed for use with the chapter exercises in ASDA.

        Chapter Exercises Data Sets (Stata and SAS Format)      Chapter Exercises Data Sets (R Format)

Analysis Example Data Sets

        These data sets are subsets of the original data and are designed for use with the analysis examples in ASDA.  We have included the raw variables used in the variable recodes and constructed variables used in the analysis examples. 

        Analysis Examples Data Sets (Stata and SAS Format)    

 

Frequently Asked Questions

        This document contains frequently asked questions and brief answers.  Click here: FAQ Document  

        This working paper addresses Accounting for Multi-stage Sample Designs in Complex Sample Variance Estimation by Brady West.  Click here to download: Multi-Stage Sample Designs

 

Links to Additional Sites

Data Archive

        University of Michigan (ICPSR) Data Archive http://www.icpsr.umich.edu

Software for Survey Data Analysis

       SAS® software     http://www.sas.com

        STATA® software     http://www.stata.com

        Sudaan® software     http://www.rti.org

        SPSS® software     http://www.spss.com

        Mplus® software     http://statmodel.com

        R software     http://www.r-project.org/

        WesVar software     http://www.westat.com/westat/statistical_software/wesvar

        IVEware     http://www.isr.umich.edu/src/smp/ive

        SDA from ICPSR  http://www.icpsr.umich.edu (online analysis system with survey correction capabilities)  

Software Updates

        Stata - V13.1 is current as of April 2014 

        IBM/SPSS- SPSS 21 is current as of April 2014  

        SAS - v9.4 is current as of April 2014 

        See websites for additional software updates and versions

 

Supplemental Code

This section provides key updates to software for analysis of survey data.

1. Stata v11-Example of new "factor" coding for categorical variables: Example of Factor Coding in Stata 11    

2. Stata v11.1-Some key updates:

    1. Survey estimation commands now support survey bootstrap SEs, with user-supplied bootstrap replicate weights: Example of svy bootstrap

    2. Survey estimation commands now support successive difference replicate  (SDR) weights, common in data sets supplied by the U.S. Census Bureau: Example of sdr method

    3. Standard goodness-of-fit (GOF) tests are now available after svy: probit and svy: logistic: Example of gof test

    4. Design-based estimates of the coefficient of variation (CV) can now be computed using svy: commands: Example of cv option

3. SAS v9.2- Example of how to use replicate weights using NHANES data: SAS Replicate Weights Example

4. Stata v11.0-Example of how to use replicate weights using NHANES data: Stata Replicate Weights Example

          5. Stata v10.1-Code to produce Table 8.4 and Figure 8.3: Non-Linear Comparisons of Logits     

          6. SAS v9.2 (TS2M3)-Example of PROC SURVEYPHREG (Cox Model): PROC SURVEYPHREG Example        

          7. Stata v11.1-Example of Mediation analysis with survey data and subpopulation indicator: Stata sgmediation example    

          8. R-Example of Quantile Regression with Bootstrap Method: R Quantile Regression Example

          9. Stata 11.1-Example of use of mi suite of commands: Stata 11.1 MI Example

         10. SAS v9.22-Example of use of NOMCAR option with PROC SURVEYMEANS: SAS NOMCAR Example

         11. Stata 11.1-Example of use of svy: logistic with estat gof post-estimation command: Stata estat gof Example

         12. Example of How to Create a Delimited Text File in SAS and Read Text File in R: Text File SAS to R Example  

         13. An Example of Fuller’s (1984) Method for Testing the Bias of Unweighted Estimates of Regression Parameters in a  Linear Regression Model Fuller's Method

 

         14. SAS code to implement Wilcoxon rank sum test for complex sample survey data: http://www.blackwellpublishing.com/rss

 

         15. SAS Macro for Difference Between Means (addition to PROC SURVEYMEANS): SAS Macro smsub.sas

        

         16. SAS Paper with Examples of ODS Graphics and SG Procedures with Examples of Weighted Frequency Plots: SAS Paper with ODS Graphics and SG Procedures Examples

 

         17. Note on How SPSS handles Strata with A Single or "Lonely" PSU: http://www-01.ibm.com/support/docview.wss?uid=swg21479202

 

         18. Link to Stata command for calculation of Population Attributable Risk proportions (user written "punaf" command): http://www.imperial.ac.uk/nhli/r.newson/usergp/uk2012/newson_ohp1.pdf

 

         19. Link to information about use of Stata 12.1 with the postestimation command estat gof after svy: logistic with subpopulations http://www.stata.com/statalist/archive/2011-03/msg00550.html

 

         20. SAS PROC MI - FCS imputation method with analysis of complex sample data: SAS PROC MI FCS Example.  Right click here to save SAS data set: Data set for FCS example

 

         21. SAS v9.3 PROC SURVEYMEANS with RATIO and DOMAIN statements for Example 5.9: SAS Example 5.9

 

         22. SAS v9.4 Example of How to Obtain the 2nd Order Rao-Scott Chi-Square Test in PROC SURVEYFREQ: PROC SURVEYFREQ with 2nd Order Rao-Scott Chi-Square  

 

         23. Example of using PROC EXPORT to convert SAS data set to Stata (.dta) and SPSS (.sav): SAS PROC EXPORT Example      

        

Statistical Resources for Analysis of Survey Data

        University of Michigan      

        Institute for Social Research-Summer Institute     www.isr.umich.edu/src/si

        IVEware (Imputation and Variance Estimation software)     www.isr.umich.edu/src/smp/ive

        ICPSR summer institute     http://www.icpsr.umich.edu/icpsrweb/sumprog/

        Center for Statistical Consulting and Research     www.umich.edu/~cscar/

        University of California-Los Angeles

        Survey Data Analysis     http://statcomp.ats.ucla.edu/survey/

        University of North Carolina-Chapel Hill

        Population Center     http://www.cpc.unc.edu/

        American Statistical Association 

        Home Page     http://www.amstat.org/

 

Survey Data Analysis Publications

This section is designed to provide information about key updates in publications regarding Survey Data analysis.  We will add to the list as new publications emerge.

1. Carle, A.C., Fitting multilevel models in complex survey data with design weights: Recommendations, BMC Medical Research Methodology, 1471-2288-9-49, 2009. http://www.biomedcentral.com/1471-2288/9/49   

          Abstract (Background)

Multilevel models (MLM) offer complex survey data analysts a unique approach to understanding individual and contextual determinants of public health. However, little summarized guidance exists with regard to fitting MLM in complex survey data with design weights. Simulation work suggests that analysts should scale design weights using two methods and fit the MLM using unweighted and scaled-weighted data. This article examines the performance of scaled-weighted and unweighted analyses across a variety of MLM and software programs.

2. Lumley, T.S., Complex Surveys: a guide to analysis using R, John Wiley & Sons, New York, 2010.

          Synopsis

A complete guide to carrying out complex survey analysis using R.  As survey analysis continues to serve as a core component of sociological research, researchers are increasingly relying upon data gathered from complex surveys to carry out traditional analyses. Complex Surveys is a practical guide to the analysis of this kind of data using R, the freely available and downloadable statistical programming language.

3. Liao, Dan., Collinearity Diagnostics for Complex Survey Data.  Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, Maryland, (2010).

4. Asparouhov, T. & Muthen, B. (2006). Multilevel modeling of complex survey data. Proceedings of the Joint Statistical Meeting in Seattle, August 2006. ASA section on Survey Research Methods, 2718-2726. Paper can be downloaded from here.

5. Berglund, Patricia, (2010).  An Introduction to Multiple Imputation of Complex Sample Data Using SAS v9.2, SAS Global Forum 2010, Paper 265-2010. Paper can be downloaded from here.

6. Kolenikov, S., Resampling Variance Estimation for Complex Survey Data, Stata Journal, sj10-2: pp. 165–199. http://www.stata-journal.com/

7. Valliant, R., The Effect of Multiple Weighting Steps on Variance Estimation, Journal of Official Statistics, Vol. 20, No. 1, 2004, pp. 1–18.

          Abstract

Multiple weight adjustments are common in surveys to account for ineligible units on a frame, nonresponse by some units, and the use of auxiliary data in estimation. A practical question is whether all of these steps need to be accounted for when estimating variances. Linearization variance estimators and related estimators in commercial software packages that use squared residuals usually account only for the last step in estimation, which is the incorporation of auxiliary data through poststratification, regression estimation, or similar methods. Replication variance estimators can explicitly account for all of the steps in estimation by repeating each adjustment separately for each replicate subsample. Through simulation, this article studies the difference in these methods for some specific sample designs, estimators of totals, and rates of ineligibility and nonresponse. In the simulations reported here, the linearization variance estimators are negatively biased and produce confidence intervals for a population total that cover at less than the nominal rate, especially at smaller sample sizes. The jackknife replication estimator generally yields confidence intervals that cover at or above the nominal rate but do so at the expense of considerably overestimating empirical mean squared errors. A leverage-adjusted variance estimator, which is related to the jackknife estimator, has small positive bias and nearly nominal coverage. The leverage-adjusted estimator is less computationally burdensome than the jackknife but works well in the situations studied here where multiple weighting steps are used.

8. Valliant, R. and Rust, K.F., Degrees of Freedom Approximations and Rules-of-Thumb, Journal of Official Statistics, Vol. 26, No. 4, 2010, pp. 585–602.

Abstract

 

In complex samples, t-distributions are used when performing hypothesis tests and

constructing confidence intervals. Rules-of-thumb are typically used to approximate degrees

of freedom for the t-distributions. The standard rule is to set the degrees of freedom equal to

the number of primary sampling units minus the number of strata. We illustrate some

circumstances where these rules can be poor. A simple estimate of degrees of freedom is

presented that leads to improved confidence interval coverage.

9.  Brumback, B. and He, Z., The Mantel–Haenszel estimator adapted for complex survey designs is not dually consistent, Statistics & Probability Letters Volume 81, Issue 9, September 2011, Pages 1465-1470.

10.  Brumback, B. and He, Z., Adjusting for confounding by neighborhood using complex survey data,  Statistics in Medicine, Volume 30, Issue 9, pages 965–972, 30 April 2011.

11. Liao, D. (2011). Variance Inflation Factors in the Analysis of Complex Survey Data. Paper presented at the 2011 Joint Statistical Meetings, Miami Beach, FL. Currently under review for publication in Survey Methodology.

12. Li, J. and Valliant, R.. Linear Regression Influence Diagnostics for Unclustered Survey Data, Journal of Official Statistics, Vol.27, No.1, 2011. pp. 99–119.  Click here to view abstract: Link to Information about Paper

Abstract
Diagnostics for linear regression models have largely been developed to handle nonsurvey data. The models and the sampling plans used for finite populations often entail stratification, clustering, and survey weights. In this article we adapt some influence diagnostics that have been formulated for ordinary or weighted least squares for use with unclustered survey data. The statistics considered here include DFBETAS, DFFITS, and Cook’s D. The differences in the performance of ordinary least squares and survey-weighted diagnostics are compared in an empirical study where the values of weights, response variables, and covariates vary substantially.

Keywords
Complex sample, Cook’s D, DFBETAS, DFFITS, influence, outlier, residual analysis

13. Wagstaff, D.A. and Harel, O., A Closer Examination of Three Small-Sample Approximations to the Multiple-Imputation Degrees of Freedom. The Stata Journal (2011) 11, Number 3, pp. 403–419.  http://www.stata-journal.com/

 

14. Binder, D.A., ESTIMATING MODEL PARAMETERS FROM A COMPLEX SURVEY UNDER A MODEL-DESIGN RANDOMIZATION FRAMEWORK, Pak. J. Statist., 2011 Vol. 27(4), 371-390.  Link to Paper       

Abstract

When an analyst faces the problem of estimating model parameters to data from a complex survey, one of the first questions he often asks is whether or not to use the survey weights. The appropriate question to ask, however, is whether the survey design information itself is relevant, and if so, how should it be incorporated in the analysis. The debate between the design-based and the model-based schools for making inferences on model parameters can be explained and clarified using a model-design randomization framework to describe how the observations for the sampled units have been obtained.

 

Keywords 

Complex survey data; Design-based inference; Model-design-based framework; Informativeness; Ignorability.

15. Li, J. and Valliant, R.DETECTING GROUPS OF INFLUENTIAL OBSERVATIONS  IN LINEAR REGRESSION USING SURVEY DATA—ADAPTING THE FORWARD SEARCH METHOD, Pak. J. Statist. 2011 Vol. 27(4), 507-528.  Link to Paper

Abstract

The forward search is an effective and efficient approach when analyzing non-survey data to detect a group of influential observations which affect regression estimates greatly if they were removed from the model fitting. It has the advantages of avoiding masked

effects among the outliers, as well as automatically identifying influential points. Compared to multiple-case deletion diagnostic statistics, this method reduces computational burden, especially when the dataset is very large. In this research we adapted the forward search to linear regression diagnostics for some types of complex survey data. While keeping the existing advantages of this method, we incorporate sample weights and the effects of stratification. A case study is conducted to illustrate the advantages of the adapted method.

 

Keywords

Cook’s distance, diagnostics for survey data, influence, linear regression, outliers, survey data.

 

16. Multiple authors, Journal of Statistical Software, Vol. 45, Issue 1-7, Dec 2011. Various articles on multiple imputation are included in this volume.

Description

The current issue of Journal of Statistical Software has several articles devoted to multiple imputation, including implementations in R, SAS, and Stata.  There is also an article devoted to imputation in multilevel structures.  Click here for more information and links to articles: http://www.jstatsoft.org/v45

17. Mplus Notes area with many articles about survey data analysis: http://statmodel.com/resrchpap.shtml.

18. Kott, P. and Liao, D. Providing double protection for unit nonresponse with a nonlinear calibration-weighting routine, Survey Research Methods (2012) 

        Vol.6, No.2, pp. 105-111.  Link to paper: Kott and Liao 2012

 

 19. Sundar Natarajan, Stuart R. Lipsitz, Garrett M. Fitzmaurice, Debajyoti Sinha, Joseph G. Ibrahim, Jennifer Haas, Walid Gellad, An Extension of the Wilcoxon rank sum    test for complex sample survey data. Journal of the Royal Statistical Society:  Series C (Applied Statistics),Volume 61, Issue 4, pages 653-664, August 2012.

 

 20. Czaplewski, Raymond L.  2010.  Complex sample survey estimation in static state-space.   Gen. Tech. Rep. RMRS-GTR-239. Fort Collins, CO: U.S. Department of         Agriculture, Forest Service, Rocky Mountain Research Station. 124 p. http://treesearch.fs.fed.us/pubs/36115

 21. Czaplewski, Raymond L.  2010.  Recursive restriction estimation: an alternative to post-stratification in surveys of land and forest cover.   Res. Pap. RMRS-RP-81. Fort Collins, CO: U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station. 32 p. http://treesearch.fs.fed.us/pubs/36116

 22. Owen, A., and Eckles, D. Bootstrapping data arrays of arbitrary order. Annals of Applied Statistics, Volume 6, Number 3 (2012), 895-927. Available from http://arxiv.org/abs/1106.2125.

 

23.  A. Veiga, P. W. F. Smith and J. J. Brown, The use of sample weights in multivariate multilevel models with an application to income data collected by using a rotating panel survey.  Forthcoming in the Journal of the Royal Statistical Society, 2013.  Link to paper: Veiga et al

Summary. Longitudinal data from labour force surveys permit the investigation of income dynamics

at the individual level. However, the data often originate from surveys with a complex

multistage sampling scheme. In addition, the hierarchical structure of the data that is imposed

by the different stages of the sampling scheme often represents the natural grouping in the

population. Motivated by how income dynamics differ between the formal and informal sectors

of the Brazilian economy and the data structure of the Brazilian Labour Force Survey, we extend

the probability-weighted iterative generalized least squares estimation method. Our method is

used to fit multivariate multilevel models to the Brazilian Labour Force Survey data where the

covariance structure between occasions at the individual level is modelled.We conclude that

there are significant income differentials and that incorporating the weights in the parameter

estimation has some effect on the estimated coefficients and standard errors.

 

Keywords: Design weights; Labour force surveys; Longitudinal data; Multivariate multilevel

models; Non-response weights; Probability-weighted iterative generalized least squares

24.  Newson R. Confidence intervals for rank statistics: Somers' D and extensions. The Stata Journal 2006; 6(3): 309-334.  Prepublication draft at: http://www.imperial.ac.uk/nhli/r.newson/papers/somdext.pdf.

Abstract. Somers’ D is an asymmetric measure of association between two variables,

which plays a central role as a parameter behind rank or “non–parametric”

statistical methods. Given a predictor variable X and an outcome variable Y , we

may estimate DYX as a measure of the effect of X on Y , or we may estimate

DXY as a performance indicator of X as a predictor of Y. The somersd package

allows the estimation of Somers’ D and Kendall’s τa with confidence limits as

well as P-values. The Stata 9 version of somersd can estimate extended versions

of Somers’ D not previously available, including the Gini index, the parameter

tested by the sign test, and extensions to left– or right–censored data. It can also

estimate stratified versions of Somers’ D, restricted to pairs in the same stratum.

Therefore, it is possible to define strata by grouping values of a confounder, or

of a propensity score based on multiple confounders, and to estimate versions of

Somers’ D which measure the association between the outcome and the predictor,

adjusted for the confounders. The Stata 9 version of somersd uses the Mata

language for improved computational efficiency with large datasets.

Keywords: st0001, Somers’ D, Kendall’s tau, Harrell’s c, ROC area, Gini index,

population attributable risk, rank correlation, rank–sum test, Wilcoxon test, sign

test, confidence intervals, non–parametric methods, propensity score.

25.  Presentation on AIC and BIC for Survey Data by Thomas Lumley and Alastair Scott: Link to Presentation

 

26. T. Lumley and A.J. Scott (2013). Partial likelihood-ratio tests for the Cox model under complex sampling. Statistics in Medicine, 32, 110-123.

 

27. T. Lumley and A.J. Scott (2012). Fitting GLMs with survey data. Proceedings of the Survey Research Methods Section, Amer. Statist. Assoc, 5174-5181.

 

28. T. Lumley and A.J. Scott (2013). Two-sample rank tests under complex sampling. Biometrika, 100, to appear shortly.   

 

29. V. Landsmana*† and B. I. Graubard, Efficient analysis of case-control studies with sample weights: Link to Paper

 

30. J.N.K. Rao, F. Verret, and M.A. Hidiroglou, A WEIGHTED ESTIMATING EQUATIONS APPROACH TO INFERENCE FOR TWO-LEVEL MODELS FROM SURVEY DATA: Link to Paper

 

31. Pfeffermann, Danny (2011) Modelling of complex survey data: why is it a problem? How should we approach it? Survey Methodology, 37, (2), 115-136. Link to Paper

 

32. Norton, E.C., Miller, M.M., Kleinman, L.C. (2001) Computing adjusted risk ratios and risk differences in Stata, Stata Journal, Volume 13, Number 3, 492-509. Link to Paper

 

33. Bieler, G.S., Brown, G.G., Williams, R.L., & Brogan, D.J. (2010). Estimating model-adjusted risks, risk differences, and risk ratios from complex survey data. American Journal of Epidemiology, 171 (5):618-623. Link to Paper
 

34. Beaumont, J.F., Bocci, C. (2009) A Practical Bootstrap Method for Testing Hypotheses from Survey Data. Survey Methodology, 35, 25-35. Link to Paper

 

35. Lumley, T., and Scott, A.J. (2014). Tests for Regression Models Fitted to Survey Data.  Australian & New Zealand Journal of Statistics, 56, 1-14.  Link to Paper

 

36. Berglund, P.A, and Heeringa, S.G., Multiple Imputation of Missing Data Using SAS.  SAS Publishing 2014.  Link to Book

 

37. Min Zhu, SAS Institute Inc.. Paper SAS026-2014  Analyzing Multilevel Models with the GLIMMIX Procedure.  Link to Paper

Errata

Please check this link for corrections to ASDA: ASDA Errata

 

www.isr.umich.edu/src/smp/asda
 For problems or questions regarding this Web site contact [pberg@umich.edu].
Last updated: 09/04/14

Visitors to this Site: