Applied Survey Data Analysis

 Project Overview         Information about Authors           Links to Data Sets          Links to Additional Sites        

Survey Data Analysis Publications         Professional Reviews        Frequently Asked Questions     Supplemental Code


Analysis Examples Replication-First Edition     

The analysis examples replication materials cover Chapters 5-12 of ASDA First Edition but not every software package contains all 8 chapters.  Lack of a link for a given chapter indicates that this software package does not include the ability to perform this type of analysis technique. 

SAS v9.2 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples


Sudaan 10.0 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples


SPSS/PASW V18.0 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples


IVEware Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples


WesVar 4.3 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples


R Survey 3.2 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples


Mplus 5.2 Code and Results

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

Chapter 12 Analysis Examples


Stata v10.1 Code and Results

Chapter 5 Analysis Examples

Chapter 6 Analysis Examples

Chapter 7 Analysis Examples

Chapter 8 Analysis Examples

Chapter 9 Analysis Examples

Chapter 10 Analysis Examples

Chapter 11 Analysis Examples

Chapter 12 Analysis Examples
































































































































Site Overview

This site contains information about the text "Applied Survey Data Analysis" including author biographies,links to public release data sets and related sites, code and output for analysis examples replicated in current software packages, and information about new publications of interest to survey data analysts.   Other features include a FAQ log and links to other software and statistical sites.  We plan to intermittently update this site with news about ongoing statistical and software advances in the field of analysis of survey data.   


Special Notes from Authors

Coming Soon: ASDA-Section Edition! We anticipate that the 2nd edition of ASDA will be available in late May or early June, 2017!

The printing of Applied Survey Data Analysis-First Edition from March 7, 2013 has a font issue where some symbols appear to be missing in the text.  This problem was corrected in subsequent printings.  Please accept our apologies for this issue.  


Project Overview

Applied Survey Data Analysis is the product born of many years of teaching applied survey data analysis classes and practical experience analyzing survey data. We have taught various versions of this course in the ISR/SRC Summer Institute Program, as part of University of Michigan/CSCAR, and within the Survey Methodology Program at University of Michigan and University of Maryland.  Our goal has been to integrate teaching materials and practical analysis knowledge into a textbook geared to a level accessible for graduate students and working analysts who may have varying levels of statistical and analytic expertise.  We intend to update the materials on this website as statistical and software improvements emerge with the goal of assisting analyst and researchers performing survey data analysis.  


Information About Authors

Patricia A. Berglund is a Senior Research Associate in the Survey Methodology Program at the Institute for Social Research.  She has extensive experience in the use of computing systems for data management and complex sample survey data analysis. She works on research projects in youth substance abuse, adult mental health, and survey methodology using data from Army STARRS, Monitoring the Future, the National Comorbidity Surveys, World Mental Health Surveys, Collaborative Psychiatric Epidemiology Surveys, and various other national and international surveys. In addition, she is involved in development, implementation, and teaching of analysis courses and computer training programs at the Survey Research Center-Institute for Social Research.  She also lectures in the SAS® Institute-Business Knowledge Series.

Steven G. Heeringa is a Research Scientist in the Survey Methodology Program, the Director of the Statistical and Research Design Group in the Survey Research Center, and the Director of the Summer Institute in Survey Research Techniques at the Institute for Social Research. He has over 25 years of statistical sampling experience directing the development of the SRC National Sample design, as well as sample designs for SRC's major longitudinal and cross-sectional survey programs. During this period he has been actively involved in research and publication on sample design methods and procedures such as weighting, variance estimation, and the imputation of missing data that are required in the analysis of sample survey data. He has been a teacher of survey sampling methods to U.S. and international students and has served as a sample design consultant to a wide variety of international research programs based in countries such as Russia, the Ukraine, Uzbekistan, Kazakhstan, India, Nepal, China, Egypt, Iran, and Chile.

Brady T. West Brady T. West is a Research Associate Professor in the Survey Methodology Program, located within the Survey Research Center at the Institute for Social Research on the University of Michigan-Ann Arbor (U-M) campus. He also serves as a Statistical Consultant on the U-M Consulting for Statistics, Computing, and Analytics Research (CSCAR) team. He earned his PhD from the Michigan Program in Survey Methodology in 2011. Before that, he received an MA in Applied Statistics from the U-M Statistics Department in 2002, being recognized as an Outstanding First-year Applied Masters student, and a BS in Statistics with Highest Honors and Highest Distinction from the U-M Statistics Department in 2001. His current research interests include the implications of measurement error in auxiliary variables and survey paradata for survey estimation, survey nonresponse, interviewer variance, and multilevel regression models for clustered and longitudinal data. He is the lead author of a book comparing different statistical software packages in terms of their mixed-effects modeling procedures (Linear Mixed Models: A Practical Guide using Statistical Software,Second Edition, Chapman Hall/CRC Press, 2014), and he is a co-author of a second book entitled Applied Survey Data Analysis (with Steven Heeringa and Pat Berglund), which was published by Chapman Hall in April 2010 and has a second edition in press that will be available in mid-2017. Brady lives in Dexter, MI with his wife Laura, his son Carter, his daughter Everleigh, and his American Cocker Spaniel Bailey.

Professional Reviews of ASDA

    1. Review/Summary of ASDA from the Stata Bookstore: Stata Review of ASDA

    2. Review posted on   

5.0 out of 5 stars Simply a Great Book, December 25, 2010 By Dennis Hanseman (Cincinnati, OH United States)

"Applied Survey Data Analysis (ASDA) is a crystal-clear survey of modern techniques for analyzing complex survey data. Note the word "analyzing".


This is not a text on sampling methods per se. Rather, it is a guide to using existing data sets that result from a complex survey design that employs weighting, clustering, and stratification. The authors demonstrate how a correct analysis should be undertaken. In doing so, they review descriptive statistics, categorical methods, regression analysis (linear and logistic), survival analysis, and multiple imputation. Most examples use Stata, but some are in SAS.


The level of mathematical sophistication is not high, although "theory boxes" are interspersed to add additional detail. Anyone who is challenged by the mathematical level of this book probably should not be working with survey data in the first place.


In sum, this is an important -- and very well written -- contribution to the literature on survey data analysis."


    3. Review from International Statistical Review (2010), 78, 3, 445–482. (Page 463 extracted here). 2010 The Authors.     

    International Statistical Review 2010 International Statistical Institute.  To read this review click here: Review of ASDA.


    4. Review from "Applied Quantitative Methods Network" Newsletter in the UK: Review of ASDA.


     5.  Review from

5.0 out of 5 stars: "A must-have for anyone analyzing survey data", (Kristen Olson, Lincoln NE): May 2, 2011.    This review is from: Applied Survey Data Analysis (Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences) (Hardcover)
"This book is unique in the extensive market of books on analysis of survey data. Most data collected on finite populations are selected with unequal probabilities of selection, strata and clusters, but most regression textbooks assume a simple random sample. This is the first full-length textbook that deals with subclass analyses, categorical data analysis, and various generalized linear models (from linear regression through hierarchical models) and complex survey designs at a statistical level accessible to most graduate students or data analysts. Weight creation and multiple imputation are also covered.

Readers will not be scared off by 'too many' formulas in this book. Although formulas are used throughout the book, there is not a great deal of detailed statistical theory presented; additionally, the 'Theory Boxes' provide enough information for more statistically inclined readers to know where to turn for more information. The book is best used for a more advanced statistical models class, after students have taken their basic regression/correlation class (and possibly after a categorical data class). The homework assignments at the end of each chapter are a useful addition to the text, with the required data sets available on the book's website. I find that the homework assignments are best supplemented with additional examples with other data sets, especially for classes taught repeatedly, but they are a great starting place.

I strongly recommend this book to anyone wanting a 'how to' book for conducting and interpreting analyses on complex survey data, supplemented with extensive documentation on model fitting and diagnostics under a complex survey design (where they are available). It is immediately useful, with Stata code for all of the analyses provided in the text and SAS, Stata, MPlus, R, SUDAAN, WesVar, and SPSS code on the website."


     6. Review posted on

5.0 out of 5 stars: from Sophia (August 23,2011):

"I bought this book in preparation for analyzing NHANES data for the first time.   To give an idea of my background: My formal biostatistics training is quite limited to what I got through experience on different research projects and through medical school. I use Stata 12 in a very basic way, and this was the first project using a large dataset that I did my own analysis for from start to finish. I've worked with weighted survey data before, but another statistician was doing all the actual coding.

I thought this book was pretty balanced between theory and practical issues. A lot of the theory was over my head, but certain parts were extremely illuminating and useful to read through. They do specify at least two semester of graduate level statistics or something like that as a prerequisite, but obviously I don't have that and I still found the book useful. The real value of the book lies in its numerous and detailed examples. The authors actually use NHANES data (and other national datasets) to work through their examples. They walk you through many different types of analyses, include multiple linear regression, multiple logistic regression, etc. Many of these examples are very detailed, and they build the whole model step-by-step and explain the rationale behind each step and decision. The book is extremely well organized, so that by flipping through the table of contents you can immediately find the relevant section for what you want to do with the data, read through the example, and apply the Stata code directly to your own analysis.

NHANES does have tutorials about how to work through their data; although I also found those to be useful and essential, I think that this book is superior because it does give more background about why you need to run certain types of analyses and tests rather than others.   I can't believe my university library didn't have this book--I think it's a totally worthwhile purchase for anyone preparing to work with large national datasets."


    7. Link to Chapman Hall Bestsellers List: (see ASDA on the list!): Link to BestSellers


    8. Link to ASDA review from The American Statistician:       

"Overall, the book is clearly written and easy to follow, and well equipped with real data examples and a book web site. The program codes used in the example are also available, mostly written in Stata. I like the presentations with real survey examples and, in particular, the unified four-step approach to the regression analysis in different models. Anyone working on survey data analysis would find the book very helpful and instructive. The book website seems to be a good complement, with additional resources on this book." (Partial review).

    9.  Link to review from Journal of Statistics, 2011:               

"Many survey data analysts have a good general understanding of the theory and application of statistical analysis to basic behavioral science data. However, many analysts do not receive specialized training in the specific aspects of complex survey design and its

implications for the statistical analysis of survey data. Applied Survey Data Analysis is a great remedy to fill this gap."

Links to Data Sets

National Comorbidity Survey-Replication (Collaborative Psychiatric Epidemiology Surveys) (for online documentation tools and data download)  (for NCS-R specific information)

National Health and Nutrition Examination Survey (National Center for Health Statistics)    

Health and Retirement Survey (Institute for Social Research-University of Michigan)

United States Census Bureau


Chapter Exercises Data Sets

        These data sets are subsets of the original data and are designed for use with the chapter exercises in ASDA.

        Chapter Exercises Data Sets (Stata and SAS Format)      Chapter Exercises Data Sets (R Format)

Analysis Example Data Sets

        These data sets are subsets of the original data and are designed for use with the analysis examples in ASDA.  We have included the raw variables used in the variable recodes and constructed variables used in the analysis examples. 

        Analysis Examples Data Sets (Stata and SAS Format)    


Frequently Asked Questions

        This document contains frequently asked questions and brief answers.  Click here: FAQ Document  

        This working paper addresses Accounting for Multi-stage Sample Designs in Complex Sample Variance Estimation by Brady West.  Click here to download: Multi-Stage Sample Designs


Links to Additional Sites

Data Archive

        University of Michigan (ICPSR) Data Archive

Software for Survey Data Analysis

       SAS® software

        STATA® software

        Sudaan® software

        SPSS® software

        Mplus® software

        R software

        WesVar software


        SDA from ICPSR (online analysis system with survey correction capabilities)  

        Manual for Package ‘svydiags’ from R, Linear Regression Model Diagnostics for Survey Data  Link to Manual

Software Updates

        Stata - V13.1 is current as of April 2014 

        IBM/SPSS- SPSS 21 is current as of April 2014  

        SAS - v9.4 is current as of April 2014 

        See websites for additional software updates and versions


Supplemental Code

This section provides key updates to software for analysis of survey data.

1. Stata v11-Example of new "factor" coding for categorical variables: Example of Factor Coding in Stata 11    

2. Stata v11.1-Some key updates:

    1. Survey estimation commands now support survey bootstrap SEs, with user-supplied bootstrap replicate weights: Example of svy bootstrap

    2. Survey estimation commands now support successive difference replicate  (SDR) weights, common in data sets supplied by the U.S. Census Bureau: Example of sdr method

    3. Standard goodness-of-fit (GOF) tests are now available after svy: probit and svy: logistic: Example of gof test

    4. Design-based estimates of the coefficient of variation (CV) can now be computed using svy: commands: Example of cv option

3. SAS v9.2- Example of how to use replicate weights using NHANES data: SAS Replicate Weights Example

4. Stata v11.0-Example of how to use replicate weights using NHANES data: Stata Replicate Weights Example

          5. Stata v10.1-Code to produce Table 8.4 and Figure 8.3: Non-Linear Comparisons of Logits     

          6. SAS v9.2 (TS2M3)-Example of PROC SURVEYPHREG (Cox Model): PROC SURVEYPHREG Example        

          7. Stata v11.1-Example of Mediation analysis with survey data and subpopulation indicator: Stata sgmediation example    

          8. R-Example of Quantile Regression with Bootstrap Method: R Quantile Regression Example

          9. Stata 11.1-Example of use of mi suite of commands: Stata 11.1 MI Example

         10. SAS v9.22-Example of use of NOMCAR option with PROC SURVEYMEANS: SAS NOMCAR Example

         11. Stata 11.1-Example of use of svy: logistic with estat gof post-estimation command: Stata estat gof Example

         12. Example of How to Create a Delimited Text File in SAS and Read Text File in R: Text File SAS to R Example  

         13. An Example of Fuller’s (1984) Method for Testing the Bias of Unweighted Estimates of Regression Parameters in a  Linear Regression Model Fuller's Method


         14. SAS code to implement Wilcoxon rank sum test for complex sample survey data:


         15. SAS Macro for Difference Between Means (addition to PROC SURVEYMEANS): SAS Macro


         16. SAS Paper with Examples of ODS Graphics and SG Procedures with Examples of Weighted Frequency Plots: SAS Paper with ODS Graphics and SG Procedures Examples


         17. Note on How SPSS handles Strata with A Single or "Lonely" PSU:


         18. Link to Stata command for calculation of Population Attributable Risk proportions (user written "punaf" command):


         19. Link to information about use of Stata 12.1 with the postestimation command estat gof after svy: logistic with subpopulations


         20. SAS PROC MI - FCS imputation method with analysis of complex sample data: SAS PROC MI FCS Example.  Right click here to save SAS data set: Data set for FCS example


         21. SAS v9.3 PROC SURVEYMEANS with RATIO and DOMAIN statements for Example 5.9: SAS Example 5.9


         22. SAS v9.4 Example of How to Obtain the 2nd Order Rao-Scott Chi-Square Test in PROC SURVEYFREQ: PROC SURVEYFREQ with 2nd Order Rao-Scott Chi-Square  


         23. Example of using PROC EXPORT to convert SAS data set to Stata (.dta) and SPSS (.sav): SAS PROC EXPORT Example     


         24. Multiple Imputation Using the Fully Conditional Specification Method: A Comparison of SAS, Stata, IVEware, and R: Link to Presentation  


         25. Analysis of Survey Data Using the SAS SURVEY Procedures: A Primer: Link to Presentation


         26. Link to Web Site with Information about Free Tools for Survey Data Analysis and Map Production: Link to full code for Map Examples:


         27. SAS Repeated Replication Macro to do Design-Based Poisson Regression (with a comparison to Stata svy: poisson command): Link to Code and Results


Statistical Resources for Analysis of Survey Data

        University of Michigan      

        Institute for Social Research-Summer Institute

        IVEware (Imputation and Variance Estimation software)

        ICPSR summer institute

        Center for Statistical Consulting and Research

        University of California-Los Angeles

        Statistical and Survey Data Analysis

        University of North Carolina-Chapel Hill

        Population Center

        American Statistical Association 

        Home Page


Survey Data Analysis Publications-General Survey Data Analysis Topics

This section is designed to provide information about key updates in publications regarding Survey Data analysis.  We will add to the list as new publications emerge.

1. Carle, A.C., Fitting multilevel models in complex survey data with design weights: Recommendations, BMC Medical Research Methodology, 1471-2288-9-49, 2009.   

          Abstract (Background)

Multilevel models (MLM) offer complex survey data analysts a unique approach to understanding individual and contextual determinants of public health. However, little summarized guidance exists with regard to fitting MLM in complex survey data with design weights. Simulation work suggests that analysts should scale design weights using two methods and fit the MLM using unweighted and scaled-weighted data. This article examines the performance of scaled-weighted and unweighted analyses across a variety of MLM and software programs.

2. Lumley, T.S., Complex Surveys: a guide to analysis using R, John Wiley & Sons, New York, 2010.


A complete guide to carrying out complex survey analysis using R.  As survey analysis continues to serve as a core component of sociological research, researchers are increasingly relying upon data gathered from complex surveys to carry out traditional analyses. Complex Surveys is a practical guide to the analysis of this kind of data using R, the freely available and downloadable statistical programming language.

3. Liao, Dan., Collinearity Diagnostics for Complex Survey Data.  Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, Maryland, (2010).

4. Asparouhov, T. & Muthen, B. (2006). Multilevel modeling of complex survey data. Proceedings of the Joint Statistical Meeting in Seattle, August 2006. ASA section on Survey Research Methods, 2718-2726. Paper can be downloaded from here.

5. Berglund, Patricia, (2010).  An Introduction to Multiple Imputation of Complex Sample Data Using SAS v9.2, SAS Global Forum 2010, Paper 265-2010. Paper can be downloaded from here.

6. Kolenikov, S., Resampling Variance Estimation for Complex Survey Data, Stata Journal, sj10-2: pp. 165–199.

7. Valliant, R., The Effect of Multiple Weighting Steps on Variance Estimation, Journal of Official Statistics, Vol. 20, No. 1, 2004, pp. 1–18.


Multiple weight adjustments are common in surveys to account for ineligible units on a frame, nonresponse by some units, and the use of auxiliary data in estimation. A practical question is whether all of these steps need to be accounted for when estimating variances. Linearization variance estimators and related estimators in commercial software packages that use squared residuals usually account only for the last step in estimation, which is the incorporation of auxiliary data through poststratification, regression estimation, or similar methods. Replication variance estimators can explicitly account for all of the steps in estimation by repeating each adjustment separately for each replicate subsample. Through simulation, this article studies the difference in these methods for some specific sample designs, estimators of totals, and rates of ineligibility and nonresponse. In the simulations reported here, the linearization variance estimators are negatively biased and produce confidence intervals for a population total that cover at less than the nominal rate, especially at smaller sample sizes. The jackknife replication estimator generally yields confidence intervals that cover at or above the nominal rate but do so at the expense of considerably overestimating empirical mean squared errors. A leverage-adjusted variance estimator, which is related to the jackknife estimator, has small positive bias and nearly nominal coverage. The leverage-adjusted estimator is less computationally burdensome than the jackknife but works well in the situations studied here where multiple weighting steps are used.

8. Valliant, R. and Rust, K.F., Degrees of Freedom Approximations and Rules-of-Thumb, Journal of Official Statistics, Vol. 26, No. 4, 2010, pp. 585–602.



In complex samples, t-distributions are used when performing hypothesis tests and

constructing confidence intervals. Rules-of-thumb are typically used to approximate degrees

of freedom for the t-distributions. The standard rule is to set the degrees of freedom equal to

the number of primary sampling units minus the number of strata. We illustrate some

circumstances where these rules can be poor. A simple estimate of degrees of freedom is

presented that leads to improved confidence interval coverage.

9.  Brumback, B. and He, Z., The Mantel–Haenszel estimator adapted for complex survey designs is not dually consistent, Statistics & Probability Letters Volume 81, Issue 9, September 2011, Pages 1465-1470.

10.  Brumback, B. and He, Z., Adjusting for confounding by neighborhood using complex survey data,  Statistics in Medicine, Volume 30, Issue 9, pages 965–972, 30 April 2011.

11. Liao, D. (2011). Variance Inflation Factors in the Analysis of Complex Survey Data. Paper presented at the 2011 Joint Statistical Meetings, Miami Beach, FL. Currently under review for publication in Survey Methodology.

12. Li, J. and Valliant, R.. Linear Regression Influence Diagnostics for Unclustered Survey Data, Journal of Official Statistics, Vol.27, No.1, 2011. pp. 99–119.  Click here to view abstract: Link to Information about Paper

Diagnostics for linear regression models have largely been developed to handle nonsurvey data. The models and the sampling plans used for finite populations often entail stratification, clustering, and survey weights. In this article we adapt some influence diagnostics that have been formulated for ordinary or weighted least squares for use with unclustered survey data. The statistics considered here include DFBETAS, DFFITS, and Cook’s D. The differences in the performance of ordinary least squares and survey-weighted diagnostics are compared in an empirical study where the values of weights, response variables, and covariates vary substantially.

Complex sample, Cook’s D, DFBETAS, DFFITS, influence, outlier, residual analysis

13. Wagstaff, D.A. and Harel, O., A Closer Examination of Three Small-Sample Approximations to the Multiple-Imputation Degrees of Freedom. The Stata Journal (2011) 11, Number 3, pp. 403–419.




When an analyst faces the problem of estimating model parameters to data from a complex survey, one of the first questions he often asks is whether or not to use the survey weights. The appropriate question to ask, however, is whether the survey design information itself is relevant, and if so, how should it be incorporated in the analysis. The debate between the design-based and the model-based schools for making inferences on model parameters can be explained and clarified using a model-design randomization framework to describe how the observations for the sampled units have been obtained.



Complex survey data; Design-based inference; Model-design-based framework; Informativeness; Ignorability.



The forward search is an effective and efficient approach when analyzing non-survey data to detect a group of influential observations which affect regression estimates greatly if they were removed from the model fitting. It has the advantages of avoiding masked

effects among the outliers, as well as automatically identifying influential points. Compared to multiple-case deletion diagnostic statistics, this method reduces computational burden, especially when the dataset is very large. In this research we adapted the forward search to linear regression diagnostics for some types of complex survey data. While keeping the existing advantages of this method, we incorporate sample weights and the effects of stratification. A case study is conducted to illustrate the advantages of the adapted method.



Cook’s distance, diagnostics for survey data, influence, linear regression, outliers, survey data.


16. Multiple authors, Journal of Statistical Software, Vol. 45, Issue 1-7, Dec 2011. Various articles on multiple imputation are included in this volume.


The current issue of Journal of Statistical Software has several articles devoted to multiple imputation, including implementations in R, SAS, and Stata.  There is also an article devoted to imputation in multilevel structures.  Click here for more information and links to articles:

17. Mplus Notes area with many articles about survey data analysis:

18. Kott, P. and Liao, D. Providing double protection for unit nonresponse with a nonlinear calibration-weighting routine, Survey Research Methods (2012) 

        Vol.6, No.2, pp. 105-111.  Link to paper: Kott and Liao 2012


 19. Sundar Natarajan, Stuart R. Lipsitz, Garrett M. Fitzmaurice, Debajyoti Sinha, Joseph G. Ibrahim, Jennifer Haas, Walid Gellad, An Extension of the Wilcoxon rank sum    test for complex sample survey data. Journal of the Royal Statistical Society:  Series C (Applied Statistics),Volume 61, Issue 4, pages 653-664, August 2012.


 20. Czaplewski, Raymond L.  2010.  Complex sample survey estimation in static state-space.   Gen. Tech. Rep. RMRS-GTR-239. Fort Collins, CO: U.S. Department of         Agriculture, Forest Service, Rocky Mountain Research Station. 124 p.

 21. Czaplewski, Raymond L.  2010.  Recursive restriction estimation: an alternative to post-stratification in surveys of land and forest cover.   Res. Pap. RMRS-RP-81. Fort Collins, CO: U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station. 32 p.

 22. Owen, A., and Eckles, D. Bootstrapping data arrays of arbitrary order. Annals of Applied Statistics, Volume 6, Number 3 (2012), 895-927. Available from


 23.  A. Veiga, P. W. F. Smith and J. J. Brown, The use of sample weights in multivariate multilevel models with an application to income data collected by using a rotating panel survey.  Forthcoming in the Journal of the Royal Statistical Society, 2013.  Link to paper: Veiga et al

Summary. Longitudinal data from labour force surveys permit the investigation of income dynamics

at the individual level. However, the data often originate from surveys with a complex

multistage sampling scheme. In addition, the hierarchical structure of the data that is imposed

by the different stages of the sampling scheme often represents the natural grouping in the

population. Motivated by how income dynamics differ between the formal and informal sectors

of the Brazilian economy and the data structure of the Brazilian Labour Force Survey, we extend

the probability-weighted iterative generalized least squares estimation method. Our method is

used to fit multivariate multilevel models to the Brazilian Labour Force Survey data where the

covariance structure between occasions at the individual level is modelled.We conclude that

there are significant income differentials and that incorporating the weights in the parameter

estimation has some effect on the estimated coefficients and standard errors.


Keywords: Design weights; Labour force surveys; Longitudinal data; Multivariate multilevel

models; Non-response weights; Probability-weighted iterative generalized least squares

 24.  Newson R. Confidence intervals for rank statistics: Somers' D and extensions. The Stata Journal 2006; 6(3): 309-334.  Prepublication draft at:

Abstract. Somers’ D is an asymmetric measure of association between two variables,

which plays a central role as a parameter behind rank or “non–parametric”

statistical methods. Given a predictor variable X and an outcome variable Y , we

may estimate DYX as a measure of the effect of X on Y , or we may estimate

DXY as a performance indicator of X as a predictor of Y. The somersd package

allows the estimation of Somers’ D and Kendall’s τa with confidence limits as

well as P-values. The Stata 9 version of somersd can estimate extended versions

of Somers’ D not previously available, including the Gini index, the parameter

tested by the sign test, and extensions to left– or right–censored data. It can also

estimate stratified versions of Somers’ D, restricted to pairs in the same stratum.

Therefore, it is possible to define strata by grouping values of a confounder, or

of a propensity score based on multiple confounders, and to estimate versions of

Somers’ D which measure the association between the outcome and the predictor,

adjusted for the confounders. The Stata 9 version of somersd uses the Mata

language for improved computational efficiency with large datasets.

Keywords: st0001, Somers’ D, Kendall’s tau, Harrell’s c, ROC area, Gini index,

population attributable risk, rank correlation, rank–sum test, Wilcoxon test, sign

test, confidence intervals, non–parametric methods, propensity score.

 25.  Presentation on AIC and BIC for Survey Data by Thomas Lumley and Alastair Scott: Link to Presentation


 26. T. Lumley and A.J. Scott (2013). Partial likelihood-ratio tests for the Cox model under complex sampling. Statistics in Medicine, 32, 110-123.


 27. T. Lumley and A.J. Scott (2012). Fitting GLMs with survey data. Proceedings of the Survey Research Methods Section, Amer. Statist. Assoc, 5174-5181.


 28. T. Lumley and A.J. Scott (2013). Two-sample rank tests under complex sampling. Biometrika, 100, to appear shortly.   


 29. V. Landsmana*† and B. I. Graubard, Efficient analysis of case-control studies with sample weights: Link to Paper




 31. Pfeffermann, Danny (2011) Modelling of complex survey data: why is it a problem? How should we approach it? Survey Methodology, 37, (2), 115-136. Link to Paper


 32. Norton, E.C., Miller, M.M., Kleinman, L.C. (2001) Computing adjusted risk ratios and risk differences in Stata, Stata Journal, Volume 13, Number 3, 492-509. Link to Paper


 33. Bieler, G.S., Brown, G.G., Williams, R.L., & Brogan, D.J. (2010). Estimating model-adjusted risks, risk differences, and risk ratios from complex survey data. American Journal of Epidemiology, 171 (5):618-623. Link to Paper

 34. Beaumont, J.F., Bocci, C. (2009) A Practical Bootstrap Method for Testing Hypotheses from Survey Data. Survey Methodology, 35, 25-35. Link to Paper


35. Lumley, T., and Scott, A.J. (2014). Tests for Regression Models Fitted to Survey Data.  Australian & New Zealand Journal of Statistics, 56, 1-14.  Link to Paper


36. Berglund, P.A, and Heeringa, S.G., Multiple Imputation of Missing Data Using SAS.  SAS Publishing 2014.  Link to Book


37. Min Zhu, SAS Institute Inc.. Paper SAS026-2014  Analyzing Multilevel Models with the GLIMMIX Procedure.  Link to Paper


38. Yao, Wenliang, Ph.D., Estimation of ROC Curve with Complex Survey Data, Dissertation THE GEORGE WASHINGTON UNIVERSITY, 2013, Link to Paper

Abstract:Receiver Operating Characteristic (ROC) curve analysis has gained an increased interest in past decades. It has been widely used to evaluate the performance of diagnostic tests. The area under the ROC curve (denoted by AUC) is the most commonly used summary index of a ROC curve. A larger AUC value for a diagnostic test usually means that the test has better discriminating ability between diseased and non-diseased populations. Both parametric and nonparametric methods have been developed to estimate and compare AUCs. However, these methods are standardly used for simple random sample, not complex samples.

In surveys, complex sample designs with cluster sampling are commonly implemented. The Hispanic Health and Nutrition Examination Survey (HHANES) was conducted to assess the health and nutritional status the population of Hispanic individuals aged 6 months to 74 years in specific areas in U.S. by using a multistage, stratified, probability design with complex weight calculation. Analyses without accounting for weighting and clustering effect that is induced by the complex survey sampling can be biased. Thus, standard statistical methods of estimation of AUC for the population and its variance are not applicable to complex survey data.

In this dissertation, we propose an extension of the nonparametric method in the estimation of the population AUC that accounts for sample weighting under differing complex survey designs. We provide and study the accuracy of a jackknife method, along with balanced repeated replication (BRR), for variance estimation of our proposed estimator of AUC. We also discuss informative sample designs where the selection probabilities are related to the parameter of interest, so that the standard analyses that ignore the sample weights can be seriously biased. Finally, our proposed methods are then applied to the Mexican-American portion of the HHANES to compare the classification accuracy of three predictors for overweight/obese using measured BMI as a gold standard.

39. Lumley and Scott, AIC AND BIC FOR MODELING WITH COMPLEX SURVEY DATA, Journal of Survey Statistics and Methodology, 2015, Link to Paper

40. Thompson, Mary E., Using Longitudinal Complex Survey Data, Annual Review of Statistics.and Its Application,  2015. 2:305–20, Link to Paper

41. Bridget L. Ryan, John Koval, Bradley Corbett, Amardeep Thind, M. Karen Campbell, and Moira Stewart, Assessing the impact of potentially influential observations in weighted logistic regression, The Research Data Centres Information and Technical Bulletin, Catalogue no. 12-002‑X —No. 2015001, Link to Paper

42. Jianzhu Li and Richard Valliant, Linear Regression Diagnostics in Cluster Samples,Journal of Official Statistics, Vol. 31, No. 1, 2015, pp. 61–75, Link to Paper

43. Miles, Andrew, Obtaining Predictions from Models Fit to Multiply Imputed Data, Sociological Methods & Research, pp. 1-11, 2015, Link to Paper

44. Luchman, J.N., Determining Subgroup Difference Importance with Complex Survey Designs An Application of Weighted Dominance Analysis, Survey Practice, Vol. 8, no 4, 2015, Link to Paper

45. Oya Kalaycioglu,Andrew Copas, Michael King and Rumana Z. Omar, A comparison of multiple-imputation methods for handling missing data in repeated measurements observational studies, Journal of the Royal Statistical Society, June 2015, Link to Paper

46. Natalie Dean, Marcello Pagano, EVALUATING CONFIDENCE INTERVAL METHODS FOR BINOMIAL PROPORTIONS IN CLUSTERED SURVEYS, Journal of Survey Statistics and Methodology, October 2015, Link to Paper

47. Zhou, H., Elliott, M.R., Raghunathan, T.E. (2015). "Synthetic Multiple Imputation Procedure For Multi-Stage Complex Samples," to appear in Journal of Official Statistics soon.

48. Zhou, H., Elliott, M.R., Raghunathan, T.E. (2015). "A Two-Step Semiparametric Method to Accommodate Sampling Weights in Multiple Imputation," in Biometrics 2015 Sep 22. Link to Paper

49. Zhou, H., Elliott, M.R., Raghunathan, T.E. (2015). "Multiple Imputation In Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap," to appear in Journal of Survey Statistics and Methodology soon.

50. Stapleton, L. and Kang, Y. (2016). "Design Effects of Multilevel Estimates From National Probability Samples", Sociological Methods & Research 0049124116630563, first published on February 11, 2016 as doi:10.1177/0049124116630563, Link to Paper

51. Daoying Lin, Lingxiao Wang, and Yan Li, "HAPLOTYPE-BASED STATISTICAL INFERENCE FOR POPULATION-BASED CASE–CONTROL AND CROSS-SECTIONAL STUDIES WITH COMPLEX SAMPLE DESIGNS", J Surv Stat Methodol published 25 April 2016, 10.1093/jssam/smv040. Link to Paper

52. Bollen,K., Biemer,P., Karr,A., Tueller,S., Berzofsky,M.,"Are Survey Weights Needed? A Review of Diagnostic Tests in Regression Analysis", Annual Review of Statistics and Its Application Vol. 3: 375-392 (Volume publication date June 2016). Link to Paper

53. Hanzhi Zhou, Michael R. Elliott, and Trivellore E. Raghunathan,"Multiple Imputation in Two-stage Cluster Samples Using the Weighted Finite Population Bayesian Bootstrap", J Surv Stat Methodol 2016 4: 139-170. Link to Paper

54. Minsun Kim Riddles, Jae Kwang Kim, and Jongho Im, "A Propensity-score-adjustment Method for Nonignorable Nonresponse", J Surv Stat Methodol 2016 4: 215-245. . Link to Paper

55. Brady T. West, Joseph W. Sakshaug, Guy Alain S. Aurelien, "How Big of a Problem is Analytic Error in Secondary Analyses of Survey Data?", Published: June 29, Link to Paper

56. Ismael Flores Cervantes and J. Michael Brick, "Nonresponse adjustments with misspecified models in stratified designs", Survey Methodology, Catalogue no. 12-001-X, Release date: June 22, 2016. Link to Paper

57. Xiaying Zheng and Ji Seung Yang, "Using Sample Weights in Item Response Data Analysis Under Complex Sample Designs", L.A. van der Ark et al. (eds.), Quantitative Psychology Research, Springer, Proceedings in Mathematics & Statistics 167, DOI 10.1007/978-3-319-38759-8_10. Link to Paper

58. Xing Lui, "Fitting Proportional Odds Models for Complex Sample Survey Data with SAS, IBM SPSS, Stata, and R", General Linear Model Journal, 2016, Vol. 42(2). Link to Paper

59. Toth, Daniel, Bureau of Labor Statistics, "An R Package for Modeling Survey Data with Regression Trees", WSS Seminar, 2017. Link to Presentation

Survey Data Analysis Publications-Bayes Related

1. Elliott, M.R., Little, R.J.A. (2000). “Model-based Alternatives to Trimming Survey Weights,” Journal of Official Statistics, 16, 191-209.

2. Elliott, M.R., Sammel, M.D. (2002).  "Discussion of 'Latent Class Analysis of Complex Sample Survey Data: Application to Dietary Data'," Journal of the American Statistical Association, 97, 732-734.

3. Elliott, M.R. (2007).  “Bayesian Weight Trimming for Generalized Linear Regression Models,” Survey Methodology, 33, 23-34.

4. Elliott, M.R. (2008).  “Model Averaging Methods for Weight Trimming,” Journal of Official Statistics, 24, 517-540.

5. Elliott, M.R. (2009).  “Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models,” Journal of Official Statistics, 25, 1-20.

6. Chen, Q., Elliott, M.R., Little, R.J.A. (2010).  “Bayesian Penalized Spline Model-Based Inference for Finite Population Proportions in Unequal Probability Sampling,” Survey Methodology, 36, 22-34.

7. Chen, Q., Elliott, M.R., Little, R.J.A. (2012). “Bayesian Inference for Finite Population Quantiles from Unequal Probability Samples,” Survey Methodology, 38, 203-214.

8. Dong, Q., Elliott, M.R., Raghunathan, T.E. (2014). “A Nonparametric Method to Generate Synthetic Populations to Adjust for Complex Sample Designs,” Survey Methodology, 40, 29-46.

9. West, B.T., Elliott, M.R. (2014). “Frequentist and Bayesian Approaches for Comparing Interviewer Variance Components in Two Groups of Survey Interviewers,” Survey Methodology, 40, 163-188.

10. Dong, Q., Elliott, M.R., Raghunathan, T.E. (2014). “Combining Information from Multiple Complex Surveys,”  Survey Methodology, 40, 347-354.



Please check this link for corrections to ASDA: ASDA Errata
 For problems or questions regarding this Web site contact [].
Last updated:April 11, 2017

Visitors to this Site: