Propensity Score Analysis in the Context of Complex Survey Data

Researchers and statisticians frequently use propensity score analyses (PSA) to analyze observational datasets and reduce the impact of confounding due to observed covariates. In many of these applied studies, nationally representative population-based complex survey datasets are frequently used. Most of these studies incorrectly choose to ignore the complex survey design features; partly because there is a lack of clear guidelines of how PSA should be implemented in a complex survey data analysis context. Only a few relatively recent studies have examined how to incorporate PSA in this context, and some of these recommendations are contradictory, inconclusive, or not generalizable to all types of PSA. This workshop will help recognize some of the challenges and open questions in the ‘big data’ analysis setting. The workshop is aimed at practitioners and is particularly focused on demonstrating the implementation of PSA in a complex survey data analysis context through an illustrative data analysis exercise.


Background in causal inference or survey data analysis is not required. Attendees should have prerequisite knowledge of multiple regression analysis and working knowledge in R (e.g., basic data manipulation and regression fitting). In the workshop, R will be the primary software package used to demonstrate the implementations. The provided software codes will be annotated and basic steps will be explained for those who prefer to use other software packages.

Outline: Tentative outline (order may change):
  • an introduction to PSA
  • complex survey data analysis
  • explanation of some of the real-world challenges of applying PSA in a complex survey data analysis context,
  • familiarization with some of the recommendations outlined in the recent literature
  • demonstration of the corresponding PSA implementation strategies through an illustrative data analysis exampleand
  • discussion of resources and future directions.
  • Past workshop participants can contact me directly for any relevant questions (please use the email address via which you registered). The links provided during those workshops (for online viewing and download) are still live. 
  • Link for the slides will be live during the future workshop event.
  • Downloading link for offline viewing will be live during the future event.
Pre-reading: recommend taking a look at the following sections for some background knowledge:
  • "Methods" section of Austin, P. C. (2011). A tutorial and case study in propensity score analysis: an application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivariate behavioral research, 46(1), 119-151. (link)
  • (optional) "Section 4" of Zanutto EL. (2006) A comparison of propensity score and linear regression analysis of complex survey data. Journal of Data Science; 4: 67–91. (link)
Pre-workshop Quiz:
Sample Data Source:
  • National Health and Nutrition Examination Survey (NHANES) cycle 2007-08
  • Sample data in R format (download): object DT9a contains our analytic data example.
  • R logbook of how the analytic data was created (for those who are interested)
Software Requirements: It is assumed that you have the following software packages installed. Workshop does not provide any installation support. Note that, bringing a laptop is not mandatory. But if the participant would like to browse through the workshop slides (as well as check out other materials) in their own laptop, they are welcome to bring a laptop that is adequately charged (enough outlet may not be available).
  • R from CRAN or MRAN (installing either one is fine)
  • RStudio desktop (installation necessary)
  • Signup for an account for an online account (no installation necessary, a supported browser is fine)
Sample Code Chunks: Code of some of the analyses shown in the workshop. Feel free to take a look at them beforehand if you wish. Code for installing required packages are provided at the top of each code chink
  • chunk1 (data download), chunk2 (table 1 & logistic regression fit), chunk3 (basic propensity score matching), chunk4 (basic survey data analysis), chunk5 (basic propensity score weighting)
  • Markdown logfiles for the above code chunks: log1, log2, log3, log4, log5
  • Brainstorming exercise: (optional) After reviewing the above codes of propensity score analysis (chunk3) and survey data analysis (chunk4), can you think of ways to combine both analysis (i.e., applying propensity score analysis in the context of complex surveys, where you may want to incorporate survey features, such as, interview weights, strata and PSU in your propensity score analysis)? We will discuss in details about various options during the workshop.
In-workshop Quiz: 
  • Response link: (will be activated during the workshop if internet available: eduroam network should be available for North American Universities).
  • Get the relevant app if responding from your smartphone. Note that, downloading the app is not necessary; there is a text option available as well for North American cell phone carriers.
Related Web-apps: (optional / not directly related to the workshop)
  • Teaching Propensity scores by example: PS Oracle
  • Teaching Inverse probability weights by example: IPW Oracle