Workshops‎ > ‎

Workshop 1

Propensity Score Analysis in the Context of Complex Survey Data

Researchers and statisticians frequently use propensity score analyses (PSA) to analyze observational datasets and reduce the impact of confounding due to observed covariates. In many of these applied studies, nationally representative population-based complex survey datasets are frequently used. Most of these studies incorrectly choose to ignore the complex survey design features; partly because there is a lack of clear guidelines of how PSA should be implemented in a complex survey data analysis context. Only a few relatively recent studies have examined how to incorporate PSA in this context, and some of these recommendations are contradictory, inconclusive, or not generalizable to all types of PSA. This workshop will help recognize some of the challenges and open questions in the ‘big data’ analysis setting. The workshop is aimed at practitioners and is particularly focused on demonstrating the implementation of PSA in a complex survey data analysis context through an illustrative data analysis exercise.


Background in causal inference or survey data analysis is not required. Attendees should have prerequisite knowledge of multiple regression analysis and working knowledge in R (e.g., basic data manipulation and regression fitting). In the workshop, R will be the primary software package used to demonstrate the implementations. The provided software codes will be annotated and basic steps will be explained for those who prefer to use other software packages.

  • Live links:
    • Link for the slides will be live during the workshop event.
    • Past workshop participants can contact me directly for any relevant questions (please use the email address via which you registered). The links provided during those workshops (for online viewing and download) are still live.
Pre-reading: recommend taking a look at the following sections for some background knowledge:
  • "Methods" section of Austin, P. C. (2011). A tutorial and case study in propensity score analysis: an application to estimating the effect of in-hospital smoking cessation counseling on mortality. Multivariate behavioral research, 46(1), 119-151. (link)
  • (optional) "Section 4" of Zanutto EL. (2006) A comparison of propensity score and linear regression analysis of complex survey data. Journal of Data Science; 4: 67–91. (link)
Pre-workshop Quiz:
Sample Data Source:
  • National Health and Nutrition Examination Survey (NHANES) cycle 2007-08
  • Sample data in R format (download): object DT9a contains our analytic data example.
  • How was the analytic data created? (for those who are interested)
  • Estimates shown in the slide (see the Rmarkdown file for the exact same dataset as shown in the slides) and the estimates in the markdown/Kaggle notebook files may vary. The analyses shown in the markdown/Kaggle notebook files are using a slightly more complex eligibility criteria. See the 'Sample Code Chunks' below.
Software Requirements: It is assumed that you have the following software packages installed. Workshop does not provide any installation support. Note that, bringing a laptop is not mandatory. But if the participant would like to browse through the workshop slides (as well as check out other materials) in their own laptop, they are welcome to bring a laptop that is adequately charged (enough outlet may not be available).
  • R from CRAN or MRAN (installing either one is fine)
  • RStudio desktop (installation necessary)
  • Online accounts (no installation necessary, a supported browser is fine) 
Sample Code Chunks: Code of some of the analyses shown in the workshop. Feel free to take a look at them beforehand if you wish. Code for installing required packages are provided at the top of each code chink
  • Codes/Markdowns
    • chunk1 (data download), 
      • Markdown logfile for the above code chunk: log1
    • chunk2 (table 1 & logistic regression fit), 
    • chunk3 (basic propensity score matching), 
      • Markdown logfile for the above code chunk: log3
      • Kaggle notebook of the above code chunk
      • Note: In matching, estimates may vary due to randomness. Set seed.
    • chunk4 (basic survey data analysis), 
    • chunk5 (basic propensity score weighting)
  • Brainstorming exercise: (optional) After reviewing the above codes of propensity score analysis (chunk3) and survey data analysis (chunk4), can you think of ways to combine both analysis (i.e., applying propensity score analysis in the context of complex surveys, where you may want to incorporate survey features, such as, interview weights, strata and PSU in your propensity score analysis)? We will discuss in details about various options during the workshop.
In-workshop Quiz: 
  • Response link: (will be activated during the workshop if internet available: eduroam network should be available for North American Universities).
  • Get the relevant app if responding from your smartphone. Note that, downloading the app is not necessary; there is a text option available as well for North American cell phone carriers.