Modeling multiple-response categorical data from complex surveys

Christopher R Bilder, Thomas M. Loughin

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Although choose all that apply questions are common in modern surveys, methods for analyzing associations among responses to such questions have only recently been developed. These methods are generally valid only for simple random sampling, but these types of questions often appear in surveys conducted under more complex sampling plans. The purpose of this article is to provide statistical analysis methods that can be applied to choose all that apply questions in complex survey sampling situations. Loglinear models are developed to incorporate the multiple responses inherent in these types of questions. Statistics to compare models and to measure association are proposed and their asymptotic distributions are derived. Monte Carlo simulations show that tests based on adjusted Pearson statistics generally hold their correct size when comparing models. These simulations also show that confidence intervals for odds ratios estimated from loglinear models have good coverage properties, while being shorter than those constructed using empirical estimates. Furthermore, the methods are shown to be applicable to more general problems of modeling associations between elements of two or more binary vectors. The proposed analysis methods are applied to data from the National Health and Nutrition Examination Survey.

Original languageEnglish (US)
Pages (from-to)553-570
Number of pages18
JournalCanadian Journal of Statistics
Volume37
Issue number4
DOIs
StatePublished - Dec 1 2009

Fingerprint

Multiple Responses
Nominal or categorical data
Modeling
Log-linear Models
Choose
Association Measure
Statistics
Survey Sampling
Simple Random Sampling
Nutrition
Odds Ratio
Asymptotic distribution
Statistical Analysis
Confidence interval
Health
Coverage
Monte Carlo Simulation
Categorical data
Valid
Binary

Keywords

  • Choose all that apply
  • Correlated binary data
  • Loglinear model
  • NHANES
  • Pearson statistic
  • Pick any/c
  • Rao-scott adjustments

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Modeling multiple-response categorical data from complex surveys. / Bilder, Christopher R; Loughin, Thomas M.

In: Canadian Journal of Statistics, Vol. 37, No. 4, 01.12.2009, p. 553-570.

Research output: Contribution to journalArticle

@article{67c68fad6ab24267bac9f59a51d3aa6a,
title = "Modeling multiple-response categorical data from complex surveys",
abstract = "Although choose all that apply questions are common in modern surveys, methods for analyzing associations among responses to such questions have only recently been developed. These methods are generally valid only for simple random sampling, but these types of questions often appear in surveys conducted under more complex sampling plans. The purpose of this article is to provide statistical analysis methods that can be applied to choose all that apply questions in complex survey sampling situations. Loglinear models are developed to incorporate the multiple responses inherent in these types of questions. Statistics to compare models and to measure association are proposed and their asymptotic distributions are derived. Monte Carlo simulations show that tests based on adjusted Pearson statistics generally hold their correct size when comparing models. These simulations also show that confidence intervals for odds ratios estimated from loglinear models have good coverage properties, while being shorter than those constructed using empirical estimates. Furthermore, the methods are shown to be applicable to more general problems of modeling associations between elements of two or more binary vectors. The proposed analysis methods are applied to data from the National Health and Nutrition Examination Survey.",
keywords = "Choose all that apply, Correlated binary data, Loglinear model, NHANES, Pearson statistic, Pick any/c, Rao-scott adjustments",
author = "Bilder, {Christopher R} and Loughin, {Thomas M.}",
year = "2009",
month = "12",
day = "1",
doi = "10.1002/cjs.10040",
language = "English (US)",
volume = "37",
pages = "553--570",
journal = "Canadian Journal of Statistics",
issn = "0319-5724",
publisher = "Statistical Society of Canada",
number = "4",

}

TY - JOUR

T1 - Modeling multiple-response categorical data from complex surveys

AU - Bilder, Christopher R

AU - Loughin, Thomas M.

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Although choose all that apply questions are common in modern surveys, methods for analyzing associations among responses to such questions have only recently been developed. These methods are generally valid only for simple random sampling, but these types of questions often appear in surveys conducted under more complex sampling plans. The purpose of this article is to provide statistical analysis methods that can be applied to choose all that apply questions in complex survey sampling situations. Loglinear models are developed to incorporate the multiple responses inherent in these types of questions. Statistics to compare models and to measure association are proposed and their asymptotic distributions are derived. Monte Carlo simulations show that tests based on adjusted Pearson statistics generally hold their correct size when comparing models. These simulations also show that confidence intervals for odds ratios estimated from loglinear models have good coverage properties, while being shorter than those constructed using empirical estimates. Furthermore, the methods are shown to be applicable to more general problems of modeling associations between elements of two or more binary vectors. The proposed analysis methods are applied to data from the National Health and Nutrition Examination Survey.

AB - Although choose all that apply questions are common in modern surveys, methods for analyzing associations among responses to such questions have only recently been developed. These methods are generally valid only for simple random sampling, but these types of questions often appear in surveys conducted under more complex sampling plans. The purpose of this article is to provide statistical analysis methods that can be applied to choose all that apply questions in complex survey sampling situations. Loglinear models are developed to incorporate the multiple responses inherent in these types of questions. Statistics to compare models and to measure association are proposed and their asymptotic distributions are derived. Monte Carlo simulations show that tests based on adjusted Pearson statistics generally hold their correct size when comparing models. These simulations also show that confidence intervals for odds ratios estimated from loglinear models have good coverage properties, while being shorter than those constructed using empirical estimates. Furthermore, the methods are shown to be applicable to more general problems of modeling associations between elements of two or more binary vectors. The proposed analysis methods are applied to data from the National Health and Nutrition Examination Survey.

KW - Choose all that apply

KW - Correlated binary data

KW - Loglinear model

KW - NHANES

KW - Pearson statistic

KW - Pick any/c

KW - Rao-scott adjustments

UR - http://www.scopus.com/inward/record.url?scp=74049131265&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74049131265&partnerID=8YFLogxK

U2 - 10.1002/cjs.10040

DO - 10.1002/cjs.10040

M3 - Article

VL - 37

SP - 553

EP - 570

JO - Canadian Journal of Statistics

JF - Canadian Journal of Statistics

SN - 0319-5724

IS - 4

ER -