Scientific
understanding of the relationship among health factors at the individual
level and those at the level of social and spatial aggregates has been
severely hampered by the lack of analytic tools.
With the widespread availability of those tools, though, it is
clear that an important source of data about health has not produced the
kind of contextual information needed to understand the interplay of
individual and groups.
One
source of contextual information that has not been fully exploited is
aggregation of survey data themselves to higher level sampling units
from which the survey subjects were selected.
Many health surveys on different health topics are conducted in
the same primary sampling units, yet there is little opportunity to link
the sampling unit across sample designs.
For example, the National Health and Nutrition Examination Survey
(NHANES) is conducted in a sample of the National Health Interview
Survey’s (NHIS) primary sampling units.
NHIS data could be used to estimate neighborhood structural
characteristics, such as ease of access to a health care facility at the
PSU level, which then can be related to individual health risk factors
such as cholesterol level. What is lacking are methods to exploit the
existence of coordinated survey design for the measurement of ecological
influences.
The
objective of this project is to use multi-level or hierarchical models
to develop statistical methods and associated software to bring
community and neighborhood foundations of health and development into
analysis of individual health characteristics.
Recent
developments in Bayesian computation (e.g., Markov chain Monte Carlo
methods) have made it possible to apply hierarchical models to both
continuous and categorical outcome data (Geman and Geman, 1984; Gelfand
and Smith, 1991). Further,
Bayesian methods have the desirable property that they can use more of
the information available more efficiently than traditional frequentist
procedures. The proposed
project will use the Bayesian computational framework to combine
information from neighborhood and individual characteristics in a sample
survey to examine individual health outcomes through a set of random
effect hierarchical models. The
random effects estimated from one level of the model are used as
predictors in the next level of the model.
Let
denote an indicator
variable taking the value 1 if the person i
in neighborhood j is below the
federally defined poverty level, based on a detailed assessment in a
large survey such as NHIS. A
random effect logistic regression model may be used to specify the
relationship between
and a set of predictors
such as region or, urban residence, denoted as
, as follows:
Here
the
are random regression
coefficients, the adjusted community-level log odds of being below the
poverty level, and are assumed to be normally distributed with mean 0
and variance
.
is a vector of fixed effect
regression coefficients. Suppose
that
is an individual health
outcome of interest such as blood glucose for subject k from the same
neighborhood, but
is measured in another
survey using the same neighborhoods, or primary sampling units.
A second-stage model regresses this health outcome on the
unobserved random effect (
) and individual-level variables
,
Here
are assumed to be normally
distributed with mean 0 and variance
. The object of the
inference is
, the adjusted effect of the neighborhood characteristic.
Gibbs
sampling and other Markov chain Monte Carlo algorithms will be used to
construct posterior distribution of the parameter of interest,
and the other parameters in
the model. As a first step,
algorithms will be developed for drawing values from first stage of the
model, conditional on the parameters in the second stage of the model
and on the data. Next,
procedures will be developed for drawing values from the posterior
distribution of the parameters of the second stage of the model,
conditional on the first stage parameters and on the data.
These two sampling algorithms will be combined into a single
general-purpose software system to implement the procedure.
The software procedure will be implemented in SAS using
facilities such as the macro language, PROC IML (and interactive matrix
language), and the SAS ASSIST features to present screens that allow
users unfamiliar with the complexities of Bayesian methods, Gibbs
sampling, and Markov chain Monte Carlo methods to specify substantively
suitable models.
The
project will also explore whether “neighborhood” characteristics
estimated using NHIS data may become part of the public use NHANES (or
NSFG) files with suitable random recodes of the primary sampling unit
characteristics to assure confidentiality. Thus, analysts outside of NCHS will have access to the random
effects coefficients representing neighborhood or primary sampling unit
characteristics that would ordinarily be inaccessible.
Publications: