|
TURNOCK READING ROOM
Assessing
the Reliability of Public
Health Performance Measures
Bernard J. Turnock MD, MPH; UIC
School
of Public Health; 2001
Introduction
The
overall aim of this project was to examine the reliability of measures of local
public health performance (Local Instrument) that were under development by the
National Public Health Performance Standards Program, Public Health Practice
Program Office, Centers for Disease Control and Prevention (CDC) beginning in
1998. The assessment of reliability reported here was based on field-testing of
a panel of public health performance measures that was conducted in the states
of
Minnesota
and
Mississippi
early in the year 2000. A previous panel of
these performance measures had been assessed as to their validity as part of a
statewide pilot test conducted in
Florida
in 1999.
The
Local Instrument included questions that were organized in a complex
hierarchical structure with several major measures identified for each of the
ten essential public health services. Each essential public health service was
operationally described by 2-7 major measures; these measures were presented in
narrative form as a model performance standard. There were 33 such first order
measures for which respondents could indicate whether or not (yes/no) the
standard was met. For each of these 33 major measures, two additional questions
were included, asking the extent to which the local public system and the local
public health agency met needs associated with that standard. These 99 questions
and responses were the focus of this reliability assessment.
Early
on it was apparent that reliability testing of these public health performance
measures faced serious methodological challenges. Reliability commonly refers to
the extent to which the results obtained from a measurement tool or instrument
can be relied upon. That is, if the same variable were to be measured in the
same circumstances over and over again, would the results differ? A somewhat
more formal definition views reliability as the degree of consistency with which
an instrument measures an attribute or the ability of an instrument or indicator
to produce similar scores on repeated testing occasions that occur under similar
circumstances.
The
performance measures in the Local Instrument under development by CDC were
intended for eventual deployment in a self-assessment and quality improvement
process that was to be part of the new Mobilizing Action for Planning and
Partnerships (MAPP) initiatives. However, the early field-testing of these
measures in various states (including
Texas
,
Florida
,
Hawaii
,
Missouri
,
Minnesota
, and
Mississippi
) did not generally replicate the
circumstances in which the final panel of measures would be used. Field test
sites did not, for example, use a broadly participatory community-driven process
that would examine each measure and then assess and report its status. The
variability of the field-testing circumstances also precluded a repeat
measurement, such that the most common form of reliability testing, a
test-retest approach, would not be appropriate. Concerns over learning bias also
argued against a test-retest strategy. Repeated use of the same test is a widely
accepted test of the stability of a measure or instrument.
Other
forms of reliability testing examine the internal consistency and equivalence of
an instrument. Common tests for internal consistency (such as Cronbach's alpha,
Kuder-Richardspn formula, and split-half analysis) examine whether sub-sets of
an entire instrument correlate with the overall score. These tests were not
appropriate for the Public Health Performance Standards Local Instrument since
widely varying concepts, embodied by the ten essential public health services,
were measured in different parts of the instrument. Equivalence is commonly
examined by using alternate forms of the same instrument and by inter-rater
reliability. Alternate forms would also be subject to considerable learning
bias, and inter-rater reliability would require that the reporter (or reporters)
be equally familiar with performance of the concepts being measured. These
circumstances could not be controlled for in this assessment.
As
a result of these considerations, direct assessment of the reliability of
responses related to local public health performance measures in the
Minnesota
and
Mississippi
field tests was not possible.
No statistical tests exist which test reliability within such a set of
measures. In order to ascertain approximate reliability within these sets of
measures, steps were taken to identify patterns of responses to examine whether
or not there exists a consistent relationship between a yes or no response and a
scaled response for the same performance standard.
By utilizing this approach, although ungrounded in any sound scientific
base, we attempted to ascertain whether a local public health system's or local
public health agency's response to a yes/no question is correlated with their
response to a scaled question measuring the performance of a particular
standard. By such an analysis,
certain predictions may be made, however, there is no credible base in which to
“scientifically” examine whether the results are indeed significant or not,
seriously weakening the credibility of such conclusions and predictions.
Study
Objectives and Results
In
the end, it was not possible to go beyond a simple examination of the
relationship between yes/no responses for key public health performance measures
and scaled responses related to the same concepts. There were 33 major
performance measures for which a yes/no response was elicited. For each of these
measures, two additional questions asked the extent the local public health
system and the local public health agency achieved that standard (1= not at all
or minimally; 2 = partially; 3 = substantially; 4 = fully or almost fully).
Appendix
1 suggests that a Yes response, meaning that a particular performance standard
was achieved, was associated with a mean scaled score of 2.82 for Minnesota
local health jurisdictions and 2.87 for Mississippi local health jurisdictions
(out of a possible 4.00) for local public health system achievement of that
standard. The range for means was 1.82 to 3.78 for the local health
jurisdictions in these two states. A Yes response for local public health
achievement of a standard was associated with a mean scaled score of 2.70 for
Minnesota
local health jurisdictions and 2.72 for local
health jurisdictions in
Mississippi
with the overall range between 1.36 and 3.83.
A
No response, meaning that a performance standard was not achieved, was
associated with a mean scaled score of 1.49 in
Minnesota
and 1.41 in
Mississippi
for local public health system achievement
with an overall range between 1.00 and 2.50.
For local public health agency achievement of a standard, a No response
was associated with a mean scaled score of 1.42 in
Minnesota
and 1.37 in
Mississippi
(range 1.00 to 3.00). The greater variability
for No responses may be related to the smaller number of overall No responses
than Yes responses for the 33 standards that are the focus of this assessment.
In
general, Yes responses coincided with "partially, "substantially"
and "fully or almost fully" responses (i.e., scaled responses 2, 3,
and 4). The mean scaled scores (2.82 and 2.87 for local public health systems
and 2.70 and 2.72 for local public health agencies) fell between
"partially" and "substantially".
No
responses coincided with "not at all or minimally,"
"partially" and "substantially" with the mean scaled scores
(1.49 and 1.41 for local public health systems and 1.42 and 1.37 for local
public health agencies) lying between "not at all or minimally" and
"partially."
One
potential measure of the reliability of a Yes or No answer is the proportion of
responses in which Yes and "not at all or minimally" (scaled score =
1) or No and "fully or almost fully" (scaled score = 4) are found.
Appendix 2 provides this information. It is interesting to note that 11 percent
of Yes responses were associated with "not at all or minimally"
responses; there were also many No responses that were accompanied by
"substantially" met scaled scores. However, there were only 3
instances (0.3 percent) in which No responses were associated with a "fully
or almost fully" met scaled response. (Appendix 2)
Discussion
There
appears to be a fairly consistent relationship between a Yes or No response and
the scaled ratings for local public health systems and local public health
agencies in the field-testing of the Local Instrument in these two states.
Responses of Yes to a performance standard were more likely to be linked
to a higher scaled rating. This finding appears consistent across counties, both
urban and rural. This finding was also consistent in counties that had more no
responses than yes responses (e.g., Tunica,
Mississippi
). Also
noted was the consistency in responses across counties when responding to the
same question. Counties who
answered yes to a specific question were all more likely to give a higher
numerical score, likewise with no responses.
This relationship was also consistent across all ten EPHS.
For each of the EPHS, a yes response predicted a higher scaled rating
score.
Limitations
Many
limitations exist within this study, most importantly, the lack of scientific
basis in which this assessment was performed.
However, given that no scientific test exists for this type of analysis,
this assessment was conducted in order to approximate reliability. This
examination focused on performance measures that were included in the version of
the Local Instrument field tested in
Minnesota
and
Mississippi
in early 2000. The instrument has been
modified substantially since that time.
Conclusions
Reliability
of the measures and instrument used to assess local public health practice
performance could only be estimated by this project. Future efforts to assess
reliability should be undertaken after the instruments are finalized and can be
administered in a more controlled set of circumstances using a test-retest
strategy for reliability assessment.
|