Images of U I C U I C logo and link
0

TURNOCK READING ROOM


Assessing the Reliability of Public Health Performance Measures

Bernard J. Turnock MD, MPH; UIC School of Public Health; 2001


Introduction

The overall aim of this project was to examine the reliability of measures of local public health performance (Local Instrument) that were under development by the National Public Health Performance Standards Program, Public Health Practice Program Office, Centers for Disease Control and Prevention (CDC) beginning in 1998. The assessment of reliability reported here was based on field-testing of a panel of public health performance measures that was conducted in the states of Minnesota and Mississippi early in the year 2000. A previous panel of these performance measures had been assessed as to their validity as part of a statewide pilot test conducted in Florida in 1999.

The Local Instrument included questions that were organized in a complex hierarchical structure with several major measures identified for each of the ten essential public health services. Each essential public health service was operationally described by 2-7 major measures; these measures were presented in narrative form as a model performance standard. There were 33 such first order measures for which respondents could indicate whether or not (yes/no) the standard was met. For each of these 33 major measures, two additional questions were included, asking the extent to which the local public system and the local public health agency met needs associated with that standard. These 99 questions and responses were the focus of this reliability assessment.

Early on it was apparent that reliability testing of these public health performance measures faced serious methodological challenges. Reliability commonly refers to the extent to which the results obtained from a measurement tool or instrument can be relied upon. That is, if the same variable were to be measured in the same circumstances over and over again, would the results differ? A somewhat more formal definition views reliability as the degree of consistency with which an instrument measures an attribute or the ability of an instrument or indicator to produce similar scores on repeated testing occasions that occur under similar circumstances.

The performance measures in the Local Instrument under development by CDC were intended for eventual deployment in a self-assessment and quality improvement process that was to be part of the new Mobilizing Action for Planning and Partnerships (MAPP) initiatives. However, the early field-testing of these measures in various states (including Texas , Florida , Hawaii , Missouri , Minnesota , and Mississippi ) did not generally replicate the circumstances in which the final panel of measures would be used. Field test sites did not, for example, use a broadly participatory community-driven process that would examine each measure and then assess and report its status. The variability of the field-testing circumstances also precluded a repeat measurement, such that the most common form of reliability testing, a test-retest approach, would not be appropriate. Concerns over learning bias also argued against a test-retest strategy. Repeated use of the same test is a widely accepted test of the stability of a measure or instrument. 

Other forms of reliability testing examine the internal consistency and equivalence of an instrument. Common tests for internal consistency (such as Cronbach's alpha, Kuder-Richardspn formula, and split-half analysis) examine whether sub-sets of an entire instrument correlate with the overall score. These tests were not appropriate for the Public Health Performance Standards Local Instrument since widely varying concepts, embodied by the ten essential public health services, were measured in different parts of the instrument. Equivalence is commonly examined by using alternate forms of the same instrument and by inter-rater reliability. Alternate forms would also be subject to considerable learning bias, and inter-rater reliability would require that the reporter (or reporters) be equally familiar with performance of the concepts being measured. These circumstances could not be controlled for in this assessment.

As a result of these considerations, direct assessment of the reliability of responses related to local public health performance measures in the Minnesota and Mississippi field tests was not possible.  No statistical tests exist which test reliability within such a set of measures. In order to ascertain approximate reliability within these sets of measures, steps were taken to identify patterns of responses to examine whether or not there exists a consistent relationship between a yes or no response and a scaled response for the same performance standard.  By utilizing this approach, although ungrounded in any sound scientific base, we attempted to ascertain whether a local public health system's or local public health agency's response to a yes/no question is correlated with their response to a scaled question measuring the performance of a particular standard.  By such an analysis, certain predictions may be made, however, there is no credible base in which to “scientifically” examine whether the results are indeed significant or not, seriously weakening the credibility of such conclusions and predictions.   

Study Objectives and Results

In the end, it was not possible to go beyond a simple examination of the relationship between yes/no responses for key public health performance measures and scaled responses related to the same concepts. There were 33 major performance measures for which a yes/no response was elicited. For each of these measures, two additional questions asked the extent the local public health system and the local public health agency achieved that standard (1= not at all or minimally; 2 = partially; 3 = substantially; 4 = fully or almost fully).

Appendix 1 suggests that a Yes response, meaning that a particular performance standard was achieved, was associated with a mean scaled score of 2.82 for Minnesota local health jurisdictions and 2.87 for Mississippi local health jurisdictions (out of a possible 4.00) for local public health system achievement of that standard. The range for means was 1.82 to 3.78 for the local health jurisdictions in these two states. A Yes response for local public health achievement of a standard was associated with a mean scaled score of 2.70 for Minnesota local health jurisdictions and 2.72 for local health jurisdictions in Mississippi with the overall range between 1.36 and 3.83.

A No response, meaning that a performance standard was not achieved, was associated with a mean scaled score of 1.49 in Minnesota and 1.41 in Mississippi for local public health system achievement with an overall range between 1.00 and 2.50.  For local public health agency achievement of a standard, a No response was associated with a mean scaled score of 1.42 in Minnesota and 1.37 in Mississippi (range 1.00 to 3.00). The greater variability for No responses may be related to the smaller number of overall No responses than Yes responses for the 33 standards that are the focus of this assessment.

In general, Yes responses coincided with "partially, "substantially" and "fully or almost fully" responses (i.e., scaled responses 2, 3, and 4). The mean scaled scores (2.82 and 2.87 for local public health systems and 2.70 and 2.72 for local public health agencies) fell between "partially" and "substantially".

No responses coincided with "not at all or minimally," "partially" and "substantially" with the mean scaled scores (1.49 and 1.41 for local public health systems and 1.42 and 1.37 for local public health agencies) lying between "not at all or minimally" and "partially."

One potential measure of the reliability of a Yes or No answer is the proportion of responses in which Yes and "not at all or minimally" (scaled score = 1) or No and "fully or almost fully" (scaled score = 4) are found. Appendix 2 provides this information. It is interesting to note that 11 percent of Yes responses were associated with "not at all or minimally" responses; there were also many No responses that were accompanied by "substantially" met scaled scores. However, there were only 3 instances (0.3 percent) in which No responses were associated with a "fully or almost fully" met scaled response. (Appendix 2)

Discussion

There appears to be a fairly consistent relationship between a Yes or No response and the scaled ratings for local public health systems and local public health agencies in the field-testing of the Local Instrument in these two states.  Responses of Yes to a performance standard were more likely to be linked to a higher scaled rating. This finding appears consistent across counties, both urban and rural. This finding was also consistent in counties that had more no responses than yes responses (e.g., Tunica, Mississippi ).  Also noted was the consistency in responses across counties when responding to the same question.  Counties who answered yes to a specific question were all more likely to give a higher numerical score, likewise with no responses.  This relationship was also consistent across all ten EPHS.  For each of the EPHS, a yes response predicted a higher scaled rating score.

Limitations

Many limitations exist within this study, most importantly, the lack of scientific basis in which this assessment was performed.  However, given that no scientific test exists for this type of analysis, this assessment was conducted in order to approximate reliability. This examination focused on performance measures that were included in the version of the Local Instrument field tested in Minnesota and Mississippi in early 2000. The instrument has been modified substantially since that time.

Conclusions

Reliability of the measures and instrument used to assess local public health practice performance could only be estimated by this project. Future efforts to assess reliability should be undertaken after the instruments are finalized and can be administered in a more controlled set of circumstances using a test-retest strategy for reliability assessment.

 

Back to U I C home button

Copyright © 2003 The Board of Trustees of the University of Illinois
Contact the webmaster

 

 Back to U I C home button

Turnock Home

Turnock CV

Reading Room  

Courses


CHSC 400

Prepare Center