EPSY 546 - Educational Measurement

4 Credit Hours
CRN: 10855

Professor: George Karabatsos
E-mail: georgek@uic.edu
Phone: 312-413-1816

Semester: Fall 2011
Class Time: Monday 5:00-8:00pm
Rooms: 3427 EPASW 1040 W. Harrison St.
Computer lab: Room 2027 EPASW
Office Hours: Monday 2-4 (EPASW 1034)


Course Description:
This course teaches Psychometrics, the practice that aims to construct scales for the measurement of psychological traits (e.g., ability in an examination, or attitudes), as they manifest from responses on a set of multiple-choice test items, rating-scale items, or judges' ratings of persons who perform on various tasks. In particular, the course will cover classical and contemporary methods to psychometric analysis, for analysis of multiple-choice and rating-scale test items.

Methods include parametric and semiparametric approaches to Rasch modeling, models of Item Response Theory (IRT), exploratory factor analysis, confirmatory factor analysis, kernel regression approaches to IRT, Hierarchical Linear Model (HLM) approaches to psychometric modeling, classical test theory including reliability analysis, extended reliability analysis with generalizability theory, methods for equating examinee scores from different tests (Given a score on Test X, what is the equivalent score on Test Y?), methods for analyzing person fit (which test respondents are giving aberrant item responses due to cheating, lucky-guessing, carelessness, etc.), and methods for analyzing item fit (which items contain surprising responses, because of poor wording of the item content, the irrelevance of the item in terms of what the test intends to measure, etc.).

Many real practical examples will be drawn primarily from the fields of education, psychology, and health care. Using the appropriate software for psychometric data analysis, we will work through many practical examples in class, and this in-class work will count as credit toward the midterm exam.

While this course focuses primarily on practical applications, this focus will not be made at the sacrifice of rigor. In particular, students who take this course will also learn the basic ideas of reliability and test validity, the key properties and characteristics of various psychometric models, and the (maximum likelihood and Bayesian) approaches to estimating the parameters of such models. These concepts will be taught so that students become fully aware of what they are doing, when applying psychometric methods for the analysis of data. Still, the course will not require an extensive mathematical background.

Prerequisite: Any introduction to statistics course, or equivalents, or consent.

Readings and Software:
Suggested readings are listed as "Relevant References" within the COURSE SCHEDULE, below.

COURSE SCHEDULE

Date Topic

Aug22

 

The four scales of measurement (nominal, ordinal, interval, and ratio scales).
Test reliability, test validity.
What is a psychometric model?
-- The item response function (IRF), the item-step response function (ISRF), and the item category-response function.
-- The three properties of all psychometric models (unidimensionality, local independence, monotonicity of the IRF/ISRF).
-- Invariant item ordering.
Relevant References:
Boorsboom, D., & Mellenbergh, G.J. (2004). The concept of validity. Psychological Review, 11, 1061-1071.
Kline, P. (1993). Reliability of tests: Practical issues. In Ch 1, The Handbook of Psychological Testing, 5-15.
Messick, S. (1995). Validity of Psychological Assessment. American Psychologist, 50, 741-749.

Aug29

Kernel regression analysis of multiple-choice and rating-scale items.
-- Estimating the Item Response Function (IRF), the Item-Step Response Function (ISRF), and the category response function, from real data.
-- Investigating the unidimensionality of the measurement scale (i.e., investigating the monotonicity of each IRF/ISRF).
-- Estimating the abilities of each test respondent, and the easiness (difficulty) of each test item.
-- Investigating person fit: Did any test respondent give aberrant item responses, due to cheating, lucky-guessing, carelessness, etc.?
-- Analyzing the reliability of the test. -- Comparing distributions of test scores using density estimation.
-- Investigating Item bias (i.e., investigating Differential Item Functioning (DIF)).
Relevant Reference:
Ramsay, J.O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56(4), 611-630.
Sijtsma, K., & Molenaar, I.W. (2002). Introduction to Nonparametric IRT. Thousand Oaks, CA: Sage.
Sep5 No Class. Labor Day.
Sep12

Rasch models for binary item scores.
-- The definition of the Rasch model (for dichotomous item scores).
-- Rasch model implies strict forms of IRF monotonicity, and invariant item ordering (parallel IRFs).
-- The specific objectivity property of the Rasch model.
-- Estimating the person ability and item difficulty parameters of the Rasch model (maximum likelihood method; marginal maximum likelihood method).
-- Investigating item fit and person fit. -- Analyzing the reliability of the test.
Bond, T., & Fox, C.M. (2007). Applying the Rasch Model: Fundamental Measurement in the Human Sciences, Second Edition. Lawrence Erlbaum.

Sep19

Rasch models for the analysis of rating scales items, and the analysis of judge ratings.
-- Rasch rating scale model, Rasch partial credit model, and the (FACETS) Rasch model for judge ratings.
-- Rasch model implies strict forms of ISRF monotonicity, and invariant item ordering (parallel ISRFs).
-- Estimating the person ability and item difficulty parameters of the Rasch model (maximum likelihood method; marginal maximum likelihood method).
-- Investigating item fit and person fit. -- Analyzing the reliability of the test.

Sep26

Item Response Theory Models.
Dichotomous item scores: 2-parameter logistic model, 3-parameter logistic model, Rasch model with guessing parameter.
Polytomous item scores: graded response models, generalized partial credit models.
Embretson, S., & Reise, S.P. (2000). Item Response Theory for Psychologists. Lawrence Erlbaum.

Oct3 Hierarchical Linear Models
-- Any Rasch model is a (special) Hierarchical Linear Model.
-- (Rasch) analysis of test items, rating scales, and judge ratings
-- Investigating Item Bias (Differential Item Functioning),
-- Comparing test performance across different groups of respondents.
-- Incorporating additional predictor variables in psychometric analysis.
Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.). Newbury Park, CA: Sage. (especially Chapters 10 and 11, pp. 365-371).
Raudenbush, S.W., & Bryk, A.S., Cheong, Y.F., & Congdon. R.T. (2004). HLM 6: Hierarchical Linear and Nonlinear Modeling. Lincolnwood, IL: Scientific Software International.

Oct10

Hierarchical Linear Models (continued)
Bayesian semiparametric inference of Rasch and IRT models
Kleinman K.P., & Ibrahim, J.G. (1998b). A semi-parametric Bayesian approach to generalized linear mixed models. Statistics In Medicine, 17, 2579-2596.
-- Illustrative applications of HLM on real data.

Oct17

Exploratory Factor analysis of test items.
Kline, P. (1993). An easy guide to factor analysis. Routledge.
MIDTERM EXAM IS DUE.

Oct24

Confirmatory Factor analysis of test items.
Kline, P. (1993). An easy guide to factor analysis. Routledge.
Oct31 Generalizability Theory: A comprehensive approach to reliability analysis.
Brennan, R. (2001). Generalizability Theory. New York: Springer.
Shavelson, R., & Webb, N. (1991). Generalizability Theory: A Primer. Sage Publications.
Nov7 Equating Test Scores: Given a score on Test X, what is the equivalent score on Test Y?
-- Equating designs.
-- Methods of score equating under various designs.
-- Rasch item equating.
Livingston, S.A. (2004). Equating Test Scores (without IRT). Princeton: Educational Testing Service.
Karabatsos, G., and Walker, S.G. (2009). A Bayesian nonparametric approach to test equating. Psychometrika.
Computer adaptive testing (CAT), Item banking, and Standard Setting.
Cizek, GJ (1996). Setting passing scores. Educational Measurement: Issues and Practice, 15, 20-31.
Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50.
Meijer, R.R., & Nering, M.L. (1999). Computer adaptive testing: Overview and introduction. In: Applied Psychological Measurement, 23, 3, 187-194. Special Issue on Computerized adaptive testing.
Ward, A.W., & Murray-Ward, M. (1994). Guidelines for the development of item banks. An NCME instructional module. Educational Measurement: Issues and Practice, 13 (1), 34-39.

Nov14

Student Presentations of final paper

Nov21

Student Presentations of final paper

Nov28 Student Presentations of final paper
Dec5 FINAL PAPER DUE (Exam week)
Please leave paper in my mailbox in Room 3233, or under my office door at Room 1034.

Grading Policy:
The final grade is based on the performance on the Midterm exam (40% of final grade), a data analysis presentation and paper (50% of final grade), and class participation (10% of final grade; includes attendance and contributions to in-class discussion).
Final grades will be given out according to the following grading scale:

A
90% - 100%
B
79% - 89%
C
68% - 78%
D
57% - 67%
F
56% - Lower

Students will spend substantial amounts of time reading, and on the computer. It is assumed that students will exert individual initiative in solving computing/analysis problems as they arise.
I can only accept hard-copies of the completed exam and completed paper (please, no electronic copies).


ASSIGNMENTS:
A) Mid-Term: Computer-Based (Take-Home) Exam (40% of total grade)
B) Data Analysis Presentation (25% of total grade)
C) Data Analysis Paper (25% of total grade)

A. Computer-Based Exam (40% total):
You will be tested on your ability to perform psychometric analyses of real data sets, and answer questions concerning the interpretation of these analyses.

B,C. Data Analysis Presentation and Paper
-- The data analyses and paper will consist of the relevant output from the software programs and a complete report stating the results.
-- You may supply your own data or you may solicit faculty (education or other) for data.
-- The paper must be 10-15 double spaced-pages, using 1-inch margins, and in APA format (computer generated output must be
placed in the Appendix, and is not part of the 10-15 page limit).
-- The presentation has a limit of 25 minutes (about 15 PowerPoint slides).

Both the presentation and paper must include:

Introduction -
Describe in detail the substantive problem you will be solving in this research study,
and describe the rationale/theory underpinning the data you will analyze (5 points).

Methods - (not necessarily in the following order).
-- Describe sample characteristics (5 points).
-- Describe the items on your test(s) (including their number and scoring format) (5 points).
-- Describe the unidimensional variable(s) you intend to measure with the test(s) (5 points).
-- For data analysis, use one or more psychometric models. (5 points)
-- Fully describe the model(s) you are using (15 points).
-- Fully describe the methods you will use to investigate the unidimensionality,
reliability, validity, and (possibly) item bias of each of your test(s) (15 points).
-- Also, if you intend to equate test scores, fully describe the equating methods you will implement
(use either equipercentile equating, Rasch item equating, or both)

Results - (not necessarily in the following order).
-- Discuss the amount of evidence for unidimensionality (10 points), reliability (10 points) and validity of your test(s) (10 points),
and justify any modifications you make to your test (removing items, removing persons, etc…).

Discussion - (not necessarily in the following order).
-- What modifications (if any) would improve the instrument? (3 points)
-- What are the implications of your study, with respect to the measurement and applications in the field of interest? (3 points)

I will deduct points from each section if you incorrectly interpret your results,
fail to report/describe or fail to fully report/describe any of the information we have covered in class that is relevant to your particular investigation. Please provide appropriate handouts and develop meaningful overheads for your presentation.

Disability Services:
UIC strives to ensure the accessibility of programs, classes, and services to students with disabilities. Reasonable accommodations can be arranged for students with various types of disabilities, such as documented learning disabilities, vision, or hearing impairments, and emotional or physical disabilities. If you need accommodations for this class, please let your instructor know your needs and he/she will help you obtain the assistance you need in conjunction with the Office of Disability Services (1190 SSB, 413-2183).