EPSY 546 - Educational Measurement

4 Credit Hours

Semester: Fall 2008
Professor: George Karabatsos
Time: Tuesday 5:00-8:00pm Phone: 312-413-1816
Room: TBA E-mail: georgek@uic.edu
Office Hours: Mon 2-4
(EPASW 1034)
CRN: 10855


Course Description
:
This course teaches Psychometrics, the practice that aims to establish scales for the measurement of psychological traits (e.g., ability in an examination, or attitudes),
as they manifest from responses on a set of multiple-choice test items, rating-scale items, or judges' ratings of persons who perform on various tasks.

In particular, the course will cover contemporary approaches to psychometric analysis, for:
(1) The nonparametric regression analysis of multiple-choice and rating-scale test items.
(2) Rasch model analysis (using the Winsteps, FACETS, and WinBugs software for data analysis).
(3) Hierarchical Linear Model (HLM) analysis, as a practical extension of Rasch model analysis (using the HLM software).
(4) The analysis of test reliability.
(5) The measurement of person ability, and the measurement of item difficulty.
(6) Comapring two or more distributions of test scores, through the use of density estimation, and through the use of HLM.
(7) The analysis of Item bias and invariant item ordering.
(8) Equating scores from different tests (Given a score on Test X, what is the equivalent score on Test Y?).
(9) The analysis of person fit (Which test respondents are giving aberrant item responses due to cheating, lucky-guessing, carelessness, etc.?)
(10) Cognitive modeling with Multinomial Processing Tree models (using the S-Plus software)

Many real practical examples will be drawn primarily from the fields of education, psychology, and health care.
Using the appropriate software for psychometric data analysis, we will work through many practical examples in class,
and this in-class work will count as credit toward the midterm exam.

Also, students who take this course will receive, free of charge, computer programs
for psychometric analysis, to perform tasks 1,4,5,6,7,8,9,10, listed above.
(These programs are written by the instructor, and are user-friendly).

While this course focuses primarily on practical applications, this focus will not be done at the sacrifice of rigor
(i.e., this course is not just about how to use psychometric software).
In particular, students who take this course will also learn:
(a) The precise definition of a psychometric model;
(b) The key properties and characteristics of various psychometric models;
(c) The (maximum likelihood and Bayesian) approaches to estimating the parameters of such models;
(d) The techniques that underly approaches to person fit analysis, item bias analysis, test equating, and comapring distirbutions of test scores.
These four aspects will be communicated in a straightforward manner (so as not to
require students to have an extensive mathematical background).
These aspects are taught so students become fully aware of what they are doing when applying
psychometric methods to the analysis of real data.

Prerequisite: Any introduction to statistics course, or equivalents, or consent.

Readings and Software:
Suggested readings are listed as "Relevant References" within the COURSE SCHEDULE, below.
The software for psychometric data analysis include R, HLM, Winsteps, FACETS, and S-Plus. Either student or full versions of the software
can be easily dowloaded by clicking the links (given in the previous sentence), though we will use the full versions of the software in class.

 

COURSE SCHEDULE

Date Topic

Aug28

 

The four scales of measurement (nominal, ordinal, interval, and ratio scales).
Test reliability, test validity.
What is a psychometric model?
-- The item response function (IRF), the item-step response function (ISRF), and the item category-response function.
-- The three properties of all psychometric models (unidimensionality, local independence, monotonicity of the IRF/ISRF).
-- Invariant item ordering.
Course Notes; Theory & Methods
Relevant References:
Boorsboom, D., Mellenbergh, G.J. (2004). The concept of validity. Psychological Review, 11, 1061-1071.
Kline, P. (1993). Reliability of tests: Practical issues. In Ch 1, The Handbook of Psychological Testing, 5-15.
Messick, S. (1995). Validity of Psychological Assessment. American Psychologist, 50, 741-749.
Sijtsma, K., & Molenaar, I.W. (2002). Introduction to Nonparametric IRT. Thousand Oaks, CA: Sage.

Sep4

Nonparametric (kernel) regression for the analysis of multiple-choice and rating-scale items.
-- Estimating the Item Response Function (IRF), the Item-Step Response Function (ISRF), and the category response function, from real data.
-- Investigating the unidimensionality of the measurement scale (i.e., investigating the monotonicity of each IRF/ISRF).
-- Estimating the abilities of each test respondent, and the easiness (difficulty) of each test item.
-- Analyzing the reliability of the test.
Relevant Reference:
Ramsay, J.O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56(4), 611-630.
Sep11

Nonparametric regression for the analysis of multiple-choice and rating-scale items (continued).
-- Investigating Invariant Item Ordering.
-- Investigating person fit: Did any test respondent give aberrant item responses, due to cheating, lucky-guessing, carelessness, etc.?
-- Comparing distributions of test scores using density estimation.
-- Investigating Item bias (i.e., investigating Differential Item Functioning (DIF)).

Sep18

Nonparametric regression for the analysis of multiple-choice and rating-scale items (continued).
-- More illustrative applications of nonparametric regression on real test data.

Sep25

Rasch models for binary item scores.
-- The definition of the Rasch model (for dichotomous item scores).
-- The model implies strict monotonicity of the IRF, and strict invariant item ordering (i.e., parallel IRFs).
-- The specific objectivity property of the Rasch model.
-- Estimating the person ability and item difficulty parameters of the Rasch model (maximum likelihood method; marginal maximum likelihood method).
-- Investigating item fit and person fit.
-- Analyzing the reliability of the test.

Oct2 Rasch models for the analysis of rating scales items, and the analysis of judge ratings.
-- The definition of the Rasch rating scale model, Rasch partial credit model, and the (FACETS) Rasch model for judge ratings.
-- The model implies strict monotonicity of the ISRF, and strict invariant item ordering (i.e., parallel ISRFs).
-- Estimating the person ability and item difficulty parameters of the Rasch model (maximum likelihood method; marginal maximum likelihood method).
-- Investigating item fit and person fit.
-- Analyzing the reliability of the test.

Oct9

Hierarchical Linear Models
-- Any Rasch model is a (special) Hierarchical Linear Model.
-- (Rasch) analysis of test items, rating scales, and judge ratings
-- Investigating Item Bias (Differential Item Functioning),
-- Comparing test performance across different groups of respondents.
-- Incorporating additional predictor variables in psychometric analysis.
Raudenbush, S.W., & Bryk, A.S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.). Newbury Park, CA: Sage.
(especially Chapter 10, and Chapter 11, pp. 365-371).
Raudenbush, S.W., & Bryk, A.S., Cheong, Y.F., & Congdon. R.T. (2004). HLM 6: Hierarchical Linear and Nonlinear Modeling. Lincolnwood, IL: Scientific Software International.

Oct16

Hierarchical Linear Models (continued)
-- More illustrative applications of HLM on real data.
MIDTERM EXAM IS DUE.

Oct23

Hierarchical Linear Models (continued)
Oct30 Equating Test Scores: Given a score on Test X, what is the equivalent score on Test Y?
-- The equipercentile approach to test equating (and using the bootstrap to infer the 95% probability interval of an equated score).
-- Rasch item equating.
Readings: Livingston, S.A. (2004). Equating Test Scores (without IRT). Princeton: Educational Testing Service.
Nov6 Bayesian inference of Rasch and IRT models (WinBugs files: lsat.odc, bones.odc)

Nov13

Bayesian inference of Cognitive models
-- Multinomial Processing Tree (MPT) models.
Relevant Reference:

Batchelder, W.H. & Riefer, D.M. (1999). Theoretical and empirical review of multinomial process tree modeling. Psychonomic Bulletin & Review, 6, 57-86.

Nov20

 

Computer adaptive testing (CAT), Item banking, and Standard Setting.
Relevant References:
Cizek, GJ (1996). Setting passing scores. Educational Measurement: Issues and Practice, 15, 20-31.
Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement:
Issues and Practice
, 23(4), 31-50.
Meijer, R.R., & Nering, M.L. (1999). Computer adaptive testing: Overview and introduction. In: Applied Psychological Measurement,
23
, 3, 187-194. Special Issue on Computerized adaptive testing.
Ward, A.W., & Murray-Ward, M. (1994). Guidelines for the development of item banks. An NCME instructional module.
Educational Measurement: Issues and Practice
, 13 (1), 34-39.

Nov27 Student Presentations of final paper
Dec4 Student Presentations of final paper
Dec12 FINAL PAPER DUE (Exam week)
Please leave paper in my mailbox in Room 3233, or under my office door at Room 1034.

Grading Policy:
The (take-home) Midterm Exam is worth 40% of the final grade, and the data analysis presentation and paper is worth the remaining 60%. Final grades will be given out according to the following scale:

A
90% - 100%
B
79% - 89%
C
68% - 78%
D
57% - 67%
F
56% - Lower

Students will spend substantial amounts of time reading, and on the computer. It is assumed that students will exert individual initiative in solving computing/analysis problems as they arise. There are no exceptions to the above grading scale, and no extra credit work will be accepted. Incompletes will be considered for students with extenuating circumstances. Poor performance on assignments will not be considered in a request for an incomplete.

ASSIGNMENTS:
A) Mid-Term: Computer-Based (Take-Home) Exam (40% of total grade)
B) Data Analysis Presentation (30% of total grade)
C) Data Analysis Paper (30% of total grade)

A. Computer-Based Exam (40% total):
You will be tested on your ability to use R, WINSTEPS, FACETS, SPSS, and S-PLUS to perform psychometric analyses of real data sets, and answer questions concerning the interpretation of these analyses.

B,C. Data Analysis Presentation (30%) and Paper (30%)
-- The data analyses and paper will consist of the relevant output from the software programs and a complete report stating the results.
-- You may supply your own data or you may solicit faculty (education or other) for data.
-- The paper must be 10-15 double spaced-pages, using 1-inch margins, and in APA format (computer generated output must be
placed in the Appendix, and is not part of the 10-15 page limit).
-- The presentation has a limit of 20 minutes (about 15 PowerPoint slides).

Both the presentation and paper must include:

Introduction -
Describe in detail the substantive problem you will be solving in this research study,
and describe the rationale/theory underpinning the data you will analyze (5 points).

Methods - (not necessarily in the following order).
-- Describe sample characteristics (3 points).
-- Describe the items on your test(s) (including their number and scoring format) (3 points).
-- Describe the unidimensional variable(s) you intend to measure with the test(s) (3 points).
-- For data analysis, use two of following psychometric models:
(1) Nonparametric regression model.
(2) Hierarchical Linear Model. For example, the Rasch model or the FACETS model.
-- Fully describe these models (15 points).
-- Fully describe the methods you will use to investigate the unidimensionality,
invariant item ordering, reliability, and item bias of each of your test(s) (15 points).
-- Also, if you intend to equate test scores, fully desribe the
equating methods you will implement
(use either equipercentile equating, Rasch item equating, or both)

Results - (not necessarily in the following order).
-- Discuss the amount of evidence for unidimensionality (10 points), reliability (10 points) and validity of your test(s) (10 points),
and justify any modifications you make to your test (removing items, removing persons, etc…).

Discussion - (not necessarily in the following order).
-- What modifications (if any) would improve the instrument? (3 points)
-- What are the implications of your study, with respect to the measurement and applications in the field of interest? (3 points)

Everyone starts with 80 points. I will deduct points from each section if you incorrectly interpret your results,
fail to report/describe or fail to fully report/describe any of the information we have covered in class that is relevant to your particular investigation.
Please provide appropriate handouts and develop meaningful overheads for your presentation.

Disability Services:
UIC strives to ensure the accessibility of programs, classes, and services to students with disabilities. Reasonable accommodations can be arranged for students
with various types of disabilities, such as documented learning disabilities, vision, or hearing impairments, and emotional or physical disabilities. If you need
accommodations for this class, please let your instructor know your needs and he/she will help you obtain the assistance you need in conjunction with the
Office of Disability Services (1190 SSB, 413-2183).