EPSY 514 - Nonparametric Modeling

4 credit hours

Semester: Fall 2010
Time: Monday 5:00-8:00pm
Room: EPASW 3427 (1040 W. Harrison St.)
Office Hours: Tue 2-4pm (EPASW 1034)

Professor: George Karabatsos
Phone: 312-413-1816
E-mail: georgek@uic.edu
CRN: 29638

Course Description:
Nonparametric models can provide accurate methods of data analysis, because they make minimal assumptions about the data-generating process.
This course covers nonparametric models for two related tasks that are central to statistical inference: Density estimation, and Regression.
Nonparametric density estimation
provide a method to infer the (unknown) true population distribution underlying a sample set of data, without making restrictive assumptions about the shape of this true distribution (including restrictive assumptions that the true population distribution is either a normal distribution, has only one mode, is not skewed, etc.).
Nonparametric regression provides a method to estimate the (unknown) true regression curve representing the actual relationship between a covariate and the outcome variable, without making assumptions about the shape of the true curve in the population of subjects (e.g., without making assumptions that the true curve in the population is linear).
Finally, the course also covers Semiparametric Regression, which provides a flexible way to model many predictor variables, while relaxing the assumptions of linear models and generalized linear-mixed models. These assumptions include normally-distributed errors, normally-distributed random effects, and the assumption that the link function is defined by a known (e.g., logistic) cumulative distribution function.
In contrast, parametric models of statistical inference usually make strong assumptions about the true data-generating process. For example, the ANOVA model assumes that the data arise from normal distributions, and the classical regression model assumes that each covariate has a linear relationship with the outcome variable, the random effects are normally distributed, and that the distribution of the errors is normal. While such simplifying assumptions provide mathematical elegance, these assumptions can be incorrect in real practice (e.g., usually, the true distribution of the data is not normal, and the true relationship between two variables is not linear).

Students who complete this course be able to use various flexible statistical models to analyze almost all types of data sets that arise in the social and health sciences. Specifically this course covers nonparametric density estimation, nonparametric regression, and semiparametric regression, using estimation methods based on either the kernel approach, the spline approach, or based on a nonparametric prior distribution assigned to the space of all distributions (e.g., Dirichlet Process priors) to define an infinite-mixture model. Importantly, among the regression models this course covers includes additive and generalized additive models, linear and generalized linear mixed models, and median regression models.

This course places strong emphasis on the practical applications of nonparametric and semiparametric models for the analysis of various data sets, while providing the necessary theoretical background for these models. Data analysis applications will involve the use of appropriate packages available for the R software, including the mgcv package (for generalized additive modeling), and the DPpackage (for Bayesian nonparametric density estimation and regression).
Course readings (listed below) are all available at no cost, and can be downloaded from the UIC library web site.

Prerequisites: At least two graduate courses in statistics.

Background Readings:
Escobar, M.D. (2007). Applied Bayesian Methods. Course Notes On Power point (Many thanks to Michael Escobar for sharing this information).
Gutiérrez-Peña, E., & Walker, S.G. (2005). Statistical Decision Problems and Bayesian Nonparametric Methods. International Statistical Review, 3, 309-330.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd Ed.). New York: Springer-Verlag. (available by Google search)
Müller, P., & Quintana, F.A. (2004). Nonparametric Bayesian Data Analysis. Statistical Science, 19 (1), 95-110.
Parzen, E. (2004). Quantile Probability and Statistical Data Modeling. Statistical Science, 19, 652-662.
Schucany, W.R. (2004). Kernel smoothers: An overview of curve estimators for the first graduate course in nonparametric statistics. Statistical Science, 19(4), 663-675.
Sheather, S.J. (2004). Density estimation. Statistical Science, 19(4), 588-597.
Walker, S.G., Damien, P., Laud, P.W., & Smith, A.F.M. (1999). Bayesian nonparametric Inference for random distributions and related functions.
Journal of the Royal Statistical Society
, Series B, 61, 485-527.
Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99, 673-686.
Wood, S.N. (2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society, Series B, 70, 495-518.

Additional references for course readings are provided in the documentation for the R packages.

Assignments to earn a grade:
-- Two take-home exams (mid-term and final exam) involving applications of nonparametric models to analyze various data sets.
-- One in-class presentation that describes an application of one or more nonparametric models for the analysis of a data set
(a suggested outline for the presentation is provided below, under the course schedule).

COURSE SCHEDULE

Date Topic
Suggested Readings

Aug23

 

Assignments/tasks. Introduction and motivation for Nonparametric Statistical Inference.
Review of Theory of Probability and Statistical Inference

Course Notes

Aug30

Review of Theory of Probability and Statistical Inference (continued)

Course Notes
Sep6 Labor Day, no class.
Sep13

Density Estimation, frequentist approaches.
-- The histogram.
-- Kernel density estimation.
-- Bootstrap approaches to learn the uncertainty of kernel density estimates (Classical bootstrap and Bayesian bootstrap).
-- Comparing densities, cumulative distribution functions, and hazard rates between two or more independent samples.
-- Quantile estimation.
-- Multivariate Density Estimation

Sheather
Parzen
Course notes

Sep20

Density Estimation, hierarchical Bayesian nonparametric approaches.
-- The Dirichlet Process Prior, Dirichlet Process mixtures, and the Pólya Tree prior.
-- Pólya Tree prior.
-- Multivariate Density Estimation

Müller&Quintana
Walker et.al

Sep27

Nonparametric Regression, kernel-based approaches.
-- The Kernel approach.
-- Locally-weighted Scatter-Plot Smoothing (LOESS)
-- Isotonic regression using the pooled-adjacent-violators algorithm.
-- Bootstrap approaches to learn the uncertainty of regression estimates (Classical bootstrap and Bayesian bootstrap).

Schucany
Hastie et al.
Course Notes

Oct4

Nonparametric Regression, spline-based approaches.
-- Generalized additive modeling (frequentist and Bayesian approaches).
-- Regression for a dependent variable that is either continuous-valued, binary-valued (0-1), or count-valued.

Hastie et al.
Wood
Course Notes

Oct11

Semiparametric Regression, Bayesian approaches.
-- Semiparametric generalized linear mixed-effects modeling using Dirichlet Process mixtures.
-- Regression for a dependent variable that is either continuous-valued, binary-valued (0-1), ordinal-valued, or count-valued.
-- Regression using the Dependent Dirichlet Process.

Course Notes
Oct18 Semiparametric Regression, Bayesian approaches.
-- Linear median regression modeling with a mixture of Pólya Trees prior for the error distribution.
--
Applications of the model to longitudinal data analysis, and for meta-analysis.
TAKE-HOME MIDTERM EXAM DUE
Course Notes
Oct25

Semiparametric Regression, the Bayesian hierarchical approach.
-- Linear median regression modeling with a mixture of Pólya Trees prior for the error distribution.

Course Notes

Nov1 Psychometric applications: Infinite mixture Rasch modeling using the Dirichlet Process.  
Nov8 Student Presentations (also, I will discuss other topics)  
Nov15 Student Presentations (also, I will discuss other topics)  
Nov22

Student Presentations (also, I will discuss other topics)

 
Nov29 Student Presentations (also, I will discuss other topics)  
Dec7 TAKE-HOME FINAL EXAM DUE (Exam week)
Please leave exam in my mailbox in Room 3233, or under my office door at Room 1034.
 


Grading Policy:
Of the final grade, the Midterm Exam, the class presentation and the Final Exam is each worth 30% (I can only accept hard copies of the completed exams), and class participation is worth 10%. Final grades will be given out according to the following scale:

A
90% - 100%
B
79% - 89%
C
68% - 78%
D
57% - 67%
F
56% - Lower

Students will spend substantial amounts of time reading, and using the computer.
I can only accept a hard-copy of the completed exams (please, no electronic copies).
Incomplete grades will be considered for students with extenuating circumstances (poor performance on assignments will not be considered in a request for an incomplete).


Outline For Data Analyses Presentation:
The presentation, which should be 25 minutes in length (about 15 Power Point slides; no more), will deal with an application of one or more nonparametric-based models for the analysis of a real data set. The presentation should (at least) include:

INTRODUCTION
-- Describe in detail the substantive problem you will be solving in this research study, and the rationale/theory underpinning the data you will analyze. (10 points)

METHODS -
-- Describe sample characteristics. (5 points)
-- Fully describe the nonparametric model(s) you will use to answer your research questions (using words and mathematical notation),
and include a discussion of the assumptions of your model. (10 points)
-- Describe the parameters will you interpret to answer your research questions. (10 points)

RESULTS - Fully describe the results of your nonparametric model(s). (25 points)

DISCUSSION - What are the implications of the results of your study, and potential future directions with this research? (10 points).


Disability Services:
UIC strives to ensure the accessibility of programs, classes, and services to students with disabilities. Reasonable accommodations can be arranged for students with various types of disabilities, such as documented learning disabilities, vision, or hearing impairments, and emotional or physical disabilities. If you need accommodations for this class, please let your instructor know your needs and he/she will help you obtain the assistance you need in conjunction with the Office of Disability Services (1190 SSB, 413-2183).