Semester: Fall 2010 |
Professor: George Karabatsos |
Course Description:
Nonparametric models can provide accurate methods of data analysis, because
they make minimal assumptions about the data-generating process.
This course covers nonparametric models for two related tasks that are central
to statistical inference: Density estimation, and Regression.
Nonparametric density estimation provide a method to infer the (unknown)
true population distribution underlying a sample set of data, without making
restrictive assumptions about the shape of this true distribution (including
restrictive assumptions that the true population distribution is either a normal
distribution, has only one mode, is not skewed, etc.).
Nonparametric regression provides a method to estimate the (unknown)
true regression curve representing the actual relationship between a covariate
and the outcome variable, without making assumptions about the shape of the
true curve in the population of subjects (e.g., without making assumptions that
the true curve in the population is linear).
Finally, the course also covers Semiparametric Regression, which provides
a flexible way to model many predictor variables, while relaxing the assumptions
of linear models and generalized linear-mixed models. These assumptions include
normally-distributed errors, normally-distributed random effects, and the assumption
that the link function is defined by a known (e.g., logistic) cumulative distribution
function.
In contrast, parametric models of statistical inference usually make strong
assumptions about the true data-generating process. For example, the ANOVA model
assumes that the data arise from normal distributions, and the classical regression
model assumes that each covariate has a linear relationship with the outcome
variable, the random effects are normally distributed, and that the distribution
of the errors is normal. While such simplifying assumptions provide mathematical
elegance, these assumptions can be incorrect in real practice (e.g., usually,
the true distribution of the data is not normal, and the true relationship between
two variables is not linear).
Students who complete this course be able to use various
flexible statistical models to analyze almost all types of data sets that arise in the social and health sciences. Specifically this course covers nonparametric density estimation, nonparametric regression, and semiparametric regression, using estimation methods based on either the kernel approach, the spline approach, or based on a nonparametric prior distribution assigned to the space of all distributions (e.g., Dirichlet
Process priors) to define an infinite-mixture model. Importantly, among the regression models this course covers includes additive and generalized additive models,
linear and generalized linear mixed models, and median regression models.
This course places strong emphasis on the practical applications of nonparametric
and semiparametric models for the analysis of various data sets, while providing the necessary theoretical background for these models. Data analysis applications will involve the use of appropriate packages available for the R software, including the mgcv package (for generalized additive modeling), and the DPpackage (for Bayesian nonparametric density estimation and regression).
Course readings (listed below) are all available at no cost, and can be downloaded from the UIC library web site.
Prerequisites: At least two graduate courses in statistics.
Background Readings:
Escobar, M.D. (2007). Applied Bayesian Methods. Course Notes On Power point
(Many thanks to Michael Escobar for sharing this information).
Gutiérrez-Peña, E., & Walker, S.G. (2005). Statistical Decision
Problems and Bayesian Nonparametric Methods. International Statistical Review,
3, 309-330.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd Ed.). New York: Springer-Verlag. (available by Google search)
Müller, P., & Quintana, F.A. (2004). Nonparametric Bayesian Data Analysis.
Statistical Science, 19 (1), 95-110.
Parzen, E. (2004). Quantile Probability and Statistical Data Modeling. Statistical Science, 19, 652-662.
Schucany, W.R. (2004). Kernel smoothers: An overview of curve estimators for
the first graduate course in nonparametric statistics. Statistical Science,
19(4), 663-675.
Sheather, S.J. (2004). Density estimation. Statistical Science, 19(4),
588-597.
Walker, S.G., Damien, P., Laud, P.W., & Smith, A.F.M. (1999). Bayesian nonparametric
Inference for random distributions and related functions.
Journal of the Royal Statistical Society, Series B, 61, 485-527.
Wood, S.N. (2004) Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99, 673-686.
Wood, S.N. (2008) Fast stable direct fitting and smoothness selection for generalized additive models. Journal of the Royal Statistical Society, Series B, 70, 495-518.
Additional references for course readings are provided in the documentation for the R packages.
Assignments to earn a grade:
-- Two take-home exams (mid-term and final exam) involving applications of nonparametric models to analyze various data sets.
-- One in-class presentation that describes an application of one or more nonparametric models for the analysis of a data set
(a suggested outline for the presentation is provided below, under the course schedule).
COURSE SCHEDULE
| Date | Topic |
Suggested Readings
|
|
Aug23
|
Assignments/tasks. Introduction and motivation
for Nonparametric Statistical Inference. Review of Theory of Probability and Statistical Inference |
Course Notes |
|
Aug30 |
Review of Theory of Probability and Statistical Inference (continued) |
Course
Notes
|
| Sep6 | Labor Day, no class. |
|
| Sep13 |
Density Estimation, frequentist approaches. |
Sheather |
|
Sep20 |
Density Estimation, hierarchical Bayesian
nonparametric approaches. |
Müller&Quintana
Walker et.al |
|
Sep27 |
Nonparametric Regression, kernel-based approaches. -- The Kernel approach. -- Locally-weighted Scatter-Plot Smoothing (LOESS) -- Isotonic regression using the pooled-adjacent-violators algorithm. -- Bootstrap approaches to learn the uncertainty of regression estimates (Classical bootstrap and Bayesian bootstrap). |
Schucany |
| Oct4 |
Nonparametric Regression, spline-based approaches. |
Hastie et al. |
| Oct11 |
Semiparametric Regression, Bayesian approaches. |
Course Notes
|
| Oct18 | Semiparametric Regression, Bayesian approaches. -- Linear median regression modeling with a mixture of Pólya Trees prior for the error distribution. -- Applications of the model to longitudinal data analysis, and for meta-analysis. TAKE-HOME MIDTERM EXAM DUE |
Course Notes
|
| Oct25 |
Semiparametric Regression, the Bayesian hierarchical approach. |
Course Notes |
| Nov1 | Psychometric applications: Infinite mixture Rasch modeling using the Dirichlet Process. | |
| Nov8 | Student Presentations (also, I will discuss other topics) | |
| Nov15 | Student Presentations (also, I will discuss other topics) | |
| Nov22 |
Student Presentations (also, I will discuss other topics) |
|
| Nov29 | Student Presentations (also, I will discuss other topics) | |
| Dec7 | TAKE-HOME FINAL EXAM DUE (Exam week) Please leave exam in my mailbox in Room 3233, or under my office door at Room 1034. |
Grading Policy:
Of the final grade, the Midterm Exam, the class presentation and the Final Exam is each worth 30% (I can only accept hard copies of the completed exams), and class participation is worth 10%. Final grades will be given
out according to the following scale:
| A |
90% - 100%
|
| B |
79% - 89%
|
| C |
68% - 78%
|
| D |
57% - 67%
|
| F |
56% - Lower
|
Students will spend substantial amounts of time reading, and using the computer.
I can only accept a hard-copy of the completed exams (please, no electronic
copies).
Incomplete grades will be considered for students with extenuating circumstances
(poor performance on assignments will not be considered in a request for an
incomplete).
Outline For Data Analyses Presentation:
The presentation, which should be 25 minutes in length (about 15 Power Point slides; no more), will deal with an application of one or more nonparametric-based models for the analysis of a real data set. The presentation should (at least) include:
INTRODUCTION
-- Describe in detail the substantive problem you will be solving in this research study,
and the rationale/theory underpinning the data you will analyze. (10 points)
METHODS -
-- Describe sample characteristics. (5 points)
-- Fully describe the nonparametric model(s) you will use to answer your research questions
(using words and mathematical notation),
and include a discussion of the assumptions of your model. (10 points)
-- Describe the parameters will you interpret to answer your research questions. (10 points)
RESULTS - Fully describe the results of your nonparametric model(s). (25 points)
DISCUSSION - What are the implications of the results of your study, and potential future directions with this research? (10 points).
Disability Services:
UIC strives to ensure the accessibility of programs, classes, and services to
students with disabilities. Reasonable accommodations can be arranged for students
with various types of disabilities, such as documented learning disabilities,
vision, or hearing impairments, and emotional or physical disabilities. If you
need accommodations for this class, please let your instructor know your needs
and he/she will help you obtain the assistance you need in conjunction with
the Office of Disability Services (1190 SSB, 413-2183).