Fall 2006, Call 24370, Wednesday 3:00-5:30 PM, DH210
Yair M. Babad, UH 2403, Phone 312-996-8094, Cell 310431-6729, Fax 312-413-0385
e-mail: ybabad@uic.edu, URL: http://www.uic.edu/~ybabad
Office Hours Wednesday 2:00-3:00 PM
Updated: 8/29/2006 20:09:12
COURSE OBJECTIVE & PHILOSOPHY
One
of the most profound results of the information technology revolution is the
explosion in data and information availability. Effective use of this
information, including the operational use of the information and its use for
prediction, planning and control, is for many organizations a critical need.
This course is devoted to the discovering of meaningful patterns in the data so
that it can be effectively used in business intelligence. Data Mining (DM) is a
user-centric, interactive process that leverages analytical and statistical
technologies and computing power. It is widely used in business, e.g., for
Customer Relationships Management (
This
course is not a course in statistics, nor is it a course in information
technology. It is a survey course of the techniques and tools used to extract
meaning and gleam useful patterns from available data. The objective is to
increase your awareness level to these and make them an indispensable element
of your professional “personality”.
TEXTBOOKS
Two Wiley books by Michael J. A. Berry and Gordon S. Linoff are the required texts for the course: Data Mining Techniques for Marketing, Sales and Customer Support, 2nd Edition (2004, ISBN 0-471-47064-3) [DMT], and Mining the Web: Transforming Customer Data into Customer Value (2001, ISBN 0-471-41609-6) [Web].
A recommended text is Data Mining: Concepts and Techniques, 2nd Edition by J. Han and M. Kamber (2006, Morgan Kaufman Publishers, ISBN 1-55860-901-6) [Han].
To demonstrate some of the discussed techniques, we will use
Clementine, a data mining package by
Other worthwhile resources, would you be interested, include:
For additional resources, look at the DMT web site at www.data-miners.com/companion, and at http://www.data-miners.com/resources/suggested.html.
My web page has PowerPoint presentations
for all the material that I will introduce in class. These summarize the
contents of the textbook, in addition to other material that will be discussed in
class. You are advised to print these presentations (probably with 3 or 6
slides per page, framed, in black and white printing format) prior to class, so
that you can use them in class in lieu of notes. You are responsible for
knowing the contents of these transparencies as well as the textbook’s material
(and of course whatever is discussed in class).
COMMUNICATIONS & PREREQUISITES
I believe that open communications
channels between all of us add significantly to the value of the class. You are
welcome to contact me – preferably via e-mail. In particular, ALL questions and
comments are welcome. All communications between us will use electronic mail.
The assignments and other course materials can be printed out from the World
Wide Web, at my URL given above.
All assignments and other submissions sent to me will have a filename in the format 572_AssignmentDescription_LastName_MMDDYY.extension, where “MMDDYY” is the submission date. Similarly, all e-mail message to me should have as the subject line 572_LastName_SubjectDescription.
The approach taken in this course is
pragmatic, rather than theoretical or technical, with the objective of
increasing your familiarity with the course topics on the one hand, and your
critical understanding of the material on the other. I do not intend to
"read the text in class". Rather, I will emphasize certain issues,
and will respond to your questions. You must read on your own and be familiar
IN ADVANCE OF EACH CLASS with the assigned material as given in the schedule, and
with the class notes available in my web page. The course will be discussion
oriented, with emphasis on discussions geared to the case studies at the end of
each chapter.
A common theme in my courses is the
development of your communications skills and use of available computer
technology and common software tools. Assignments should all be typed (using
computerized office tools) and be professionally presentable; hand-written
assignments will not be graded. Your work must follow the
standards specified in the PRESHINT.DOC
file in my web site. You are expected to submit your work using word-processing
and spreadsheet tools.
All homework will be submitted
electronically via e-mail. It must be in my reader by midnight Monday
preceding the class in which it must be submitted, at the latest. Assignment
due-dates as given above or in class will be strictly adhered to and late
assignments will not be accepted, unless prearranged with me.
Virus infected submissions will be deleted and not graded with no
opportunity for resubmission.
I maintain a web page for this class. To this end, get to my URL listed above, select this class, and you will find yourself in an "announcement file" for this course. This file includes references to related documents, such as this syllabus, homework, and PowerPoint presentation of class material, in addition to the latest announcements related to the class.
The course assumes that different students have different levels of understanding and background of the course's topics, yet we will present the topics at advanced level. Students with little familiarity of the material are expected to prepare themselves to fully understand the material and contribute to course work and discussions. You are always welcome to discuss this (and all other issues) with me.
ASSIGNMENTS, QUIZZES
Assignments will usually be based on the
case studies at the end of the text's chapters, and will be announced in class.
Homework solutions will be discussed in class at the date they are due;
therefore, late submissions of homework assignments will not be accepted. Note
that homework will be based, to a large extent, on material you are supposed to
read for the next class, and will be discussed in class only after you submit
the homework, in order to let you exercise your own judgment and understanding.
There
will be a team-oriented data mining course project. The project will include
data collection and scrubbing, model building, and data analysis and
presentation. The project, and its various segments, will be discussed also
during classes. Each team will provide a final project report, in addition to
intermediate reports at the end of various segments of the project. The last
class will be devoted to presentations of the projects. Following that, a
public presentation will be made to the public and to members of the Center for
Research in Information Management (CRIM) [it is a required element of the
course].
There
will be no exams in this course. Rather, each class session (except the first
one) may include a brief open book quiz, which stress understanding of the
required reading material and the material covered in the last class. This
system allows timely grade progress feedback, and motivates to prepare for each
session (and thus increase the probability of quality participation and getting
the most from the class sessions).
CLASS ATTENDANCE
You are expected to attend all classes, and are responsible for all announcements made in class or in the announcement file. Makeup of quizzes or reports will be given only by approval PRIOR to the quiz or report, except for extreme circumstances. Punctuality is highly regarded; no student, if arriving late, will be given any extra time to complete a quiz, nor will makeup quizzes be offered.
The university's honor code will be adhered to. Submitted reports and homework may from time-to-time be checked for plagiarism. Cheating, plagiarism or copying will result in an automatic failing grade for the problem, quiz, exam or project for all those participating in the cheating or copying, and may lead to a failing grade in the course for all those students who are deemed to have consciously contributed to the cheating. To help you in maintaining the anti-plagiarism policy, you will be required to submit all your homework and reports to TurnItIn, a plagiarism assessment program, from which I will download your homework and reports. Note also that since I will be downloading this material only once a week, you must adhere to the submission timing requirements.
GRADING
Grades
will be based on homework assignments and quizzes (equally weighted, and
possibly dropping the worst assignment and/or quiz), as well as the project.
The homework and quizzes will weigh 50% of the final grade (except that no one
of them, except for the Risk homework which will get 15% of the final grade,
will be given more than 5% of the final grade; if there will be insufficient
number of homework and quizzes the allocation of percentages between the
projects and homework will be adjusted), and the project 50%. Final grades will
be assigned on a curve, and I will exercise my judgment as to the cut points,
as well as to the grading of students who miss or come late to many of the
classes.
Don't nitpick about the grading. Persons who complain will not be rewarded for it; those who have the decency not to complain would deserve the same break. A request to look at one problem leads to re-grading of the whole paper, which often leads to a lower grade.
No "extra credit" opportunities will be offered or
assigned to specific individuals under any circumstances; all students' grades will
be based on the same components - this is an equal opportunity course.
TENTATIVE & APPROXIMATE
COURSE SCHEDULE
(actual schedule will be
determined by class advancement, and changes will be announced)
|
Class
Number |
Class
Date |
Topic |
Chapter (in week topic
started) |
|
1 |
Aug 30 |
Introduction, DM
Applications |
DMT
1, 4; Web 1; Han 1 |
|
2 |
Sep 6 |
DM Methodology, CRISP-DM,
Course Projects |
DMT
2, 3 |
|
3 |
Sep 13 |
Cont. |
|
|
4 |
Sep 20 |
Preparing Data for Mining,
Clementine Overview |
DMT
17; Han 2 |
|
5 |
Sep 27 |
Clementine Overview – cont. Project – Initial
Understanding Presentation |
|
|
6 |
Oct 4 |
DM and
Statistics, Hazard Functions, Survival Analysis |
DMT
5, 12; Web 8 |
|
7 |
Oct 11 |
Hazard Functions, Survival
Analysis – cont. Project – Final
Understanding Presentation Risk Homework (submission
Monday midnight before class 10) |
|
|
8 |
Oct 18 |
Memory
Based Reasoning, Market Basket Analysis, Link Analysis |
DMT
8-10; Han 5 |
|
9 |
Oct 25 |
Decision Trees, Clustering Project – Model
Presentation |
DMT
6, 11; Han 6-7 |
|
10 |
Nov 1 |
Cont. Risk – Presentation |
|
|
11 |
Nov 8 |
Neural Networks, Genetic
Algorithms |
DMT
7, 13 |
|
12 |
Nov 15 |
Mining Stream, Data Series
and Sequences Project – Evaluation
Presentation |
Han
8 |
|
13 |
Nov 22 |
DM and CRM |
DMT
14; Web 7 |
|
14 |
Nov 29 |
Privacy and Societal
Issues, the DM Environment, Putting DM to Work Project – Final Report
submission |
DMT
15, 16, 18; Han 11 |
|
15 |
Dec 6 |
Project Presentations in Class |
|
|
|
Thu, Dec 7 |
Project
presentation in the CRIM Students Projects Exposition |
|
|
*** |
|
*** Exams
Week - No Final Exam *** |
|