The University of Illinois at Chicago
Economics 346: Econometrics
Helen Roberts
Review Questions for the Midterm and Study
Plan
e-mail: hroberts@uic
Phone: (312) 355-0378 (I check messages from home).
These questions are worth 5 points each. Answers
in italics. For 5-point questions, I
look for 2-3 sentences, generally. One
to state your answer, and one to explain why.
I give partial credit, so the worst thing you can do is to leave the question
blank. 5 points means you gave a great
answer, 3-4 means you were basically right but left something out, 1-2 means
you wrote something that is correct, but didn’t get the answer. So write something relevant, even if you
aren’t sure you know the answer.
Sometimes, this helps you figure out the answer.
1. What
should the slope of a regression line through the residuals be?
The slope
should be zero – the regression residuals are supposed (by assumption) to be
uncorrelated with each X variable. The
residual plots have an X variable on the horizontal axis and the residuals on
the Y axis, so a positive or negative slope would indicate that they are
correlated with that X variable, breaking one of the classical assumptions.
2. True or false and explain: The
correlation between 2 variables is the same as a regression between 2
variables.
False (and the
T/F part is worth 1 point – 4 points come from the explanations). When
false, you need only give one difference between correlation and
regression. Here are a few (but not all
the possibilities): 1)Correlation is
symmetric, regression is not, the correlation of X with Y is the same as the
correlation of Y with X. Regression gives a different answer if you switch the
dependent and independent variables (like in PS 2). 2)Correlation gives you information on the closeness of the
relationship between 2 variables and regression gives you the sensitivity of Y
to X through the regression equation, as well as information on the closeness
through the R-squared value. 3) The R-squared value is used in regression,
correlation has r, and for simple regression, r times r is the same as the
regression R-squared, but this is not true for a multiple regression. 4)
Regression can handle relationships
with multiple factors influencing another variable (our Y) but correlation is
just pairs.
For an
open-ended question like this, I will give credit for right answers even if t
hey are not the answer I was originally thinking of when I wrote the
problem. So don’t worry if you have a
different answer from your friend, necessarily.
3. You have run a regression, with 100
observations, on the relationship between prices paid and quantity bought, and
the t-statistic on the slope coefficient is -5.86. What does this tell you?
That the
relationship between price and quantity is statistically significant and
different from zero (negative). This is
a demand curve. Since the
t-distribution is similar to the normal distribution, you can use those rules,
especially with 100 observations, a large sample. So 95% of the probability is within 2 standard deviations of the
mean, etc. This t-stat of –5.86 is
almost 6 standard deviations away from the supposed mean of zero, so the
probability is very small that it could equal zero, certainly smaller than the
common significance levels of .05 or .01.
4. How does the population regression
function differ from the sample regression function?
This is a
population versus regression problem. The population regression function is the
ideal or “truth” that we are using the regression to discover. The sample is a part of the population, the
piece we have, a subset. The sample
regression function will approach the population regression function as the
sample size increases.
These questions are each worth 10 points. More points, add in a few more sentences.
5. What is R2 and what does it
measure? (Include in your answer the relationship of R2 to something
else, as well.)
Here you
can give other names for R-squared (coefficient of correlation, square of correlation
coefficient in simple regression), formula for R-squared: ratio of explained
sum of squares over total sum of squares – you can also call it the ratio of
regression sum of squares over total sum of squares – so if you write RSS/TSS,
make sure you tell me that the R stands for Regression and not Residual (or
that the E stands for Explained and not Error). Or you can write the formula as 1-((Sum of squared
residuals)/TSS). TSS is the sum of squared
(Y-Ymean). R-squared measures the variation in Y explained by changes in the X
variables (the “interpret the regression” part relating to R-squared).
Relation to
something else could be: correlation coefficient, TSS, RSS, ESS, variance of Y,
variance of residuals, how different R-squareds look in a plot (closer/farther
from the regression line) and so on.
6.
Why do we assume that
the ordinary least squares estimators are normally distributed, and why do we
care?
We may not have
a large enough sample to assure that the Central Limit Theorem result,
normally-distributed coefficients, will go through. If the errors are distributed normally, that is enough with even
small samples. We need the normal
distribution relationship for our statistical testing, both the t and F
tests. So without the normal
distribution, we don’t know the probabilities, and can’t do our tests, or make
confidence intervals.
7. Discuss the 6 assumptions underlying the classical
linear regression model, method of least squares.
Discuss means list the assumptions,
then give an example or tell how you can identify if they are broken. The
assumptions are on page 85. We
discussed breaking them in PS 3, but several can be identified through the
residual and data or line fit plots.
Review also the problem set questions and
the end of chapter questions. Further study suggestions:
The first 2 end-of-chapter questions and
the chapter summary are places I look for ideas for problems. I will also take problems from the
CourseInfo multiple choice midterm pool.
I have tried to cover in class the most important points. You need to know the formulas for mean,
standard deviation, correlation, t statistic, F statistic, R-squared,
regression equation, and confidence interval.
So of course you must understand at least these concepts. Further concepts are in the first 2 end-of-chapter questions.
You are better prepared than you probably
feel at this point. Problem Set 3 and the
in-class Rents problem were midterm-style Excel exercises. The CourseInfo practice exams (extra credit
for taking them, also) are help on the multiple choice part, and I will take problems
from there for your midterms. The short
answer part has partial credit. I curve
the grades, remember, especially downwards if necessary.
What won’t be on the exam: Excel commands, SAS commands, proofs,
formula for adjusting R-squared,
formula for calculating slope or intercept coefficients.
I will be watching my e-mail and office
phone messages and responding regularly if you have questions.
Good luck!