The University of Illinois at Chicago 

       Economics 346: Econometrics

 

Helen Roberts

Review Questions for the Midterm and Study Plan

e-mail: hroberts@uic

Phone: (312) 355-0378  (I check messages from home).

These questions are worth 5 points each. Answers in italics.  For 5-point questions, I look for 2-3 sentences, generally.  One to state your answer, and one to explain why.  I give partial credit, so the worst thing you can do is to leave the question blank.  5 points means you gave a great answer, 3-4 means you were basically right but left something out, 1-2 means you wrote something that is correct, but didn’t get the answer.  So write something relevant, even if you aren’t sure you know the answer.  Sometimes, this helps you figure out the answer.

 1.        What should the slope of a regression line through the residuals be?

 The slope should be zero – the regression residuals are supposed (by assumption) to be uncorrelated with each X variable.  The residual plots have an X variable on the horizontal axis and the residuals on the Y axis, so a positive or negative slope would indicate that they are correlated with that X variable, breaking one of the classical assumptions.

2.         True or false and explain: The correlation between 2 variables is the same as a regression between 2 variables.

False (and the T/F part is worth 1 point – 4 points come from the explanations).   When false, you need only give one difference between correlation and regression.  Here are a few (but not all the possibilities):  1)Correlation is symmetric, regression is not, the correlation of X with Y is the same as the correlation of Y with X. Regression gives a different answer if you switch the dependent and independent variables (like in PS 2).  2)Correlation gives you information on the closeness of the relationship between 2 variables and regression gives you the sensitivity of Y to X through the regression equation, as well as information on the closeness through the R-squared value. 3) The R-squared value is used in regression, correlation has r, and for simple regression, r times r is the same as the regression R-squared, but this is not true for a multiple regression. 4) Regression  can handle relationships with multiple factors influencing another variable (our Y) but correlation is just pairs.

For an open-ended question like this, I will give credit for right answers even if t hey are not the answer I was originally thinking of when I wrote the problem.  So don’t worry if you have a different answer from your friend, necessarily.

3.         You have run a regression, with 100 observations, on the relationship between prices paid and quantity bought, and the t-statistic on the slope coefficient is -5.86. What does this tell you?

 That the relationship between price and quantity is statistically significant and different from zero (negative).  This is a demand curve.  Since the t-distribution is similar to the normal distribution, you can use those rules, especially with 100 observations, a large sample.  So 95% of the probability is within 2 standard deviations of the mean, etc.  This t-stat of –5.86 is almost 6 standard deviations away from the supposed mean of zero, so the probability is very small that it could equal zero, certainly smaller than the common significance levels of .05 or .01.

4.         How does the population regression function differ from the sample regression function?

 This is a population versus regression problem. The population regression function is the ideal or “truth” that we are using the regression to discover.  The sample is a part of the population, the piece we have, a subset.  The sample regression function will approach the population regression function as the sample size increases.

These questions are each worth 10 points.  More points, add in a few more sentences.

 

5.         What is R2 and what does it measure? (Include in your answer the relationship of R2 to something else, as well.)

 Here you can give other names for R-squared (coefficient of correlation, square of correlation coefficient in simple regression), formula for R-squared: ratio of explained sum of squares over total sum of squares – you can also call it the ratio of regression sum of squares over total sum of squares – so if you write RSS/TSS, make sure you tell me that the R stands for Regression and not Residual (or that the E stands for Explained and not Error).  Or you can write the formula as 1-((Sum of squared residuals)/TSS).  TSS is the sum of squared (Y-Ymean). R-squared measures the variation in Y explained by changes in the X variables (the “interpret the regression” part relating to R-squared).

Relation to something else could be: correlation coefficient, TSS, RSS, ESS, variance of Y, variance of residuals, how different R-squareds look in a plot (closer/farther from the regression line) and so on.

6.                  Why do we assume that the ordinary least squares estimators are normally distributed, and why do we care?

We may not have a large enough sample to assure that the Central Limit Theorem result, normally-distributed coefficients, will go through.  If the errors are distributed normally, that is enough with even small samples.  We need the normal distribution relationship for our statistical testing, both the t and F tests.  So without the normal distribution, we don’t know the probabilities, and can’t do our tests, or make confidence intervals.

 

7.      Discuss the 6 assumptions underlying the classical linear regression model, method of least squares.

 Discuss means list the assumptions, then give an example or tell how you can identify if they are broken. The assumptions are on page 85.  We discussed breaking them in PS 3, but several can be identified through the residual and data or line fit plots.

Review also the problem set questions and the end of chapter questions. Further study suggestions:

The first 2 end-of-chapter questions and the chapter summary are places I look for ideas for problems.  I will also take problems from the CourseInfo multiple choice midterm pool.  I have tried to cover in class the most important points.  You need to know the formulas for mean, standard deviation, correlation, t statistic, F statistic, R-squared, regression equation, and confidence interval.  So of course you must understand at least these concepts.  Further concepts are in  the first 2 end-of-chapter questions.

 

You are better prepared than you probably feel at this point.  Problem Set 3 and the in-class Rents problem were midterm-style Excel exercises.  The CourseInfo practice exams (extra credit for taking them, also) are help on the multiple choice part, and I will take problems from there for your midterms.  The short answer part has partial credit.  I curve the grades, remember, especially downwards if necessary.

 

What won’t be on the exam:  Excel commands, SAS commands, proofs, formula for adjusting R-squared,  formula for calculating slope or intercept coefficients.

 

I will be watching my e-mail and office phone messages and responding regularly if you have questions. 

Good luck!