TU Wien:Econometrics for Business Informatics VU (Schneider)/Midterm (2022 S)
(10 points) Assume that you are given data value for for the dependent variable. Now consider a simple linear regression model with just an intercept and no explanatory variable, that is, [Bearbeiten | Quelltext bearbeiten]
(a) What is k in this model?[Bearbeiten | Quelltext bearbeiten]
In this simple linear regression model with just an intercept (no explanatory variable), represents the number of parameters in the model. In this case, since there is only the intercept , .
(b) Putting this in the notation of the multiple linear regression model, waht exactly does the regressor matrix look like for this model? What are its dimensions?[Bearbeiten | Quelltext bearbeiten]
For this model, the regressor matrix consists solely of a column of ones, because there is only an intercept and no other explanatory variables. Each element corresponds to the intercept term for each observation. Hence, if there are observations, the matrix will have dimensions and look like this
(c) Compute the LS estimator for this model. Be careful about the dimensions of this object. (You can use the general formula for the LS estimator in the multiple linear model.)[Bearbeiten | Quelltext bearbeiten]
The least squares (LS) estimator for the intercept in a model without any explanatory variables (other than the intercept) is given by the formula:
with and we can simplify the term to this
(d) Define the appropriate coefficient of determination you would us for this model.[Bearbeiten | Quelltext bearbeiten]
Because there are no explanatory variables adjusted wouldn't make any sense
(e) (extra credit 1 pt) Can you compute the value of the coefficient of dtermination here? What is your explanation for the outcome?[Bearbeiten | Quelltext bearbeiten]
Since and ther term from d) can be simplified like
(10 points) Consider the (population) multiple linear regression model; ;Explain what each quantity is and specify its dimension. What is the sample equivalent to this model? Make sure that you properly define each term. Which properties of the term involved in you sample model do you know?[Bearbeiten | Quelltext bearbeiten]
- is the dependent variable vector of dimension , where is the number of observation.
- is the matrix of explanatory variables, including a column of ones for the intercept, with dimension where is the number of the explanatory variables plus the intercept.
- is the parameter vector of dimensions .
- is the vector of error terms of dimensions .
Properties
- Linearity in parameters.
- Random sampling.
- No perfect multicollinearity.
- Expectation of the error term is zero conditional on the regressors.
- Homoscedasticity of errors.
(10 points) Consider a data set with observations of wage (wage, average hourly earnings in $), education (educ, in years) and experience (exper, in years) of individuals. Regressing the variable wage on the variables educ and exper and then the variable log wage on educ and exper gave the following output in R:[Bearbeiten | Quelltext bearbeiten]
---------------------------------------------------------------------------------------------- Call: lm(formula = wage ~ educ + exper, data = wagedata) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.39054 0.76657 -4.423 1.18e-05 *** educ 0.64427 0.05381 11.974 < 2e-16 *** exper 0.07010 0.01098 6.385 3.78e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.257 on 523 degrees of freedom Multiple R-squared: 0.2252, Adjusted R-squared: 0.2222 F-statistic: 75.99 on 2 and 523 DF, p-value: < 2.2e-16 ---------------------------------------------------------------------------------------------- Call: lm(formula = log(wage) ~ educ + exper, data = wagedata) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.216854 0.108595 1.977 0.0464 * educ 0.097936 0.007622 12.848 < 2e-16 *** exper 0.010347 0.001555 6.653 7.24e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.4614 on 523 degrees of freedom Multiple R-squared: 0.2493, Adjusted R-squared: 0.2465 F-statistic: 86.86 on 2 and 523 DF, p-value: < 2.2e-16 ----------------------------------------------------------------------------------------------
(a) Which one of the two models give the better fit? Why?[Bearbeiten | Quelltext bearbeiten]
The model with the dependent variable as log(wage) gives a better fit based on a higher value (0.2493 compared to 0.2252 in the wage model). This implies that the logarithmic transformation of the wage variable explains a greater proportion of the variance in wages compared to the linear model. It can generally be stated that the closer the value of is to 1, the better it is.
(b) Looking at the first regression: How much more salary (wage) can a person expect if he or she increases his or her education by one year?[Bearbeiten | Quelltext bearbeiten]
From the first regression, the coefficient for educ is 0.64427. This implies that for each additional year of education, the expected increase in wage is approximately $0.64427 per hour, holding other factors constant.
(c) Looking at the second regression: How can you interpret the coefficient educ in this fit quantitatively? If a person currently earns 10$ an hour, roughly how much salary can he or she expect with one addition year of education?[Bearbeiten | Quelltext bearbeiten]
In the second regression, where log(wage) is regressed on educ and exper, the coefficient for educ is again 0.097936. However, the interpretation differs because the dependent variable is the natural logarithm of wage. The coefficient of educ in this context means that each additional year of education is associated with an expected increase in wage by a factor of
For a person earning $10 per hour:
- The logarithm of $10 is approximately 2.3026.
- Adding 0.097936 to this gives 2.400536
- Converting back from the log scale () gives approximately $11.02.