Panel Data Analysis Homework 1 - jose.fajardo · Panel Data Analysis Homework 1 Prf. José Fajardo...

5
Panel Data Analysis Homework 1 Prf. José Fajardo (FGV/EBAPE) 1. Using two datasets time_var.dta and time_invar.dta, consider the following wage equation: lwage it = α 0 + α 1 ability i + α 2 medu i + α 3 fedu i + α 4 d i + α 5 siblings i + β 1 ed it + β 2 pexp it + ε it Notice that all the α coefficients are associated with time-invariant cross section data, while β are with time-variant panel data series. a) Formulate, estimate, and compare the pooled or populationaveraged based on OLS and OLS with panel-robust standard errors, respectively. In addition to pooled model, three different variable transformations should be considered and compared: (1) first-difference, (2) between (or group means), and (3) within (or deviations from group means). Note: not all coefficients can be estimated for all models. Why? b) Formulate, estimate, and compare the fixed-effects and randomeffects panel data models based on OLS and OLS with panelrobust standard errors, respectively. Setup and perform hypothesis testings to choose a proper panel data model: (1) pool or not to pool? (2) fixed-effects or random-effects? If you are interested in the original paper below, read this, but we are not attempting to replicate their results (see also Joshua C. C. Chan, "Replication of the Results in 'Learning about Heterogeneity in Returns to Schooling'", Journal of Applied Econometrics, Vol. 20. No. 3, 2005, pp. 439-443.) [Koop, G. and J. Tobias, "Learning About Heterogeneity in Returns to Schooling." Journal of Applied Econometrics, 19, 2004, 827-849]

Transcript of Panel Data Analysis Homework 1 - jose.fajardo · Panel Data Analysis Homework 1 Prf. José Fajardo...

Page 1: Panel Data Analysis Homework 1 - jose.fajardo · Panel Data Analysis Homework 1 Prf. José Fajardo (FGV/EBAPE) 1. ... Formulate, estimate, ...

Panel Data Analysis Homework 1

Prf. José Fajardo (FGV/EBAPE)

1. Using two datasets time_var.dta and time_invar.dta, consider the following wage equation: lwageit = α0 + α1abilityi + α2medui + α3fedui + α4di + α5siblingsi + β1edit + β2pexpit + εit Notice that all the α coefficients are associated with time-invariant cross section data, while β are with time-variant panel data series. a) Formulate, estimate, and compare the pooled or populationaveraged based on OLS and OLS with panel-robust standard errors, respectively. In addition to pooled model, three different variable transformations should be considered and compared: (1) first-difference, (2) between (or group means), and (3) within (or deviations from group means). Note: not all coefficients can be estimated for all models. Why? b) Formulate, estimate, and compare the fixed-effects and randomeffects panel data models based on OLS and OLS with panelrobust standard errors, respectively. Setup and perform hypothesis testings to choose a proper panel data model: (1) pool or not to pool? (2) fixed-effects or random-effects? If you are interested in the original paper below, read this, but we are not attempting to replicate their results (see also Joshua C. C. Chan, "Replication of the Results in 'Learning about Heterogeneity in Returns to Schooling'", Journal of Applied Econometrics, Vol. 20. No. 3, 2005, pp. 439-443.) [Koop, G. and J. Tobias, "Learning About Heterogeneity in Returns to Schooling." Journal of Applied Econometrics, 19, 2004, 827-849]

Page 2: Panel Data Analysis Homework 1 - jose.fajardo · Panel Data Analysis Homework 1 Prf. José Fajardo (FGV/EBAPE) 1. ... Formulate, estimate, ...

2. The data in the file produtivity.txt are a panel on the following variables for the lower 48 states, 17 years, STATE = state name YR = year, 1970,...,1986 P_CAP = public capital HWY = highway capital WATER = water utility capital UTIL = utility capital PC = private capital GSP = gross state product EMP = employment UNEMP = unemployment rate The basic model of interest is Yit = 1X1it + 2X2it + 3X3it + 4X4it + β5X5it + ci + it Where Y is logGSP, X1 is logPC, X2 is logHWY, X3 is logWATER, X4 is logUTIL and X5 is logEMP. This is a Cobb-Douglas production function. a) Fit the “pooled” model and report your results b) Fit a random effects model and a fixed effects model. Use your model results to decide which is the preferable model. If you find that neither panel data model is preferred to the pooled model, show how you reached that conclusion [Munnell, A. "Why Has Productivity Declined? Productivity and Public Investment." New England Economic Review, 1990, 3-22.]

Page 3: Panel Data Analysis Homework 1 - jose.fajardo · Panel Data Analysis Homework 1 Prf. José Fajardo (FGV/EBAPE) 1. ... Formulate, estimate, ...

3. Charitable Contributions We analyze individual income tax returns data from the 1979-1988 Statistics of Income (SOI) Panel of Individual Returns. The SOI Panel is a subset of the IRS Individual Tax Model File (charity.txt) and represents a simple random sample of individual income tax returns filed each year. Based on the individual returns data, the goal is to investigate whether a taxpayer's marginal tax rate affects private charitable contributions, and secondly, if the tax revenue losses due to charitable contributions deductions is less than the gain of charitable organizations. To address these issues, we consider a price and income model of charitable contributions, considered by Banerjee and Frees (1997). The latter define price as the complement of an individual's federal marginal tax rate, using taxable income prior to contributions. Income of an individual is defined as the adjusted gross income. The dependent variable is total charitable contributions, which is measured as the sum of cash and other property contributions, excluding carry overs from previous years. Other covariates included in the model are age, marital status and the number of dependents of an individual taxpayer. Age is adichotomous variable representing whether a taxpayer is over sixty four years or not. Similarly, marital status represents if an individual is married or single. The population consists of all U.S. taxpayers who itemize their deductions. Specifically, these are the individuals who are likely to have and to record charitable contribution deductions in a given year. Among the 1,413 taxpayers in our subset of the SOI Panel, approximately 22% itemized their deductions each year during the period 1979-1988. A random sample of 47 individuals was selected from the latter group. These data are analyzed in Banerjee and Frees (1997). Taxpayer Characteristics: Variable Description SUBJECT Subject identifier, 1-47. TIME Time identifier, 1-10. CHARITY The sum of cash and other property contributions, excluding carry overs from previous years. INCOME Adjusted gross income.

Page 4: Panel Data Analysis Homework 1 - jose.fajardo · Panel Data Analysis Homework 1 Prf. José Fajardo (FGV/EBAPE) 1. ... Formulate, estimate, ...

PRICE One minus the marginal tax rate. Here, the marginal tax rate is defined on income prior to contributions. AGE An indicator variable that equal one if a taxpayer is over sixty four years and equals zero otherwise. MS An indicator variable that equal one if a taxpayer is married and equals zero otherwise. DEPS Number of dependents claimed on the taxpayer’s form. a) Basic Summary Statistics I. Summarize each variable. For the binary variables, AGE and MS, provide only averages. For the other variables, CHARITY, INCOME, PRICE and DEPS, provide the mean, median, standard deviation, minimum and maximum. Further, summarize the average response variable CHARITY over TIME. II. Create a multivariate time series plot of CHARITY versus TIME. III. Summarize the relationship among CHARITY, INCOME, PRICE, DEPS and TIME. Do this by calculating correlations and scatter plots for each pair. b) Basic fixed effects model I. Run a one-way fixed effects model of CHARITY on INCOME, PRICE, DEPS, AGE and MS. State which variables are statistically significant and justify your conclusions. II. Produce an added variable plot of CHARITY versus INCOME, controlling for the effects of PRICE, DEPS, AGE and MS. Interpret this plot. c) Incorporating temporal effects. Is there an important time pattern? I. Re-run the model in b(i) and include TIME as an additional explanatory (continuous) variable. II. Re-run the model in b(i) and include TIME through dummy variables, one for each year. III. Re-run the model in b(i) and include an AR(1) component for the error.

Page 5: Panel Data Analysis Homework 1 - jose.fajardo · Panel Data Analysis Homework 1 Prf. José Fajardo (FGV/EBAPE) 1. ... Formulate, estimate, ...

IV. Which of the three methods for incorporating temporal effects do you prefer? Be sure to justify your conclusion. d) Random effects: Find R.E. estimations in item (a). What do you choose F.E or R.E? [M. Banerjee and E. W. Frees , “Influence diagnostics for longitudinal models”. Journal of the American Statistical Association 1997, v. 92, 999 -1005.]