ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf ·...

6

Click here to load reader

Transcript of ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf ·...

Page 1: ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf · ECON%312%Problem%Set%1%–%Suggested%Solutions% SpringSemester%2014% % % 1.%% a)#Show#that#v i=u i+w i# % ... # In# the

ECON  312  Problem  Set  1  –  Suggested  Solutions  Spring  Semester  2014  

   1.    a)  Show  that   vi = ui +wi    The   original   regression   model   in   terms   of   the   true   but   unobservable  Yi  is   given   by:  Yi = β0 + β1Xi + ui and  the  regression  model  we  actually  estimate  using   the  observable  variable  

Yi  is  given  by:   Yi = β0 + β1Xi + v    By  assumption,  

Yi = Yi +wi  and  so   Yi = Yi −wi .  Substituting  this  into  the  first  regression  model  gives:    

Yi −wi = β0 + β1Xi + uiYi = β0 + β1Xi + ui +wi

 

 and  so   vi = ui +wi    b)  Show  that  the  regression  

Yi = β0 + β1Xi + vi  satisfies  the  standard  LS  assumptions  in  Key  Concept  4.3.    (Assume  that  wi  is  independent  of   Xi  and  Yi  for  all  i  and  has  finite  4th  moment.)    i)  We  must  show  that  E(vi | Xi ) = 0  or  equivalently,   cov(vi | Xi ) = 0  • From  part  (a),   vi = ui +wi  and  so:      

cov(vi | Xi ) = cov(ui +wi ,Xi ) = cov(ui ,Xi )+ cov(wi ,Xi )    • We  are  told  to  assume  that  wi  is   independent  of   Xi ,  which  implies   cov(wi ,Xi ) = 0  

and  we   are   also   told   that   the   regression  model  Yi = β0 + β1Xi + ui  satisfies   the   LS  assumptions,   which   implies   cov(ui ,Xi ) = 0 .   These   facts   imply   that   cov(vi ,Xi ) = 0  and  therefore  the  first  assumption  is  satisfied.  

 ii)  We  must  show  that   (Xi , Yi )  are  i.i.d.  draws  from  their  joint  distribution  • Given  that  the  regression  model  Yi = β0 + β1Xi + ui  satisfies  the  LS  assumptions,  

(Xi ,Yi )  must  be  i.i.d.    • Yi = Yi +wi  and  by  assumption  the  measurement  errors  wi  are  i.i.d.  and  are  independent  of  Yi .  This  implies  that   Yi  must  also  be  i.i.d.    

• Finally,   Yi  and   Xj  must  be  independent  for   i ≠ j ,  since  both  Yi  and  wi  are  

independent  of   Xj  

• Therefore,   (Xi , Yi )  must  be  i.i.d.    iii)  We  must  show  that  

Yi  and   Xi  have  finite  fourth  moment  (no  large  outliers)  

Page 2: ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf · ECON%312%Problem%Set%1%–%Suggested%Solutions% SpringSemester%2014% % % 1.%% a)#Show#that#v i=u i+w i# % ... # In# the

• Xi  and  Yi  must  have  finite  fourth  moment  because  the  regression  model    Yi = β0 + β1Xi + ui  satisfies  the  LS  assmptions  

• Again,  by  definition   Yi = Yi +wi .  Both  Yi  and  wi have  finite  fourth  moments  and  are  

mutually  independent,  so   Yi  has  a  finite  fourth  moment.    

 c)  The  OLS  estimators  are  consistent,  because  the  standard  LS  assumptions  are  satisfied.  

 d)  Yes,  confidence   intervals  and  hypothesis  tests  can  be  performed  in  the  normal  way,  because  the  LS  assumptions  are  satisfied.    e)   In   the   case   of   measurement   error   that   is   i.i.d.   (i.e.   classic   measurement   error),  measurement  error   in   the   independent  variable  makes   the  OLS  estimator   inconsistent  and   biased,   but   if   there   is   measurement   error   in   the   dependent   variable   the   OLS  estimator   is  still  unbiased  and  consistent.   In   this  sense,   the  statement   is  partially   true.  However,   the   measurement   error   in   the   dependent   variable   still   makes   the   OLS  estimator  inefficient  compared  to  the  case  with  no  measurement  error.  Furthermore,  if  the   form  of  measurement  error   in   the  dependent  variable   is  different,   then   this   result  may  no  longer  hold  and  OLS  may  be  biased  and  inconsistent.    2.  As  we  discussed  during  the  classes,  the  effects  of  missing  data  on  the  OLS  estimator  depend   on   why   the   data   are   missing.   We   saw   three   possible   cases:   when   data   are  missing   at   random,  when  data   are  missing  based  on   the  value  of   one  of   the   regressor  and  finally  when  data  are  missing  based  on  the  value  of  the  dependent  variable.    In  cases  1  and  2,   the  OLS  estimator  will   remain  unbiased  and  consistent,  but  will  be   inefficient  (you  can  explain  intuitively  why  this  is  the  case  for  some  extra  credit).    In  the  final  case  however  the  OLS  estimator  will  typically  be  biased  and  inconsistent  –  known  as  sample  selection   bias.   Again   you   can   try   to   explain   why   or   give   some   brief   examples   for  additional  credit.      3.    a)  The  STATA  commands  to  generate  the  log  wage,  perform  the  regression  and  display  the  adjusted  R2  are:  

generate lwage = ln(wage) regress lwage exper, robust display "Adjusted R-squared = " _result(8)

NOTE:  we  should  include  the  robust  option  in  the  regress  command,  to  tell  STATA  to  use  the  heteroskedasticity  robust  standard  errors.      

   The   positive   sign   of   the   estimated   coefficient   on   exper   is   consistent   with   economic  theory,   since   on   average   a   positive   relationship   would   be   expected   between   years   of  

Page 3: ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf · ECON%312%Problem%Set%1%–%Suggested%Solutions% SpringSemester%2014% % % 1.%% a)#Show#that#v i=u i+w i# % ... # In# the

working   experience   and   the   (log)   wage.   However,   the   coefficient   estimate   above   will  only  be  a  reliable  estimate  of  the  true  causal  effect  if  the  model  is  internally  valid.    There  are   several   possible   threats   to   internal   validity   -­‐   in   particular,   given   the   very   simple  single   regressor   model   used,   it   is   quite   possible   that   there   is   significant   omitted  variables  bias,  which  would  make  the  OLS  estimator  biased  and  inconsistent.      b)   For   the   log-­‐log   specification,   the   relevant   STATA   commands   and   estimation   output  are:   generate lexper = ln(exper) regress lwage lexper, robust

display "Adjusted R-squared = " _result(8)

   For  the  second  quadratic  functional  form:  

generate exper2 = exper^2 generate exper3 = exper^3 regress lwage exper exper2, robust display "Adjusted R-squared = " _result(8)

 

   The  adjusted  R2  values  imply  that  both  the  log-­‐log  and  quadratic  specifications  provide  an  improvement  in  predictive  ability  compared  to  the  log-­‐linear  specification  in  (a).  The  quadratic  specification  for  exper  however  results  in  a  higher  adjusted  R2  than  the  log-­‐log   specification   and   so  we   select   the   quadratic  model.     You   could   also   check   a   cubic  specification  for  exper,  but  you  should  find  that  the  simpler  quadratic  specification  is  actually   preferable.   NOTE:   it   is   typically   better   to   use   the   adjusted   R2   to   compare  different  models  and  not   the   standard  R2  –   this   is   true  generally  and  not  only   for   this  specific  question.      c)  Extending  the  quadratic  model  above  by  adding  years  of  education,  educ:  

regress lwage exper exper2 educ, robust  

Page 4: ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf · ECON%312%Problem%Set%1%–%Suggested%Solutions% SpringSemester%2014% % % 1.%% a)#Show#that#v i=u i+w i# % ... # In# the

   The   estimated   coefficients   on   all   included   regressors   are   statistically   significant.   The  estimated   coefficient   on   educ   is   positive   (consistent   with   economic   theory)   and   the  value  of  0.073  implies  that  (holding  exper  constant)  an  extra  year  of  education  leads  to  a  predicted   increase  of  7.3%  in  wages  (remember  we  are  using  the   log  wage).  We  can  also  calculate   the  predicted  effect  of  a  change   in  years  of  working  experience  (holding  educ  constant),  but  because  exper  enters  the  regression  model  nonlinearly  we  have  to  use   the   method   discussed   in   Section   8.1   of   the   textbook.   Note   however   that   the  coefficient  on  the  squared  value  of  exper  is  very  small,  suggesting  that  the  relationship  between  working  experience  and  the  log  wage  is  close  to  linear.  Compared  to  the  earlier  model   in   (b),   the   estimated   coefficients  on  exper   and  exper   squared  have   the   same  signs,   but   their   sizes   change   quite   substantially.   Combined   with   the   fact   that   the  estimated  coefficient  on  educ  is  statistically  significant,  this  suggests  that  the  model  in  (b)  probably  suffers  from  some  omitted  variable  bias  and  so  is  not  internally  valid.      d)  The  minimum  you  should  check  is  whether  any  of  the  binary  variables  appear  to  be  relevant  when  included  linearly  in  the  regression  model  as  additional  regressors.  Given  that  the  number  of  binary  variables  is  small  relative  to  the  sample  size,  the  best  way  to  begin  would  be  to  include  all  of  the  binary  variables  as  regressors  and  check  which  are  statistically  significant:    

regress lwage exper exper2 educ female married nonwhite south union, robust

   The   estimated   coefficients   on   female,   nonwhite   and   union   are   all   statistically  significant,   suggesting   that   gender,   race   and   trade   union   membership   all   seem   to   be  relevant  explanatory  variables  for  the  log  wage.  The  estimated  coefficients  on  married  

Page 5: ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf · ECON%312%Problem%Set%1%–%Suggested%Solutions% SpringSemester%2014% % % 1.%% a)#Show#that#v i=u i+w i# % ... # In# the

and  south  however  are  not   individually  or   jointly  significant  and  so  we  can  probably  safely  remove  these  variables  and  re-­‐estimate  the  model:    

   When   we   remove   married   and   south   from   the   regression   model,   the   estimated  coefficients  on  the  remaining  variables  do  not  change  substantially.  Combined  with  the  statistically   insignificant   estimated   coefficients   we   obtained   above   for   married   and  south,   this   suggests   that   we   can   probably   safely   remove   these   variables   without  introducing  any  substantial  omitted  variables  bias.        We  could  also  check  whether  any  interaction  terms  are  relevant  -­‐  one  obvious  example  is  to  interact  educ  and  exper  with  female,  to  allow  the  expected  returns  to  working  experience   and   education   to   differ   according   to   gender.   To   do   this   we   create   the  interaction   terms   femeduc   and   femexper   and   then   include   them   in   the   previous  regression  model:  

generate femeduc = female*educ generate femexper = female*exper regress lwage exper exper2 educ female femeduc femexper nonwhite union, robust

From   these   estimation   results,   there   is   some   evidence   that   the   returns   to   working  experience   are   lower   for   women   than   for   men,   but   the   returns   to   education   do   not  appear  to  vary  significantly  with  gender.      

Page 6: ECON312 PS1 Solutions - Koç Hastanesi PS1 Solutions.pdf · ECON%312%Problem%Set%1%–%Suggested%Solutions% SpringSemester%2014% % % 1.%% a)#Show#that#v i=u i+w i# % ... # In# the

A  second  obvious  example   is   to  allow  the  expected  returns  to  working  experience  and  education   to   differ   according   to   race.   There   are   of   course   many   other   possible  interaction   terms   you   could   include,   some   of   which   may   turn   out   to   be   statistically  significant.  However,   checking  all  possible   interaction   terms   is  not  really  practical  and  any   additional   interaction   terms   you   do   check   should   be   justifiable   in   some   way  according  to  economic  or  other  theory  (such  the  examples  above).       e)  Depending  on  which  binary  variables  and  interaction  terms  you  looked  at  in  part  (d),  you  may  obtain  a  different  final  model  for  part  (e).  As  a  result,  there  is  not  really  a  single  ‘correct’   regression  model   for   this  part  of   the  question  –  what   is   important   is  whether  you  can  correctly  interpret  and  discuss  the  estimation  results  for  the  specific  model  you  used.     For   example,   what   is   the   expected   effect   wages   from   an   additional   year   of  education?   Does   this   vary   according   to   gender   or   race?   If   it   does,   how  much   does   it  differ?   All   other   things   being   equal,  what   is   the   difference   in   expected  wage   between  men  and  women?  What  about  white  and  non-­‐white   individuals?  All  of   these  questions  can  be  answered  using  the  material  from  ECON311  and  so  they  will  not  be  discussed  in  detail  here.      f)   Here   you   can   briefly   consider   the   various   threats   to   internal   validity   and   discuss  whether   you   feel   each   is   likely   to   be   a   problem   in   the   current   context.   We   have  controlled   for   several   qualitative   factors   that   are   typically   found   to   be   important  determinants   of   wages   (such   as   gender   and   race),   but   it   is   possible   that   omitted  variables  bias  is  still  a  problem.  If  you  think  this  may  be  true,  try  to  suggest  one  or  two  potentially  relevant  variables   that  are  not   included   in   the  data  set  and  why  they  could  satisfy  the  conditions  for  omitted  variable  bias.  You  are  told  at  the  start  of  the  question  that   the   data   are   from   the  US  Current   Population   Survey   (CPS)   and   so  were   collected  and  processed  by  the  US  government.  This  makes  simple  recording  errors  unlikely  and  the  sampling  system  for  the  survey  has  probably  been  designed  to  avoid  serious  sample  selection  problems.  The  data  are  however  still  survey  based  and  so  could  possibly  suffer  from  measurement  error  if  survey  respondents  give  inaccurate  responses  –  whether  or  not  this  is  a  problem  depends  on  the  form  of  measurement  error.      g)   Again,   the   exact   estimation   results   you   obtain   will   depend   on   the   final   model  specification  you  selected   in  part   (e).  Compare  your  estimation  results   from  1978  and  1985   using   identical   specifications   for   the   regression   function   –   if   the   estimated  coefficient  values  and  their  statistical  significance  are  similar  for  the  two  time  periods,  then   this   provides   some   evidence   of   external   validity   over   time.   If   there   are   large  differences  between   the   results   for  1978  and  1985,   then   this   suggests   that   the   results  from  1978  may  not  be  generalisable  to  other  time  periods.    Of  course,  obtaining  similar  estimation   results   for   1978   and   1985   does   not   provide   any   guarantee   of   external  validity  more   generally   -­‐   either   across   time   (for   example,   to   other   years   like   2005   or  1965)  or  to  other  countries.