Wednesday, November 9, 2011

Proc QLIM to fit Tobit Models

Standard Tobit Model:

yi=yi^star*I(yi^star>0), where yi^star=xi*beta+ei, ei~N(0,sigma^2).
The inverse Mills ratio, also called the selection hazard, is used to take account of a possible selection bias.
IMratio=f(xi*beta/sigma)/Pi(xi*beta/sigma), f is pdf, Pi is cdf of normal distribution. As the probability of censoring increases, the ratio approaches infinity. As the probability of censoring decreases, the ratio approaches 0.

The likelihood function of Tobit models consists of two parts: the first part is the likelihood function for OLS regression and the second part is the likelihood function for a Probit model. If there was no censoring, then the tobit model would produce unbiased OLS estimates. But if there was censoring, the Tobit model would not only predict the non-censored response values, but also the estimated probability that the observation is censored.

Probit Model:
Pi^(-1)(pi)=xi*beta.
The probit link function transforms a probability to a standard normal z-score at which the left-tailed probability equals the posterior probability. Pi^(-1)(0.5)=0, Pi^(-1)(1.96)=.0975.

Proc QLIM ( Qualitative and Limited dependent variable Model) analyzes univariate and multivariate limited dependent variables where dependent variables are observed in a limited range of values.
Option ENDOGENOUS CENSORED specifies the dependent variable is censored.
Option HETERO specifies the ways of heteroscedasticity of the residuals.
Option ENDOGENOUS TRUNCATED specifies the dependent variable is truncated. In a truncated distribution, the values of the predictor variables are known only when the response variable is observed. This differs from censoring where the predictor variables are known even when the response variable is unknown.

* Tobit Model in PROC QLIM;
** ENDOGENOUS CENSORED (lower bound=0);
proc qlim data=pva;
class gender;
format gender $gender.;
model donation = gender age income_group wealth_rating pep_star months_since_last_gift median_home_value recent_avg_gift_amt recent_card_response_prop card_prom_12 recent_response_count;
endogenous donation ~ censored(lb=0);
output out=selection conditional expected marginal predicted xbeta mills;
title 'Tobit Model in PROC QLIM';
run;

* Tobit Model Corrected for Heteroscedasticity;
**HETERO specified variance function Var(ei);
proc qlim data=pva;
class gender;
format gender $gender.;
model donation= gender age income_group wealth_rating
pep_star months_since_last_gift
median_home_value recent_avg_gift_amt
recent_card_response_prop card_prom_12
recent_response_count;
endogenous donation ~ censored(lb=0);
hetero donation~pep_star months_since_last_gift
recent_avg_gift_amt card_prom_12
recent_response_count;

output out=selection xbeta;
title "Tobit Model Corrected for Heteroscedasticity";
run;

* Tobit Model in PROC QLIM;
** ENDOGENOUS CENSORED (lower bound=500 upper bound=1500);
proc qlim data=censored_aids;
model basecd4 = age cigarettes drug partners depression;
endogenous basecd4 ~ censored(lb=500 ub=1500);
title 'Censored Distribution in PROC QLIM';
run;

* Tobit Model in PROC QLIM;
** ENDOGENOUS CENSORED (lower bound=500 upper bound=1500);
proc qlim data=censored_aids;
model basecd4 = age cigarettes drug partners depression;
endogenous basecd4 ~ censored(lb=500 ub=1500);
hetero basecd4 ~ cigarettes;
output out=selection xbeta;
title 'Censored Distribution in PROC QLIM';
run;

* Tobit Model in PROC QLIM;
**Truncated Regression Model;
proc qlim data=housing;
model median_value = crime_rate large_lots nonretail charles_river nitric_oxide
average_number_rooms percent_lower_status
distance_Boston teacher_pupil_ratio access_highway;
endogenous median_value ~ truncated (ub=25);
output out=selection conditional marginal predicted mills xbeta errstd;
title "Truncated Regression Model for the Boston Housing Data";
run;

* Tobit Model in PROC QLIM;
*Truncated Regression Model Accounting for Heteroscedasticity;
proc qlim data=housing;
model median_value = crime_rate large_lots nonretail charles_river nitric_oxide
average_number_rooms percent_lower_status
distance_Boston teacher_pupil_ratio access_highway;
endogenous median_value ~ truncated (ub=25);
hetero median_value~ percent_lower_status distance_Boston;
output out=selection xbeta;
title "Truncated Regression Model for the Boston Housing Data Accounting for Heteroscedasticity";
run;

Sample selection Models;
In this model, the response variable is only observed when some selection criterion is met. The selection equation has a latent response variable which includes predictor variable, coefficients and an error term.

* Tobit Model in PROC QLIM;
**Sample Selection Model ;
proc qlim data=mroz;
model labor_force = age education experience kidslt6 income / discrete;
model wage = experience experience_sq education marg_tax_rate / select(labor_force=1);
hetero wage ~ experience education;
title "Sample Selection Model of Married Female Wages in Labor Force";
run; 

No comments:

Post a Comment

Blog Archive