Wednesday, November 9, 2011

Proc Autoreg to Model Time Series Errors

Proc Autoreg calculates OLS estimates for the linear regression component of your model and choose the following method to estimate the auto-regressive component of the model: Yule-Walker, iterated Yule-Walker, unconditional least squares (ULS), maximum likelihood (ML).

The advantage of AUTOREG is that the procedure automatically fits an auto-regressive error structure to the residuals. The disadvantage is that it requires all future values of the independent variables to the data. Forecasts beyond the end of the history data for variable severity cannot be performed. And also, no overall measure of the adequacy of the model.

Proc Autoreg results contain the Durbin-Watson Statistics. It test the assumption of no first-order auto-correlation. Auto-correlation is either positive, where errors of one sign tend to be followed by errors of the same sign, or negative, where errors of one sign tend to be followed by errors of the opposite sign. The Durbin-Watson statistic is a ratio with a range of 0 to 4. If no auto-correlation is present, the expected value of the statistic is 2. Values significantly less than 2 are an indicate positive auto-correlation, while values significantly larger than 2 indicate negative auto-correlation.

proc sql;
create table failure_60 as
select
year(failure_date) as failure_year,
month(failure_date) as failure_month,
mdy(month(failure_date),1,year(failure_date)) as failure,
count(claim_id) as total_claim_cnt,
sum(claim_amt) as total_claim_amt,
sum(claim_amt)/count(claim_id) as severity
from claim_grp9
group by 1,2
order by 1,2;
quit;

data failure_60;
set failure_60;
lseverity=log(severity);
dlseverity=dif(lseverity);
ldlseverity1=lag(dlseverity);
ldlseverity2=lag2(dlseverity);
ldlseverity3=lag3(dlseverity);

lcount=log(total_claim_cnt);
dlcount=dif(lcount);
ldlcount1=lag(dlcount);
ldlcount2=lag2(dlcount);
ldlcount3=lag3(dlcount);
run;

proc autoreg data=failure_grp4;
model dlseverity=dlcount/nlags=12 backstep method=ml;
run;
Option backstep in the model statement is to perform backwards elimination of non-signification auto-regressive parameter estimates. It is useful when you are unsure of the order of the model.

*alternative model;
proc autoreg data=failure_grp4;
model dlseverity=dlcount ldlcount1 ldlcount2 ldlcount3/nlags=12 backstep method=ml;
run;
*Both AIC and BIC are smailler for the alternative model;

*Fitting an equivalent Model using Proc ARIMA;
When using Proc Autoreg, you implicitly assume that the error process for the dependent time series variable is auto-regressive or that it can be approximated by an auto-regressive process. Proc Arima enables you to identify the error processes for your time series data. And also, use the ARIMA procedure to calculate the correlation between two time series by specify the CROSSCOR= option in the IDENTITY statement.
proc arima data=failure_60;
identify var=severity(1) crosscor=(total_claim_cnt(1)) noprint;
estimate input=(1$(1)total_claim_cnt) p=1 q=1 method=ml plot;
forecast lead=60 interval=month id=failure out=predict_severity ;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=predict_severity;
format failure monyy. severity dollar.;
plot (severity forecast)*failure/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit;

*Put training and test data sets together;

data failure;
set train test;
run;

*Use msrp and failure_time information to fit the data;
proc autoreg data=failure;
model avg_claim=msrp failure_year failure_month/nlags=2 method=ml noprint;
output out=forecast p=forecast lcl=lower95 ucl=upper95;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity') order=(-200 to 1000 by 500);
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=dot l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=dot l=1 w=2;
symbol2 v=dot ci=blue height=1 cells interpol=dot l=1 w=2;
symbol2 v=dot ci=blue height=1 cells interpol=dot l=1 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=forecast;
format failure monyy. avg_claim dollar.;
plot (avg_claim forecast lower95 upper95)*failure/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit;

No comments:

Post a Comment

Blog Archive