My DS Coding Bolg: Proc ARIMA to Fit AR(p) model

Autoregressive models are a class of models that estiamte the stochastic process underlying a time series, wher the time series values exhibit nonzero auto-correlation.

The Box-Jenkins approach to analyzing time series data is an analytical method involving three phases: identification of the process, estimation of the model parameters, and diagnostic checking.

*Identification Phase
The first phase is the identification phase. Use Proc arima by issuing the IDENTIFY statement.

proc arima data=predicted_series ;
identity var=severity scan;
run;

ACF lists the estimated auto-correlation coefficients at each lag, cov(Y_t, Y_t+i), i=1,2,.... A plot of the ACF shows the pattern of auto-correlations as the number of lags increases.

IACF is the auto-correlation function of an inverted model. the IACF of an AR is equivalent to the ACF for the same process modeled with an MA model.

PACF is the auto-correlation function at lag p accounting for the effects of all intervening observations. Thus, the PACF at lag 1 is identical to the auto-correlation at lag 1, but the PACF and ACF are different at all higher lags. Plots of the PACF reveal patterns that are useful in identifying the order of auto-regressive processes.

For an AR(p) process, the ACF plots should display the following patterns:
1. ACF tails off exponentially.
2. The PACF and IACF drop to 0 after lag p, where p is the order of the AR process. In PACF plot, if the tail-off behavior is combined with positive auto-correlations alternating with negative auto-correlation on successive lags, it is said to be contained within an exponential envelop.
3. If the ACF drops off very slowly, the time series is said to be non-stationary and will required further processing.

*Estimation Phrase
The second phrase of the ARIMA procedure is the estimation phrase.

proc arima data=predicted_series ;
identity var=severity scan;
estimate p=6;
run;

Three estimation techniques are available in Proc ARIMA:
Conditional least squares (CLS) is conditional upon the assumption that all values of the series prior to the first observation the analysis data set are equal to the mean of the series.
Unconditional least squares (ULS) is exact least squares which makes no assumption.
Maximum likelihood (ML) attempts to maximize the log of the likelihood function.

The ouput from the estimation phase lists model parameter estimates and associated significant tests, goodness of fit statistics, and other estimation diagnostics.
Use the ESTIMATE statement to invoke the estimation phrase of PROC ARIMA. An IDENTIFY statement must precede the ESTIMATE statement.

*Diagnostic Checking Phrase
The third phrase is to diagnostic checking the model.

General guidelines for checking diagnostics:
1. Check auto-correlation check of residuals. If the Q statistics are significant, you can conclude that the estimated model provides a poor fit. Q statistics is for the white noise hypothesis for the residuals of the fitted model. Each Q statistic is a chi-square produced using Ljung-Box calculation. Ljung-Box is based on the null hypothesis that the series is white noise.
2. Use the goodness-of-fit criteria(variance estimate, AIC, BIC) to compare the fit of different models estimated with same estimation method (conditional least squares, unconditional least squares, maximum likelihood).
3. Check the parameter estimates. These may indicate that model has too many parameters.
4. If the obsolete value of the last (highest-order) parameter estimates is close to 1, then the series may be non-stationary.
5. Perform a spectral analysis on the residuals to test for white noise hypothesis.

Proc arima is also used to forecast autoregressive models. ARIMA models are best for short-term forecasting because long-term forecasting is no better than using the mean of the time series.

proc arima data=predicted_series ;
identify var=severity scan;
estimate p=6;
forecast lead=60 interval=month id=failure_date out=outlead;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
symbol3 v=dot ci=blue height=1 cells interpol=join l=2 w=1;
symbol4 v=dot ci=blue height=1 cells interpol=join l=2 w=1;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=outlead;
format failure_date monyy. severity dollar.;
plot (severity forecast L95 U95)*failure_date/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;

My DS Coding Bolg

Wednesday, November 9, 2011

Proc ARIMA to Fit AR(p) model

Autoregressive models are a class of models that estiamte the stochastic process underlying a time series, wher the time series values exhibit nonzero auto-correlation.

No comments:

Post a Comment

Blog Archive