Showing posts with label Time Series. Show all posts
Showing posts with label Time Series. Show all posts

Tuesday, August 19, 2014

Time Series Analysis in R

##################################################################
## Second Order Summaries
##################################################################
The theory for time series is based on the assumption of second-order stationarity after removing any trends (which will include seasonal trends). A time series is said to be stationary if the mean, variance, and lag covariance are constant over time. There are three methods of dealing with non-stationary time series listed above: differencing, using log transformations, using sinusoidal components.

Checking the acf functions to explore the auto-covariance and auto-correlation, make sure a series with a non-unit frequency.

##################################################################
## Checking acf and pacf
##################################################################
par(mfrow = c(1, 2), pty="s")
acf(Sample$embroidered_top)
pacf(Sample$embroidered_top)


The plots of the above series show the pattern typical of seasonal series, and the auto-correlation do not damp down for large lags.

How to identify an AR(p) process
- The ACF tails off exponentially
- The PACF drop to 0 after lag p, where p is the order of the AR process
- If the ACF tails off very slowly, the time series may not be stationary.

How to identify an MA(q) process
- The ACF drops to 0 after lag q, where q is the order of the MA process
- The PACF tail of exponentially

##################################################################
## Spetral analysis
##################################################################

The basic tool in estimating the spectral density is the Periodogram. Checking Periodogram, we effectively compute the squared correlation between the series and sine/cosine waves of frequency. And also, using repeated smoothing with modified Daniell smoothers, which are moving averages giving half weight to the end values of the span. The bandwidth is a measure of the size of smoothing window. Trial-and-error is needed to choose the spans. The spans should be odd integers, and it helps to produce a smooth plot if they are different and at least two are used.

spectrum(Sample$embroidered_top)

## Cumulative periodogram
par(mfrow = c(1, 1), pty="s")
cpgram(logtraindt)


##################################################################
## Fit ARIMA models
##################################################################
ARIMA models are best for short-term forecasting because long-term forecasting is no better than using the mean of the time series.

## split train and test data set
train <- Sample[1:(nrow(Sample)-4)]
test <- Sample[(nrow(Sample)-3):nrow(Sample)]

## transformation
logtraindt <- log(1+train$embroidered_top)/log(100)
logtestdt <- log(1+test$embroidered_top)/log(100)

## AR model investigation
## All likelihood assume a Gaussian distribution. The model with the smallest AIC is chosen.
ar1 <- ar(logtraindt, aic=T, order.max = 10); summary(ar1)
cpgram(ar1$resid, main="AR Max Model")

## case 1
fit <- arima(ts, order=c(1,0,0), list(order=c(2,1,0), period=5))
fore <- predict(fit, n.ahead=15)
# error bounds at 95% confidence level
U <- fore$pred + 2*fore$se
L <- fore$pred - 2*fore$se
ts.plot(ts, fore$pred, U, L, col=c(1,2,4,4), lty = c(1,1,2,2))
legend("topleft", c("Actual", "Forecast", "Error Bounds (95% Confidence)"), col=c(1,2,4), lty=c(1,1,2))

## case 2
## grid search for p,d,q
#for (i in 1:4){
#for (j in 1:4){
for (k in 1:4){
  arima1 <- arima(logtraindt, order=c(3,1,k))
  print(arima1$aic)
}
arima1 <- arima(logtraindt, order=c(4,4,3))
tsdiag(arima1)
arima1.fore <- predict(arima1, 8)
ts.plot(c(train$embroidered_top,arima1.fore$pred))
ts.plot(arima1.fore$pred,
        arima1.fore$pred - 2*arima1.fore$se,
        arima1.fore$pred + 2*arima1.fore$se)


##################################################################
## Plot ARIMA model Prediction
##################################################################
train$Date = as.Date(train$Date, "%Y-%m-%d")
pred.dt <- train$Date[length(train$Date)] + seq(1:8)*7
plot_dt=c(train$Date, pred.dt)
plot_pd=c(train$embroidered_top, test$embroidered_top, rep(NA,4))
plot_pd2=c(rep(NA, length(train$embroidered_top)),exp(arima1.fore$pred*log(100))-1)
plot_data=data.table(plot_dt, plot_pd, plot_pd2)
n=nrow(plot_data)
plot_data=plot_data[(n-52):n]

ggplot(plot_data, aes(x=plot_dt, y=plot_pd)) +
  geom_point(aes(x=plot_dt, y=plot_pd), col = "blue", size = 1) +
  geom_line(aes(x=plot_dt, y=plot_pd), size=1, linetype=1, color="blue") +
  geom_line(aes(x=plot_dt, y=plot_pd2), size=1, linetype=4, color="darkgreen") +
  geom_vline(xintercept = as.integer(as.Date("2016-10-23", "%Y-%m-%d")), linetype=4) +
  scale_x_date(labels=date_format("%b %y"), breaks=date_breaks("12 week")) +
  stat_smooth(color="red") +
  xlab("Search Index") +
  ylab("Date") +
  ggtitle("Forcast of 8 Weeks from ARIMA for Embroidered Top") +
  theme(plot.title = element_text(face = "bold", size = 20)) +
  theme(axis.text.x = element_text(face = "bold", size = 14)) +
  theme(axis.text.y = element_text(face = "bold", size = 14)) +
  theme(axis.title.x = element_text(face = "bold", size = 16)) +
  theme(strip.text.x = element_text(face = "bold", size = 16)) +
  theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
  guides(fill=FALSE)


##################################################################
### Exponential smoothing
#### Fit in to exponential model
##################################################################
ts.mail <- ts(data$count)
ts.mail.forecast <- HoltWinters(ts.mail , beta=FALSE, gamma=FALSE)
ts.mail.forecast
plot(ts.mail.forecast)
ts.mail.forecast2 <- forecast.HoltWinters(ts.mail.forecast, h=100)
plot(ts.mail.forecast2)

##################################################################
## Double Exponentila with Linear Trends
##################################################################
  mHolt <- holt(tslogcnt)
  summary(mHolt)
  plot(mHolt)
  attributes(mHolt)
  mHolt$fitted

  fit1 <- ses(tslogcnt, h=5)
  fit2 <- holt(tslogcnt, h=5)
  fit3 <- holt(tslogcnt, exponential=TRUE, h=5)
  fit4 <- holt(tslogcnt, damped=TRUE, h=5)
  fit5 <- holt(tslogcnt, exponential=TRUE, damped=TRUE, h=5)

  plot(fit3, plot.conf=FALSE)
  lines(fit1$mean,col=2)
  lines(fit2$mean,col=3)
  lines(fit4$mean,col=5)
  lines(fit5$mean,col=6)
  legend("topleft", lty=1, pch=1, col=1:6,
         c("Data", "SES", "Holt's", "Exponential", "Additive Damped", "Multiplicative Damped"))
}


##################################################################
## Stationary Test
##################################################################
## stl function to extract components
stl_dau <- stl(ts_dau, s.window="periodic", robust=T)
names(stl_dau)
ts.plot(stl_dau$time.series)

## stationary test
## In ADF test, P-value less that 0.05 to reject the null hypothesis to conclude stationary time series
adf.test(ts_dau)

## de-trend, de-seasonaly
stl_dau <- stl(ts_dau, s.window="periodic", robust=T)
head(stl_dau$time.series)
ts_dau_dt <- ts_dau - stl_dau$time.series[,2]
adf.test(ts_dau_dt)
ts_dau_sa <- ts_dau - stl_dau$time.series[,1]- stl_dau$time.series[,2]
adf.test(ts_dau_sa)

## lags, leads, ACF, PACF and CCF
ts_dau_lag1 <- dplyr::lag(ts_dau_sa, 1)
ts_dau_lead1 <- dplyr::lead(ts_dau_sa, 1)
acf_result <- Acf(ts_dau_sa)
pacf_result <- Pacf(ts_dau_sa)
## ccf(ts1, ts2)

##################################################################
## State Space Model
##################################################################
## tbats
ts_dau <- ts(log(data[,amount]), frequency=365.25, start=c(2015,1,17))
daily_data <- tbats(ts_dau) 
summary(daily_data)
t.forecast <- forecast(daily_data, h=14)
plot(t.forecast)
t.forecast
summary(t.forecast)
Box.test(t.forecast$residuals)

Wednesday, November 9, 2011

Proc Timeseries and Proc Expand

Special Date and Time Interval Functions

Intck represents the number of time intervals in a given time span.
duration=intck('month',policy_sale_date,failure_date);

Intnx generates a date that is specified number of intervals from a starting value.
new_date=intnx('month',date,12);

Proc TIMESERIES
Proc timeseries analysis time-stamped transaction date with respect to trend and seasonal analysis. Time-stamped time series are often recorded at no fixed interval. Analysis often want to use time series analysis techniques that require fixed-time intervals. Therefore, the transaction data much be converted to a high frequency or accumulated to a lower frequency.

Further analysis can be performed by proc timeseries including: descriptive(global) statistics, seasonal decomposition/adjustment analysis, correlation analysis, and cross-correlation analysis.

proc timeseries data=failure_60 out=predict_series plot=(series corr decomp);
Id failure interval=month accumulate=median;
var severity;
run;

out=predict_series option specifies the resulting time series for each id.
interval=month option specifies the transaction to be accumulated on a monthly basis.
accumulate=median option specifies that the median of the transaction is to be calculated.
After the data is accumulated into a time series format, many of the procedures can be used to analyze the resulting time series data.

Proc EXPAND
proc expand is able to interpolate missing values in time series data. It provides three ways of interpolating:
1. cubic spline interpolation. It involves joining segments of third degree polynomial curves to the non-missing input values, so that the whole curve and its first and second derivatives are continuous. Method=spline in the convert statement to request this method.
2. linear spline interpolation. It fits a continuous curve by connecting successive straight line segments to the non-missing input values. Specify Method=join in the convert statement to request this method.
3. step function interpolation. It fits a discontinuous step function to the non-missing input values. Specify Method=step in the convert statement to use this method.

%let start_date=mdy(10,1,2002);
%let end_date=mdy(6,1,2011);
data structure;
length=intck('month',&start_date,&end_date);
do i=0 to length;
failure_date=intnx('month',&start_date,i);
format failure_date date.;
output;
end;
run;

proc sort data=series;
by failure_date;
run;

proc sort data=structure;
by failure_date;
run;

data series1;
merge structure(in=a) series;
by failure_date;
if a;
run;

proc expand data=series1 out=predicted_series from=month;
convert claim_amt=severity/ observed=average;
id failure_date;
run;
*used for interpolation.

convert claim_amt= lists the variables to be processed.
id option names a numeric variable that is used to identify observations in the input and output data sets.
from= option specifies the type of time interval between observations of the input data set. The interval specification can be year, semiyear, qtr, month, week, day, etc.
to= option specifies the type of time interval between observations of the output data set.
observed= option indicates the attributes of the input time series and the output series. The attribute to the following: beginning(specifies beginning of period values), middle(specifies midpoint values), end(specifies end of period values), total(specifies period totals), average(specifies period averages).
method= option specifies the method used to convert the data series. The methods supported as spline, join, step, and aggregate. 

Proc ARIMA to Fit AR(p) model

Autoregressive models are a class of models that estiamte the stochastic process underlying a time series, wher the time series values exhibit nonzero auto-correlation.


The Box-Jenkins approach to analyzing time series data is an analytical method involving three phases: identification of the process, estimation of the model parameters, and diagnostic checking.

*Identification Phase
The first phase is the identification phase. Use Proc arima by issuing the IDENTIFY statement.

proc arima data=predicted_series ;
identity var=severity scan;
run;

ACF lists the estimated auto-correlation coefficients at each lag, cov(Y_t, Y_t+i), i=1,2,.... A plot of the ACF shows the pattern of auto-correlations as the number of lags increases.

IACF is the auto-correlation function of an inverted model. the IACF of an AR is equivalent to the ACF for the same process modeled with an MA model.

PACF is the auto-correlation function at lag p accounting for the effects of all intervening observations. Thus, the PACF at lag 1 is identical to the auto-correlation at lag 1, but the PACF and ACF are different at all higher lags. Plots of the PACF reveal patterns that are useful in identifying the order of auto-regressive processes.

For an AR(p) process, the ACF plots should display the following patterns:
1. ACF tails off exponentially.
2. The PACF and IACF drop to 0 after lag p, where p is the order of the AR process. In PACF plot, if the tail-off behavior is combined with positive auto-correlations alternating with negative auto-correlation on successive lags, it is said to be contained within an exponential envelop.
3. If the ACF drops off very slowly, the time series is said to be non-stationary and will required further processing.

*Estimation Phrase
The second phrase of the ARIMA procedure is the estimation phrase.

proc arima data=predicted_series ;
identity var=severity scan;
estimate p=6;
run;

Three estimation techniques are available in Proc ARIMA:
Conditional least squares (CLS) is conditional upon the assumption that all values of the series prior to the first observation the analysis data set are equal to the mean of the series.
Unconditional least squares (ULS) is exact least squares which makes no assumption.
Maximum likelihood (ML) attempts to maximize the log of the likelihood function.

The ouput from the estimation phase lists model parameter estimates and associated significant tests, goodness of fit statistics, and other estimation diagnostics.
Use the ESTIMATE statement to invoke the estimation phrase of PROC ARIMA. An IDENTIFY statement must precede the ESTIMATE statement.

*Diagnostic Checking Phrase
The third phrase is to diagnostic checking the model.

General guidelines for checking diagnostics:
1. Check auto-correlation check of residuals. If the Q statistics are significant, you can conclude that the estimated model provides a poor fit. Q statistics is for the white noise hypothesis for the residuals of the fitted model. Each Q statistic is a chi-square produced using Ljung-Box calculation. Ljung-Box is based on the null hypothesis that the series is white noise.
2. Use the goodness-of-fit criteria(variance estimate, AIC, BIC) to compare the fit of different models estimated with same estimation method (conditional least squares, unconditional least squares, maximum likelihood).
3. Check the parameter estimates. These may indicate that model has too many parameters.
4. If the obsolete value of the last (highest-order) parameter estimates is close to 1, then the series may be non-stationary.
5. Perform a spectral analysis on the residuals to test for white noise hypothesis.

Proc arima is also used to forecast autoregressive models. ARIMA models are best for short-term forecasting because long-term forecasting is no better than using the mean of the time series.

proc arima data=predicted_series ;
identify var=severity scan;
estimate p=6;
forecast lead=60 interval=month id=failure_date out=outlead;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
symbol3 v=dot ci=blue height=1 cells interpol=join l=2 w=1;
symbol4 v=dot ci=blue height=1 cells interpol=join l=2 w=1;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=outlead;
format failure_date monyy. severity dollar.;
plot (severity forecast L95 U95)*failure_date/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;

Proc ARIMA to Fit MA(q) model

Moving average models are a class of models that estimate the process underlying a time series, where the current time series value is related to the random errors from previous time periods. In contrast, auto-regressive models estimate a process where the current time series value is related to the actual time series values from previous time periods. Like auto-regressive processes, time series with moving average processes also exhibit nonzero auto-correlation.

The Box-Jenkins approach provides a framework for identifying the moving average process underlying a time series and for estimating model parameters. For an MA(q) process, the auto-correlation plots should display the following patterns:
1. The ACF drops to 0 after lag q.
2. The IACF and PACF tail off exponentially.

A plot of the data shows a upward trend. With this type of trend, the time series is non-stationary. A time series is said to be stationary if the mean, variance, and lag covariance are constant over time. One common way of achieving this is to difference the series. Differencing a time series involves making calculations of the type Y_t-Y_t-1 for all observations.
*take first differences to remove non-stationary trend;
data predicted_series;
set predicted_series;
severity_diff=dif(severity);
run;

The first phase is the identification phase, which you perform by issuing the IDENTIFY statement.
proc arima data=predicted_series ;
identity var=severity_diff nlag=12 scan;
run;

The second phrase of the ARIMA procedure is the estimation phrase.
proc arima data=predicted_series ;
identity var=severity_diff scan;
estimate q=2 method=ml;
run;

The ACF drops to 0 after the sixth lag, indicating MA(6).
The IACF drops to 0 after the forth lag, indicating AR(4).
The PACF has a spike at the sixth lag, indicating AR(6).

The third phrase is to forecast the time series.
proc arima data=predicted_series ;
identify var=severity_diff;
estimate q=4 method=ml;
forecast lead=60 interval=month id=failure_date out=outlead;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=outlead;
format failure_date monyy. severity dollar.;
plot (severity_diff forecast)*failure_date/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit; 

Proc ARIMA to test Stationary.

Stationarity is a mathematical and statistical property of time series data. Much of the probability theory of time series analysis is based on the assumption that time series are stationary. If either the mean or the variance is not constant, then the time series is not stationary.

If the characteristic equation has a unit root, e.g., 1-pi1*B-pi2*B^2=1 for AR(2), it means that B=1 is a solution so then the mean drops out and the series is non-stationary. For ARIMA models, non-stationary is equivalent to one or more unit roots in the characteristic polynomial.

First, a systematic change in mean. If a series does not seem to have a constant mean (part of the definition of stationary) when graphed, that is a visible symptom of non-stationarity.

Second, a systematic change in variance. Log transformation could be able to make the variance constant. And also, log transformations are for removing exponential trends and other types of nonlinear behavior in time series data.

Third, a seasonal or periodic variation. The symptom had to do with the plot of the lag j correlation against the lag number j. Symbolizing the series value at time t by Yt, the lag j correlation is the correlation between Yt-j and Yt. If the plot of the estimated autocorrelations dies down very slowly with increasing j, this provides a second visual symptom of nonstationarity. Meanwhile, if your original time series has a sine-wave pattern, you may find it useful to use sinusoidal components as predictor variables. You can use spectural analysis to detect the presence of sinusoidal components.

There are three methods of dealing with non-stationary time series listed above: differencing, using log transformations, using sinusoidal components. Differencing is the most common method of handing non-stationary means, however, other methods are preferable for some types of series.

Differencing includes first difference, second difference, span-12 difference(for monthly data), span-4 difference(for quarterly data), etc. In many of these cases, it appeared that the series of first differences Yt - Yt-1 did seem to be stationary. Sometimes you may mistakenly take a difference when the data so not require it, which is over-differencing. If the inverse auto-correlation function (IACF) plot from the identification phase output of PROC ARIMA tails off very slowly, you may have over-differencing.

Note that if is a pure AR series, then the IACF of Yt is the same as the ACF of a pure MA process and will cut off after a few lags, i.e. the IACF behaves like the PACF of Yt .On the other hand, if is a pure MA series, then the IACF of Yt is the same as the ACF of a pure AR process and IACF will decline exponentially, i.e. the IACF again behaves like the PACF of Yt.

The Dickey-Fuller Unit Root test involves a regression of delta_Y on Y_{t-1}-Y^bar, delta_{Y-1}, ..., delta_{Y-p}, where delta indicates a first difference, i.e., Yt-Yt-1 on Y{t-1}-Y^bar, Yt-1-Yt-2, Yt-2-Yt-3,..., Yt-p-Yt-p-1. The t statistic for the parameter estimate of Y_{t-1}-Y^bar provides the statistical test of the stationary hypothesis and the distribution is call Tau. If the parameter estimate is negative and significant, the series is stationary. Otherwise, the series is not stationary, that is, it has a unit root and need to take difference.

proc arima data=predicted_series ;
i var=severity nlag=12 stationarity=(ADF=(0,1));
run;

/*
Augmented Dickey-Fuller Unit Root Tests

Type Lags Rho Pr < Rho Tau Pr < Tau F Pr > F

Zero Mean 0 0.1562 0.7175 0.11 0.7167
1 0.7651 0.8663 0.82 0.8868
Single Mean 0 -7.2238 0.2519 -2.10 0.2460 2.72 0.3774
1 -3.7934 0.5563 -1.56 0.4996 2.42 0.4546
Trend 0 -66.2436 0.0004 -6.92 <.0001 24.02 0.0010
1 -54.6745 0.0003 -5.24 0.0002 13.79 0.0010

*/

Zero Mean indicates Yt-Yt-1 on t, Yt-1, ..., with noint of Proc reg. It is assumed that the series mean is zero whether or not there are unit roots. The null hypothesis is that the time series is non-stationary, while the alternative hypothesis is that the series is stationary. Non-significant value indicates non-stationary.
Single Mean indicates Yt-Yt-1 on t, Yt-1, ..., of Proc reg with intercept. Because the model, under the alternative, has the single mean. If you simply subtracts the estimated mean for all observations, then check zero mean results.
Trend indicates Yt-Yt-1 on t, Yt-1, ..., of Proc reg with intercept.

In each case it is the coefficient of Yt-1 that indicates whether differencing needs to be done with a 0 coefficient indicating differencing. In each case, the usual t formula applies but the distribution is nonstandard. Differencing would not be appropriate as it would induce a unit root of moving average term. On the other hand, if the error is a unit root process then differencing is appropriate.

The remaining question is whether the nonstandard distribution is the same in each case. Unfortunately the answer is no, but fortunately the distributional results needed are available. For the SAS user, it is enough to know that the test computations in PROC ARIMA use the proper distributions so their p-values can be trusted. 

Proc LOGISTIC to fit Multinomial Logistic Models

Multinomial logit model or generalized logit model makes the parameters specific to the nominal. With subject-specific covariate only, the probability of response j=0..J-1 is pi_ij=exp(xi.beta)/(sum_0^(J-1))exp(xi.beta) with beta_0=0 for identification. An intercept is included in each betaj. The multinomial logit model has the property of independence from irrelevant alternatives because pi_jk=exp(xi.(beta_j-beta_k). Having too many parameters is a serious drawback of the MLM.

The multinomial logit model can be estimated in proc logistic and proc glimmix using the link=glogit.

data multinomial_response;
beta1_0 = .2;
beta1_1 = .5;
beta2_0 = -.2;
beta2_1 = .7;
seed = 1234579;
do id = 1 to 500;
x = rannor(seed);
eta1 = beta1_0 + beta1_1*x;
eta2 = beta2_0 + beta2_1*x;
prob1 = exp(eta1) / (1 + exp(eta1) + exp(eta2));
prob2 = exp(eta2) / (1 + exp(eta1) + exp(eta2));
prob3 = 1 / (1 + exp(eta1) + exp(eta2));
call rantbl(seed, prob1, prob2, prob3, y);
output;
end;
keep id x y;
run;

proc logistic data = multinomial_response;
class;
model y = x / link = glogit;
run;

Proc ARIMA to Fit ARIMA(p,d,q) model

Characteristics of Time Series Data:

1. Trend.
Trend is a long-term, consistent change in the time series values from beginning to end. It often represents a steady increase or a steady decrease in the values of time series, although other trend patterns are possible, e.g., quadratic or cubic trends in the data.

2. Seasonal Components.
Seasonal components involve a regular change in the data values that occurs at the same time in a given time period, e.g., Turkey sales are highest in November of every year.

3. Cyclical Components.
Cyclical components are long-term patterns of rising and falling data values in the time series. Many time series follow the business cycle of growth and recession in the economy.

4. Random Fluctuations.
Random fluctuations are what is 'left over' in a time series after the other three components have been accounted fro. They can represent measurement error or other unpredictable facors that affect the behaviour of a time series.

Check ACF, IACF, and PACF plots to identify p,q.
1. All three plots tail off exponentially for ARIMA(p,d,q) process.
2. For the ACF, r=max(p-1,q) beginning values followed by the behavior characteristic of an AR(p), that is, tailing off.
3. For the IACF and PACF, there are r=max(p-1,q) beginning values followed by the behavior characteristic of an MA(q), that is, tailing off.

Also, check seasonal pattern of time series as follow:
1. For seasonal AR process, the seasonal lags, s, 2s, ..., Ns, die off exponentially in ACF plot. In the IACF and PACF plots, there are spikes at each seasonal lag up to lag Ps, where there are P seasonal auto-regressive parameters necessary to model the data.
2. For seasonal MA process, the seasonal lags, s, 2s, ..., Ns, die off exponentially in IACF and PACF plot. In the ACF, there are spikes at each seasonal lag up to lag Qs, where Q is the number of seasonal moving average parameters necessary to model the data.
3. For seasonal ARMA process, the seasonal lags in the ACF tail off exponentially after lag Qs, where Q is the number of seasonal MA parameters necessary to model the data. The seasonal lags in the PACF and IACF tail of exponentially after lag Ps, where P is the number of seasonal AR parameters necessary to model the data.

proc arima data=predicted_series ;
identify var=severity(1) nlag=24 stationarity=(ADF=(0,1));
estimate p=(1,2) (12) q=(1,2) (12) method=ml;
forecast lead=60 interval=month id=failure_date out=outlead;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=outlead;
format failure_date monyy. severity dollar.;
plot (severity forecast)*failure_date/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit; 

Proc Forecast to Automatic Forecasting.

Proc Forecast procedure is based on extrapolation of data and the forecast for a series is solely a function of the time and the past values of the series. It is a one-step method to automatically generate forecasts for hurndres of series at a time.

A time series plot is able to visually identify the presence of three possible components: level, trend, and seasonality.
First, the plot may be flat, showing only random fluctuation about some constant level.
Second, the plot may show an upward (or downward) trend over time.
Third, the plot may show some obvious seasonality.

trend= option specifies the effects of different values.:
trend=1 specifies single exponential smoothing (a constant model).
trend=2 specifies double exponential smoothing (a linear trend model).
trend=3 specifies triple exponential smoothing (a quadratic trend model).

There are three main exponential smoothing methods:
First, single exponential for level component.
Second, double exponential for trend component.
Third, stepar/winters/addwinters for seasonal component.

method=expo forecasts based on an exponential smoothing model.In fitting the trend, the parameters are allowed to change gradually over time, and earlier observations are given exponentially declining weights. It works for long-term, deterministic change.
method=stepar forecasts based on a stepwise autoregressive model. It combines time trend regression with an autoregressive model and used a stepwise method to select the lags to use for the autoregressive process. It works for short-term fluctuation.
method=winters forecasts based on a multiplicative seasonal exponential smoothing model.It combines a time trend with multiplicative seasonal factors to account for regular seasonal fluctuations in a series. It works for regular seasonal fluctuations.
method=addwinters forecasts based on an additive seasonal exponential smoothing model. It combines a time trend with additive seasonal factors to account for regular seasonal fluctuations in a series. It works for regular seasonal fluctuations.

If unspecified, the weight defaults to 1-.8^(1/trend), where trend is the value of the Trend=option.
weight=.2 for trend=1.
weight=.10557 for trend=2.
weight=.07168 for trend=3.

Single exponential smoothing and arima(0,1,1) are parallel when the moving average parameter set to smoothing constant. Because the smoothed values of St and St-1 are forecasts of Yt+1 and Yt respectively.
Double exponential smoothing and arima(0,2,2) are parallel. Proc arima constructs forecasts with the estimation suppressed and used the MA=option equals to 1-weight, 0.89442719099991 for default, in proc forecast if trend=2.
Triple exponential smoothing and arima(0,3,3) are also parallel.

proc forecast data=predicted_series interval=month method=expo trend=2 lead=70 out=predict_severity1 outfull outest=est;
id failure_date;
var severity;
run;

data severity1;
set predict_severity1;
if _type_='FORECAST';
pred1= severity;
run;

proc sort data=severity1;
by failure_date;
run;

*replicate the first observation;
data predicted_series2;
set predicted_series ;
if _n_=1 then do;
failure_date=intnx('month', failure_date,-1);
output;
failure_date=intnx('month', failure_date,1);
end;
output;
run;

*1-weight for MA;
proc arima data=predicted_series2 ;
identify var=severity(1,1) noprint;
estimate q=(1) (1) noconstant noest
ma=0.89442719099991 0.89442719099991;
forecast lead=60 interval=month id=failure_date out=predict_severity2 noprint;
run;

data severity2 ;
set predict_severity2 ;
pred2=forecast;
obs=severity;
keep failure_date pred2 obs;
run;

proc sort data=predict_severity2;
by failure_date;
run;

data mergedata;
merge severity1 severity2;
by failure_date;
run;

proc compare data=mergedata MAXPRINT=200;
var pred1;
with pred2;
id failure_date;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=dot ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=2 w=2;
symbol3 v=dot ci=blue height=1 cells interpol=join l=2 w=2;
legend1 label=('') across=3 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=mergedata;
format failure_date monyy. ;
plot (pred1 pred2 obs)*failure_date/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit;

Proc Autoreg to Model Time Series Errors

Proc Autoreg calculates OLS estimates for the linear regression component of your model and choose the following method to estimate the auto-regressive component of the model: Yule-Walker, iterated Yule-Walker, unconditional least squares (ULS), maximum likelihood (ML).

The advantage of AUTOREG is that the procedure automatically fits an auto-regressive error structure to the residuals. The disadvantage is that it requires all future values of the independent variables to the data. Forecasts beyond the end of the history data for variable severity cannot be performed. And also, no overall measure of the adequacy of the model.

Proc Autoreg results contain the Durbin-Watson Statistics. It test the assumption of no first-order auto-correlation. Auto-correlation is either positive, where errors of one sign tend to be followed by errors of the same sign, or negative, where errors of one sign tend to be followed by errors of the opposite sign. The Durbin-Watson statistic is a ratio with a range of 0 to 4. If no auto-correlation is present, the expected value of the statistic is 2. Values significantly less than 2 are an indicate positive auto-correlation, while values significantly larger than 2 indicate negative auto-correlation.

proc sql;
create table failure_60 as
select
year(failure_date) as failure_year,
month(failure_date) as failure_month,
mdy(month(failure_date),1,year(failure_date)) as failure,
count(claim_id) as total_claim_cnt,
sum(claim_amt) as total_claim_amt,
sum(claim_amt)/count(claim_id) as severity
from claim_grp9
group by 1,2
order by 1,2;
quit;

data failure_60;
set failure_60;
lseverity=log(severity);
dlseverity=dif(lseverity);
ldlseverity1=lag(dlseverity);
ldlseverity2=lag2(dlseverity);
ldlseverity3=lag3(dlseverity);

lcount=log(total_claim_cnt);
dlcount=dif(lcount);
ldlcount1=lag(dlcount);
ldlcount2=lag2(dlcount);
ldlcount3=lag3(dlcount);
run;

proc autoreg data=failure_grp4;
model dlseverity=dlcount/nlags=12 backstep method=ml;
run;
Option backstep in the model statement is to perform backwards elimination of non-signification auto-regressive parameter estimates. It is useful when you are unsure of the order of the model.

*alternative model;
proc autoreg data=failure_grp4;
model dlseverity=dlcount ldlcount1 ldlcount2 ldlcount3/nlags=12 backstep method=ml;
run;
*Both AIC and BIC are smailler for the alternative model;

*Fitting an equivalent Model using Proc ARIMA;
When using Proc Autoreg, you implicitly assume that the error process for the dependent time series variable is auto-regressive or that it can be approximated by an auto-regressive process. Proc Arima enables you to identify the error processes for your time series data. And also, use the ARIMA procedure to calculate the correlation between two time series by specify the CROSSCOR= option in the IDENTITY statement.
proc arima data=failure_60;
identify var=severity(1) crosscor=(total_claim_cnt(1)) noprint;
estimate input=(1$(1)total_claim_cnt) p=1 q=1 method=ml plot;
forecast lead=60 interval=month id=failure out=predict_severity ;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=predict_severity;
format failure monyy. severity dollar.;
plot (severity forecast)*failure/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit;

*Put training and test data sets together;

data failure;
set train test;
run;

*Use msrp and failure_time information to fit the data;
proc autoreg data=failure;
model avg_claim=msrp failure_year failure_month/nlags=2 method=ml noprint;
output out=forecast p=forecast lcl=lower95 ucl=upper95;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity') order=(-200 to 1000 by 500);
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=dot l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=dot l=1 w=2;
symbol2 v=dot ci=blue height=1 cells interpol=dot l=1 w=2;
symbol2 v=dot ci=blue height=1 cells interpol=dot l=1 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=forecast;
format failure monyy. avg_claim dollar.;
plot (avg_claim forecast lower95 upper95)*failure/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit;

Proc UCM

Proc UCM used structure models to analyzes and forecasts the response series by decomposing into components such as trend, seasonal, cycles, and the regression effects due to predictor series.

ods graphics on;
proc ucm data=predict_series;
id failure interval=month;
model severity;
irregular;
level variance=0 noest;
slope variance=0 noest;
season length=12 type=trig;
cycle plot=smooth;
estimate back=1 plot=(residual normal acf);
forecast back=1 lead=72 outfor=predict_severity;
run;
ods graphics off;

ID option is used to specify a date variable.
Model option specify the response series.
Irregular option is specified using error term.
Trend is specified by the level and slope statements. If the variance are set to zero to create a model with a fixed linear trend. A noest option is included to fix the values of the model parameters.
Season statement is used to specify the seasonality, e.g., 12 for monthly seasonality and is of the tri-genometric type.
In cycle statement, plot= option used to plot the smoothed estimate of the cycle component.
The estimate statement can be used to specify the span of data used in parameter estimation. back= option is to specify a hold out sample, which are omitted from the estimation.
The forecast statement requests the printing of the smoothed trend plus seasonal component. back=12 option is to begin the multi-step forecast 12 observations back form the end of the historical data.

*Plot the forecast series with confidence interval;
axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=2 w=2;
symbol3 v=dot ci=blue height=1 cells interpol=join l=2 w=2;
symbol4 v=dot ci=blue height=1 cells interpol=join l=2 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=predict_severity;
format failure monyy. severity dollar.;
plot severity*failure forecast*failure LCL*failure UCL*failure/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit

Proc STATESPACE

Proc Statespace is useful for automatic modeling and forecasting of several interrelated multiple time series with or without a feedback relationship, also called Kalman filtering.

Proc statespace performs the following steps in its analysis of multivariate time series data:
1 It fits a sequence of multivariate auto-regressive models for lags 0 to 10. For each model, AIC is calculated. The minimum AIC value is used to determine the order of auto-regressive model used in further analysis.
2 It tries to improve the fit of the selected AR model by adding moving averages(MA) terms and removing some of the AR terms. The best-fitting revised model is assessed by means of another of AIC.
3 It estimates the parameters of the final model, using the estimated co-variance matrix and a maximum likelihood estimation method..

Note that Proc staespace provides valid analysis only for stationary series. If the data are not stationary, need to make the data stationary before use proc statespace.

proc sql;
create table failure_grp4 as
select
year(repair_visit_dt) as failure_year,
month(repair_visit_dt)as failure_month,
mdy(month(min(repair_visit_dt)),1,year(min(repair_visit_dt))) as failure,
count(*) as total_claim_cnt,
sum(repair_gross_amt)/count(*) as severity
from claim_grp4
group by 1,2
order by 1,2;
quit;

proc statespace data=failure_grp4 cancorr out=predict_severity lead=74 interval=month;
id failure;
var severity total_claim_cnt;
run;

Option cancorr prints the sequence of canonical correlations during model selection.
Output from proc statespace includes AIC for the sequence of auto-regressive models, a schematic representation of the auto-correlation matrices, an schematic representation of t the partial auto-regression matrices, Yule-Walker estimates for the minimum AIC, and a canonical correlations analysis for each state vector (obtained with the cancorr option).

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Failure Date') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
legend1 label=('') value=('Actual Claim Severity' 'Predicted Claim Severity') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=predict_severity;
format failure monyy. severity dollar. FOR1 dollar.;
plot severity*failure FOR1*failure/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit; 

Proc Spectra

Spectral analysis is a statistical approach to detecting regular cyclical patterns, or periodicities, in time series data. It can be used to compute periodogram ordinates, plot periodogram ordinates and test the white noise hypothesis.

It can perform the following tasks:
1 Produce periodogram ordinates, calculate spectral density estimates, and plot periokogrames and spectral density estimates.
Spectral analysis is a statistical approach to detecting regular cyclical patterns, or periodicities, in transformed time series data.
1.1 transform the data with a finite Fourier transform, which decomposes the data into a sum of sine and cosine waves. These sine and cosine components can be of varying wavelengths and amplitudes.
1.2 regress time series on the sine and cosine components for varying frequencies.
1.3 calculate the sums of squares for each regression model.
1.4 use the significant sinusoidal components detected by this technique to model the original series.

The periodogram plots the sums of squares from each regression model against the frequencies. You can interpret the ordinates of the periodogram as the amount of variation in Y at each frequency. The regression sums of squares for each model are proportional to the sum of the square sine and cosine coefficients for each model.

The periodogram is an estimate of a theoretical quntity called a spectrum. Because periodograms are rough estimates of true spectra, it is often to use weights to smooth the periodogram. The weighted moving averages of periodogram ordinates are called spectral density estimates.

proc spectra data=failure_grp out=speclead p s adjmean;
var severity;
weights 1 2 3 4;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Spectral Density');
axis2 width=1 offset=(1 pct) label=('Period') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
symbol2 v=dot ci=green height=1 cells interpol=join l=1 w=2;
legend1 label=('') across=2 mode=protect position=(top center inside);
title 'Severity Prediction';
proc gplot data=speclead;
plot s_01*freq/
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit;

2 Test the white-noise hypothesis.
Proc Spectra is to test the null hypothesis that a time series is white noise. It provides two test for white noise.
2.1 Fisher-Kappa test is the ratio of the largest periodogram ordinate tot the average of all the ordinates. This test is designed to detect one sinusoidal component buried in white noise. The large first ordinate is excluded from this analysis.
2.2 Bartlett Kolmogorov Smirnov test devides the sum of periodogram ordinates, for each frequency, by the the sum of all periodogram ordinates. It is more power than Fisher-Kappa test to detect departures fromt he white noise hypothesis over the whole range of frequncies.

proc statespace data=failure_grp3 cancorr out=predict_severity lead=74 interval=month;
id failure;
var severity total_claim_cnt;
run;

proc spectra data=predict_severity adjmean whitetest out=resid ;
var res1;
run;

axis1 width=1 offset=(1 pct) label=(a=90 r=0 'Severity');
axis2 width=1 offset=(1 pct) label=('Resid') value=(h=1.25);
symbol1 v=star ci=red height=1 cells interpol=join l=1 w=2;
title 'Periodogram';
proc gplot data=resid;
where (int(period) not in (3,12) ) and period<=20;
plot p_01*period/ overlay
caxis = BLACK
ctext = BLACK
vaxis = axis1
haxis = axis2
legend= legend1
grid
hminor=0
;
run;
quit; 

PROC MODEL to estimate ARMA - GARCH

data garch;
lu = 0;
lh = 0;
ly = 0;
do i = 1 to 1000;
h = 0.3 + 0.4 * lu ** 2 + 0.5 * lh;
u = sqrt(h) * rannor(1);
y = 1 + 0.6 * (ly - 1) + u - 0.7 * lu;
lu = u;
lh = h;
ly = y;
output;
end;
run;

proc model data = garch;
parms ar1 ma1 mu;
y = mu + ar1 * zlag1(y - mu) + ma1 * zlag1(resid.y);
fit y / method = marquardt fiml white;
/* Nonlinear FIML Parameter Estimates
Approx Approx
Parameter Estimate Std Err t Value Pr > |t|
ar1 0.777183 0.0702 11.07 <.0001
ma1 0.841133 0.0639 13.16 <.0001
mu 1.020479 0.0392 26.01 <.0001
RESULT: ESTIMATION OF ARMA(1, 1) WITHOUT GARCH COMPONENT.

Heteroscedasticity Test
Equation Test Statistic DF Pr > ChiSq
y White's Test 89.85 9 <.0001
RESULT: WHITE'S TEST SUGGESTS HETEROSCEKASTICITY.
*/
test "AR1 = 0.6" ar1 = 0.6;
test "MA1 = 0.7" ma1 = 0.7;
/* Test Results
Test Type Statistic Pr > ChiSq
AR1 = 0.6 Wald 6.37 0.0116
MA1 = 0.7 Wald 4.88 0.0272
RESULT: ESTIMATES ARE DIFFERENT FROM SIMULATION PARAMETERS.
*/
run;
quit;

proc model data = garch;
parms ar1 ma1 mu arch0 arch1 garch1;
y = mu + ar1 * zlag1(y - mu) + ma1 * zlag1(resid.y);
h.y = arch0 + arch1 * xlag(resid.y ** 2, mse.y) +
garch1 * xlag(h.y, mse.y);
fit y / method = marquardt fiml out = forecast outall;
/* Nonlinear FIML Parameter Estimates
Approx Approx
Parameter Estimate Std Err t Value Pr > |t|
ar1 0.51709 0.1036 4.99 <.0001
ma1 0.658866 0.0866 7.61 <.0001
mu 0.989957 0.0257 38.54 <.0001
arch0 0.391227 0.0702 5.58 <.0001
arch1 0.398557 0.0539 7.39 <.0001
garch1 0.464724 0.0530 8.76 <.0001
RESULT: ESTIMATION OF ARMA(1, 1) WITH GARCH(1, 1).
*/
test "AR1 = 0.6" ar1 = 0.6;
test "MA1 = 0.7" ma1 = 0.7;
test "ARCH1 = 0.4" arch1 = 0.4;
test "GARCH1 = 0.5" garch1 = 0.5;
/* Test Results
Test Type Statistic Pr > ChiSq
AR1 = 0.6 Wald 0.64 0.4237
MA1 = 0.7 Wald 0.23 0.6347
ARCH1 = 0.4 Wald 0.00 0.9786
GARCH1 = 0.5 Wald 0.44 0.5059
RESULT: ESTIMATES ARE IN LINE WITH SIMULATION PARAMETERS.
*/
run;
quit;