Wednesday, November 9, 2011

Proc Timeseries and Proc Expand

Special Date and Time Interval Functions

Intck represents the number of time intervals in a given time span.
duration=intck('month',policy_sale_date,failure_date);

Intnx generates a date that is specified number of intervals from a starting value.
new_date=intnx('month',date,12);

Proc TIMESERIES
Proc timeseries analysis time-stamped transaction date with respect to trend and seasonal analysis. Time-stamped time series are often recorded at no fixed interval. Analysis often want to use time series analysis techniques that require fixed-time intervals. Therefore, the transaction data much be converted to a high frequency or accumulated to a lower frequency.

Further analysis can be performed by proc timeseries including: descriptive(global) statistics, seasonal decomposition/adjustment analysis, correlation analysis, and cross-correlation analysis.

proc timeseries data=failure_60 out=predict_series plot=(series corr decomp);
Id failure interval=month accumulate=median;
var severity;
run;

out=predict_series option specifies the resulting time series for each id.
interval=month option specifies the transaction to be accumulated on a monthly basis.
accumulate=median option specifies that the median of the transaction is to be calculated.
After the data is accumulated into a time series format, many of the procedures can be used to analyze the resulting time series data.

Proc EXPAND
proc expand is able to interpolate missing values in time series data. It provides three ways of interpolating:
1. cubic spline interpolation. It involves joining segments of third degree polynomial curves to the non-missing input values, so that the whole curve and its first and second derivatives are continuous. Method=spline in the convert statement to request this method.
2. linear spline interpolation. It fits a continuous curve by connecting successive straight line segments to the non-missing input values. Specify Method=join in the convert statement to request this method.
3. step function interpolation. It fits a discontinuous step function to the non-missing input values. Specify Method=step in the convert statement to use this method.

%let start_date=mdy(10,1,2002);
%let end_date=mdy(6,1,2011);
data structure;
length=intck('month',&start_date,&end_date);
do i=0 to length;
failure_date=intnx('month',&start_date,i);
format failure_date date.;
output;
end;
run;

proc sort data=series;
by failure_date;
run;

proc sort data=structure;
by failure_date;
run;

data series1;
merge structure(in=a) series;
by failure_date;
if a;
run;

proc expand data=series1 out=predicted_series from=month;
convert claim_amt=severity/ observed=average;
id failure_date;
run;
*used for interpolation.

convert claim_amt= lists the variables to be processed.
id option names a numeric variable that is used to identify observations in the input and output data sets.
from= option specifies the type of time interval between observations of the input data set. The interval specification can be year, semiyear, qtr, month, week, day, etc.
to= option specifies the type of time interval between observations of the output data set.
observed= option indicates the attributes of the input time series and the output series. The attribute to the following: beginning(specifies beginning of period values), middle(specifies midpoint values), end(specifies end of period values), total(specifies period totals), average(specifies period averages).
method= option specifies the method used to convert the data series. The methods supported as spline, join, step, and aggregate. 

No comments:

Post a Comment

Blog Archive