Preview only show first 10 pages with watermark. For full document please download

Time Series Analysis In Python With Statsmodels

Time Series Analysis in Python with statsmodels

   EMBED


Share

Transcript

     D   R  A   F   T PROC. OF THE 10th PYTHON IN SCIENCE CONF. (SCIPY 2011) 97 Time Series Analysis in Python with statsmodels Wes McKinney, Josef Perktold, Skipper Seabold F Abstract  —We introduce the new time series analysis features of scik-its.statsmodels. This includes descriptive statistics, statistical tests and sev-eral linear model classes, autoregressive, AR, autoregressive moving-average,ARMA, and vector autoregressive models VAR. Index Terms  —time series analysis, statistics, econometrics, AR, ARMA, VAR,GLSAR, filtering, benchmarking Introduction Statsmodels is a Python package that provides a complementto SciPy for statistical computations including descriptivestatistics and estimation of statistical models. Beside the initialmodels, linear regression, robust linear models, generalizedlinear models and models for discrete data, the latest releaseof scikits.statsmodels includes some basic tools and modelsfor time series analysis. This includes descriptive statistics,statistical tests and several linear model classes: autoregres-sive, AR, autoregressive moving-average, ARMA, and vectorautoregressive models VAR. In this article we would like tointroduce and provide an overview of the new time seriesanalysis features of statsmodels. In the outlook at the endwe point to some extensions and new models that are underdevelopment. Time series data comprises observations that are orderedalong one dimension, that is time, which imposes specificstochastic structures on the data. Our current models assumethat observations are continuous, that time is discrete andequally spaced and that we do not have missing observations.This type of data is very common in many fields, in economicsand finance for example, national output, labor force, prices,stock market values, sales volumes, just to name a few.In the following we briefly discuss some statistical proper-ties of the estimation with time series data, and then illustrateand summarize what is currently available in statsmodels. Ordinary Least Squares (OLS) The simplest linear model assumes that we observe an en-dogenous variable y and a set of regressors or explanatoryvariables x , where y and x are linked through a simple linearrelationship plus a noise or error term  y t  = x t  b  + e  t   Josef Perktold is with the University of North Carolina, Chapel Hill. Wes McKinney is with Duke University. Skipper Seabold is with the AmericanUniversity. E-mail:[email protected]. c  2011 Wes McKinney et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in any medium, provided thesrcinal author and source are credited. In the simplest case, the errors are independently and iden-tically distributed. Unbiasedness of OLS requires that theregressors and errors be uncorrelated. If the errors are ad-ditionally normally distributed and the regressors are non-random, then the resulting OLS or maximum likelihood es-timator (MLE) of  b  is also normally distributed in smallsamples. We obtain the same result, if we consider considerthe distributions as conditional on x t  when they are exogenousrandom variables. So far this is independent whether t indexestime or any other index of observations.When we have time series, there are two possible extensionsthat come from the intertemporal linkage of observations. Inthe first case, past values of the endogenous variable influencethe expectation or distribution of the current endogenousvariable, in the second case the errors e  t  are correlated overtime. If we have either one case, we can still use OLS orgeneralized least squares GLS to get a consistent estimate of the parameters. If we have both cases at the same time, thenOLS is not consistent anymore, and we need to use a non-linear estimator. This case is essentially what ARMA does. Linear Model with autocorrelated error (GLSAR) This model assumes that the explanatory variables, regressors,are uncorrelated with the error term. But the error term is anautoregressive process, i.e.  E  (  x t  , e  t  ) = 0 e  t  = a 1 e  t   1 + a 2 e  t   1 + ... + a k  e  t   k  An example will be presented in the next section. Linear Model with lagged dependent variables (OLS, AR, VAR) This group of models assume that past dependent variables,  y t   i , are included among the regressors, but that the error termare not serially correlated  E  ( e  t  , e  s ) = 0 , for t  6 = s y t  = a 1  y t   1 + a 2  y t   1 + ... + a k   y t   k  +  x t  b  + e  t  Dynamic processes like autoregressive processes depend onobservations in the past. This means that we have to decidewhat to do with the initial observations in our sample wherewe do nnt observe any past values.The simplest way is to treat the first observation as fixed,and analyse our sample starting with the k  -th observation. Thisleads to conditional least squares or conditional maximumlikelihood estimation. For conditional least squares we can just     D   R  A   F   T 98 PROC. OF THE 10th PYTHON IN SCIENCE CONF. (SCIPY 2011) use OLS to estimate, adding past endog  to the exog  . The vectorautoregressive model (VAR) has the same basic statisticalstructure except that we consider now a vector of endogenousvariables at each point in time, and can also be estimated withOLS conditional on the initial information. (The stochasticstructure of VAR is richer, because we now also need to takeinto account that there can be contemporaneous correlation of the errors, i.e. correlation at the same time point but acrossequations, but still uncorrelated across time.) The secondestimation method that is currently available in statsmodels ismaximum likelihood estimation. Following the same approach,we can use the likelihood function that is conditional on thefirst observations. If the errors are normaly distributed, thenthis is essentially equivalent to least squares. However, wecan easily extend conditional maximum likelihood to othermodels, for example GARCH, linear models with generalizedautoregressive conditional heteroscedasticity, where the vari-ance depends on the past, or models where the errors followa non-normal distribution, for example Student-t distributedwhich has heavier tails and is sometimes more appropriate infinance.The second way to treat the problem of intial conditions is tomodel them together with other observations, usually under theassumption that the process has started far in the past and thatthe initial observations are distributed according to the longrun, i.e. stationary, distribution of the observations. This exactmaximum likelihood estimator is implemented in statsmodelsfor the autoregressive process in statsmodels.tsa.AR, and forthe ARMA process in statsmodels.tsa.ARMA. Autoregressive Moving average model (ARMA) ARMA combines an autoregressive process of the dependentvariable with a error term, moving-average or MA, thatincludes the present and a linear combination of past errorterms, an ARMA(p,q) is defined as  E  ( e  t  , e  s ) = 0 , for t  6 = s y t  = µ  + a 1  y t   1 + ... + a k   y t    p + e  t  + b 1 e  t   1 + ... + b q e  t   q As a simplified notation, this is often expressed in terms of lag-polynomials as f  (  L )  y t  = y  (  L ) e  t  where f  (  L ) = 1  a 1  L 1  a 2  L 2  ...  a k   L  p y  (  L ) = 1 + b 1  L 1 + b 2  L 2 + ... + b k   L q  L is the lag or shift operator, L i  x t  = x t   i ,  L 0 = 1. This is thesame process that scipy.lfilter uses. Forecasting with ARMAmodels has become popular since the 1970’s as Box-Jenkinsmethodology, since it often showed better forecast perfor-mance than more complex, structural models.Using OLS to estimate this process, i.e. regressing y t  on past  y t   i , does not provide a consistent estimator. The process canbe consistently estimated using either conditional least squares,which in this case is a non-linear estimator, or conditionalmaximum likelihood or with exact maximum likelihood. Thedifference between conditional methods and exact MLE is thesame as described before. statsmodels provides estimators forboth methods in tsa.ARMA which will be described in moredetail below.Time series analysis is a vast field in econometrics with alarge range of models that extend on the basic linear modelswith the assumption of normally distributed errors in manyways, and provides a range of statistical tests to identifyan appropriate model specification or test the underlyingassumptions.Besides estimation of the main linear time series models,statsmodels also provides a range of descriptive statistics fortime series data and associated statistical tests. We include anoverview in the next section before describing AR, ARMAand VAR in more details. Additional results that facilitate theusage and interpretation of the estimated models, for exampleimpulse response functions, are also available. OLS, GLSAR and serial correlation Suppose we want to model a simple linear model that linksthe stock of money in the economy to real GDP and consumerprice index CPI, example in Greene (2003, ch. 12). We importnumpy and statsmodels, load the variables from the exampledataset included in statsmodels, transform the data and fit themodel with OLS: importnumpyasnp importscikits.statsmodels.apiassm  tsa=sm.tsa # as shorthand  mdata=sm.datasets.macrodata.load().data endog=np.log(mdata[’m1’]) exog=np.column_stack([np.log(mdata[’realgdp’]), np.log(mdata[’cpi’])]) exog=sm.add_constant(exog, prepend=True) res1=sm.OLS(endog, exog).fit() print res1.summary() provides the basic overview of the regression results. We skip it here to safe on space. TheDurbin-Watson statistic that is included in the summary isvery low indicating that there is a strong autocorrelation inthe residuals. Plotting the residuals shows a similar strongautocorrelation.As a more formal test we can calculate the autocorrelation,the Ljung-Box Q-statistic for the test of zero autocorrelationand the associated p-values: acf, ci, Q, pvalue=tsa.acf(res1.resid, nlags=4, confint=95, qstat=True, unbiased=True) acf #array([1., 0.982, 0.948, 0.904, 0.85]) pvalue #array([3.811e-045, 2.892e-084, 6.949e-120,2.192e-151]) To see how many autoregressive coefficients might be relevant,we can also look at the partial autocorrelation coefficients tsa.pacf(res1.resid, nlags=4) #array([1., 0.982, -0.497, -0.062, -0.227]) Similar regression diagnostics, for example for heteroscedas-ticity, are available in statsmodels.stats.diagnostic  . Details onthese functions and their options can be found in the docu-mentation and docstrings.The strong autocorrelation indicates that either our modelis misspecified or there is strong autocorrelation in the errors.     D   R  A   F   T TIME SERIES ANALYSIS IN PYTHON WITH STATSMODELS 99 If we assume that the second is correct, then we can estimatethe model with GLSAR. As an example, let us assume weconsider four lags in the autoregressive error. mod2=sm.GLSAR(endog, exog, rho=4) res2=mod2.iterative_fit() iterative_fit  alternates between estimating the autoregressiveprocess of the error term using tsa.yule_walker, and feasiblesm.GLS. Looking at the estimation results shows two things,the parameter estimates are very different between OLS andGLS, and the autocorrelation in the residual is close to arandom walk: res1.params #array([-1.502, 0.43 , 0.886]) res2.params #array([-0.015, 0.01 , 0.034]) mod2.rho #array([ 1.009, -0.003, 0.015, -0.028]) This indicates that the short run and long run dynamics mightbe very different and that we should consider a richer dynamicmodel, and that the variables might not be stationary and thatthere might be unit roots. Stationarity, Unit Roots and Cointegration Loosely speaking, stationarity means here that the mean,variance and intertemporal correlation structure remains con-stant over time. Non-stationarities can either come fromdeterministic changes like trend or seasonal fluctuations, orthe stochastic properties of the process, if for example theautoregressive process has a unit root, that is one of the rootsof the lag polynomial is on the unit circle. In the first case,we can remove the deterministic component by detrendingor deseasonalization. In the second case we can take firstdifferences of the process,Differencing is a common approach in the Box-Jenkinsmethodology and gives rise to ARIMA, where the I stands forintegrated processes, which are made stationary by differenc-ing. This lead to a large literature in econometrics on unit-roottesting that tries to distinguish deterministic trends from unitroots or stochastic trends. statsmodels provides the augmented  Dickey-Fuller test  . Monte Carlo studies have shown that it isoften the most powerful of all unit roots test.To illustrate the results, we just show two results. Testingthe log of the stock of money with a null hypothesis of unitroots against an alternative of stationarity around a linear trend,shows an adf-statistic of -1.5 and a p-value of 0.8, so we arefar away from rejecting the unit root hypothesis: tsa.adfuller(endog, regression="ct")[:2] (-1.561,0.807) If we test the differenced series, that is the growth rate of moneystock, with a Null hypothesis of Random Walk withdrift, then we can strongly reject the hypothesis that the growthrate has a unit root (p-value 0.0002) tsa.adfuller(np.diff(endog), regression="c")[:2] (-4.451,0.00024) ARMA processes and data The identification for ARIMA(p,d,q) processes, especiallychoosing the number of lagged terms, p and q, to include, re-mains partially an art. One recommendation in the Box-Jenkinsmethodology is to look at the pattern in the autocorrelation(acf) and partial autocorrelation (pacf) functions scikits.statsmodels.tsa.arima_process contains a class thatprovides several properties of ARMA processes and arandom process generator. As an example, statsmod-els/examples/tsa/arma_plots.py  can be used to plot autocor-relation and partial autocorrelation functions for differentARMA models.  Figure 1: ACF and PACF for ARMA(p,q) This illustrated that the pacf is zero after p terms for AR(p) processes and the acf is zeroafter q terms for MA(q) processes. This allows easy comparison of the theoretical propertiesof an ARMA process with their empirical counterparts. Forexample, define the lag coefficients for an ARMA(2,2) pro-cess, generate a random process and compare observed andtheoretical pacf: importscikits.statsmodels.tsa.arima_processastsp ar=np.r_[1.,-0.5,-0.2]; ma=np.r_[1.,0.2,-0.2] np.random.seed(123) x=tsp.arma_generate_sample(ar, ma,20000, burnin=1000) sm.tsa.pacf(x,5) array([1.,0.675,-0.053,0.138,-0.018,0.038]) ap=tsp.ArmaProcess(ar, ma) ap.pacf(5) array([1.,0.666,-0.035,0.137,-0.034,0.034]) We can see that they are very close in a large generatedsample like this. ArmaProcess defines several additional meth-ods that calculate properties of ARMA processes and towork with lag-polynomials: acf  , acovf  , ar  , ar_roots , arcoefs , arma2ar  , arma2ma  , arpoly  , from_coeffs , from_estimation , generate_sample  , impulse_response  , invertroots , isinvertible  , isstationary  , ma  , ma_roots , macoefs , mapoly  , nobs , pacf  , pe-riodogram  . The sandbox has a FFT version of some of this tolook at the frequency domain properties.     D   R  A   F   T 100 PROC. OF THE 10th PYTHON IN SCIENCE CONF. (SCIPY 2011) ARMA Modeling Statsmodels provides several helpful routines and modelsfor working Autoregressive Moving Average (ARMA) time-series models, including simulation and estimation code. Forexample, after importing arima_process as ap  from scik-its.statsmodels.tsa  we can simulate a series 1 >>>ar_coef=[1,.75,-.25] >>>ma_coef=[1,-.5] >>>nobs=100 >>>y=ap.arma_generate_sample(ar_coef, ...ma_coef, nobs)>>>y+=4 # add in constant We can then estimate an ARMA model of the series >>>mod=tsa.ARMA(y) >>>res=arma_mod.fit(order=(2,1), trend=’c’, ...method=’css-mle’, disp=-1) >>>arma_res.params array([4.0092,-0.7747,0.2062,-0.5563]) The estimation method, ’css-mle’, indicates that the startingparameters from the optimization are to be obtained fromthe conditional sum of squares estimator and then the exactlikelihood is optimized. The exact likelihood is implementedusing the Kalman Filter. Filtering We have recently implemented several filters that are com-monly used in economics and finance applications. The threemost popular method are the Hodrick-Prescott, the Baxter-King filter, and the Christiano-Fitzgerald. These can all beviewed as approximations of the ideal band-pass filter; how-ever, discussion of the ideal band-pass filter is beyond thescope of this paper. We will [briefly review the implementationdetails of each] give an overview of each of the methods andthen present some usage examples.The Hodrick-Prescott filter was proposed by Hodrick andPrescott [HPres], though the method itself has been in useacross the sciences since at least 1876 [Stigler]. The idea is toseparate a time-series y t  into a trend t  t  and cyclical compenent z  t   y t  = t  t  + z  t  The components are determined by minimizing the followingquadratic loss functionmin { t  t  } T   t  z  2 t  + l  T   t  = 1 [( t  t   t  t   1 )  ( t  t   1  t  t   2 )] 2 where t  t  =  y t   z  t  and l  is the weight placed on the penalty forroughness. Hodrick and Prescott suggest using l  = 1600 forquarterly data. Ravn and Uhlig [RUhlig]suggest l  = 6 . 25 and l  = 129600 for annual and monthly data, respectively. Whilethere are numerous methods for solving the loss function,our implementation uses scipy.sparse.linalg.spsolve  to findthe solution to the generalized ridge-regression suggested inDanthine and Girardine[DGirard]. Baxter and King [BKing] propose an approximate band-passfilter that deals explicitly with the periodicity of the businesscycle. By applying their band-pass filter to a time-series y t  ,they produce a series y ⇤ t  that does not contain fluctuations atfrequencies higher or lower than those of the business cycle.Specifically, in the time domain the Baxter-King filter takesthe form of a symmetric moving average  y ⇤ t  = K   k  =  K  a k   y t   k  where a k  = a  k  for symmetry and  K k  =  K  a k  = 0 such thatthe filter has trend elimination properties. That is, series thatcontain quadratic deterministic trends or stochastic processesthat are integrated of order 1 or 2 are rendered stationary byapplication of the filter. The filter weights a k  are given asfollows a  j = B  j + q  for j = 0 , ± 1 , ± 2 , . . . , ± K  B 0 =( w  2  w  1 ) p   B  j = 1 p  j ( sin ( w  2 j )  sin ( w  1 j )) for j = 0 , ± 1 , ± 2 , . . . , ± K  where q  is a normalizing constant such that the weights sumto zero q  =    j =  K  K  b  j 2 K  + 1and w  1 = 2 p  P  H  , w  2 = 2 p  P  L with the periodicity of the low and high cut-off frequen-cies given by P  L and P  H  , respectively. Following Burns andMitchell’s [] pioneering work which suggests that US businesscycles last from 1.5 to 8 years, Baxter and King suggest using P  L = 6 and P  H  = 32 for quarterly data or 1.5 and 8 for annualdata. The authors suggest setting the lead-lag length of thefilter K  to 12 for quarterly data. The transformed series willbe truncated on either end by K  . Naturally the choice of theseparameters depends on the available sample and the frequencyband of interest.The last filter that we currently provide is that of Christianoand Fitzgerald [CFitz]. The Christiano-Fitzgerald filter is again a weighted moving average. However, their filter is asymmetricabout t  and operates under the (generally false) assumptionthat y t  follows a random walk. This assumption allows theirfilter to approximate the ideal filter even if the exact time-series model of  y t  is not known. The implementation of theirfilter involves the calculations of the weights in  y ⇤ t  = B 0  y t  +  B 1  y t  + 1 + ··· +  B T   1  t   y T   1 + ˜  B T   t   y T  +  B 1  y t   1 + ··· +  B t   2  y 2 + ˜  B t   1  y 1 for t  = 3 , 4 , ..., T   2, where  B  j = sin (  jb )  sin (  ja ) p  j , j  1  B 0 = b  a p  , a = 2 p  P u , b = 2 p  P  L ˜  B T   t  and˜  B t   1 are linear functions of the B  j ’s, and the valuesfor t  = 1 , 2 , T   1 , and T  are also calculated in much the sameway. See the authors’ paper or our code for the details. P U  and P  L are as described above with the same interpretation.     D   R  A   F   T TIME SERIES ANALYSIS IN PYTHON WITH STATSMODELS 101  Figure 2: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1  Figure 3: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1 Moving on to some examples, the below demonstrates theAPI and resultant filtered series for each method. We use seriesfor unemployment and inflation to demonstrate2.They are traditionally thought to have a negative relationship at businesscycle frequencies. >>> from scipy.signalimport lfilter>>>data=sm.datasets.macrodata.load() >>>infl=data.data.infl[1:] >>> # get 4 qtr moving average >>>infl=lfilter(np.ones(4)/4,1, infl)[4:] >>>unemp=data.data.unemp[1:] To apply the Hodrick-Prescott filter to the data3, we can do >>>infl_c, infl_t=tsa.filters.hpfilter(infl) >>>unemp_c, unemp_t=tsa.filters.hpfilter(unemp) The Baxter-King filter4is applied as >>>infl_c=tsa.filters.bkfilter(infl) >>>unemp_c=tsa.filters.bkfilter(unemp) The Christiano-Fitzgerald filter is similarly applied5 >>>infl_c, infl_t=tsa.filters.cfilter(infl) >>>unemp_c, unemp_t=tsa.filters.cfilter(unemp)  Figure 4: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1  Figure 5: Unfiltered Inflation and Unemployment Rates 1959Q4-2009Q1 Statistical Benchmarking We also provide for another frequent need of those who work with time-series data of varying observational frequency--thatof benchmarking. Benchmarking is a kind of interpolationthat involves creating a high-frequency dataset from a low-frequency one in a consistent way. The need for benchmarkingarises when one has a low-frequency series that is perhapsannual and is thought to be reliable, and the researcher also hasa higher frequency series that is perhaps quarterly or monthly.A benchmarked series is a high-frequency series consistentwith the benchmark of the low-frequency series.We have implemented Denton’s modified method. Origi-nally proposed by Denton[Denton]and improved by Cholette [Cholette]. To take the example of turning an annual series into a quarterly one, Denton’s method entails finding a bench-marked series X  t  that solvesmin {  X  t  } T   t  ✓  X  t   I  t   X  t   1  I  t   1 ◆ 2