StataNow 18 help for arima

StataNow 18 help for arima

[TS] arima -- ARIMA, ARMAX, and other dynamic regression models

Syntax
Basic syntax for a regression model with ARMA disturbances
arima depvar [indepvars], ar(numlist) ma(numlist)

Basic syntax for an ARIMA(p,d,q) model
arima depvar, arima(#p,#d,#q)

Basic syntax for a multiplicative seasonal ARIMA(p,d,q)*(P,D,Q)s model
arima depvar, arima(#p,#d,#q) sarima(#P,#D,#Q,#s)

Full syntax
arima depvar [indepvars] [if] [in] [weight] [, options]
options Description ------------------------------------------------------------------------- Model noconstant suppress constant term arima(#p,#d,#q) specify ARIMA(p,d,q) model for dependent variable ar(numlist) autoregressive terms of the structural model disturbance ma(numlist) moving-average terms of the structural model disturbance constraints(constraints) apply specified linear constraints
Model 2 sarima(#P,#D,#Q,#s) specify period-#s multiplicative seasonal ARIMA term mar(numlist, #s) multiplicative seasonal autoregressive terms; may be repeated mma(numlist, #s) multiplicative seasonal moving-average terms; may be repeated
Model 3 condition use conditional MLE instead of full MLE savespace conserve memory during estimation diffuse use diffuse prior for starting Kalman filter recursions p0(#|matname) use alternate prior for starting Kalman recursions; seldom used state0(#|matname) use alternate state vector for starting Kalman filter recursions
SE/Robust vce(vcetype) vcetype may be opg, robust, or oim
Reporting level(#) set confidence level; default is level(95) detail report list of gaps in time series nocnsreport do not display constraints display_options control columns and column formats, row spacing, and line width
Maximization maximize_options control the maximization process; seldom used
collinear keep collinear variables coeflegend display legend instead of statistics ------------------------------------------------------------------------- You must tsset your data before using arima; see [TS] tsset. depvar and indepvars may contain time-series operators; see tsvarlist. by, collect, fp, rolling, statsby, and xi are allowed; see prefix. iweights are allowed; see weights. collinear and coeflegend do not appear in the dialog box. See [TS] arima postestimation for features available after estimation.

Menu
Statistics > Time series > ARIMA and ARMAX > ARIMA and ARMAX models

Description
arima fits univariate models for a time series, where the disturbances are allowed to follow a linear autoregressive moving-average (ARMA) specification. When independent variables are included in the specification, such models are often called ARMAX models; and when independent variables are not specified, they reduce to Box-Jenkins autoregressive integrated moving-average (ARIMA) models in the dependent variable.

Options
+-------+ ----+ Model +------------------------------------------------------------
noconstant; see [R] Estimation options.
arima(#p,#d,#q) is an alternative, shorthand notation for specifying models with ARMA disturbances. The dependent variable and any independent variables are differenced #d times, 1 through #p lags of autocorrelations and 1 through #q lags of moving averages are included in the model. For example, the specification
. arima D.y, ar(1/2) ma(1/3)
is equivalent to
. arima y, arima(2,1,3)
The latter is easier to write for simple ARMAX and ARIMA models, but if gaps in the AR or MA lags are to be modeled, or if different operators are to be applied to independent variables, the first syntax is required.
ar(numlist) specifies the autoregressive terms of the structural model disturbance to be included in the model. For example, ar(1/3) specifies that lags of 1, 2, and 3 of the structural disturbance be included in the model; ar(1 4) specifies that lags 1 and 4 be included, perhaps to account for additive quarterly effects.
If the model does not contain regressors, these terms can also be considered autoregressive terms for the dependent variable.
ma(numlist) specifies the moving-average terms to be included in the model. These are the terms for the lagged innovations (white-noise disturbances).
constraints(constraints); see [R] Estimation options.
If constraints are placed between structural model parameters and ARMA terms, the first few iterations may attempt steps into nonstationary areas. This process can be ignored if the final solution is well within the bounds of stationary solutions.
+---------+ ----+ Model 2 +----------------------------------------------------------
sarima(#P,#D,#Q,#s) is an alternative, shorthand notation for specifying the multiplicative seasonal components of models with ARMA disturbances. The dependent variable and any independent variables are lag-#s seasonally differenced #D times, and 1 through #P seasonal lags of autoregressive terms and 1 through #Q seasonal lags of moving-average terms are included in the model. For example, the specification
. arima DS12.y, ar(1/2) ma(1/3) mar(1/2,12) mma(1/2,12)
is equivalent to
. arima y, arima(2,1,3) sarima(2,1,2,12)
mar(numlist, #s) specifies the lag-#s multiplicative seasonal autoregressive terms. For example, mar(1/2,12) requests that the first two lag-12 multiplicative seasonal autoregressive terms be included in the model.
mma(numlist, #s) specifies the lag-#s multiplicative seasonal moving-average terms. For example, mma(1 3,12) requests that the first and third (but not the second) lag-12 multiplicative seasonal moving-average terms be included in the model.
+---------+ ----+ Model 3 +----------------------------------------------------------
condition specifies that conditional, rather than full, maximum likelihood estimates be produced. The presample values for epsilon_t and mu_t are taken to be their expected value of zero, and the estimate of the variance of epsilon_t is taken to be constant over the entire sample; see Hamilton (1994, 132). This estimation method is not appropriate for nonstationary series but may be preferable for long series or for models that have one or more long AR or MA lags. diffuse, p0(), and state0() have no meaning for models fit from the conditional likelihood and may not be specified with condition.
If the series is long and stationary and the underlying data-generating process does not have a long memory, estimates will be similar, whether estimated by unconditional maximum likelihood (the default), conditional maximum likelihood (condition), or maximum likelihood from a diffuse prior (diffuse).
In small samples, however, results of conditional and unconditional maximum likelihood may differ substantially; see Annsley and Newbold (1980). Whereas the default unconditional maximum likelihood estimates make the most use of sample information when all the assumptions of the model are met, Harvey (1989) and Ansley and Kohn (1985) argue for diffuse priors often, particularly in ARIMA models corresponding to an underlying structural model.
The condition or diffuse options may also be preferred when the model contains one or more long AR or MA lags; this avoids inverting potentially large matrices (see diffuse below).
When condition is specified, estimation is performed by the arch command (see [TS] arch), and more control of the estimation process can be obtained using arch directly.
condition cannot be specified if the model contains any multiplicative seasonal terms.
savespace specifies that memory use be conserved by retaining only those variables required for estimation. The original dataset is restored after estimation. This option is rarely used and should be used only if there is not enough space to fit a model without the option. However, arima requires considerably more temporary storage during estimation than most estimation commands in Stata.
diffuse specifies that a diffuse prior (see Harvey 1989 or 1993) be used as a starting point for the Kalman filter recursions. Using diffuse, nonstationary models may be fit with arima (see the p0() option below; diffuse is equivalent to specifying p0(1e9)). See [TS] arima for details.
p0(#|matname) is a rarely specified option that can be used for nonstationary series or when an alternate prior for starting the Kalman recursions is desired; see [TS] arima for details.
state0(#|matname) is a rarely used option that specifies an alternate initial state vector for starting the Kalman filter recursions. If # is specified, all elements of the vector are taken to be #. The default initial state vector is state0(0).
+-----------+ ----+ SE/Robust +--------------------------------------------------------
vce(vcetype) specifies the type of standard error reported, which includes types that are robust to some kinds of misspecification (robust) and that are derived from asymptotic theory (oim, opg); see [R] vce_option.
For state-space models in general and ARMAX and ARIMA models in particular, the robust or quasimaximum likelihood estimates (QMLEs) of variance are robust to symmetric nonnormality in the disturbances, including, as a special case, heteroskedasticity. The robust variance estimates are not generally robust to functional misspecification of the structural or ARMA components of the model; see Hamilton (1994, 389) for a brief discussion.
+-----------+ ----+ Reporting +--------------------------------------------------------
level(#); see [R] Estimation options.
detail specifies that a detailed list of any gaps in the series be reported, including gaps due to missing observations or missing data for the dependent variable or independent variables.
nocnsreport; see [R] Estimation options.
display_options: noci, nopvalues, vsquish, cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] Estimation options.
+--------------+ ----+ Maximization +-----------------------------------------------------
maximize_options: difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#), gtolerance(#), nonrtolerance(#), and from(init_specs); see [R] Maximize for all options except gtolerance(), and see below for information on gtolerance().
These options are sometimes more important for ARIMA models than most maximum likelihood models because of potential convergence problems with ARIMA models, particularly if the specified model and the sample data imply a nonstationary model.
Several alternate optimization methods, such as Berndt-Hall-Hall-Hausman (BHHH) and Broyden-Fletcher-Goldfarb-Shanno (BFGS), are provided for ARIMA models. Although ARIMA models are not as difficult to optimize as ARCH models, their likelihoods are nevertheless generally not quadratic and often pose optimization difficulties; this is particularly true if a model is nonstationary or nearly nonstationary. Because each method approaches optimization differently, some problems can be successfully optimized by an alternate method when one method fails.
Setting technique() to something other than the default or BHHH changes the vcetype to vce(oim).
The following options are all related to maximization and are either particularly important in fitting ARIMA models or not available for most other estimators.
technique(algorithm_spec) specifies the optimization technique to use to maximize the likelihood function.
technique(bhhh) specifies the Berndt-Hall-Hall-Hausman (BHHH) algorithm.
technique(dfp) specifies the Davidon-Fletcher-Powell (DFP) algorithm.
technique(bfgs) specifies the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm.
technique(nr) specifies Stata's modified Newton-Raphson (NR) algorithm.
You can specify multiple optimization methods. For example, technique(bhhh 10 nr 20) requests that the optimizer perform 10 BHHH iterations, switch to Newton-Raphson for 20 iterations, switch back to BHHH for 10 more iterations, and so on.
The default for arima is technique(bhhh 5 bfgs 10).
gtolerance(#) specifies the tolerance for the gradient relative to the coefficients. When |g_i*b_i| < gtolerance() for all parameters b_i and the corresponding elements of the gradient g_i, the gradient tolerance criterion is met. The default gradient tolerance for arima is gtolerance(.05).
gtolerance(999) may be specified to disable the gradient criterion. If the optimizer becomes stuck with repeated "(backed up)" messages, the gradient probably still contains substantial values, but an uphill direction cannot be found for the likelihood. With this option, results can often be obtained, but whether the global maximum likelihood has been found is unclear.
When the maximization is not going well, it is also possible to set the maximum number of iterations (see [R] Maximize) to the point where the optimizer appears to be stuck and to inspect the estimation results at that point.
from(init_specs) allows you to set the starting values of the model coefficients; see [R] Maximize for a general discussion and syntax options.
The standard syntax for from() accepts a matrix, a list of values, or coefficient name value pairs; see [R] Maximize. arima also accepts from(armab0), which sets the starting value for all ARMA parameters in the model to zero prior to optimization.
ARIMA models may be sensitive to initial conditions and may have coefficient values that correspond to local maximums. The default starting values for arima are generally good, particularly in large samples for stationary series.
The following options are available with arima but are not shown in the dialog box:
collinear, coeflegend; see [R] Estimation options.

Examples
--------------------------------------------------------------------------- Setup . webuse wpi1
Simple ARIMA model with differencing and autoregressive and moving-average components . arima wpi, arima(1,1,1)
Same as above . arima D.wpi, ar(1) ma(1)
ARIMA model with additive seasonal effects . arima D.wpi, ar(1) ma(1 4)
--------------------------------------------------------------------------- Setup . webuse air2 . generate lnair = ln(air)
Multiplicative SARIMA model . arima lnair, arima(0,1,1) sarima(0,1,1,12) noconstant
--------------------------------------------------------------------------- Setup . webuse friedman2, clear
ARMAX model . arima consump m2 if tin(, 1981q4), ar(1) ma(1)
ARMAX model with robust standard errors . arima consump m2 if tin(, 1981q4), ar(1) ma(1) vce(robust) ---------------------------------------------------------------------------

Video example
Introduction to ARMA/ARIMA models

Stored results
arima stores the following in e():
Scalars e(N) number of observations e(N_gaps) number of gaps e(k) number of parameters e(k_eq) number of equations in e(b) e(k_eq_model) number of equations in overall model test e(k_dv) number of dependent variables e(k1) number of variables in first equation e(df_m) model degrees of freedom e(ll) log likelihood e(sigma) sigma e(chi2) chi-squared e(p) p-value for model test e(tmin) minimum time e(tmax) maximum time e(ar_max) maximum AR lag e(ma_max) maximum MA lag e(rank) rank of e(V) e(ic) number of iterations e(rc) return code e(converged) 1 if converged, 0 otherwise
Macros e(cmd) arima e(cmdline) command as typed e(depvar) name of dependent variable e(covariates) list of covariates e(eqnames) names of equations e(wtype) weight type e(wexp) weight expression e(title) title in estimation output e(tmins) formatted minimum time e(tmaxs) formatted maximum time e(chi2type) Wald; type of model chi-squared test e(vce) vcetype specified in vce() e(vcetype) title used to label Std. err. e(ma) lags for moving-average terms e(ar) lags for autoregressive terms e(mari) multiplicative AR terms and lag i=1... (# seasonal AR terms) e(mmai) multiplicative MA terms and lag i=1... (# seasonal MA terms) e(seasons) seasonal lags in model e(opt) type of optimization e(ml_method) type of ml method e(user) name of likelihood-evaluator program e(technique) maximization technique e(tech_steps) number of iterations performed before switching techniques e(properties) b V e(estat_cmd) program used to implement estat e(predict) program used to implement predict e(marginsok) predictions allowed by margins e(marginsnotok) predictions disallowed by margins
Matrices e(b) coefficient vector e(Cns) constraints matrix e(ilog) iteration log (up to 20 iterations) e(gradient) gradient vector e(V) variance-covariance matrix of the estimators e(V_modelbased) model-based variance
Functions e(sample) marks estimation sample
In addition to the above, the following is stored in r():
Matrices r(table) matrix containing the coefficients with their standard errors, test statistics, p-values, and confidence intervals
Note that results stored in r() are updated when the command is replayed and will be replaced when any r-class command is run after the estimation command.

References
Ansley, C. F., and R. J. Kohn. 1985. Estimation, filtering, and smoothing in state space models with incompletely specified initial conditions. Annals of Statistics 13: 1286-1316.
Ansley, C. F., and P. Newbold. 1980. Finite sample properties of estimators for autoregressive moving average models. Journal of Econometrics 13: 159-183.
Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.
Harvey, A. C. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press.
------. 1993. Time Series Models. 2nd ed. Cambridge, MA: MIT Press.