Science topic

Multiple Time Series Analysis - Science topic

Explore the latest questions and answers in Multiple Time Series Analysis, and find Multiple Time Series Analysis experts.
Questions related to Multiple Time Series Analysis
  • asked a question related to Multiple Time Series Analysis
Question
4 answers
I'm using multiple time series measured daily from 2015 to 2021. but records for some days are missing for all time series. How can I impute the records for missing days?
Relevant answer
Answer
I disagree with Dinesh Kumar. Under the assumptions of missing completely at random (MCAR) or missing at random (MAR) data, full information maximum likelihood (FIML) or multiple imputation may be used. Under missing not at random (MNAR) data, these techniques can lead to bias. Also, mean imputation or LOCF are no longer recommended in the missing data literature (e.g., Enders, 2022). Mean imputation can lower the variance and LOCF can obviously introduce bias in longitudinal data.
Enders, C. K. (2022). Applied missing data analysis. Guilford Publications.
  • asked a question related to Multiple Time Series Analysis
Question
8 answers
I am using Mann-Kendall test and Sen slope to assess the trends in monthly rainfall datasets for 64 years, e.g., Jan 1957, Jan 1958, ..., Jan 2020. Since the region is a semi arid one, there are a lot of zero values (NOT missing values) in the time series. For example, the time series for rainfall in January has only 15 non-zero values out of 64 data points. My question is how this will effect the trend test (Mann-Kendall) and the trend slope (Theil-Sen)?
Relevant answer
Answer
My suggestion is to delete the months that have zero data. Then analyze the remaining data (15 months).
These tests are sensitive to zero numbers and strongly affect the results.
You must have at least 10 observations for the Normal approximation to be appropriate.
  • asked a question related to Multiple Time Series Analysis
Question
4 answers
TLDR: How many variables can I have in a VAR or VECM model?
I am writing my thesis and I am using a VECM (VAR model with error correction for cointegration) model for analyzing the relationship between the prices of an energy exchange and some other factors. So far I have 4 variables in my model and I am thinking of adding more.
My question is that after how many variables does the model become unusable and unstable or can I add as much as I like?
Thank you for your answers in advance!
Relevant answer
Answer
If you are not constrained by degrees of freedom, then you should be guided by the theory. Include all the relevant variables suggested by theory, that should constitute the minimum set of variables that you should include. Should you add more variables after that? That depends. What is the justification for those additional variables? You should be able to provide a justification for those added variables. You should also avoid a "kitchen sink" approach.
  • asked a question related to Multiple Time Series Analysis
Question
17 answers
Dear Colleagues,
I estimated the OLS models and checked them for several tests; however, instability in CUSUMSQ persists as described in the photo. What should I do in this case?
Best
Ibrahim
Relevant answer
Answer
I presume that your data is quarterly or monthly as otherwise, you have too few observations to make any reasonable inferences.
If you are trying to make causal inferences (e.g. you have an economic model that implies that x causes y and you wish to measure that effect). the CUMSUMSQ is one test that indicates that your model is not stable. Either the coefficients or the variance of the residuals is not stable. You have indicated that there is no heteroskedasticity so it is possible that the model coefficients are the problem. The test itself only indicates that there's instability and does not say what the instability is or what causes it. There are many possible causes of instability, (omitted variables, functional form, heteroskedasticity, autocorrelation, varying coefficients etc.) Your best procedure is to return to your economics and work out how your theory might lead to stability problems. Are there possible breaks in your data caused by policy changes, strikes, technological innovations, and similar. that might be covered with a dummy variable or a step dummy.
If you are doing forecasting (or projections) I would not be too concerned about specification tests. It is very unlikely that an unstable model will forecast well. You may achieve good forecasting results with a very simple model that need not be fully theory compliant.
  • asked a question related to Multiple Time Series Analysis
Question
3 answers
Hi,
I am having trouble with a problem, in the field of Optimal Control and the generation of optimal time-series.
Let's consider a system, whose dynamics are represented by dx/dt = f(t,x(t),u(t),p(t)), x and u being respectively the state and control vectors for the system. p is a vector of parameters which have a direct influence on a system's dynamics.
An example illustrating this would be considering a drone, going from point A to point B, in minimum time, but subject to a windy environment (the wind being represented by the time-dependent variable p(t)).
I have generated, by solving an Optimal Control Problem, optimal time-series for x(t) and u(t), for several values of p=p(t)=constant.
I would now like to interpolate, for any given value of p(t) at time t, the "nearly-optimal" control u(t) to be applied to the system between time t and time t+1, based on the OCP results previously computed.
Would you know if this is even possible ? I have not really been able to find published work on this topic, if you had any suggestions I would be grateful.
Thanks,
Relevant answer
Answer
Hi,
Thanks for your reply. Sadly, the disturbance is does not appear linearly in the state equation, it directly influences the system dynamics.
  • asked a question related to Multiple Time Series Analysis
Question
19 answers
Dear Colleagues,
If I have 10 variables in my dataset (time series) out of which 9 is explanatory and 1 dependent, and if I clarify that all the variables are non-stationary, should I take the first difference of the dependent variable as well?
Best
Ibrahim
Relevant answer
Answer
Econometric models estimated with non-stationary data are profoundly invalid and misleading (Greene, 2002). An example of a simple scenario: - in a regression with one regressor, there are three variables that could be stationary or non-stationary; namely the dependent variable (Y), the regressor (X), and the disturbance term (u). A suitable econometric treatment of such a model depends critically on the pattern of stationarity and non-stationarity of these three variables (incl. the dependant variable). Since quite often variables can be non-stationary at I(0), it is important to understand the forces behind such non-stationarity, which largely include structural breaks, deterministic trends, and stochastic trends. Differencing (including the explained variable as in your case) is a common appropriate in nonstationary models, and this is often correct (Granger & Newbold, 1974; Green, 2002; and Stock, James & Watson, 2011).
Granger, C. W. J., and Paul Newbold. 1974. Spurious Regressions in Econometrics. Journal of Econometrics, 2(2):111-120.
Greene, W. 2002. Time series Models. (pp. 608-662) In Econometric Analysis, 5th edition. Prentice Hall, Upper Saddle Rive, NJ.
Stock, James H., and Mark Watson. 2011. Introduction to Econometrics. 3rd ed. Boston: Pearson Education/Addison Wesley.
  • asked a question related to Multiple Time Series Analysis
Question
7 answers
Dear colleagues,
I am capable of using linear estimations between X and Y variables via OLS or 2SLS (on Eviews, for example); however, I need to study how to estimate/model non-linear relationships as well. If you know any source which can explain it in a simple language based on time series, your recommendations are well-welcomed. Thank you beforehand.
Best
Ibrahim
Relevant answer
Answer
Dear Ibrahim,
I also recommend Greg. N Gregorios and Raven Pascalau @Hamid Muili and google for more materials on non-linear relationship.
Regards
  • asked a question related to Multiple Time Series Analysis
Question
1 answer
Hi,
When estimated the DCC-GARCH in stata at the end of the output pairwise quasi correlations are given. What does it mean in practice? is it the mean value of dynamic correlations or something else?
Much appreciated if anybody could clarify this.
Kind regards
Thushara
Relevant answer
Answer
  • asked a question related to Multiple Time Series Analysis
Question
2 answers
Hi
I've estimated a DCC-GARCH(1,1) model using STATA. at the end of the stata output, correlation matrix is given and it is also called quasi correlation matrix. Is it the conditional correlation matrix or a different one? if so is it the average/mean value of the dynamic conditional correlations?
Much appreciated if anybody clarifies this.
(I've herewith attached the output)
Kind regards
Thushara
Relevant answer
Answer
Hi. The answer is in the Stata documentation : "When Qt is stationary, the R matrix in (1) is a weighted average of the unconditional covariance matrix of the standardized residuals et, denoted by R, and the unconditional mean of Qt, denoted by Q. Because R is not eaqual to Q, as shown by Aielli (2009), R is neither the unconditional correlation matrix nor the unconditional mean of Qt. For this reason, the parameters in R are known as quasicorrelations; see Aielli (2009) and Engle (2009) for discussions". Type in your favorite search engine "DCC-GARCH STATA quasi correlation matrix" and you will find it on page 5. I used Google.
  • asked a question related to Multiple Time Series Analysis
Question
5 answers
What's the best open source (i.e., free) approach/library/tool for unsupervised/semi-supervised[i.e., with limited to no training data] time-series [like this - https://github.com/numenta/nupic/blob/master/src/nupic/datafiles/extra/nycTaxi/nycTaxi.csv] anomaly detection.
Relevant answer
Answer
R and Python
  • asked a question related to Multiple Time Series Analysis
Question
6 answers
HI,
In a DCC-GARCH(1,1) model (dependent variable is first difference of logarithm of the series) based on monthly data,
1. How do you interpret unconditional and conditional correlation in a DCC-GARCH model?
2. Is it possible to get a correlation matrix (like the unconditional correlation matrix, without correlation for each month and pair) for conditional correlations? or we just need to present the data using a conditional variance graph for each pair?
Much appreciated your comments/advice on this.
Kind regards
Thushara
Relevant answer
Answer
  • asked a question related to Multiple Time Series Analysis
Question
5 answers
My clinical study measured the blood biomakers (glucose, insulin, glucagon, GLP-1, GIP, amino acids, etc) at baseline before an intervention meal and at multiple time points after an intervention meal.
We did this measure three times using three different intervention meals at three different days. My main objective is to compare whether there is any difference in the change in blood biomarkers after different intervention meals.
There are several AUC calculation methods to do this, such as the total AUC, incremental AUC (ignore the area under baseline) and net incremental AUC (subtract the area under baseline). How to determine which one to use and what is the rationale?
Relevant answer
Answer
iAUC (incremental ...) is the most recommended and approved method. Follow iAUC
  • asked a question related to Multiple Time Series Analysis
Question
3 answers
I am modelling volatility of international tourists arrivals from several source markets. I use mainly two methods ARIMA-GARCH or ARIMA-GJR models and SARIMA-GARCH or SARIMA-GJR models. Initially the estimates suggest that error terms of some models do not exhibit a normal distribution even though in the estimation I assumed it is normally distributed. I obtained the Bollerslev-Wooldridge standard errors in this case as it is said to be better than ordinary se. As some of the models do not have normally-distributed error term, I re-estimated all the models assuming a student-t distribution as it is recommended when the error term is non-normal. However, Bollerslev-Wooldridge standard errors were not available when using student-t distribution (In Eviews 10 software). Instead Huber-White se is available. I am wondering whether this is better than Bollerslev-Wooldridge standard errors or produce approximately similar outcomes. Any advice is much appreciated !!!
Relevant answer
Answer
Hello S.C Thushara
There is no magic formula for errors to be normal identically distributed, in fact with no model you will find that the errors give you white noise, what you must ensure is that they do not have autocorrelation presence that is done with the pormonteu test in the lags and make sure that the squared residuals do not have arch effects with the Ljung-Box test. I recommend using distributions that have skewness and kurtosis to model the innovations of conditional volatility models, for that you must migrate from eviews to R or matlab.
Regards,
  • asked a question related to Multiple Time Series Analysis
Question
3 answers
Hi,
can anyone recommend some literature and/or software for multi-level non-hierarchical dynamic factor models?
Relevant answer
Answer
Have you created an account on Kagle? There might be something on there
  • asked a question related to Multiple Time Series Analysis
Question
3 answers
I am trying to compare two time series and I am in the process of assessing different methodologies to compare their relationship.
If you have used the Granger test, are you willing to share some literature on the topic please?
Relevant answer
Answer
Hi,
Granger-Causality is mainly used in economics and finance research and it can be used in other disciplines as well. Therefore, most of the literature are in those two fields. Since you have only two time series, you can go gor pair-wise granger causality tests to assess the nature of causality, i.e whether the causality is uni directional or bi directional if there is causality exists between the two variables.
Please see below for some articles in your field.
Good luck with your research.
Kind regards
Thushara
  • asked a question related to Multiple Time Series Analysis
Question
1 answer
Hi!
We are trying to estimate body mass (W) heritability and cross-sex genetic correlation using MCMCglmm. Our data matrix consists of three columns: ID, sex, and W. Body mass data is NOT normally distributed.
Following previous advice, we first separated weight data into two columns, WF and WM. WF listed weight data for female specimens and “NA” for males, and vice-versa in the WM column. We used the following prior and model combination:
prior1 <- list(R=list(V=diag(2)/2, nu=2), G=list(G1=list(V=diag(2)/2, nu=2)))
modelmulti <- MCMCglmm(cbind(WF,WM)~trait-1, random=~us(trait):animal, rcov=~us(trait):units, prior=prior1, pedigree=Ped, data=Data1, nitt=100000, burnin=10000, thin=10)
The resulting posterior means of posterior distribution were suspiciously low (e.g. 0.00002). We calculated heritability values anyway, using the following:
herit1 <- modelmulti$VCV[,'traitWF:trait WF.animal']/
(modelmulti$VCV[,'traitWF:trai tWF.animal']+modelmulti$VCV[,' traitWF:traitWF.units'])
herit2 <- modelmulti$VCV[,'traitWM:trait WM.animal']/
(modelmulti$VCV[,'traitWM:trai tWM.animal']+modelmulti$VCV[,' traitWM:traitWM.units'])
corr.gen <- modelmulti$VCV[,traitWF.traitW M.animal']/
sqrt(modelmulti$VCV[,'traitWF: traitWF.animal']*modelmulti$VC V[,'traitWM:traitWM.animal'])
We get heritability estimates of about 50%, which is reasonable, but correlation estimates were extremely low, about 0.04%.
Suspecting the model was wrong, we used the original dataset with all weight data in a single column and tried the following model:
prior2 <- list(R=list(V=1, nu=0.02), G=list(G1=list(V=1, nu=1, alpha.mu=0, alpha.V=1000)))
model <- MCMCglmm(W~sex, random=~us(sex):animal, rcov=~us(sex):units, prior=prior2, pedigree=Ped, data=Data1, nitt=100000, burnin=10000, thin=10)
The model runs, but it refuses to calculate “herit” values, with the error message “subscript out of bounds”. We’d also add that in this case, the posterior density graph for sex2:sex.animal is not shaped like a bell.
What are we doing wrong? Are we even using the correct models?
Eva and Simona
Relevant answer
Answer
See our published paper on the topic: Cross-sex genetic correlation does not extend to sexual size dimorphism in spiders
  • asked a question related to Multiple Time Series Analysis
Question
5 answers
Hey, dears!
I am finding for someone mathematical model with chaotic burstings beyond the neural dynamics and don't have sucess. 
Particularly, I have interested if (using couplings) it s possible force the Lorentz ou Rossler, for example, for exhibit this behavior.
I would like of receive suggestions of articles or ways for this.
Thank you.
Resgards, 
Relevant answer
Answer
These article may be useful:
Chaotic burst in the dynamics of (z)=\lambda sinhz/z https://doi.org/10.1070/RD2005v010n01ABEH000301
  • asked a question related to Multiple Time Series Analysis
Question
2 answers
I want to look at temporal variability of community composition..
Ive been reading about methods such as Redundancy analysis with principal coordinates of neighbourhood matrices (RDA-PCNM) or Asymmetric eigenvector maps (AEM) (Boccard & Legendre 2002, Legendre 2014). 
Ultimately I want to:
1) plot a graph with X axis =  time, Y axis = some 'univariate measure' of composition (I've seen Jaccard, or RDA x-axis score etc) 
2) Calculate a 'univariate measure' of the temporal variability of composition (i.e a multivariate measure of Coefficient of Variation - CV) to plot against other x axis such as diversity etc.. 
My questions are 
1) if you use RDA_PCNM or AEM -ie Boccard & Legendre 2002/Legendre 2014 methods;  what is the need for conducting the PCNM first? Why cant you just use the RDA scores based on the original data?
2) An output of the RDA_PCNM can be the RDA x axis score plotted over time. But this still only shows you one dimension of the variability.. so you still need to plot the RDA y axis score too -
Isn't there some way to create a single 'measure' to incorporate the multidimensionality of possible composition changes.. such as the Euclidean distances between Time 0 and Time t for each year (perhaps from a PCNM) and plot that as the y-score?
What is the advantage of the RDA_PCNM or AEM methods over the Euclidean distance method?
3) to calculate a compositional measure of temporal variability; can I use the Euclidean distances as above and then calculate CV of these distances?
Thanks for your suggestions.
Data structure: 
12 time points (not all sites sampled in all years)
environmental variable = habitat type
3 habitat types with a gradient of vegetation cover from A-C
3 replicate sites in each habitat
multivariate response variable = abundance data across multiple species (community)
Relevant answer
Answer
Many researchers regard PCNM as a suitable technique for transforming spatial distances of a truncated distance-matrix to the matrix (rectangular data) suitable for constrained ordinations such as in RDA. Otherwise, by using the truncated distance-matrix directly into constrained ordination, you are forcing truncated environmental data into regular distance matrix, when it is not actually so.
But for the distances in rectangular form, it is similar to normal explanatory variables used in, e.g., constrained ordination (rda and cca). Therefore, I have directly used the RDA scores of the temporal gradient (pls see attached file).
2) "An output of the RDA_PCNM can be the RDA x axis score plotted over time. But ……"
If you are concerned only about the ‘temporal variation’, why would you need to analyse the variation along other axes (say, along spatial gradient)? In such condition, I would simply go for partial constrained ordination (say pRDA), and treat all the non-temporal gradients as co-variates. In my opinion, it would then be safer to use the RDA scores of only the 'temporal gradient'.
“I am not an expert of such analyses, but have been practicing these techniques for a while…”
  • asked a question related to Multiple Time Series Analysis
Question
8 answers
Using Engle-Granger method, I found cointegrating relationship between my variables. Then I estimated long-run and error-correction models. Which model should I use to check main assumptions like normality, heteroskedasticity etc.?
Relevant answer
Answer
Thank you, my good friend but don't work too hard!
Regards,
Prof. Arize
  • asked a question related to Multiple Time Series Analysis
Question
4 answers
Hi all,
Just wanted to ask If I test how IVs A,B,C,D, E, F predict depended variable Y and given that: .
A is point in time (before or after manipulation).
B is the group to which participants belong (control or experiment).
C and D are two measures of well being. 
E and F are sex and age.
Y is measure for resilience.
Now, here is where it  become a bit complex, I assume that after manipulation there would be increase in scores of C and D for the experiment group but not for the control group. Also, C and D scores will positively predict Y scores for both control and experiment groups. Of importance, I assume that C and D will predict the same or higher level of the variance in Y scores and that Y scores therefore would be higher for the experiment group after the manipulation (no change in the control). No special predictions for sex and age, they are very much covariates. 
So, what analysis should I use?
Relevant answer
Answer
Times series by using a software such as SPSS is helpful in this case.
My suggestion is to predict value of Y by using only C and D variables. (Parameter A can be define as time parameter. For example: Y(t) = C(t) + D(t)
You need to predict for control group and experiment group separately.
The same for parameters E and F. For E, you have only MALE and FEMALE, so it's easy to deal with it. But for age, depends on the variety of ages, you can divide them into groups. For example: 0-10, 10-20, and so on.
Thus, by using your data, you can obtain many information. Don't limit yourself in just one global formula to predict Y based on all mentioned parameters.
Good luck
  • asked a question related to Multiple Time Series Analysis
Question
10 answers
I would like to find the long-run relationship between domestic and international prices. I have run the Johansen Cointegration test on level with 2 and 6 lags, but I have got mixed results. With 2 lags the trace-test confirm one or more cointegrating vectors but trace-test does not confirm cointegration. However, with 6 lags suggested by AIC and HQ information criteria, their is no cointegration at all. How should I proceed further?
Relevant answer
Answer
Yes this problem has already been solved with me. Generally speaking we convert all prices into log form and then conduct unit root and cointegration tests. However, it may not be wrong if one use the level form as well. I am sharing two of my papers below, if they are of any help to you:
  • asked a question related to Multiple Time Series Analysis
Question
3 answers
Studying time to seed germination under several temperatures we may consider a germination box, for example, as an experimental unit. In this individual space, I sowed 100 seeds of one species at the same time. These seeds start the imbibition immediately, but the chemical reactions inside every seed depend on the physiological quality of every one. Thus, we have germination events at t1, t2, …, tn inside this germination box (the experimental unit). In the experiment, we can have j experimental units for every k treatments. The question is: may the researcher analyze this data set using a routine for repeated measures along the experimental time?
Relevant answer
Answer
Hi José,
as Spyridon said, you can not consider repeated measures. If you have only 100 seeds for each species, you should make 4 repetitions of 25 each, and use the mean, your result will be more robust. Pay attention to the diferences between the reps. For ISTA rules, you have to determine the difference of your mean germination with the max and the min, if it is too high, you can not, statiscally, use the mean. You have to begin again. See this link for more details.
Fabienne
  • asked a question related to Multiple Time Series Analysis
Question
4 answers
The High Frequency requires an appropriate way to treat the data?
If yes, which is the appropriate methodology to do this?
Relevant answer
Answer
Have you heard about convergent cross mapping(CCM)[1]? But , if your data are purely stochastic than it might give you any result.. but , some data might appear as stochastic , but it might be the case that the data generteting process is deterministic dynamic … in that cause granger causality is not reliable , and CCM is a really good solution. 
1.       Sugihara, G., May, R., Ye, H., Hsieh, C. H., Deyle, E., Fogarty, M., & Munch, S. (2012). Detecting causality in complex ecosystems. science, 338(6106), 496-500.
  • asked a question related to Multiple Time Series Analysis
Question
4 answers
I'm facing a multivariate time series composed by observations representing driving style collected each 0.1 s using Sensor fusion approach (mobile phone). The features are AccX,AccY,AccY, GyroX, GyroY, GyroZ,Speed. I'm trying different methods to segment the serie to obtain segments representing meaningful driving events (accelerations, breaking, steers). My first approach has been linear segmentation of individual time series but I'll prefer a multivariate approach.
Relevant answer
Answer
I agree with Hugh Kenedy. Try to implement his suggestions.
  • asked a question related to Multiple Time Series Analysis
Question
4 answers
Lets say that in a restaurant a chair is occupied by a customer. He can sit there for as long as he wants to depending on various factors like the ambience of the surrounding, quality of food, friendliness of staff etc. The time duration would be split into blocks of 15 mins i.e. every 15 mins a researcher would observe if he is still sitting there or has he left. The customer would be assigned a dichotomous value of 0 if he leaves and 1 if he still continues to occupy the chair. 
Time          Customer Sitting         No. of servings/Food quality
7:00 pm                1                                            5/Good
7:15 pm                1                                            5/Good
7:30 pm                1                                            5/Good
7:45 pm                1                                            5/Good
8:00 pm                1                                            5/Good
8:15 pm                0
8:30 pm                1                                            3/Average
8:45 pm                1                                           3/Average
9:00 pm                0
9:15 pm                1                                         2/Poor
9:30pm                 0
In the above example a customer arrives at the restaurant and sits there at 7:00 pm and remains there till 8:00 pm. After that he leaves and another customer occupies that chair and continues till 9:00 pm. Second customer leaves after that and third one arrives at 9:15pm and so on and so forth. 
In this illustration the occupancy of the chair by a customer is the dependent variable taking values 1 or 0, and food quality/no. of servings would be  independent variables. 
I want to ask if logit and probit regression can be applied on such a problem. Is it a violation of criteria of independence of dependent variable that the value following "0" has to be "1". Can logit and probit regression be applied with some modifications, if yes what are those? Can logit/probit regression be applied on a time series data like this without any loss of generality? 
Thanks in Advance
Naseem
Relevant answer
Answer
It accounts for dependencies based on the within-panel correlation structure that you specify. Here are the following choices of correlation structures that you can specify in Stata: exchangeable, independent, unstructured, autoregressive of order #, stationary of order #, nonstationary of order #, and a user-defined option.
  • asked a question related to Multiple Time Series Analysis
Question
3 answers
Please find my dataset and forecast outputs attached.
A) First sheet contains March-2011 to February 2014 data and forecast for March-2014 to February 2015 using ARIMA,Winter's,TBATS and BATS method.It also has forecast errors obtained by comparing with actual output.
B) Second sheet has forecast for June 2015 to February 2016 using above mentioned methods.
C) R code.
As it can be seen, TBATS method gave output for 2014-15 with the least error but there is no trend and seasonality (Constant values) in TBATS output for 2015-16 which is hard to believe.
BATS method gave the most erroneous output (Constant values) for 2014-15 but forecast for 2015-16 seems reasonable.
I am confused which method should I go for.Should I opt for some other technique considering my data? Or Am I missing something?
Relevant answer
Answer
FORECAST HORIZON: you have data series that runs from March 2011 to March 2014. The proposed forecast is from March 2014 to February 2015, i.e. 12 months of forecast series. This is problematic. The forecast horizon is too long (12 months). you used monthly data and you try to project 12 months ahead. The forecast error may be too high. you should shorten the forecast horizon to a reasonable length.
ARIMA MODEL: Let's see the components of Autoregregressive integrated moving average (ARIMA); it consists of AR = autoregressive; plus there is an integration of exogenous shock effect that destroyed the original mean reverting---this requires you to verify "integration." MA = moving average; here, you need to verify that the error is N(0,1). ARIMA is the combination of AR + I + MA. Verify each part in the model selection.
REFERENCE: See Box-Jenkins materials.
  • asked a question related to Multiple Time Series Analysis
Question
6 answers
Hi all,
Is there any multivariate time series classification problem in which some variables are categorical? The time series are assumed to have numerical observations for most of the approaches in this domain and I am interested in the case where some observations are categorical.
For example, network flow data consists of packages transferred between IP pairs. Each flow can be labeled by its application such as bittorrent, Skype and etc. Each flow is a series of packages for which the information about the size, direction and payload is known. Direction is either upstream or downstream in this particular example. Although it can be represented as a binary variable, the natüre of the variable is categorical.
Please, let me know if you have such datasets. Thanks in advance.
Relevant answer
Answer
If anybody requires the type of dataset mentioned in this post, you can download it through my website. Network flow dataset in this repository stores the information about the packages sent by applications such as skype, bittorrent and so on. The aim is to predict the application type based on the set of packages in the flow.
The time series dataset stores the information about the direction of the package at a given time, time between packages, size of the package and size of the payload. This information is collected at Bogazici University by Subakan et. al. (Y. C. Subakan, B. Kurt, A. T. Cemgil, and B. Sankur. Probabilistic sequence clustering with spectral learning. Digital Signal Processing, 29(0):1 – 19, 2014.).
  • asked a question related to Multiple Time Series Analysis
Question
3 answers
Hello,
I am doing Multilevel modelling where my dependent variable is income and my dependent variables are age, type of job (categorical) (fixed effect) and sex (categorical) (fixed effect). 
I have been modelling in MLwiN and everything has gone pretty good. I fitted a quadratic function in level 2 and everything was perfect. 
My confusion occurred when I modelled level 1 variance. I modelled the variance and there is a clear heterocedasticity, both coefficients are significant. But, in that process my level 2 variance becomes none significant and my standard error increases a lot. 
My only explanation is that the whole majority of the variance is within districts (level 1) instead of between districts (level 2). Therefore there is a lot of confounding variance across levels but the majority is in level 1. 
Any suggestions for this? 
Thank you very much!
Relevant answer
Answer
Another plausible explanation may be the fact that you increase the estimates of the total variance, resulting in higher Standard Errors. I would recommend to evaluate both models (with and without level 2 variance) on fit as well, by making use of AIC or BIC. 
  • asked a question related to Multiple Time Series Analysis
Question
2 answers
The main idea is to use multivariable time series (as observations) to predict a state variable (one dimension).
Please find the attachments.
For example time series mm (4 variables and 200 observations) was used to learn V, W of a DLM. I have two question in this regard:
1) I supposed that dimensions of DLM should be as follows based on matrix operation:
FF = R4×1
GG = R1×1
V = R4×1 (because the dimension of FF×Ө and V should be the same)
W = R1×1
M0 = R1×1
C0 = R1×1
Therefore, the ¨V vector.R¨ code was developed. But, an error was displayed:
Error in dlm(FF = matrix(1, N, 1), GG = 1, V = matrix(exp(parm [1:4]), : Incompatible dimensions of matrices
Debug result:
m <- nrow(x$FF)
p <- ncol(x$FF)
if (!is.numeric(x$V))
stop("Component V must be numeric")
if (!(nrow(x$V) == m && ncol(x$V) == m))
stop("Incompatible dimensions of matrices")
Why V should be R4×4 ?
2) The ¨V matrix.R¨ code was developed. But, following error was displayed.
Error in dlm(FF = matrix(1, N, 1), GG = 1, V = matrix(parm[1:16], N, N), :
V is not a valid variance matrix
What is the problem?
Relevant answer
Answer
A good reference for this is West and Harrison, Bayesian forecasting and dynamic models, Springer Verlag, 1997. It appears you are using West and Harrison notation or a variant of it. Chapter 9.2 discusses the “multiple regression dlm”.
I think that subchapter will clarify a few things. DLMs are defined by {F,G,V,W}. F is a 4x1 vector of independent variables. G = I (4x4 identity matrix), the system evolution matrix. V is the observation variance. W is the covariance matrix for the state parameter vector (regression coefficients), 4x4.
The variable predicted is the “observation variable”. West & Harrison do not call it a “state variable”. Rather, W&H refer to the current regression coefficients as the state vector or system vector Θ (subject to change over time).
Again, assuming W&H notation, the observation equation:
Y = F’ Θ + v ,                                  v ~ N( 0, V )
(1x1) = (4 x 1)’ (4 x 1) + (1x1) ;         N[(1x1), (1x1)]
The system equation:
Θt = G Θt-1 + w,                             w ~ N(0, W)
(4x1) = (4x4) (4x1) + (4x1);               N[(4x1),(4x4)]
I would recode things yourself, don’t use a library “dlm”. The equations are pretty easy (especially with W&H open next to you).
  • asked a question related to Multiple Time Series Analysis
Question
11 answers
I have a multivariable time series database and label of each subset. It is intended to learn parameters of HMM (Hidden Markov Model) based on the data for classification. 
At first, I select the label as an state variable. But, in this way the performance of classification by HMM is not good. 
How I should select the state variables based on the presence database?
Relevant answer
Answer
ISSUE: How variable state should be selected in Hidden Markov Model (HMM)?
HMM & SIMPLE MARKOV: Under Markov process, the output depends on state. The output is visible, thus, observable. State that produces output is not visible. Each state has probability distribution over all possible output. Therefore, the sequence of output generated by HMM provides information about sequence states. Recall that in simple Markov, the state is directly visible to observers. The only parameter is state transition probability. However, in Hidden Markov Model, the state producing the output is not directly visible, i.e. latent. Only the output that depends on state is visible.
Suppose that we have a data sequence: u = {u1, u2, ..., uT}. In this case, there are 53 time series observations; therefore the last term is u53. Now, every ut is generated by a hidden state called St. This underlying (latent) state St follows a Markov chain, that is: (i) given the present, the future is independent of the past or simply that the future event depends on the most recent event. the distant past is of no consequence. Thus:
P(St+1 | St, St-1, ..., S0) = P(St-1 | St)
From St-1 to St, there is a transition probability which may be summarized as:
aki = P(St+1 = I | St = k)
... where k,i = 1, 2, ... M where M represents a total number of states. The initial number of states is pk. (reads pi sub-k). The total sum of the transition probabilities adds up to 1 and the total sum of pk is also add up to 1. NOW, the states are:
S(S1, S2, ... , ST) = P(S2 | S1)P(S3|S2) ... P(ST | ST-1)
This probability of pk may be written as:
S(S1, S2, ... , ST) = pkas1s2as2s3 ... aST-1,ST
For a given state ST, the observation ut is independent of other observation and state.
HOW DOES IT WORK?
Markov Process:          X0........A.......X1......A.......X2......A.......XT-1
                                         B                  B                B                  B
Observations:              O0................O1..............O2 ............. OT-1
What does it mean? A Markov process X0 produces an observation O0 via a hidden process B. Then moving from Xo to X1, there is a transition process A. This A has its own probability called transition aki. Matrix A = {aki} is an N X N where
aki = P(state gi at t+1 | state gi at t)
A is row stochastic. Matrix B = {bj(k)} is N X M with
bj(k) = P(observed k at t | state gi at t).
Thus, HMM is defined by A, B, p and dimension N and M. The formal statement for HMM is:
Lambda = (A, B, p)
NOW, consider a generic sequence X = (X0, X2, X3) with corresponding observations O = (O0, O2, O3). Then pxo is the probability of starting in state X0; that bx0(O0) is the probability of initially observing O0 and axo,x1 is the probability of the transition for X0 to X1. Therefore, the probability P(X) is give as:
P(X) = px0bx0(O0)ax0x1(O0)ax1x2b2bx2(O2)ab3bx3(O3)
In the present case, find P(53) by using the above generic sequence as a model. Extend the sequence to O53.Your matrix A is 53 X 53 and matrix B is 53 X 20. Thus:
P(53) = px0bx0(O0)ax0x1(O0)ax1x2b2bx2(O2) ...   ...   ab53bx53(O53)
  • asked a question related to Multiple Time Series Analysis
Question
6 answers
We have collected data from three dorms for two months (daily), we want to compare the control dorm (no treatment) to a dorm with one treatment (water saving ads) and a dorm with two treatments (water saving ads and eco-feedback shower heads). What is the best SPSS analysis to conduct and why? Are there any papers we can cite for this methodology?
Relevant answer
Answer
You have three independent groups, so You can use one way analysis of variance (ANOVA) to compare the means of groups to test whether the differences between the three groups is significant or not.
If the calculated F is greater than tabulated F at alfa=0.05 (Sig. < 0.05) you can say their is a significant difference between the three groups and you should apply Multiple Comparisons as LSD (Least significant difference) or Tukey HSD or Scheffe to comparing between each two groups. But if the calculated F is smaller than tabulated F at alfa=0.05 (Sig. >= 0.05) you can say their is no significant difference between the three groups and you should end the analyzing. You can use SPSS easily for your analysis.