[Scilab-users] GROCER VAR-HMM Example Output Different from Chapter 23

Brian Bouterse bmbouter at gmail.com
Tue Jun 3 12:21:08 CEST 2014


Hi Eric,

Thanks for the quick and helpful reply! I adjusted the first parameter from
'cte' to 'all' and that produces the correct form of output, and the values
look close to the expected values. This is great!

I've read the help documents several times, but I didn't realize the impact
of adjusting the value of the first parameter. The doc example on the
ms_var help page describes the parameter, but the only example on that page
uses 'cte'. There is no example with 'all' on the ms_var() help page. Also
including some sample output may also be good there. I would like to
contribute an example like this to the docs, manual, or help files. I think
that would have caused me to solve my own problems instead of writing long
e-mails.

I also tried using a larger data file 10,000 observations in a column
format, and then impexc2bd transformed the data file correctly. That is
also solved thanks to your suggestion.

I know the code is available for download, but would you ever consider
putting a copy of the GROCER code in a repository on github.com? If it were
in a place like that, I, and others, would have an easier time of
contributing to the codebase instead of e-mailing patches. What do you
think about an idea like this? Just a friendly suggestion based on what has
worked well for me in the past. I can help put it up there, and configure
it if that is helpful.

Thanks again for making this great software, and helping me use it!

Best,
Brian





On Thu, May 29, 2014 at 1:41 PM, Eric Dubois <grocer.toolbox at gmail.com>
wrote:

> Hello Brian
>
> You did not recover the intial parameters because you did not estimate the
> same model as the one you generated.
>
> This is because you used in ms_var the option 'cte' which means that only
> the constant is allowed to switch: and this is why the AR coeffcients are
> only given for 'all regimes'. I f you rather use the option 'all', then all
> parameters are allowed to switch and the result is now consistent with the
> parameters of the generated model, that is the true values are within the
> confidence interval of the estimated ones. This is explained in the help
> command (see help ms_var): may I ask you why you did not infer it from the
> help files?
>
> As for the big file, I suspect it is truncated into Scilab. Can you try
> with data in column instead of in rows?
>
> Éric.
>
>
> 2014-05-29 13:24 GMT+02:00 Brian Bouterse <bmbouter at gmail.com>:
>
> Hi Eric,
>>
>> Thanks again for the response. What you told me to do worked. If I closed
>> Scilab, and rerun the example from my earlier e-mail, I do not see any
>> warnings, which is good. The output I received is not what I expected in
>> several ways, so I'll ask some more questions here. There are a lot of
>> different questions here, but they all relate to the example below, and the
>> use of ms_var and GROCER. Thanks in advance for any insight into the issues
>> I am experiencing.
>>
>> The example:
>>
>> I have a file arhmm_example_280.dat [0] that was generated and loaded
>> from arhmm_example_280.csv [1] using the following commands:
>>
>>     impexc2bd('arhmm_example_280.csv', ';', 'arhmm_example_280.dat')
>>     load('arhmm_example_280.dat')
>>
>> I then run, what I expect to be a single variable, 2-regime,
>>
>>     nb_states=2
>>     switch_var=2
>>     var_opt=3
>>
>>   r=ms_var('cte',3,'ardata',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>
>> I then receive this output:      http://ur1.ca/hej98
>>
>> 1) In the "coefficients" section of the output, the output doesn't seem
>> to contain enough numbers to describe the expected number of outputs. The
>> model has 2 regimes, and it is autoregressive with order 3, so I expect
>> there to be 3 coefficients for EACH regime (regime 1 and regime 2).
>> Instead, I see a single coefficient for Regime 1 (0.9313736) and a single
>> coefficient for Regime 2 (8.2685973). Where are the three autoregressive
>> parameters for regime 1 and the other three autoregressive parameters for
>> regime 2?
>>
>> 2) There is a section in the output called "All Regimes" which is in the
>> "Coefficients" section contains 3 coefficients. Do these correspond to the
>> autoregressive aspect of the model [0.2258037, 0.4757391, -0.3161730). I
>> expected to see a section like this for Regime 1, and one for Regime 2, but
>> I did not. I'm not sure how these 3 coefficients for "All Regimes" fit in.
>> The Markov Switching Model switches the autoregressive coefficients with
>> each Regime, so I'm not sure how to interpret coefficients that are for
>> "both regimes".
>>
>> 3) I believe each regime will have its error term estimated, and I do see
>> that in the output. For example under Regime 2, the variance of the error
>> term is 2.3164941. This is listed in the section "Variance-covariance
>> matrix of residuals" which is a subsection under "Regime 2". Am I
>> interpreting the output correctly? Where is the mean value for the regime?
>>
>> 4)  My overall goal is to validate that I can hand a list of observations
>> to GROCER+scilab, along with the order of the autoregressive model, and the
>> number of regimes, and have it estimate the Markov hidden state transition
>> probabilities, the estimated regime that is in effect at time t, and
>> coefficients for each regime, and the error term that can be used for
>> prediction of the t+1 observation. I don't know of a "well known" data set
>> that allows for this type of validation, so I wrote a simple AR-HMM
>> generator named koa marhn [2], which outputs AR-HMM data that follows a set
>> of parameters. This is the program that generated the 280 point data set in
>> the arhmm_example_280.csv [1] file. You can see the parameters used to
>> generate this file here:
>>
>>     https://github.com/bmbouter/koa_marhn/blob/master/arhmm_example.py
>>
>> A[i][j] is the probability transition from regime i to regime j.
>> C[x][y] is the autoregressive coefficient for regime x used for
>> generation of the t+1 element from the t, t-1, t-2, ..., t-y summation.
>> Each y is a coefficient for regime x.
>> R[q] contains mean and std_deviation values for regime q
>> pi[m] contains the starting state probabilities that the regime is in
>> state m at time t=0
>>
>> The output from http://ur1.ca/hej98 does not correspond at all with the
>> values of these inputs, which means something unexpected is going on! Do
>> you have insight into why scilab doesn't produce these expected numbers?
>>
>> 5)  One possibility is that maybe it is "working", but there isn't enough
>> data for it to estimate the parameters correctly. I had koa marhn [2]
>> output 10,000 observations, but I ran into trouble transforming it from a
>> csv file to a dat file using impexc2bd. My input file is
>> arhmm_example_10000.csv [3]. I tried to convert to .dat using:
>>
>>     impexc2bd('arhmm_example_10000.csv', ';', 'arhmm_example_10000.dat')
>>
>> I receive this output, and the .dat file in NOT created
>>
>>  !--error 10000
>> dates are entered neither in chronlogical order nor in reverseerse
>> chronological order
>> at line      87 of function read_dates called by :
>> at line     188 of function impexc2bd called by :
>> arhmm_example_10000.dat'
>>
>> How do you transform a large .csv file to .dat? Is there some problem
>> with the data file, the way I am making it?
>>
>> [0]:  https://s3.amazonaws.com/scilab_data_files/arhmm_example_280.dat
>> [1]:  https://s3.amazonaws.com/scilab_data_files/arhmm_example_280.csv
>> [2]:  https://github.com/bmbouter/koa_marhn
>> [3]:  https://s3.amazonaws.com/scilab_data_files/arhmm_example_10000.csv
>> [4]:  https://s3.amazonaws.com/scilab_data_files/arhmm_example_1000.csv
>> [5]:  https://s3.amazonaws.com/scilab_data_files/arhmm_example_1000.dat
>>
>> This e-mail is way to long, but I wanted to fully recap the issues I've
>> considered as I use ms_var(). Thanks for any help you can provide. You can
>> also find me in the #scilab channel on the OFTC IRC servers.
>>
>> Thanks,
>> Brian
>>
>>
>>
>> On Fri, May 23, 2014 at 3:48 PM, Eric Dubois <grocer.toolbox at gmail.com>
>> wrote:
>>
>>> Dear Brian
>>>
>>> 1) For me it works fine; I suspect that you have run ms_var with your
>>> data after having run with the manual example. I expect that if you reopen
>>> Scialb and run your problem, starting from load('arhmm_example.dat'),
>>> it will work. A good advice anyway is to set the bounds before each
>>> estimation or to run:
>>> --> bounds()
>>> if you want to use the greatest available time span with your data
>>> 2) I agree; indeed the example in the manual as well as yours are
>>> univariate, but ms_var also works with multivariate series (I have run some
>>> tests which worked well)
>>> 3) If you want to contribute, do not hesitate to send me code (at
>>> grocer.toolbox at gmail.com or grocer.toolbox at free.fr); add your
>>> copyright; if you can create a help file it would still be better (I have
>>> some tools do help doing that if you want them); and if you can add to the
>>> manual, it would be marvellous!
>>> If you want to imporve the docs you are also welcome; I can sned you the
>>> OpenOffice files you need if you find it suitable
>>>
>>> Éric.
>>>
>>>
>>>
>>> 2014-05-23 14:13 GMT+02:00 Brian Bouterse <bmbouter at gmail.com>:
>>>
>>> Thanks for the reply Eric. It is great to get hints and suggestions from
>>>> the author directly!
>>>>
>>>> I used the commands that you outlined, and they were able to reproduce
>>>> the expected output verbatim, which is great. Thanks for clearing that up
>>>> for me. I've gotten further towards my goal.
>>>>
>>>> I've now got three questions:
>>>>
>>>> 1)  I get an unexpected result when I run ms_var() on my own data. I
>>>> run these commands:
>>>>
>>>> load('arhmm_example.dat')
>>>> nb_states=2
>>>> switch_var=2 // variances are switching
>>>> var_opt=3 // unrestricted var-cov matrix
>>>>
>>>> r=ms_var('cte',3,'ardata',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>>>
>>>> I receive this output:
>>>>
>>>> WARNING: in overlay, series number 2 has been ignored because of a bad
>>>> frequency
>>>>  !--error 10000
>>>> series ends before the end date of the bounds
>>>> at line      39 of function ts2vec0 called by :
>>>> at line     101 of function explone called by :
>>>> at line     253 of function ms_var called by :
>>>>
>>>> r=ms_var('cte',3,'ardata',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>>>
>>>> The arhmm_example.dat file is available here[0], and it was made by
>>>> running.the following command on the original csv[1] file arhmm_example.csv.
>>>>
>>>> impexc2bd('arhmm_example.csv', ';', 'arhmm_example.dat')
>>>>
>>>> I believe I either don't have the dates configured correctly, or it
>>>> requires a specific number of data points to match the frequency value,
>>>> which also may be wrong. Do you have some insight into this error message?
>>>> I've been reading the docs on the ts structure, and I will continue to try
>>>> to solve this roadblock.
>>>>
>>>>
>>>> 2) My goal in doing all this is to analyze Autoregressive Hidden Markov
>>>> Models. As I understand it, the VAR-HMM that ms_var provides is a
>>>> multivariate case of an Autoregressive Hidden Markov Model. The terms
>>>> Markov Switching Model, and Hidden Markov Model refer to the same thing.
>>>> Using a single variable with ms_var() as I show above in the example, will
>>>> simulate an AR-HMM(3). I would like to check if these statements agree with
>>>> your understanding.
>>>>
>>>>
>>>> 3) How could I contribute to the grocer code. At the very least I could
>>>> improve the docs some.
>>>>
>>>>
>>>> [0]:
>>>> https://s3.amazonaws.com/dfsklfdsklfds/fdsjkfsdjkfds/arhmm_example.dat
>>>> [1]:
>>>> https://s3.amazonaws.com/dfsklfdsklfds/fdsjkfsdjkfds/arhmm_example.csv
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>>
>>>> On Tue, May 20, 2014 at 3:55 PM, Eric Dubois <grocer.toolbox at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello Brian
>>>>>
>>>>> Sorry for this late answer, but I have been quite busy these days.
>>>>>
>>>>> I did not notice the problem with this version of the MS programs.
>>>>> Indeed I have changed the optimization device of all GROCER programs and I
>>>>> have not adapted the defaults for the MS programs.
>>>>>
>>>>> If you run:
>>>>> --> global GROCERDIR;
>>>>> --> load(GROCERDIR+'\data\us_revu.dat')
>>>>> --> bounds('1967m4','2004m2')
>>>>>
>>>>> --> nb_states=2
>>>>> --> switch_var=2 // variances are switching
>>>>> --> var_opt=3 // unrestricted var-cov matrix
>>>>>
>>>>> -->
>>>>> r=ms_var('cte',3,'100*(log(us_revu)-lagts(2,log(us_revu)))',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>>>> (see chapter 6 of the manual for explanations)
>>>>>
>>>>> Then the results of the ms_var demo is restaured.
>>>>>
>>>>> I will change the default in Grocer next version .
>>>>>
>>>>> Regards.
>>>>>
>>>>> Éric.
>>>>>
>>>>>
>>>>> 2014-05-19 12:56 GMT+02:00 Brian Bouterse <bmbouter at gmail.com>:
>>>>>
>>>>>> Hi Scilab community!
>>>>>>
>>>>>> I'm new to Scilab, and the AR-HMM and VAR-HMM solving capabilities of
>>>>>> GROCER are what interest me.
>>>>>>
>>>>>> I have a question relating to Chapter 23 from the GROCER manual[0].
>>>>>>  This is the univariate MS-AR(3) solved using the function ms_reg_d() on
>>>>>> the us_revu.dat data included with GROCER.  I have made no adjustment from
>>>>>> the example statements in Chapter 23.
>>>>>>
>>>>>> The example output is shown on pages 4 and 5 of the Chapter 23
>>>>>> module.  Compare that against the output I receive.
>>>>>>
>>>>>> http://fpaste.org/102978/14004958/
>>>>>>
>>>>>> Here are my questions:
>>>>>>
>>>>>> 1.  The numerical output is completely different.  I expected it to
>>>>>> be the same since the data is provided by GROCER, and I've done the example
>>>>>> exactly as shown in Chapter 23.  Is there some explanation to why the
>>>>>> solved solution I receive is different than the example output in the
>>>>>> chapter?
>>>>>>
>>>>>> 2.  I see output like %i*8.4469016 which seems like an error because
>>>>>> %i looks like a variable that yet needs to be replaced, and then multiplied
>>>>>> to get to its final value.  Is this some kind of bug or error?
>>>>>>
>>>>>> Thanks for any help the community can provide.  We'll be using this
>>>>>> for a seminar on HMM, AR-HMM, and VAR-HMM at North Carolina State
>>>>>> University.  I'm also a developer, so I really appreciate all the effort
>>>>>> that has been put into scilab and GROCER.
>>>>>>
>>>>>> Thanks,
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>> [0]:  http://dubois.ensae.net/Grocer_manual_v1.6.zip
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Brian Bouterse
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users at lists.scilab.org
>>>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users at lists.scilab.org
>>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Brian Bouterse
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users at lists.scilab.org
>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at lists.scilab.org
>>> http://lists.scilab.org/mailman/listinfo/users
>>>
>>>
>>
>>
>> --
>> Brian Bouterse
>>
>> _______________________________________________
>> users mailing list
>> users at lists.scilab.org
>> http://lists.scilab.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> http://lists.scilab.org/mailman/listinfo/users
>
>


-- 
Brian Bouterse
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20140603/88ce1eb4/attachment.htm>


More information about the users mailing list