[Scilab-users] GROCER VAR-HMM Example Output Different from Chapter 23
Brian Bouterse
bmbouter at gmail.com
Fri Jun 20 12:52:53 CEST 2014
Hi Eric,
I want to reach out and see what your thoughts are on putting a copy of
grocer on github?
There is already a grocer on github that is a different software project so
github.com/grocer/grocer is already taken. Maybe it could be housed at
github.com/<yourusername>/grocer for now.
I could also post it on github ( github.com/bmbouter ), but having a
canonical version from the author would be better. I would be willing to
submit some documentation patches through github.
Thanks again for the great software and the help; it's working for us!
Best,
Brian
On Tue, Jun 3, 2014 at 4:40 PM, Eric Dubois <grocer.toolbox at gmail.com>
wrote:
> Hello Brian.
>
> Thank you for your feedback. I will try to make the distinction between
> 'cte' and 'all' more precise in the manual.
>
> I will also think about your suggestion to put the code on a hub.
>
> Regards.
>
> Éric.
>
>
> 2014-06-03 12:21 GMT+02:00 Brian Bouterse <bmbouter at gmail.com>:
>
> Hi Eric,
>>
>> Thanks for the quick and helpful reply! I adjusted the first parameter
>> from 'cte' to 'all' and that produces the correct form of output, and the
>> values look close to the expected values. This is great!
>>
>> I've read the help documents several times, but I didn't realize the
>> impact of adjusting the value of the first parameter. The doc example on
>> the ms_var help page describes the parameter, but the only example on that
>> page uses 'cte'. There is no example with 'all' on the ms_var() help page.
>> Also including some sample output may also be good there. I would like to
>> contribute an example like this to the docs, manual, or help files. I think
>> that would have caused me to solve my own problems instead of writing long
>> e-mails.
>>
>> I also tried using a larger data file 10,000 observations in a column
>> format, and then impexc2bd transformed the data file correctly. That is
>> also solved thanks to your suggestion.
>>
>> I know the code is available for download, but would you ever consider
>> putting a copy of the GROCER code in a repository on github.com? If it
>> were in a place like that, I, and others, would have an easier time of
>> contributing to the codebase instead of e-mailing patches. What do you
>> think about an idea like this? Just a friendly suggestion based on what has
>> worked well for me in the past. I can help put it up there, and configure
>> it if that is helpful.
>>
>> Thanks again for making this great software, and helping me use it!
>>
>> Best,
>> Brian
>>
>>
>>
>>
>>
>> On Thu, May 29, 2014 at 1:41 PM, Eric Dubois <grocer.toolbox at gmail.com>
>> wrote:
>>
>>> Hello Brian
>>>
>>> You did not recover the intial parameters because you did not estimate
>>> the same model as the one you generated.
>>>
>>> This is because you used in ms_var the option 'cte' which means that
>>> only the constant is allowed to switch: and this is why the AR coeffcients
>>> are only given for 'all regimes'. I f you rather use the option 'all', then
>>> all parameters are allowed to switch and the result is now consistent with
>>> the parameters of the generated model, that is the true values are within
>>> the confidence interval of the estimated ones. This is explained in the
>>> help command (see help ms_var): may I ask you why you did not infer it from
>>> the help files?
>>>
>>> As for the big file, I suspect it is truncated into Scilab. Can you try
>>> with data in column instead of in rows?
>>>
>>> Éric.
>>>
>>>
>>> 2014-05-29 13:24 GMT+02:00 Brian Bouterse <bmbouter at gmail.com>:
>>>
>>> Hi Eric,
>>>>
>>>> Thanks again for the response. What you told me to do worked. If I
>>>> closed Scilab, and rerun the example from my earlier e-mail, I do not see
>>>> any warnings, which is good. The output I received is not what I expected
>>>> in several ways, so I'll ask some more questions here. There are a lot of
>>>> different questions here, but they all relate to the example below, and the
>>>> use of ms_var and GROCER. Thanks in advance for any insight into the issues
>>>> I am experiencing.
>>>>
>>>> The example:
>>>>
>>>> I have a file arhmm_example_280.dat [0] that was generated and loaded
>>>> from arhmm_example_280.csv [1] using the following commands:
>>>>
>>>> impexc2bd('arhmm_example_280.csv', ';', 'arhmm_example_280.dat')
>>>> load('arhmm_example_280.dat')
>>>>
>>>> I then run, what I expect to be a single variable, 2-regime,
>>>>
>>>> nb_states=2
>>>> switch_var=2
>>>> var_opt=3
>>>>
>>>> r=ms_var('cte',3,'ardata',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>>>
>>>> I then receive this output: http://ur1.ca/hej98
>>>>
>>>> 1) In the "coefficients" section of the output, the output doesn't seem
>>>> to contain enough numbers to describe the expected number of outputs. The
>>>> model has 2 regimes, and it is autoregressive with order 3, so I expect
>>>> there to be 3 coefficients for EACH regime (regime 1 and regime 2).
>>>> Instead, I see a single coefficient for Regime 1 (0.9313736) and a single
>>>> coefficient for Regime 2 (8.2685973). Where are the three autoregressive
>>>> parameters for regime 1 and the other three autoregressive parameters for
>>>> regime 2?
>>>>
>>>> 2) There is a section in the output called "All Regimes" which is in
>>>> the "Coefficients" section contains 3 coefficients. Do these correspond to
>>>> the autoregressive aspect of the model [0.2258037, 0.4757391, -0.3161730).
>>>> I expected to see a section like this for Regime 1, and one for Regime 2,
>>>> but I did not. I'm not sure how these 3 coefficients for "All Regimes" fit
>>>> in. The Markov Switching Model switches the autoregressive coefficients
>>>> with each Regime, so I'm not sure how to interpret coefficients that are
>>>> for "both regimes".
>>>>
>>>> 3) I believe each regime will have its error term estimated, and I do
>>>> see that in the output. For example under Regime 2, the variance of the
>>>> error term is 2.3164941. This is listed in the section "Variance-covariance
>>>> matrix of residuals" which is a subsection under "Regime 2". Am I
>>>> interpreting the output correctly? Where is the mean value for the regime?
>>>>
>>>> 4) My overall goal is to validate that I can hand a list of
>>>> observations to GROCER+scilab, along with the order of the autoregressive
>>>> model, and the number of regimes, and have it estimate the Markov hidden
>>>> state transition probabilities, the estimated regime that is in effect at
>>>> time t, and coefficients for each regime, and the error term that can be
>>>> used for prediction of the t+1 observation. I don't know of a "well known"
>>>> data set that allows for this type of validation, so I wrote a simple
>>>> AR-HMM generator named koa marhn [2], which outputs AR-HMM data that
>>>> follows a set of parameters. This is the program that generated the 280
>>>> point data set in the arhmm_example_280.csv [1] file. You can see the
>>>> parameters used to generate this file here:
>>>>
>>>> https://github.com/bmbouter/koa_marhn/blob/master/arhmm_example.py
>>>>
>>>> A[i][j] is the probability transition from regime i to regime j.
>>>> C[x][y] is the autoregressive coefficient for regime x used for
>>>> generation of the t+1 element from the t, t-1, t-2, ..., t-y summation.
>>>> Each y is a coefficient for regime x.
>>>> R[q] contains mean and std_deviation values for regime q
>>>> pi[m] contains the starting state probabilities that the regime is in
>>>> state m at time t=0
>>>>
>>>> The output from http://ur1.ca/hej98 does not correspond at all with
>>>> the values of these inputs, which means something unexpected is going on!
>>>> Do you have insight into why scilab doesn't produce these expected numbers?
>>>>
>>>> 5) One possibility is that maybe it is "working", but there isn't
>>>> enough data for it to estimate the parameters correctly. I had koa marhn
>>>> [2] output 10,000 observations, but I ran into trouble transforming it from
>>>> a csv file to a dat file using impexc2bd. My input file is
>>>> arhmm_example_10000.csv [3]. I tried to convert to .dat using:
>>>>
>>>> impexc2bd('arhmm_example_10000.csv', ';', 'arhmm_example_10000.dat')
>>>>
>>>> I receive this output, and the .dat file in NOT created
>>>>
>>>> !--error 10000
>>>> dates are entered neither in chronlogical order nor in reverseerse
>>>> chronological order
>>>> at line 87 of function read_dates called by :
>>>> at line 188 of function impexc2bd called by :
>>>> arhmm_example_10000.dat'
>>>>
>>>> How do you transform a large .csv file to .dat? Is there some problem
>>>> with the data file, the way I am making it?
>>>>
>>>> [0]: https://s3.amazonaws.com/scilab_data_files/arhmm_example_280.dat
>>>> [1]: https://s3.amazonaws.com/scilab_data_files/arhmm_example_280.csv
>>>> [2]: https://github.com/bmbouter/koa_marhn
>>>> [3]:
>>>> https://s3.amazonaws.com/scilab_data_files/arhmm_example_10000.csv
>>>> [4]: https://s3.amazonaws.com/scilab_data_files/arhmm_example_1000.csv
>>>> [5]: https://s3.amazonaws.com/scilab_data_files/arhmm_example_1000.dat
>>>>
>>>> This e-mail is way to long, but I wanted to fully recap the issues I've
>>>> considered as I use ms_var(). Thanks for any help you can provide. You can
>>>> also find me in the #scilab channel on the OFTC IRC servers.
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>>
>>>>
>>>> On Fri, May 23, 2014 at 3:48 PM, Eric Dubois <grocer.toolbox at gmail.com>
>>>> wrote:
>>>>
>>>>> Dear Brian
>>>>>
>>>>> 1) For me it works fine; I suspect that you have run ms_var with your
>>>>> data after having run with the manual example. I expect that if you reopen
>>>>> Scialb and run your problem, starting from load('arhmm_example.dat'),
>>>>> it will work. A good advice anyway is to set the bounds before each
>>>>> estimation or to run:
>>>>> --> bounds()
>>>>> if you want to use the greatest available time span with your data
>>>>> 2) I agree; indeed the example in the manual as well as yours are
>>>>> univariate, but ms_var also works with multivariate series (I have run some
>>>>> tests which worked well)
>>>>> 3) If you want to contribute, do not hesitate to send me code (at
>>>>> grocer.toolbox at gmail.com or grocer.toolbox at free.fr); add your
>>>>> copyright; if you can create a help file it would still be better (I have
>>>>> some tools do help doing that if you want them); and if you can add to the
>>>>> manual, it would be marvellous!
>>>>> If you want to imporve the docs you are also welcome; I can sned you
>>>>> the OpenOffice files you need if you find it suitable
>>>>>
>>>>> Éric.
>>>>>
>>>>>
>>>>>
>>>>> 2014-05-23 14:13 GMT+02:00 Brian Bouterse <bmbouter at gmail.com>:
>>>>>
>>>>> Thanks for the reply Eric. It is great to get hints and suggestions
>>>>>> from the author directly!
>>>>>>
>>>>>> I used the commands that you outlined, and they were able to
>>>>>> reproduce the expected output verbatim, which is great. Thanks for clearing
>>>>>> that up for me. I've gotten further towards my goal.
>>>>>>
>>>>>> I've now got three questions:
>>>>>>
>>>>>> 1) I get an unexpected result when I run ms_var() on my own data. I
>>>>>> run these commands:
>>>>>>
>>>>>> load('arhmm_example.dat')
>>>>>> nb_states=2
>>>>>> switch_var=2 // variances are switching
>>>>>> var_opt=3 // unrestricted var-cov matrix
>>>>>>
>>>>>> r=ms_var('cte',3,'ardata',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>>>>>
>>>>>> I receive this output:
>>>>>>
>>>>>> WARNING: in overlay, series number 2 has been ignored because of a
>>>>>> bad frequency
>>>>>> !--error 10000
>>>>>> series ends before the end date of the bounds
>>>>>> at line 39 of function ts2vec0 called by :
>>>>>> at line 101 of function explone called by :
>>>>>> at line 253 of function ms_var called by :
>>>>>>
>>>>>> r=ms_var('cte',3,'ardata',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>>>>>
>>>>>> The arhmm_example.dat file is available here[0], and it was made by
>>>>>> running.the following command on the original csv[1] file arhmm_example.csv.
>>>>>>
>>>>>> impexc2bd('arhmm_example.csv', ';', 'arhmm_example.dat')
>>>>>>
>>>>>> I believe I either don't have the dates configured correctly, or it
>>>>>> requires a specific number of data points to match the frequency value,
>>>>>> which also may be wrong. Do you have some insight into this error message?
>>>>>> I've been reading the docs on the ts structure, and I will continue to try
>>>>>> to solve this roadblock.
>>>>>>
>>>>>>
>>>>>> 2) My goal in doing all this is to analyze Autoregressive Hidden
>>>>>> Markov Models. As I understand it, the VAR-HMM that ms_var provides is a
>>>>>> multivariate case of an Autoregressive Hidden Markov Model. The terms
>>>>>> Markov Switching Model, and Hidden Markov Model refer to the same thing.
>>>>>> Using a single variable with ms_var() as I show above in the example, will
>>>>>> simulate an AR-HMM(3). I would like to check if these statements agree with
>>>>>> your understanding.
>>>>>>
>>>>>>
>>>>>> 3) How could I contribute to the grocer code. At the very least I
>>>>>> could improve the docs some.
>>>>>>
>>>>>>
>>>>>> [0]:
>>>>>> https://s3.amazonaws.com/dfsklfdsklfds/fdsjkfsdjkfds/arhmm_example.dat
>>>>>> [1]:
>>>>>> https://s3.amazonaws.com/dfsklfdsklfds/fdsjkfsdjkfds/arhmm_example.csv
>>>>>>
>>>>>> Thanks,
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>> On Tue, May 20, 2014 at 3:55 PM, Eric Dubois <
>>>>>> grocer.toolbox at gmail.com> wrote:
>>>>>>
>>>>>>> Hello Brian
>>>>>>>
>>>>>>> Sorry for this late answer, but I have been quite busy these days.
>>>>>>>
>>>>>>> I did not notice the problem with this version of the MS programs.
>>>>>>> Indeed I have changed the optimization device of all GROCER programs and I
>>>>>>> have not adapted the defaults for the MS programs.
>>>>>>>
>>>>>>> If you run:
>>>>>>> --> global GROCERDIR;
>>>>>>> --> load(GROCERDIR+'\data\us_revu.dat')
>>>>>>> --> bounds('1967m4','2004m2')
>>>>>>>
>>>>>>> --> nb_states=2
>>>>>>> --> switch_var=2 // variances are switching
>>>>>>> --> var_opt=3 // unrestricted var-cov matrix
>>>>>>>
>>>>>>> -->
>>>>>>> r=ms_var('cte',3,'100*(log(us_revu)-lagts(2,log(us_revu)))',nb_states,switch_var,var_opt,'prt=initial;final','opt_convg=0')
>>>>>>> (see chapter 6 of the manual for explanations)
>>>>>>>
>>>>>>> Then the results of the ms_var demo is restaured.
>>>>>>>
>>>>>>> I will change the default in Grocer next version .
>>>>>>>
>>>>>>> Regards.
>>>>>>>
>>>>>>> Éric.
>>>>>>>
>>>>>>>
>>>>>>> 2014-05-19 12:56 GMT+02:00 Brian Bouterse <bmbouter at gmail.com>:
>>>>>>>
>>>>>>>> Hi Scilab community!
>>>>>>>>
>>>>>>>> I'm new to Scilab, and the AR-HMM and VAR-HMM solving capabilities
>>>>>>>> of GROCER are what interest me.
>>>>>>>>
>>>>>>>> I have a question relating to Chapter 23 from the GROCER manual[0].
>>>>>>>> This is the univariate MS-AR(3) solved using the function ms_reg_d() on
>>>>>>>> the us_revu.dat data included with GROCER. I have made no adjustment from
>>>>>>>> the example statements in Chapter 23.
>>>>>>>>
>>>>>>>> The example output is shown on pages 4 and 5 of the Chapter 23
>>>>>>>> module. Compare that against the output I receive.
>>>>>>>>
>>>>>>>> http://fpaste.org/102978/14004958/
>>>>>>>>
>>>>>>>> Here are my questions:
>>>>>>>>
>>>>>>>> 1. The numerical output is completely different. I expected it to
>>>>>>>> be the same since the data is provided by GROCER, and I've done the example
>>>>>>>> exactly as shown in Chapter 23. Is there some explanation to why the
>>>>>>>> solved solution I receive is different than the example output in the
>>>>>>>> chapter?
>>>>>>>>
>>>>>>>> 2. I see output like %i*8.4469016 which seems like an error
>>>>>>>> because %i looks like a variable that yet needs to be replaced, and then
>>>>>>>> multiplied to get to its final value. Is this some kind of bug or error?
>>>>>>>>
>>>>>>>> Thanks for any help the community can provide. We'll be using this
>>>>>>>> for a seminar on HMM, AR-HMM, and VAR-HMM at North Carolina State
>>>>>>>> University. I'm also a developer, so I really appreciate all the effort
>>>>>>>> that has been put into scilab and GROCER.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Brian
>>>>>>>>
>>>>>>>>
>>>>>>>> [0]: http://dubois.ensae.net/Grocer_manual_v1.6.zip
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Brian Bouterse
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users at lists.scilab.org
>>>>>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users at lists.scilab.org
>>>>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Brian Bouterse
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users at lists.scilab.org
>>>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users at lists.scilab.org
>>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Brian Bouterse
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users at lists.scilab.org
>>>> http://lists.scilab.org/mailman/listinfo/users
>>>>
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users at lists.scilab.org
>>> http://lists.scilab.org/mailman/listinfo/users
>>>
>>>
>>
>>
>> --
>> Brian Bouterse
>>
>> _______________________________________________
>> users mailing list
>> users at lists.scilab.org
>> http://lists.scilab.org/mailman/listinfo/users
>>
>>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> http://lists.scilab.org/mailman/listinfo/users
>
>
--
Brian Bouterse
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20140620/f96b947c/attachment.htm>
More information about the users
mailing list