<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <font face="Courier New">Stéphane,<br>

      <br>

      My first argument in favor of keeping covar and cov as separate

      functions is that often what one needs is the covariance between

      two potentially correlated signals regardless of their individual

      variances, so it seems somewhat inefficient to compute essentially

      three covariances (two of them between two equal signals) when it

      is only one of them what one wants to calculate. <br>

      <br>

      However, the syntax of covar should have an option to process the

      signals directly instead of their statistics (values and joint

      frequencies): covar(x,y)<br>

      <br>

      My second argument seems to be the opposite of my pevious request:

      covar(x,y,fre) is a quite different function since the input

      information is presented in a different way, which is valuable

      when one happens to have the information in such fashion.<br>

      <br>

      Regards,<br>

      <br>

      Federico Miyara<br>

       <br>

    </font><br>

    <div class="moz-cite-prefix">On 19/02/2020 17:14, Stéphane Mottelet

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:a12739b1-a830-ce31-8622-d240136f5f2e@utc.fr">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <p><tt>Hi all,</tt></p>

      <p><tt>Within the development team we recently had a discussion

          about the improvement of cov() in terms of speed and memory

          requirement and about the opportunity of merging cov() and

          covar() wich are two disctinct macros. Since we did not manage

          to reach a consensus we thought it could be the occasion to

          have the opinion of members of this list which have a

          recognized academical/research knowledge in probability and

          statistics. Here are some elements to start the discussion.

          Let us start with covar() macro and what it actually computes:</tt><tt><br>

        </tt></p>

      <p><tt>* covar()</tt></p>

      <p><tt>Let us start with a definition of covariance in general:</tt><tt><br>

        </tt></p>

      <tt> </tt>

      <p><tt><a class="moz-txt-link-freetext"

href="https://fr.wikipedia.org/wiki/Covariance#D%C3%A9finition_de_la_covariance"

            moz-do-not-send="true">https://fr.wikipedia.org/wiki/Covariance#D%C3%A9finition_de_la_covariance</a></tt></p>

      <tt> </tt>

      <p><tt>and with an example there:</tt><tt><br>

        </tt></p>

      <p><tt><a class="moz-txt-link-freetext"

            href="https://en.wikipedia.org/wiki/Covariance#Example"

            moz-do-not-send="true">https://en.wikipedia.org/wiki/Covariance#Example</a></tt></p>

      <tt> </tt>

      <p><tt>In the two above links scalar/real variables are considered

          and in the second link discrete random variables are

          considered. In the example the covariance is computed knowing

          the possible values and their joint density. You can easily

          check in the source of covar() (type "edit covar") that, after

          normalizing the matrix of joint probabilities (named

          "frequencies" in the source), the macro computes the same

          value, which is confirmed by the result of the following

          statements:</tt></p>

      <p><tt>--> x=[1 2];y=[1 2 3];fre = [1/4 1/4 0;0 1/4

          1/4];covar(x,y,fre)</tt><tt><br>

        </tt><tt> ans  =</tt><tt><br>

        </tt><tt><br>

        </tt><tt>   0.25</tt></p>

      <p><tt>Please note that covar() output is always a scalar. Now let

          us consider cov():</tt><tt><br>

        </tt></p>

      <tt> </tt>

      <p><tt>* cov()</tt></p>

      <p><tt>Here is a definition of the covariance matrix:</tt><tt><br>

        </tt> </p>

      <tt> </tt>

      <p><tt><a class="moz-txt-link-freetext"

            href="https://en.wikipedia.org/wiki/Covariance_matrix"

            moz-do-not-send="true">https://en.wikipedia.org/wiki/Covariance_matrix</a></tt></p>

      <tt> </tt>

      <p><tt>Here we consider vectors of random variables (not scalar

          random variables) and in this case the covariance is a matrix.

          When there is no a priori knowledge on these variables (when

          the joint density is not known, typically), the best you can

          do is, when you have samples of this random vector, is to

          compute an estimation of the covariance matrix, see e.g. he

          following page:</tt></p>

      <p><tt><a class="moz-txt-link-freetext"

            href="https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices"

            moz-do-not-send="true">https://en.wikipedia.org/wiki/Estimation_of_covariance_matrices</a></tt><tt><br>

        </tt></p>

      <p><tt>You can verify in actual code of cov() that this macro

          computes the same estimation (sums are vectorized). </tt><tt><br>

        </tt></p>

      <p><tt>We can summarize these facts this way:</tt><tt><br>

        </tt></p>

      <p><tt>* covar(x,y,fre) computes the scalar covariance of two

          discrete random variables knowing their possible values x(:)

          and y(:) and their joint probability density </tt><tt><br>

        </tt></p>

      <p><tt>* When x is a matrix, cov(x) computes an estimator of the

          covariance matrix of a vector X of size(x,2) random variables

          by using size(x,1) samples of this vector (each x(i,:) is a

          sample). if x and y are vectors of the same size, cov(x,y) is

          computed as cov([x(:) y(:)]).</tt><tt><br>

        </tt></p>

      <p><tt>To me, the main difference is that covar(x,y,fre) does not

          compute an </tt><tt><u>estimator</u></tt><tt> but a </tt><tt><u>exact

            value</u></tt><tt>. Of course, the vectors x and y can be

          the unique value of two random variables, gathered from

          samples (x,y) and "fre" be the empirical frequency of samples

          (x_i,y_j). In this case covar() will compute an estimation.

          For example, consider the two random variables X and Y, where

          X takes values {1,2} with equal probability, and Y=X+U where U

          takes values {0,1} with equal probability. We can use covar()

          to compute the exact covariance of X and Y, but if we only

          have samples, like in the below script, if we want to estimate

          the covariance with the same macro, then unique pairs have to

          be found and occurences counted in order to estimate the

          frequency :</tt></p>

      <p><tt>N=1000;</tt><tt><br>

        </tt><tt>x=ceil(rand(N,1)*2);</tt><tt><br>

        </tt><tt>y=x+floor(rand(N,1)*2);</tt><tt><br>

        </tt><tt><br>

        </tt><tt>[pairs,k]=unique(gsort([x y],'lr','i'),'r');</tt><tt><br>

        </tt><tt>f=diff([k;N+1])/N;</tt><tt><br>

        </tt></p>

      <p><tt>freq=sparse(pairs,f)</tt><tt><br>

        </tt><tt>N/(N-1)*covar(1:2,1:3,freq)</tt><tt><br>

        </tt><tt>cov(x,y)</tt><tt><br>

        </tt></p>

      <p><tt>If you have a look to the results</tt><tt>,<br>

        </tt></p>

      <p><tt>--> freq</tt><tt><br>

        </tt><tt> freq  = </tt><tt><br>

        </tt><tt><br>

        </tt><tt>   0.2526   0.2489   0.    </tt><tt><br>

        </tt><tt>   0.       0.2453   0.2532</tt><tt><br>

        </tt><tt><br>

        </tt><tt>--> N/(N-1)*covar(1:2,1:3,freq)<br>

           ans  =<br>

          <br>

             0.249769<br>

          <br>

          --> cov(x,y)<br>

           ans  =<br>

          <br>

             0.2500182   0.249769 <br>

             0.249769    0.4995447</tt><tt><br>

        </tt></p>

      <p><tt>you can see that <br>

        </tt></p>

      <p><tt>1. we have considered the same random variables as in the

          example </tt><tt><a class="moz-txt-link-freetext"

            href="https://en.wikipedia.org/wiki/Covariance#Example"

            moz-do-not-send="true">https://en.wikipedia.org/wiki/Covariance#Example</a><br>

          2. covar's output (up to the normalization to correct the

          bias) gives the off diagonal term of cov(x,y)<br>

        </tt></p>

      <p><tt>So, yes, off diagonal term of cov(x,y) and covar(x,y,fre)

          (up to unique pairs determination, computation of "fre" and

          bias correction) have the same value, but is it a reason to

          merge the two functions. I think that the answer is NO.<br>

        </tt></p>

      <p><tt>If you agree or disagree, feel free to continue the

          discussion in this thread.</tt></p>

      <p><tt>S.</tt><tt><br>

        </tt></p>

      <pre class="moz-signature" cols="72">-- 

Stéphane Mottelet

Ingénieur de recherche

EA 4297 Transformations Intégrées de la Matière Renouvelable

Département Génie des Procédés Industriels

Sorbonne Universités - Université de Technologie de Compiègne

CS 60319, 60203 Compiègne cedex

Tel : +33(0)344234688

<a class="moz-txt-link-freetext" href="http://www.utc.fr/~mottelet" moz-do-not-send="true">http://www.utc.fr/~mottelet</a>

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:users@lists.scilab.org">users@lists.scilab.org</a>

<a class="moz-txt-link-freetext" href="http://lists.scilab.org/mailman/listinfo/users">http://lists.scilab.org/mailman/listinfo/users</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>