<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">Really, nobody knows ?<br>

      <br>

      S.<br>

      <br>

      Le 20/02/2018 à 11:57, Stéphane Mottelet a écrit :<br>

    </div>

    <blockquote type="cite"

      cite="mid:e8820c71-751d-77bf-913e-a445d711e3dd@utc.fr">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div class="moz-cite-prefix">Hello,<br>

        <br>

        Continuing on this subject, Hello, I discovered that the new

        Scilab API allows to modify input parameters of a function

        (in-place assignment), e.g. I have modified the previous daxpy

        such that the expression<br>

        <br>

        daxpy(2,X,Y)<br>

        <br>

        has no output but formally does "Y+=2*X" if such an operator

        would exist in Scilab. In this case there is no matrix copy at

        all, hence no memory overhead.<br>

        <br>

        Was it possible to do this with the previous API ?<br>

        <br>

        S.<br>

        <br>

        Le 19/02/2018 à 19:15, Stéphane Mottelet a écrit :<br>

      </div>

      <blockquote type="cite"

        cite="mid:1553ecbc-2120-9236-3e8e-b32eb09811e4@utc.fr">

        <meta http-equiv="Content-Type" content="text/html;

          charset=utf-8">

        <div class="moz-cite-prefix">Hello,<br>

          <br>

          After some tests, for the intended use (multiply a matrix by a

          scalar), dgemm is not faster that dscal, but in the C code of

          "iMultiRealScalarByRealMatrix", the part which takes the most

          of the CPU time is the call to "dcopy". For example, on my

          machine,  for a 10000x10000 matrix, the call to dcopy takes

          540 milliseconds and the call to dscal 193 milliseconds.

          Continuing my explorations today, I tried to see how Scilab

          expressions such as<br>

          <br>

          Y+2*X<br>

          <br>

          are parsed and executed. To this purpose I have written an

          interface (daxpy.sci and daxpy.c attached) to the BLAS

          function "daxpy" which does "y<-y+a*x" and a script

          comparing the above expression to <br>

          <br>

          daxpy(2,X,Y)<br>

          <br>

          for two 10000x10000 matrices. Here are the results (MacBook

          air core i7@1,7GHz):<br>

          <br>

           daxpy(2,X,Y)<br>

           (dcopy: 582 ms)<br>

           (daxpy: 211 ms)<br>

          <br>

           elapsed time: 793 ms<br>

          <br>

           Y+2*X<br>

          <br>

           elapsed time: 1574 ms<br>

          <br>

          Considered the above value, the explanation is that in "Y+2*X"

          there are *two* copies of a 10000x10000 matrix instead of only

          one in "daxpy(2,X,Y)". In "Y+2*X+3*Z" there will be three

          copies, although there could be only one if daxpy was used

          twice. <br>

          <br>

          I am not blaming Scilab here, I am just blaming

          "vectorization", which can be inefficient when large objects

          are used. That's why explicits loops can sometimes be faster

          than vectorized operations in Matlab or Julia (which both use

          JIT compilation).<br>

          <br>

          S.<br>

          <br>

          <br>

          Le 15/02/2018 à 17:11, Antoine ELIAS a écrit :<br>

        </div>

        <blockquote type="cite"

          cite="mid:3daebae7-eba4-a32c-5e1e-c403ece6f5fa@scilab-enterprises.com">Hello

          Stéphane, <br>

          <br>

          Interesting ... <br>

          <br>

          In release, we don't ship the header of BLAS/LAPACK functions.

          <br>

          But you can define them in your C file as extern. ( and let

          the linker do his job ) <br>

          <br>

          extern int C2F(dgemm) (char *_pstTransA, char *_pstTransB, int

          *_piN, int *_piM, int *_piK, double *_pdblAlpha, double

          *_pdblA, int *_piLdA, <br>

                                 double *_pdblB, int *_piLdB, double

          *_pdblBeta, double *_pdblC, int *_piLdC); <br>

          and <br>

          <br>

          extern int C2F(dscal) (int *_iSize, double *_pdblVal, double

          *_pdblDest, int *_iInc); <br>

          <br>

          Others BLAS/LAPACK prototypes can be found at <a

            class="moz-txt-link-freetext"

href="http://cgit.scilab.org/scilab/tree/scilab/modules/elementary_functions/includes/elem_common.h?h=6.0"

            moz-do-not-send="true">http://cgit.scilab.org/scilab/tree/scilab/modules/elementary_functions/includes/elem_common.h?h=6.0</a><br>

          <br>

          Regards, <br>

          Antoine <br>

          Le 15/02/2018 à 16:50, Stéphane Mottelet a écrit : <br>

          <blockquote type="cite">Hello all, <br>

            <br>

            Following the recent discussion with fujimoto, I discovered

            that Scilab does not (seem to) fully use SIMD operation in 

            BLAS as it should. Besides the bottlenecks of its code,

            there are also many operations of the kind <br>

            <br>

            scalar*matrix <br>

            <br>

            Althoug this operation is correctly delegated to the DSCAL

            BLAS function (can be seen in C function

            iMultiRealMatrixByRealMatrix in

            modules/ast/src/c/operations/matrix_multiplication.c) : <br>

            <br>

            <blockquote type="cite">int iMultiRealScalarByRealMatrix( <br>

                  double _dblReal1, <br>

                  double *_pdblReal2,    int _iRows2, int _iCols2, <br>

                  double *_pdblRealOut) <br>

              { <br>

                  int iOne    = 1; <br>

                  int iSize2    = _iRows2 * _iCols2; <br>

              <br>

                  C2F(dcopy)(&iSize2, _pdblReal2, &iOne,

              _pdblRealOut, &iOne); <br>

                  C2F(dscal)(&iSize2, &_dblReal1, _pdblRealOut,

              &iOne); <br>

                  return 0; <br>

              } <br>

            </blockquote>

            in the code below the product "A*1" is likely using only one

            processor core, as seen on the cpu usage graph and on the

            elapsed time, <br>

            <br>

            A=rand(20000,20000); <br>

            tic; for i=1:10; A*1; end; toc <br>

            <br>

             ans  = <br>

            <br>

               25.596843 <br>

            <br>

            but this second piece of code is more than 8 times faster

            and uses 100% of the cpu, <br>

            <br>

            ONE=ones(20000,1); <br>

            tic; for i=1:10; A*ONE; end; toc <br>

            <br>

             ans  = <br>

            <br>

               2.938314 <br>

            <br>

            with roughly the same number of multiplications. This second

            computation is delegated to DGEMM (C<-alpha*A*B + beta*C,

            here with alpha=1 and beta=0) <br>

            <br>

            <blockquote type="cite">int iMultiRealMatrixByRealMatrix( <br>

                  double *_pdblReal1,    int _iRows1, int _iCols1, <br>

                  double *_pdblReal2,    int _iRows2, int _iCols2, <br>

                  double *_pdblRealOut) <br>

              { <br>

                  double dblOne        = 1; <br>

                  double dblZero        = 0; <br>

              <br>

                  C2F(dgemm)("n", "n", &_iRows1, &_iCols2,

              &_iCols1, &dblOne, <br>

                             _pdblReal1 , &_iRows1 , <br>

                             _pdblReal2, &_iRows2, &dblZero, <br>

                             _pdblRealOut , &_iRows1); <br>

                  return 0; <br>

              } <br>

            </blockquote>

            Maybe my intuition is wrong, but I have the feeling that

            using dgemm with alpha=0 will be faster than dscal. I plan

            to test this by making a quick and dirty code linked to

            Scilab so my question to devs is : which are the #includes

            to add on top of the source (C) to be able to call dgemm and

            dscal ? <br>

            <br>

            Thanks for your help <br>

            <br>

            S. <br>

            <br>

            <br>

          </blockquote>

          <br>

          _______________________________________________ <br>

          dev mailing list <br>

          <a class="moz-txt-link-abbreviated"

            href="mailto:dev@lists.scilab.org" moz-do-not-send="true">dev@lists.scilab.org</a>

          <br>

          <a class="moz-txt-link-freetext"

            href="http://lists.scilab.org/mailman/listinfo/dev"

            moz-do-not-send="true">http://lists.scilab.org/mailman/listinfo/dev</a>

          <br>

        </blockquote>

        <p><br>

        </p>

        <pre class="moz-signature" cols="72">-- 

Stéphane Mottelet

Ingénieur de recherche

EA 4297 Transformations Intégrées de la Matière Renouvelable

Département Génie des Procédés Industriels

Sorbonne Universités - Université de Technologie de Compiègne

CS 60319, 60203 Compiègne cedex

Tel : +33(0)344234688

<a class="moz-txt-link-freetext" href="http://www.utc.fr/%7Emottelet" moz-do-not-send="true">http://www.utc.fr/~mottelet</a></pre>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <br>

        <pre wrap="">_______________________________________________

dev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:dev@lists.scilab.org" moz-do-not-send="true">dev@lists.scilab.org</a>

<a class="moz-txt-link-freetext" href="http://lists.scilab.org/mailman/listinfo/dev" moz-do-not-send="true">http://lists.scilab.org/mailman/listinfo/dev</a>

</pre>

      </blockquote>

      <p><br>

      </p>

      <pre class="moz-signature" cols="72">-- 

Stéphane Mottelet

Ingénieur de recherche

EA 4297 Transformations Intégrées de la Matière Renouvelable

Département Génie des Procédés Industriels

Sorbonne Universités - Université de Technologie de Compiègne

CS 60319, 60203 Compiègne cedex

Tel : +33(0)344234688

<a class="moz-txt-link-freetext" href="http://www.utc.fr/%7Emottelet" moz-do-not-send="true">http://www.utc.fr/~mottelet</a></pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

dev mailing list

<a class="moz-txt-link-abbreviated" href="mailto:dev@lists.scilab.org">dev@lists.scilab.org</a>

<a class="moz-txt-link-freetext" href="http://lists.scilab.org/mailman/listinfo/dev">http://lists.scilab.org/mailman/listinfo/dev</a>

</pre>

    </blockquote>

    <p><br>

    </p>

    <pre class="moz-signature" cols="72">-- 

Stéphane Mottelet

Ingénieur de recherche

EA 4297 Transformations Intégrées de la Matière Renouvelable

Département Génie des Procédés Industriels

Sorbonne Universités - Université de Technologie de Compiègne

CS 60319, 60203 Compiègne cedex

Tel : +33(0)344234688

<a class="moz-txt-link-freetext" href="http://www.utc.fr/~mottelet">http://www.utc.fr/~mottelet</a></pre>

  </body>

</html>