[Scilab-Dev] BLAS use in Scilab

Mon Feb 26 13:48:18 CET 2018

  Hello,

Thank you for your answer. Would it have an interest to implement such  
an operator (and, as a consequence, open a kind of Pandora's box..) ?

S.

Quoting Clément David <Clement.David at esi-group.com>:

> Hello Stéphane,
>
> AFAIK, as in the old interface the new C++ one allow you to get  
> pointers to the raw double* values
> and so allow you to write anything to the inputs. However this kind  
> of behavior is considered a bug
> as from the user point of view (eg. Scilab beginners) each  
> assignation is supposed to have inputs
> (unmodified data) and outputs (computed data).
>
> Note: about the += operator, this is indeed a way to tell the user  
> that the argument is modified in
> place with a specific + operation.
>
> Thanks,
>
> --
> Clément
>
> Le jeudi 22 février 2018 à 14:18 +0100, Stéphane Mottelet a écrit :
>> Really, nobody knows ?
>>
>> S.
>>
>> Le 20/02/2018 à 11:57, Stéphane Mottelet a écrit :
>> Hello,
>>
>> Continuing on this subject, Hello, I discovered that the new Scilab  
>> API allows to modify input
>> parameters of a function (in-place assignment), e.g. I have  
>> modified the previous daxpy such
>> that the expression
>>
>> daxpy(2,X,Y)
>>
>> has no output but formally does "Y+=2*X" if such an operator would  
>> exist in Scilab. In this case
>> there is no matrix copy at all, hence no memory overhead.
>>
>> Was it possible to do this with the previous API ?
>>
>> S.
>>
>> Le 19/02/2018 à 19:15, Stéphane Mottelet a écrit :
>>> Hello,
>>>
>>> After some tests, for the intended use (multiply a matrix by a  
>>> scalar), dgemm is not faster
>>> that dscal, but in the C code of "iMultiRealScalarByRealMatrix",  
>>> the part which takes the most
>>> of the CPU time is the call to "dcopy". For example, on my  
>>> machine,  for a 10000x10000 matrix,
>>> the call to dcopy takes 540 milliseconds and the call to dscal 193  
>>> milliseconds. Continuing my
>>> explorations today, I tried to see how Scilab expressions such as
>>>
>>> Y+2*X
>>>
>>> are parsed and executed. To this purpose I have written an  
>>> interface (daxpy.sci and daxpy.c
>>> attached) to the BLAS function "daxpy" which does "y<-y+a*x" and a  
>>> script comparing the above
>>> expression to
>>>
>>> daxpy(2,X,Y)
>>>
>>> for two 10000x10000 matrices. Here are the results (MacBook air  
>>> core i7 at 1,7GHz):
>>>
>>>   daxpy(2,X,Y)
>>>   (dcopy: 582 ms)
>>>   (daxpy: 211 ms)
>>>
>>>   elapsed time: 793 ms
>>>
>>>   Y+2*X
>>>
>>>   elapsed time: 1574 ms
>>>
>>> Considered the above value, the explanation is that in "Y+2*X"  
>>> there are *two* copies of a
>>> 10000x10000 matrix instead of only one in "daxpy(2,X,Y)". In  
>>> "Y+2*X+3*Z" there will be three
>>> copies, although there could be only one if daxpy was used twice.
>>>
>>> I am not blaming Scilab here, I am just blaming "vectorization",  
>>> which can be inefficient when
>>> large objects           are used. That's why explicits loops can  
>>> sometimes be faster than
>>> vectorized operations in Matlab or Julia (which both use JIT compilation).
>>>
>>> S.
>>>
>>>
>>> Le 15/02/2018 à 17:11, Antoine ELIAS a écrit :
>>> > Hello Stéphane,
>>> >
>>> > Interesting ...
>>> >
>>> > In release, we don't ship the header of BLAS/LAPACK functions.
>>> > But you can define them in your C file as extern. ( and let the  
>>> linker do his job )
>>> >
>>> > extern int C2F(dgemm) (char *_pstTransA, char *_pstTransB, int  
>>> *_piN, int *_piM, int *_piK,
>>> > double *_pdblAlpha, double *_pdblA, int *_piLdA,
>>> >                        double *_pdblB, int *_piLdB, double  
>>> *_pdblBeta, double *_pdblC, int
>>> > *_piLdC);
>>> > and
>>> >
>>> > extern int C2F(dscal) (int *_iSize, double *_pdblVal, double  
>>> *_pdblDest, int *_iInc);
>>> >
>>> > Others BLAS/LAPACK prototypes can be found at  
>>> http://cgit.scilab.org/scilab/tree/scilab/modu
>>> > les/elementary_functions/includes/elem_common.h?h=6.0
>>> >
>>> > Regards,
>>> > Antoine
>>> > Le 15/02/2018 à 16:50, Stéphane Mottelet a écrit :
>>> > > Hello all,
>>> > >
>>> > > Following the recent discussion with fujimoto, I discovered  
>>> that Scilab does not (seem to)
>>> > > fully use SIMD operation in  BLAS as it should. Besides the  
>>> bottlenecks of its code, there
>>> > > are also many operations of the kind
>>> > >
>>> > > scalar*matrix
>>> > >
>>> > > Althoug this operation is correctly delegated to the DSCAL  
>>> BLAS function (can be seen in C
>>> > > function iMultiRealMatrixByRealMatrix in
>>> > > modules/ast/src/c/operations/matrix_multiplication.c) :
>>> > >
>>> > > > int iMultiRealScalarByRealMatrix(
>>> > > >     double _dblReal1,
>>> > > >     double *_pdblReal2,    int _iRows2, int _iCols2,
>>> > > >     double *_pdblRealOut)
>>> > > > {
>>> > > >     int iOne    = 1;
>>> > > >     int iSize2    = _iRows2 * _iCols2;
>>> > > >
>>> > > >     C2F(dcopy)(&iSize2, _pdblReal2, &iOne, _pdblRealOut, &iOne);
>>> > > >     C2F(dscal)(&iSize2, &_dblReal1, _pdblRealOut, &iOne);
>>> > > >     return 0;
>>> > > > }
>>> > >  in the code below the product "A*1" is likely using only one  
>>> processor core, as seen on
>>> > > the cpu usage graph and on the elapsed time,
>>> > >
>>> > > A=rand(20000,20000);
>>> > > tic; for i=1:10; A*1; end; toc
>>> > >
>>> > >  ans  =
>>> > >
>>> > >    25.596843
>>> > >
>>> > > but this second piece of code is more than 8 times faster and  
>>> uses 100% of the cpu,
>>> > >
>>> > > ONE=ones(20000,1);
>>> > > tic; for i=1:10; A*ONE; end; toc
>>> > >
>>> > >  ans  =
>>> > >
>>> > >    2.938314
>>> > >
>>> > > with roughly the same number of multiplications. This second  
>>> computation is delegated to
>>> > > DGEMM (C<-alpha*A*B + beta*C, here with alpha=1 and beta=0)
>>> > >
>>> > > > int iMultiRealMatrixByRealMatrix(
>>> > > >     double *_pdblReal1,    int _iRows1, int _iCols1,
>>> > > >     double *_pdblReal2,    int _iRows2, int _iCols2,
>>> > > >     double *_pdblRealOut)
>>> > > > {
>>> > > >     double dblOne        = 1;
>>> > > >     double dblZero        = 0;
>>> > > >
>>> > > >     C2F(dgemm)("n", "n", &_iRows1, &_iCols2, &_iCols1, &dblOne,
>>> > > >                _pdblReal1 , &_iRows1 ,
>>> > > >                _pdblReal2, &_iRows2, &dblZero,
>>> > > >                _pdblRealOut , &_iRows1);
>>> > > >     return 0;
>>> > > > }
>>> > >  Maybe my intuition is wrong, but I have the feeling that  
>>> using dgemm with alpha=0 will be
>>> > > faster than dscal. I plan to test this by making a quick and  
>>> dirty code linked to Scilab
>>> > > so my question to devs is : which are the #includes to add on  
>>> top of the source (C) to be
>>> > > able to call dgemm and dscal ?
>>> > >
>>> > > Thanks for your help
>>> > >
>>> > > S.
>>> > >
>>> > >
>>> >
>>> > _______________________________________________
>>> > dev mailing list
>>> > dev at lists.scilab.org
>>> > http://lists.scilab.org/mailman/listinfo/dev
>>>
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at lists.scilab.org
>>> http://lists.scilab.org/mailman/listinfo/dev
>>
>> --
>> Stéphane Mottelet
>> Ingénieur de recherche
>> EA 4297 Transformations Intégrées de la Matière Renouvelable
>> Département Génie des Procédés Industriels
>> Sorbonne Universités - Université de Technologie de Compiègne
>> CS 60319, 60203 Compiègne cedex
>> Tel : +33(0)344234688
>> http://www.utc.fr/~mottelet
>>
>> _______________________________________________
>> dev mailing list
>> dev at lists.scilab.org
>> http://lists.scilab.org/mailman/listinfo/dev
>>
>> --
>> Stéphane Mottelet
>> Ingénieur de recherche
>> EA 4297 Transformations Intégrées de la Matière Renouvelable
>> Département Génie des Procédés Industriels
>> Sorbonne Universités - Université de Technologie de Compiègne
>> CS 60319, 60203 Compiègne cedex
>> Tel : +33(0)344234688
>> http://www.utc.fr/~mottelet
>> _______________________________________________
>> dev mailing list
>> dev at lists.scilab.org
>> http://lists.scilab.org/mailman/listinfo/dev
>
> _______________________________________________
> dev mailing list
> dev at lists.scilab.orghttp://lists.scilab.org/mailman/listinfo/dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/dev/attachments/20180226/7dec0436/attachment.htm>