[Scilab-Dev] BLAS use in Scilab

Tue Feb 27 14:00:42 CET 2018

Hello Clément,

Concerning my other question about the new Scilab API

> Currently trying to port a Scilab interface to Scilab 6 standards, I 
> did try to find the equivalent of the deprecated Scifunction ? What is 
> the new function and where is documented ?
how does one call a scilab function from within a gateway code, since 
"SciFunction" seems to belong to the past now ?

Thanks in advance,

Stéphane

Le 26/02/2018 à 12:36, Clément David a écrit :
> Hello Stéphane,
>
> AFAIK, as in the old interface the new C++ one allow you to get pointers to the raw double* values
> and so allow you to write anything to the inputs. However this kind of behavior is considered a bug
> as from the user point of view (eg. Scilab beginners) each assignation is supposed to have inputs
> (unmodified data) and outputs (computed data).
>
> Note: about the += operator, this is indeed a way to tell the user that the argument is modified in
> place with a specific + operation.
>
> Thanks,
>
> --
> Clément
>
> Le jeudi 22 février 2018 à 14:18 +0100, Stéphane Mottelet a écrit :
>> Really, nobody knows ?
>>
>> S.
>>
>> Le 20/02/2018 à 11:57, Stéphane Mottelet a écrit :
>>> Hello,
>>>
>>> Continuing on this subject, Hello, I discovered that the new Scilab API allows to modify input
>>> parameters of a function (in-place assignment), e.g. I have modified the previous daxpy such
>>> that the expression
>>>
>>> daxpy(2,X,Y)
>>>
>>> has no output but formally does "Y+=2*X" if such an operator would exist in Scilab. In this case
>>> there is no matrix copy at all, hence no memory overhead.
>>>
>>> Was it possible to do this with the previous API ?
>>>
>>> S.
>>>
>>> Le 19/02/2018 à 19:15, Stéphane Mottelet a écrit :
>>>> Hello,
>>>>
>>>> After some tests, for the intended use (multiply a matrix by a scalar), dgemm is not faster
>>>> that dscal, but in the C code of "iMultiRealScalarByRealMatrix", the part which takes the most
>>>> of the CPU time is the call to "dcopy". For example, on my machine,  for a 10000x10000 matrix,
>>>> the call to dcopy takes 540 milliseconds and the call to dscal 193 milliseconds. Continuing my
>>>> explorations today, I tried to see how Scilab expressions such as
>>>>
>>>> Y+2*X
>>>>
>>>> are parsed and executed. To this purpose I have written an interface (daxpy.sci and daxpy.c
>>>> attached) to the BLAS function "daxpy" which does "y<-y+a*x" and a script comparing the above
>>>> expression to
>>>>
>>>> daxpy(2,X,Y)
>>>>
>>>> for two 10000x10000 matrices. Here are the results (MacBook air core i7 at 1,7GHz):
>>>>
>>>>   daxpy(2,X,Y)
>>>>   (dcopy: 582 ms)
>>>>   (daxpy: 211 ms)
>>>>
>>>>   elapsed time: 793 ms
>>>>
>>>>   Y+2*X
>>>>
>>>>   elapsed time: 1574 ms
>>>>
>>>> Considered the above value, the explanation is that in "Y+2*X" there are *two* copies of a
>>>> 10000x10000 matrix instead of only one in "daxpy(2,X,Y)". In "Y+2*X+3*Z" there will be three
>>>> copies, although there could be only one if daxpy was used twice.
>>>>
>>>> I am not blaming Scilab here, I am just blaming "vectorization", which can be inefficient when
>>>> large objects           are used. That's why explicits loops can sometimes be faster than
>>>> vectorized operations in Matlab or Julia (which both use JIT compilation).
>>>>
>>>> S.
>>>>
>>>>
>>>> Le 15/02/2018 à 17:11, Antoine ELIAS a écrit :
>>>>> Hello Stéphane,
>>>>>
>>>>> Interesting ...
>>>>>
>>>>> In release, we don't ship the header of BLAS/LAPACK functions.
>>>>> But you can define them in your C file as extern. ( and let the linker do his job )
>>>>>
>>>>> extern int C2F(dgemm) (char *_pstTransA, char *_pstTransB, int *_piN, int *_piM, int *_piK,
>>>>> double *_pdblAlpha, double *_pdblA, int *_piLdA,
>>>>>                         double *_pdblB, int *_piLdB, double *_pdblBeta, double *_pdblC, int
>>>>> *_piLdC);
>>>>> and
>>>>>
>>>>> extern int C2F(dscal) (int *_iSize, double *_pdblVal, double *_pdblDest, int *_iInc);
>>>>>
>>>>> Others BLAS/LAPACK prototypes can be found at http://cgit.scilab.org/scilab/tree/scilab/modu
>>>>> les/elementary_functions/includes/elem_common.h?h=6.0
>>>>>
>>>>> Regards,
>>>>> Antoine
>>>>> Le 15/02/2018 à 16:50, Stéphane Mottelet a écrit :
>>>>>> Hello all,
>>>>>>
>>>>>> Following the recent discussion with fujimoto, I discovered that Scilab does not (seem to)
>>>>>> fully use SIMD operation in  BLAS as it should. Besides the bottlenecks of its code, there
>>>>>> are also many operations of the kind
>>>>>>
>>>>>> scalar*matrix
>>>>>>
>>>>>> Althoug this operation is correctly delegated to the DSCAL BLAS function (can be seen in C
>>>>>> function iMultiRealMatrixByRealMatrix in
>>>>>> modules/ast/src/c/operations/matrix_multiplication.c) :
>>>>>>
>>>>>>> int iMultiRealScalarByRealMatrix(
>>>>>>>      double _dblReal1,
>>>>>>>      double *_pdblReal2,    int _iRows2, int _iCols2,
>>>>>>>      double *_pdblRealOut)
>>>>>>> {
>>>>>>>      int iOne    = 1;
>>>>>>>      int iSize2    = _iRows2 * _iCols2;
>>>>>>>
>>>>>>>      C2F(dcopy)(&iSize2, _pdblReal2, &iOne, _pdblRealOut, &iOne);
>>>>>>>      C2F(dscal)(&iSize2, &_dblReal1, _pdblRealOut, &iOne);
>>>>>>>      return 0;
>>>>>>> }
>>>>>>   in the code below the product "A*1" is likely using only one processor core, as seen on
>>>>>> the cpu usage graph and on the elapsed time,
>>>>>>
>>>>>> A=rand(20000,20000);
>>>>>> tic; for i=1:10; A*1; end; toc
>>>>>>
>>>>>>   ans  =
>>>>>>
>>>>>>     25.596843
>>>>>>
>>>>>> but this second piece of code is more than 8 times faster and uses 100% of the cpu,
>>>>>>
>>>>>> ONE=ones(20000,1);
>>>>>> tic; for i=1:10; A*ONE; end; toc
>>>>>>
>>>>>>   ans  =
>>>>>>
>>>>>>     2.938314
>>>>>>
>>>>>> with roughly the same number of multiplications. This second computation is delegated to
>>>>>> DGEMM (C<-alpha*A*B + beta*C, here with alpha=1 and beta=0)
>>>>>>
>>>>>>> int iMultiRealMatrixByRealMatrix(
>>>>>>>      double *_pdblReal1,    int _iRows1, int _iCols1,
>>>>>>>      double *_pdblReal2,    int _iRows2, int _iCols2,
>>>>>>>      double *_pdblRealOut)
>>>>>>> {
>>>>>>>      double dblOne        = 1;
>>>>>>>      double dblZero        = 0;
>>>>>>>
>>>>>>>      C2F(dgemm)("n", "n", &_iRows1, &_iCols2, &_iCols1, &dblOne,
>>>>>>>                 _pdblReal1 , &_iRows1 ,
>>>>>>>                 _pdblReal2, &_iRows2, &dblZero,
>>>>>>>                 _pdblRealOut , &_iRows1);
>>>>>>>      return 0;
>>>>>>> }
>>>>>>   Maybe my intuition is wrong, but I have the feeling that using dgemm with alpha=0 will be
>>>>>> faster than dscal. I plan to test this by making a quick and dirty code linked to Scilab
>>>>>> so my question to devs is : which are the #includes to add on top of the source (C) to be
>>>>>> able to call dgemm and dscal ?
>>>>>>
>>>>>> Thanks for your help
>>>>>>
>>>>>> S.
>>>>>>
>>>>>>
>>>>>   
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev at lists.scilab.org
>>>>> http://lists.scilab.org/mailman/listinfo/dev
>>>>
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at lists.scilab.org
>>>> http://lists.scilab.org/mailman/listinfo/dev
>>> -- 
>>> Stéphane Mottelet
>>> Ingénieur de recherche
>>> EA 4297 Transformations Intégrées de la Matière Renouvelable
>>> Département Génie des Procédés Industriels
>>> Sorbonne Universités - Université de Technologie de Compiègne
>>> CS 60319, 60203 Compiègne cedex
>>> Tel : +33(0)344234688
>>> http://www.utc.fr/~mottelet
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at lists.scilab.org
>>> http://lists.scilab.org/mailman/listinfo/dev
>> -- 
>> Stéphane Mottelet
>> Ingénieur de recherche
>> EA 4297 Transformations Intégrées de la Matière Renouvelable
>> Département Génie des Procédés Industriels
>> Sorbonne Universités - Université de Technologie de Compiègne
>> CS 60319, 60203 Compiègne cedex
>> Tel : +33(0)344234688
>> http://www.utc.fr/~mottelet
>> _______________________________________________
>> dev mailing list
>> dev at lists.scilab.org
>> http://lists.scilab.org/mailman/listinfo/dev
> _______________________________________________
> dev mailing list
> dev at lists.scilab.org
> http://lists.scilab.org/mailman/listinfo/dev

-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet