<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

"http://www.w3.org/TR/REC-html40/loose.dtd">

<html>

<head>

<meta http-equiv="content-type" content="text/html; charset=utf-8">

<title></title>

</head>

<body style="font-family:Arial;font-size:14px">

<p>Hello,<br>

<br>

Thank you for your answer. Would it have an interest to implement such an operator (and, as a consequence, open a kind of Pandora's box..) ?<br>

<br>

S.<br>

<br>

Quoting Clément David <<a href="mailto:Clement.David@esi-group.com">Clement.David@esi-group.com</a>>:</p>

<blockquote style="border-left:2px solid blue;margin-left:2px;padding-left:12px;" type="cite">

<p>Hello Stéphane,<br>

<br>

AFAIK, as in the old interface the new C++ one allow you to get pointers to the raw double* values<br>

and so allow you to write anything to the inputs. However this kind of behavior is considered a bug<br>

as from the user point of view (eg. Scilab beginners) each assignation is supposed to have inputs<br>

(unmodified data) and outputs (computed data).<br>

<br>

Note: about the += operator, this is indeed a way to tell the user that the argument is modified in<br>

place with a specific + operation.<br>

<br>

Thanks,<br>

<br>

--<br>

Clément<br>

<br>

Le jeudi 22 février 2018 à 14:18 +0100, Stéphane Mottelet a écrit :</p>

<blockquote style="border-left:2px solid blue;margin-left:2px;padding-left:12px;" type="cite">

<p>Really, nobody knows ?<br>

<br>

S.<br>

<br>

Le 20/02/2018 à 11:57, Stéphane Mottelet a écrit :<br>

Hello,<br>

<br>

Continuing on this subject, Hello, I discovered that the new Scilab API allows to modify input<br>

parameters of a function (in-place assignment), e.g. I have modified the previous daxpy such<br>

that the expression<br>

<br>

daxpy(2,X,Y)<br>

<br>

has no output but formally does "Y+=2*X" if such an operator would exist in Scilab. In this case<br>

there is no matrix copy at all, hence no memory overhead.<br>

<br>

Was it possible to do this with the previous API ?<br>

<br>

S.<br>

<br>

Le 19/02/2018 à 19:15, Stéphane Mottelet a écrit :<br>

> Hello,<br>

><br>

> After some tests, for the intended use (multiply a matrix by a scalar), dgemm is not faster<br>

> that dscal, but in the C code of "iMultiRealScalarByRealMatrix", the part which takes the most<br>

> of the CPU time is the call to "dcopy". For example, on my machine,  for a 10000x10000 matrix,<br>

> the call to dcopy takes 540 milliseconds and the call to dscal 193 milliseconds. Continuing my<br>

> explorations today, I tried to see how Scilab expressions such as<br>

><br>

> Y+2*X<br>

><br>

> are parsed and executed. To this purpose I have written an interface (daxpy.sci and daxpy.c<br>

> attached) to the BLAS function "daxpy" which does "y<-y+a*x" and a script comparing the above<br>

> expression to<br>

><br>

> daxpy(2,X,Y)<br>

><br>

> for two 10000x10000 matrices. Here are the results (MacBook air core <a href="mailto:i7@1">i7@1</a>,7GHz):<br>

><br>

>  daxpy(2,X,Y)<br>

>  (dcopy: 582 ms)<br>

>  (daxpy: 211 ms)<br>

><br>

>  elapsed time: 793 ms<br>

><br>

>  Y+2*X<br>

><br>

>  elapsed time: 1574 ms<br>

><br>

> Considered the above value, the explanation is that in "Y+2*X" there are *two* copies of a<br>

> 10000x10000 matrix instead of only one in "daxpy(2,X,Y)". In "Y+2*X+3*Z" there will be three<br>

> copies, although there could be only one if daxpy was used twice.<br>

><br>

> I am not blaming Scilab here, I am just blaming "vectorization", which can be inefficient when<br>

> large objects           are used. That's why explicits loops can sometimes be faster than<br>

> vectorized operations in Matlab or Julia (which both use JIT compilation).<br>

><br>

> S.<br>

><br>

><br>

> Le 15/02/2018 à 17:11, Antoine ELIAS a écrit :<br>

> > Hello Stéphane,<br>

> ><br>

> > Interesting ...<br>

> ><br>

> > In release, we don't ship the header of BLAS/LAPACK functions.<br>

> > But you can define them in your C file as extern. ( and let the linker do his job )<br>

> ><br>

> > extern int C2F(dgemm) (char *_pstTransA, char *_pstTransB, int *_piN, int *_piM, int *_piK,<br>

> > double *_pdblAlpha, double *_pdblA, int *_piLdA,<br>

> >                        double *_pdblB, int *_piLdB, double *_pdblBeta, double *_pdblC, int<br>

> > *_piLdC);<br>

> > and<br>

> ><br>

> > extern int C2F(dscal) (int *_iSize, double *_pdblVal, double *_pdblDest, int *_iInc);<br>

> ><br>

> > Others BLAS/LAPACK prototypes can be found at <a href="http://cgit.scilab.org/scilab/tree/scilab/modu" target="_blank">http://cgit.scilab.org/scilab/tree/scilab/modu</a><br>

> > les/elementary_functions/includes/elem_common.h?h=6.0<br>

> ><br>

> > Regards,<br>

> > Antoine<br>

> > Le 15/02/2018 à 16:50, Stéphane Mottelet a écrit :<br>

> > > Hello all,<br>

> > ><br>

> > > Following the recent discussion with fujimoto, I discovered that Scilab does not (seem to)<br>

> > > fully use SIMD operation in  BLAS as it should. Besides the bottlenecks of its code, there<br>

> > > are also many operations of the kind<br>

> > ><br>

> > > scalar*matrix<br>

> > ><br>

> > > Althoug this operation is correctly delegated to the DSCAL BLAS function (can be seen in C<br>

> > > function iMultiRealMatrixByRealMatrix in<br>

> > > modules/ast/src/c/operations/matrix_multiplication.c) :<br>

> > ><br>

> > > > int iMultiRealScalarByRealMatrix(<br>

> > > >     double _dblReal1,<br>

> > > >     double *_pdblReal2,    int _iRows2, int _iCols2,<br>

> > > >     double *_pdblRealOut)<br>

> > > > {<br>

> > > >     int iOne    = 1;<br>

> > > >     int iSize2    = _iRows2 * _iCols2;<br>

> > > ><br>

> > > >     C2F(dcopy)(&iSize2, _pdblReal2, &iOne, _pdblRealOut, &iOne);<br>

> > > >     C2F(dscal)(&iSize2, &_dblReal1, _pdblRealOut, &iOne);<br>

> > > >     return 0;<br>

> > > > }<br>

> > >  in the code below the product "A*1" is likely using only one processor core, as seen on<br>

> > > the cpu usage graph and on the elapsed time,<br>

> > ><br>

> > > A=rand(20000,20000);<br>

> > > tic; for i=1:10; A*1; end; toc<br>

> > ><br>

> > >  ans  =<br>

> > ><br>

> > >    25.596843<br>

> > ><br>

> > > but this second piece of code is more than 8 times faster and uses 100% of the cpu,<br>

> > ><br>

> > > ONE=ones(20000,1);<br>

> > > tic; for i=1:10; A*ONE; end; toc<br>

> > ><br>

> > >  ans  =<br>

> > ><br>

> > >    2.938314<br>

> > ><br>

> > > with roughly the same number of multiplications. This second computation is delegated to<br>

> > > DGEMM (C<-alpha*A*B + beta*C, here with alpha=1 and beta=0)<br>

> > ><br>

> > > > int iMultiRealMatrixByRealMatrix(<br>

> > > >     double *_pdblReal1,    int _iRows1, int _iCols1,<br>

> > > >     double *_pdblReal2,    int _iRows2, int _iCols2,<br>

> > > >     double *_pdblRealOut)<br>

> > > > {<br>

> > > >     double dblOne        = 1;<br>

> > > >     double dblZero        = 0;<br>

> > > ><br>

> > > >     C2F(dgemm)("n", "n", &_iRows1, &_iCols2, &_iCols1, &dblOne,<br>

> > > >                _pdblReal1 , &_iRows1 ,<br>

> > > >                _pdblReal2, &_iRows2, &dblZero,<br>

> > > >                _pdblRealOut , &_iRows1);<br>

> > > >     return 0;<br>

> > > > }<br>

> > >  Maybe my intuition is wrong, but I have the feeling that using dgemm with alpha=0 will be<br>

> > > faster than dscal. I plan to test this by making a quick and dirty code linked to Scilab<br>

> > > so my question to devs is : which are the #includes to add on top of the source (C) to be<br>

> > > able to call dgemm and dscal ?<br>

> > ><br>

> > > Thanks for your help<br>

> > ><br>

> > > S.<br>

> > ><br>

> > ><br>

> ><br>

> > _______________________________________________<br>

> > dev mailing list<br>

> > <a href="mailto:dev@lists.scilab.org">dev@lists.scilab.org</a><br>

> > <a href="http://lists.scilab.org/mailman/listinfo/dev" target="_blank">http://lists.scilab.org/mailman/listinfo/dev</a><br>

><br>

><br>

><br>

> _______________________________________________<br>

> dev mailing list<br>

> <a href="mailto:dev@lists.scilab.org">dev@lists.scilab.org</a><br>

> <a href="http://lists.scilab.org/mailman/listinfo/dev" target="_blank">http://lists.scilab.org/mailman/listinfo/dev</a><br>

<br>

--<br>

Stéphane Mottelet<br>

Ingénieur de recherche<br>

EA 4297 Transformations Intégrées de la Matière Renouvelable<br>

Département Génie des Procédés Industriels<br>

Sorbonne Universités - Université de Technologie de Compiègne<br>

CS 60319, 60203 Compiègne cedex<br>

Tel : +33(0)344234688<br>

<a href="http://www.utc.fr/~mottelet" target="_blank">http://www.utc.fr/~mottelet</a><br>

<br>

<br>

_______________________________________________<br>

dev mailing list<br>

<a href="mailto:dev@lists.scilab.org">dev@lists.scilab.org</a><br>

<a href="http://lists.scilab.org/mailman/listinfo/dev" target="_blank">http://lists.scilab.org/mailman/listinfo/dev</a><br>

<br>

--<br>

Stéphane Mottelet<br>

Ingénieur de recherche<br>

EA 4297 Transformations Intégrées de la Matière Renouvelable<br>

Département Génie des Procédés Industriels<br>

Sorbonne Universités - Université de Technologie de Compiègne<br>

CS 60319, 60203 Compiègne cedex<br>

Tel : +33(0)344234688<br>

<a href="http://www.utc.fr/~mottelet" target="_blank">http://www.utc.fr/~mottelet</a><br>

_______________________________________________<br>

dev mailing list<br>

<a href="mailto:dev@lists.scilab.org">dev@lists.scilab.org</a><br>

<a href="http://lists.scilab.org/mailman/listinfo/dev" target="_blank">http://lists.scilab.org/mailman/listinfo/dev</a></p>

</blockquote>

_______________________________________________<br>

dev mailing list<br>

<a href="mailto:dev@lists.scilab">dev@lists.scilab</a>.<a href="orghttp://lists.scilab.org/mailman/listinfo/dev" target="_blank">orghttp://lists.scilab.org/mailman/listinfo/dev</a></blockquote>

<p><br>

<br></p>

</body>

</html>