[Scilab-users] HDF5 save is super slow

amonmayr at laas.fr amonmayr at laas.fr
Tue Oct 16 09:52:11 CEST 2018


Le 15/10/2018 à 18:17, Arvid Rosén a écrit :
>
> Hi again,
>
> I just filed a bug report here:
>
> http://bugzilla.scilab.org/show_bug.cgi?id=15809
>
> Would it be possible to bring back the old mem-dump approach in scilab 6?
>
Couldn't you create your own atom package that restore this raw memory 
dump for scilab 6.0?
I understand why we moved away from this model, but it seems to be key 
for you.
There is always a trade-off between portability (and robustness) and raw 
speed...
>
> I mean, could I write a gateway that just takes a pointer to the first 
> byte in memory, figures out the size, and dumps to disk? Or maybe it 
> doesn’t work like that. Writing a JSON exporter for storing filter 
> coefficients in a math software package seems a bit ridicules, but 
> hey, if it works it might be worth it in our case.
>
I was also wondering whether this can be done in HDF5: ie do some 
serialization of your structure and dump it in hdf5?
We use hdf5 for Labview and for some horrible structures (like arrays of 
clusters containing lots of elements of different types), we just turn 
them into byte stream and dump the stream in an hdf5 dataset.
We then retrieve it and rebuild the structure (knowing its shape).
Could this be implemented in Scilab 6?
What could be missing, the any variable -> bytestream conversion and the 
way back?

Antoine
>
> Cheers,
>
> Arvid
>
> *From: *users <users-bounces at lists.scilab.org> on behalf of Clément 
> DAVID <clement.david at scilab-enterprises.com>
> *Reply-To: *Users mailing list for Scilab <users at lists.scilab.org>
> *Date: *Monday, 15 October 2018 at 15:48
> *To: *Users mailing list for Scilab <users at lists.scilab.org>
> *Cc: *Clément David <Clement.David at esi-group.com>
> *Subject: *Re: [Scilab-users] HDF5 save is super slow
>
> Hello all,
>
> Correct, I experienced such a slowness while working with Xcos 
> diagrams for Scilab 5. At first we considered HDF5 for this deep 
> nested list / mlist data-structure storage however after some tests ; 
> XML might be used for tree-like storage and HDF5 (or Java types 
> serialization) for big matrices.
>
> AFAIK currently there is no easy way to load/save specifying a format 
> other than HDF5 ; maybe adding xmlSave/xmlLoad sci_gateway to let the 
> user select an xml file format for any Scilab structure might provide 
> better performance on your use-case. JSON might also be another 
> candidate to look at for decent serialization support.
>
> PS: Scilab 5.5.1 load/save are direct memory dump so this is really 
> the fastest you can get from Scilab ; HDF5 binary format is good 
> enough for matrices
>
> --
>
> Clément
>
> *From:*users <users-bounces at lists.scilab.org> *On Behalf Of *Stéphane 
> Mottelet
> *Sent:* Monday, October 15, 2018 2:36 PM
> *To:* users at lists.scilab.org
> *Subject:* Re: [Scilab-users] HDF5 save is super slow
>
> Hello,
>
> I looked a little bit in the sources: the evident bottleneck is the 
> nested creation of an hdf5 group each time that a container variable 
> is met.
> For the given example, this is particularly evident. If you replace 
> the syslin structure by the corresponding [A,B;C,D] matrix, then save 
> is ten times faster:
>
> N = 4;
> n = 1000;
> filters = list();
> for i=1:n
>   G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>   filters($+1) = G;
> end
> tic();
> save('filters.dat', 'filters');
> disp(toc());
> --> disp(toc());
>
>    0.724754
>
> N = 4;
> n = 1000;
> filters = list()
> for i=1:n
>   G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>   filters($+1) = [G.a G.b;G.c G.d];
> end
> tic();
> save('filters.dat', 'filters');
> disp(toc());
> --> disp(toc());
>
>    0.082302
>
> Serializing container objects seems to be the solution, but it goes 
> towards an orthogonal direction w.r.t. the hdf5 portability spirit.
>
> S.
>
>
> Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :
>
>     Le 15/10/2018 à 11:55, Arvid Rosén a écrit :
>
>         Hi,
>
>         Thanks for getting back to me!
>
>         Unfortunately, we used Scilab’s pretty cool way of doing
>         object orientation, so we have big nested tlist structures
>         with multiple instances of various lists of filters and other
>         structures, as in my example. Saving those structures in some
>         explicit manual way would be extremely complicated. Or is
>         there some way of writing explicit HDF5 saving/loading schemes
>         using overloading? That would be great! I am sure we could
>         find the main culprits and do something explicit for them, but
>         as they can be located wherever in a big nested structure, it
>         would be painful to do anything on the top level.
>
>         Another, related I guess, problem here is that the new file
>         format uses about 15 times as much disk space as the old
>         format (for a typical ill-behaved nested structure). That adds
>         to the save/load time too I guess, but is probably not the
>         main source here.
>
>     Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
>     I think that both increases in time and size are real issues and
>     should be reported as bugs.
>
>     By the way, I rewrote your script to run it under both 6.0 and 5.5:
>
>     /////////////////////////////////
>     N = 4;
>     n = 10000;
>     filters = list();
>
>     for i=1:n
>       G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>       filters($+1) = G;
>     end
>
>     ver=getversion('scilab');
>
>     if ver(1)<6 then
>         tic();
>         save('filters_old.dat', filters);
>         ts1 = toc();
>     else
>         tic();
>         save('filters_new.dat', 'filters');
>         ts1 = toc();
>     end
>
>     printf("Time for save %.2fs\n", ts1);
>     /////////////////////////////////
>
>     Hope it helps,
>
>     Antoine
>
>
>
>         I think I might have reported this earlier using Bugzilla, but
>         I’m not sure. I’ll check and report it if not.
>
>         Cheers,
>
>         Arvid
>
>         *From: *users <users-bounces at lists.scilab.org>
>         <mailto:users-bounces at lists.scilab.org> on behalf of
>         "amonmayr at laas.fr" <mailto:amonmayr at laas.fr>
>         <amonmayr at laas.fr> <mailto:amonmayr at laas.fr>
>         *Reply-To: *"antoine.monmayrant at laas.fr"
>         <mailto:antoine.monmayrant at laas.fr>
>         <antoine.monmayrant at laas.fr>
>         <mailto:antoine.monmayrant at laas.fr>, Users mailing list for
>         Scilab <users at lists.scilab.org> <mailto:users at lists.scilab.org>
>         *Date: *Monday, 15 October 2018 at 11:08
>         *To: *"users at lists.scilab.org" <mailto:users at lists.scilab.org>
>         <users at lists.scilab.org> <mailto:users at lists.scilab.org>
>         *Subject: *Re: [Scilab-users] HDF5 save is super slow
>
>         Hello,
>
>         I tried your code in 5.5.1 and the last nightly-build of 6.0:
>         I see a slowdown of around 175 between old save in 5.5.1 and
>         new (and only) save in 6.0.
>         It's really related to the data structure, because we use hdf5
>         read/write a lot here and did not experience significant
>         slowdowns using 6.0.
>         I think the overhead might come to the translation of your
>         fairly complex variable (a long array of tlist) in the
>         corresponding hdf5 structure.
>         In the old save, this translation was not necessary.
>         Maybe you could try to save your data in a different way.
>         For example:
>         3) you could save each element of "filters" in a separate file.
>         2) you could bypass save and directly write your data in a
>         hdf5 file by using h5open(), h5write() directly. It means you
>         need to write your own load() for your custom file format. But
>         this way, you can try to find the best way to layout your data
>         in hdf5 format.
>         3) in addition to 2) you could try to save each entry of your
>         "filters" array as one dataset in a given hdf5 file.
>
>         Did you search on bugzilla whether this bug was already submitted?
>         Could you try to report it?
>
>
>         Antoine
>
>         Le 15/10/2018 à 10:11, Arvid Rosén a écrit :
>
>             /////////////////////////////////
>
>             N = 4;
>
>             n = 10000;
>
>             filters = list();
>
>             for i=1:n
>
>               G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>
>               filters($+1) = G;
>
>             end
>
>             tic();
>
>             save('filters.dat', filters);
>
>             ts1 = toc();
>
>             tic();
>
>             save('filters.dat', 'filters');
>
>             ts2 = toc();
>
>             printf("old save %.2fs\n", ts1);
>
>             printf("new save %.2fs\n", ts2);
>
>             printf("slowdown %.1f\n", ts2/ts1);
>
>             /////////////////////////////////
>
>         -- 
>
>         +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>           
>
>           Antoine Monmayrant LAAS - CNRS
>
>           7 avenue du Colonel Roche
>
>           BP 54200
>
>           31031 TOULOUSE Cedex 4
>
>           FRANCE
>
>           
>
>           Tel:+33  5 61 33 64 59
>
>           
>
>           email :antoine.monmayrant at laas.fr <mailto:antoine.monmayrant at laas.fr>
>
>           permanent email :antoine.monmayrant at polytechnique.org
>         <mailto:antoine.monmayrant at polytechnique.org>
>
>           
>
>         +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>           
>
>     -- 
>
>     +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>       
>
>       Antoine Monmayrant LAAS - CNRS
>
>       7 avenue du Colonel Roche
>
>       BP 54200
>
>       31031 TOULOUSE Cedex 4
>
>       FRANCE
>
>       
>
>       Tel:+33  5 61 33 64 59
>
>       
>
>       email :antoine.monmayrant at laas.fr <mailto:antoine.monmayrant at laas.fr>
>
>       permanent email :antoine.monmayrant at polytechnique.org
>     <mailto:antoine.monmayrant at polytechnique.org>
>
>       
>
>     +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>       
>
>
>
>
>
>     _______________________________________________
>
>     users mailing list
>
>     users at lists.scilab.org <mailto:users at lists.scilab.org>
>
>     https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users
>
> -- 
> Stéphane Mottelet
> Ingénieur de recherche
> EA 4297 Transformations Intégrées de la Matière Renouvelable
> Département Génie des Procédés Industriels
> Sorbonne Universités - Université de Technologie de Compiègne
> CS 60319, 60203 Compiègne cedex
> Tel : +33(0)344234688
> http://www.utc.fr/~mottelet <http://www.utc.fr/%7Emottelet>
>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> http://lists.scilab.org/mailman/listinfo/users


-- 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++

  Antoine Monmayrant LAAS - CNRS
  7 avenue du Colonel Roche
  BP 54200
  31031 TOULOUSE Cedex 4
  FRANCE

  Tel:+33 5 61 33 64 59
  
  email : antoine.monmayrant at laas.fr
  permanent email : antoine.monmayrant at polytechnique.org

+++++++++++++++++++++++++++++++++++++++++++++++++++++++

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20181016/cdc5c158/attachment.htm>


More information about the users mailing list