[Scilab-users] HDF5 save is super slow

Stéphane Mottelet stephane.mottelet at utc.fr
Mon Oct 15 14:36:12 CEST 2018


Hello,

I looked a little bit in the sources: the evident bottleneck is the 
nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the 
syslin structure by the corresponding [A,B;C,D] matrix, then save is ten 
times faster:

N = 4;
n = 1000;
filters = list();
for i=1:n
   G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
   filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

    0.724754

N = 4;
n = 1000;
filters = list()
for i=1:n
   G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
   filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());

    0.082302

Serializing container objects seems to be the solution, but it goes 
towards an orthogonal direction w.r.t. the hdf5 portability spirit.

S.


Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :
> Le 15/10/2018 à 11:55, Arvid Rosén a écrit :
>>
>> Hi,
>>
>> Thanks for getting back to me!
>>
>> Unfortunately, we used Scilab’s pretty cool way of doing object 
>> orientation, so we have big nested tlist structures with multiple 
>> instances of various lists of filters and other structures, as in my 
>> example. Saving those structures in some explicit manual way would be 
>> extremely complicated. Or is there some way of writing explicit HDF5 
>> saving/loading schemes using overloading? That would be great! I am 
>> sure we could find the main culprits and do something explicit for 
>> them, but as they can be located wherever in a big nested structure, 
>> it would be painful to do anything on the top level.
>>
>> Another, related I guess, problem here is that the new file format 
>> uses about 15 times as much disk space as the old format (for a 
>> typical ill-behaved nested structure). That adds to the save/load 
>> time too I guess, but is probably not the main source here.
>>
> Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
> I think that both increases in time and size are real issues and 
> should be reported as bugs.
>
> By the way, I rewrote your script to run it under both 6.0 and 5.5:
>
> /////////////////////////////////
> N = 4;
> n = 10000;
> filters = list();
>
> for i=1:n
>   G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>   filters($+1) = G;
> end
>
> ver=getversion('scilab');
>
> if ver(1)<6 then
>     tic();
>     save('filters_old.dat', filters);
>     ts1 = toc();
> else
>     tic();
>     save('filters_new.dat', 'filters');
>     ts1 = toc();
> end
>
> printf("Time for save %.2fs\n", ts1);
> /////////////////////////////////
>
> Hope it helps,
>
> Antoine
>
>> I think I might have reported this earlier using Bugzilla, but I’m 
>> not sure. I’ll check and report it if not.
>>
>> Cheers,
>>
>> Arvid
>>
>> *From: *users <users-bounces at lists.scilab.org> on behalf of 
>> "amonmayr at laas.fr" <amonmayr at laas.fr>
>> *Reply-To: *"antoine.monmayrant at laas.fr" 
>> <antoine.monmayrant at laas.fr>, Users mailing list for Scilab 
>> <users at lists.scilab.org>
>> *Date: *Monday, 15 October 2018 at 11:08
>> *To: *"users at lists.scilab.org" <users at lists.scilab.org>
>> *Subject: *Re: [Scilab-users] HDF5 save is super slow
>>
>> Hello,
>>
>> I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a 
>> slowdown of around 175 between old save in 5.5.1 and new (and only) 
>> save in 6.0.
>> It's really related to the data structure, because we use hdf5 
>> read/write a lot here and did not experience significant slowdowns 
>> using 6.0.
>> I think the overhead might come to the translation of your fairly 
>> complex variable (a long array of tlist) in the corresponding hdf5 
>> structure.
>> In the old save, this translation was not necessary.
>> Maybe you could try to save your data in a different way.
>> For example:
>> 3) you could save each element of "filters" in a separate file.
>> 2) you could bypass save and directly write your data in a hdf5 file 
>> by using h5open(), h5write() directly. It means you need to write 
>> your own load() for your custom file format. But this way, you can 
>> try to find the best way to layout your data in hdf5 format.
>> 3) in addition to 2) you could try to save each entry of your 
>> "filters" array as one dataset in a given hdf5 file.
>>
>> Did you search on bugzilla whether this bug was already submitted?
>> Could you try to report it?
>>
>>
>> Antoine
>>
>> Le 15/10/2018 à 10:11, Arvid Rosén a écrit :
>>
>>     /////////////////////////////////
>>
>>     N = 4;
>>
>>     n = 10000;
>>
>>     filters = list();
>>
>>     for i=1:n
>>
>>       G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>>
>>       filters($+1) = G;
>>
>>     end
>>
>>     tic();
>>
>>     save('filters.dat', filters);
>>
>>     ts1 = toc();
>>
>>     tic();
>>
>>     save('filters.dat', 'filters');
>>
>>     ts2 = toc();
>>
>>     printf("old save %.2fs\n", ts1);
>>
>>     printf("new save %.2fs\n", ts2);
>>
>>     printf("slowdown %.1f\n", ts2/ts1);
>>
>>     /////////////////////////////////
>>
>> -- 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   
>>   Antoine Monmayrant LAAS - CNRS
>>   7 avenue du Colonel Roche
>>   BP 54200
>>   31031 TOULOUSE Cedex 4
>>   FRANCE
>>   
>>   Tel:+33  5 61 33 64 59
>>   
>>   email :antoine.monmayrant at laas.fr <mailto:antoine.monmayrant at laas.fr>
>>   permanent email :antoine.monmayrant at polytechnique.org 
>> <mailto:antoine.monmayrant at polytechnique.org>
>>   
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   
>
>
> -- 
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>   Antoine Monmayrant LAAS - CNRS
>   7 avenue du Colonel Roche
>   BP 54200
>   31031 TOULOUSE Cedex 4
>   FRANCE
>
>   Tel:+33  5 61 33 64 59
>   
>   email :antoine.monmayrant at laas.fr
>   permanent email :antoine.monmayrant at polytechnique.org
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users


-- 
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20181015/9f55c9a0/attachment.htm>


More information about the users mailing list