[Scilab-users] HDF5 save is super slow
Stéphane Mottelet
stephane.mottelet at utc.fr
Mon Oct 15 14:36:12 CEST 2018
Hello,
I looked a little bit in the sources: the evident bottleneck is the
nested creation of an hdf5 group each time that a container variable is met.
For the given example, this is particularly evident. If you replace the
syslin structure by the corresponding [A,B;C,D] matrix, then save is ten
times faster:
N = 4;
n = 1000;
filters = list();
for i=1:n
G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
filters($+1) = G;
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());
0.724754
N = 4;
n = 1000;
filters = list()
for i=1:n
G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
filters($+1) = [G.a G.b;G.c G.d];
end
tic();
save('filters.dat', 'filters');
disp(toc());
--> disp(toc());
0.082302
Serializing container objects seems to be the solution, but it goes
towards an orthogonal direction w.r.t. the hdf5 portability spirit.
S.
Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :
> Le 15/10/2018 à 11:55, Arvid Rosén a écrit :
>>
>> Hi,
>>
>> Thanks for getting back to me!
>>
>> Unfortunately, we used Scilab’s pretty cool way of doing object
>> orientation, so we have big nested tlist structures with multiple
>> instances of various lists of filters and other structures, as in my
>> example. Saving those structures in some explicit manual way would be
>> extremely complicated. Or is there some way of writing explicit HDF5
>> saving/loading schemes using overloading? That would be great! I am
>> sure we could find the main culprits and do something explicit for
>> them, but as they can be located wherever in a big nested structure,
>> it would be painful to do anything on the top level.
>>
>> Another, related I guess, problem here is that the new file format
>> uses about 15 times as much disk space as the old format (for a
>> typical ill-behaved nested structure). That adds to the save/load
>> time too I guess, but is probably not the main source here.
>>
> Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
> I think that both increases in time and size are real issues and
> should be reported as bugs.
>
> By the way, I rewrote your script to run it under both 6.0 and 5.5:
>
> /////////////////////////////////
> N = 4;
> n = 10000;
> filters = list();
>
> for i=1:n
> G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
> filters($+1) = G;
> end
>
> ver=getversion('scilab');
>
> if ver(1)<6 then
> tic();
> save('filters_old.dat', filters);
> ts1 = toc();
> else
> tic();
> save('filters_new.dat', 'filters');
> ts1 = toc();
> end
>
> printf("Time for save %.2fs\n", ts1);
> /////////////////////////////////
>
> Hope it helps,
>
> Antoine
>
>> I think I might have reported this earlier using Bugzilla, but I’m
>> not sure. I’ll check and report it if not.
>>
>> Cheers,
>>
>> Arvid
>>
>> *From: *users <users-bounces at lists.scilab.org> on behalf of
>> "amonmayr at laas.fr" <amonmayr at laas.fr>
>> *Reply-To: *"antoine.monmayrant at laas.fr"
>> <antoine.monmayrant at laas.fr>, Users mailing list for Scilab
>> <users at lists.scilab.org>
>> *Date: *Monday, 15 October 2018 at 11:08
>> *To: *"users at lists.scilab.org" <users at lists.scilab.org>
>> *Subject: *Re: [Scilab-users] HDF5 save is super slow
>>
>> Hello,
>>
>> I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a
>> slowdown of around 175 between old save in 5.5.1 and new (and only)
>> save in 6.0.
>> It's really related to the data structure, because we use hdf5
>> read/write a lot here and did not experience significant slowdowns
>> using 6.0.
>> I think the overhead might come to the translation of your fairly
>> complex variable (a long array of tlist) in the corresponding hdf5
>> structure.
>> In the old save, this translation was not necessary.
>> Maybe you could try to save your data in a different way.
>> For example:
>> 3) you could save each element of "filters" in a separate file.
>> 2) you could bypass save and directly write your data in a hdf5 file
>> by using h5open(), h5write() directly. It means you need to write
>> your own load() for your custom file format. But this way, you can
>> try to find the best way to layout your data in hdf5 format.
>> 3) in addition to 2) you could try to save each entry of your
>> "filters" array as one dataset in a given hdf5 file.
>>
>> Did you search on bugzilla whether this bug was already submitted?
>> Could you try to report it?
>>
>>
>> Antoine
>>
>> Le 15/10/2018 à 10:11, Arvid Rosén a écrit :
>>
>> /////////////////////////////////
>>
>> N = 4;
>>
>> n = 10000;
>>
>> filters = list();
>>
>> for i=1:n
>>
>> G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>>
>> filters($+1) = G;
>>
>> end
>>
>> tic();
>>
>> save('filters.dat', filters);
>>
>> ts1 = toc();
>>
>> tic();
>>
>> save('filters.dat', 'filters');
>>
>> ts2 = toc();
>>
>> printf("old save %.2fs\n", ts1);
>>
>> printf("new save %.2fs\n", ts2);
>>
>> printf("slowdown %.1f\n", ts2/ts1);
>>
>> /////////////////////////////////
>>
>> --
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>> Antoine Monmayrant LAAS - CNRS
>> 7 avenue du Colonel Roche
>> BP 54200
>> 31031 TOULOUSE Cedex 4
>> FRANCE
>>
>> Tel:+33 5 61 33 64 59
>>
>> email :antoine.monmayrant at laas.fr <mailto:antoine.monmayrant at laas.fr>
>> permanent email :antoine.monmayrant at polytechnique.org
>> <mailto:antoine.monmayrant at polytechnique.org>
>>
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>
>
> --
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Antoine Monmayrant LAAS - CNRS
> 7 avenue du Colonel Roche
> BP 54200
> 31031 TOULOUSE Cedex 4
> FRANCE
>
> Tel:+33 5 61 33 64 59
>
> email :antoine.monmayrant at laas.fr
> permanent email :antoine.monmayrant at polytechnique.org
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users
--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20181015/9f55c9a0/attachment.htm>
More information about the users
mailing list