[Scilab-users] HDF5 save is super slow

Arvid Rosén arvid at softube.com
Mon Oct 15 11:55:20 CEST 2018


Hi,

Thanks for getting back to me!

Unfortunately, we used Scilab’s pretty cool way of doing object orientation, so we have big nested tlist structures with multiple instances of various lists of filters and other structures, as in my example. Saving those structures in some explicit manual way would be extremely complicated. Or is there some way of writing explicit HDF5 saving/loading schemes using overloading? That would be great! I am sure we could find the main culprits and do something explicit for them, but as they can be located wherever in a big nested structure, it would be painful to do anything on the top level.

Another, related I guess, problem here is that the new file format uses about 15 times as much disk space as the old format (for a typical ill-behaved nested structure). That adds to the save/load time too I guess, but is probably not the main source here.

I think I might have reported this earlier using Bugzilla, but I’m not sure. I’ll check and report it if not.

Cheers,
Arvid

From: users <users-bounces at lists.scilab.org> on behalf of "amonmayr at laas.fr" <amonmayr at laas.fr>
Reply-To: "antoine.monmayrant at laas.fr" <antoine.monmayrant at laas.fr>, Users mailing list for Scilab <users at lists.scilab.org>
Date: Monday, 15 October 2018 at 11:08
To: "users at lists.scilab.org" <users at lists.scilab.org>
Subject: Re: [Scilab-users] HDF5 save is super slow

Hello,

I tried your code in 5.5.1 and the last nightly-build of 6.0: I see a slowdown of around 175 between old save in 5.5.1 and new (and only) save in 6.0.
It's really related to the data structure, because we use hdf5 read/write a lot here and did not experience significant slowdowns using 6.0.
I think the overhead might come to the translation of your fairly complex variable (a long array of tlist) in the corresponding hdf5 structure.
In the old save, this translation was not necessary.
Maybe you could try to save your data in a different way.
For example:
3) you could save each element of "filters" in a separate file.
2) you could bypass save and directly write your data in a hdf5 file by using h5open(), h5write() directly. It means you need to write your own load() for your custom file format. But this way, you can try to find the best way to layout your data in hdf5 format.
3) in addition to 2) you could try to save each entry of your "filters" array as one dataset in a given hdf5 file.

Did you search on bugzilla whether this bug was already submitted?
Could you try to report it?


Antoine

Le 15/10/2018 à 10:11, Arvid Rosén a écrit :
/////////////////////////////////
N = 4;
n = 10000;

filters = list();

for i=1:n
  G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
  filters($+1) = G;
end

tic();
save('filters.dat', filters);
ts1 = toc();

tic();
save('filters.dat', 'filters');
ts2 = toc();

printf("old save %.2fs\n", ts1);
printf("new save %.2fs\n", ts2);
printf("slowdown %.1f\n", ts2/ts1);
/////////////////////////////////



--

+++++++++++++++++++++++++++++++++++++++++++++++++++++++



 Antoine Monmayrant LAAS - CNRS

 7 avenue du Colonel Roche

 BP 54200

 31031 TOULOUSE Cedex 4

 FRANCE



 Tel:+33 5 61 33 64 59



 email : antoine.monmayrant at laas.fr<mailto:antoine.monmayrant at laas.fr>

 permanent email : antoine.monmayrant at polytechnique.org<mailto:antoine.monmayrant at polytechnique.org>



+++++++++++++++++++++++++++++++++++++++++++++++++++++++


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20181015/a078796b/attachment.htm>


More information about the users mailing list