[Scilab-users] HDF5 save is super slow
Stéphane Mottelet
stephane.mottelet at utc.fr
Mon Oct 15 15:35:04 CEST 2018
Le 15/10/2018 à 15:07, Arvid Rosén a écrit :
>
> Hi,
>
> Yeah, that makes sense. Or, it was about what I expected at least. It
> is a pity though, as handling thousands of filters isn’t necessarily a
> strange thing to do with a software like Scilab, and making a special
> serialization like that would be nothing less than a hack.
>
> Do you think there is a way forward under the hood that could make big
> deep list structures >10x faster in the future?
>
No. I think that hdf5 is not convenient for deeply structured data with
small leafs. Some interesting discussions can be found here:
https://cyrille.rossant.net/should-you-use-hdf5/
https://cyrille.rossant.net/moving-away-hdf5/
If you just need to read/write within your own software, serializing
should not be an issue. In the example you gave, the structure of each
leaf is always the same: using an array of structs improves performances
a little bit:
clear
N = 4;
n = 1000;
for i=1:n
G(i).a=rand(N,N);
G(i).b=rand(N,1);
G(i).c=rand(1,N);
G(i).c=rand(1,1);
end
tic();
save('filters.dat', 'G');
disp(toc());
--> disp(toc());
0.24133
S.
>
> Otherwise, the whole object orientation part of Scilab (tlist and
> mlist etc.) would be hard to use for anything that comes in large
> numbers, which would be a shame, especially as it used to work just
> fine (well, I can see how the old structure wasn’t “just fine” in
> other ways, but still).
>
> Cheers,
>
> Arvid
>
> *From: *users <users-bounces at lists.scilab.org> on behalf of Stéphane
> Mottelet <stephane.mottelet at utc.fr>
> *Organization: *Université de Technologie de Compiègne
> *Reply-To: *Users mailing list for Scilab <users at lists.scilab.org>
> *Date: *Monday, 15 October 2018 at 14:37
> *To: *"users at lists.scilab.org" <users at lists.scilab.org>
> *Subject: *Re: [Scilab-users] HDF5 save is super slow
>
> Hello,
>
> I looked a little bit in the sources: the evident bottleneck is the
> nested creation of an hdf5 group each time that a container variable
> is met.
> For the given example, this is particularly evident. If you replace
> the syslin structure by the corresponding [A,B;C,D] matrix, then save
> is ten times faster:
>
> N = 4;
> n = 1000;
> filters = list();
> for i=1:n
> G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
> filters($+1) = G;
> end
> tic();
> save('filters.dat', 'filters');
> disp(toc());
> --> disp(toc());
>
> 0.724754
>
> N = 4;
> n = 1000;
> filters = list()
> for i=1:n
> G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
> filters($+1) = [G.a G.b;G.c G.d];
> end
> tic();
> save('filters.dat', 'filters');
> disp(toc());
> --> disp(toc());
>
> 0.082302
>
> Serializing container objects seems to be the solution, but it goes
> towards an orthogonal direction w.r.t. the hdf5 portability spirit.
>
> S.
>
>
> Le 15/10/2018 à 12:22, Antoine Monmayrant a écrit :
>
> Le 15/10/2018 à 11:55, Arvid Rosén a écrit :
>
> Hi,
>
> Thanks for getting back to me!
>
> Unfortunately, we used Scilab’s pretty cool way of doing
> object orientation, so we have big nested tlist structures
> with multiple instances of various lists of filters and other
> structures, as in my example. Saving those structures in some
> explicit manual way would be extremely complicated. Or is
> there some way of writing explicit HDF5 saving/loading schemes
> using overloading? That would be great! I am sure we could
> find the main culprits and do something explicit for them, but
> as they can be located wherever in a big nested structure, it
> would be painful to do anything on the top level.
>
> Another, related I guess, problem here is that the new file
> format uses about 15 times as much disk space as the old
> format (for a typical ill-behaved nested structure). That adds
> to the save/load time too I guess, but is probably not the
> main source here.
>
> Argh, yes, I tested it and in your example, I have a file x8.5 bigger.
> I think that both increases in time and size are real issues and
> should be reported as bugs.
>
> By the way, I rewrote your script to run it under both 6.0 and 5.5:
>
> /////////////////////////////////
> N = 4;
> n = 10000;
> filters = list();
>
> for i=1:n
> G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
> filters($+1) = G;
> end
>
> ver=getversion('scilab');
>
> if ver(1)<6 then
> tic();
> save('filters_old.dat', filters);
> ts1 = toc();
> else
> tic();
> save('filters_new.dat', 'filters');
> ts1 = toc();
> end
>
> printf("Time for save %.2fs\n", ts1);
> /////////////////////////////////
>
> Hope it helps,
>
> Antoine
>
>
> I think I might have reported this earlier using Bugzilla, but
> I’m not sure. I’ll check and report it if not.
>
> Cheers,
>
> Arvid
>
> *From: *users <users-bounces at lists.scilab.org>
> <mailto:users-bounces at lists.scilab.org> on behalf of
> "amonmayr at laas.fr" <mailto:amonmayr at laas.fr>
> <amonmayr at laas.fr> <mailto:amonmayr at laas.fr>
> *Reply-To: *"antoine.monmayrant at laas.fr"
> <mailto:antoine.monmayrant at laas.fr>
> <antoine.monmayrant at laas.fr>
> <mailto:antoine.monmayrant at laas.fr>, Users mailing list for
> Scilab <users at lists.scilab.org> <mailto:users at lists.scilab.org>
> *Date: *Monday, 15 October 2018 at 11:08
> *To: *"users at lists.scilab.org" <mailto:users at lists.scilab.org>
> <users at lists.scilab.org> <mailto:users at lists.scilab.org>
> *Subject: *Re: [Scilab-users] HDF5 save is super slow
>
> Hello,
>
> I tried your code in 5.5.1 and the last nightly-build of 6.0:
> I see a slowdown of around 175 between old save in 5.5.1 and
> new (and only) save in 6.0.
> It's really related to the data structure, because we use hdf5
> read/write a lot here and did not experience significant
> slowdowns using 6.0.
> I think the overhead might come to the translation of your
> fairly complex variable (a long array of tlist) in the
> corresponding hdf5 structure.
> In the old save, this translation was not necessary.
> Maybe you could try to save your data in a different way.
> For example:
> 3) you could save each element of "filters" in a separate file.
> 2) you could bypass save and directly write your data in a
> hdf5 file by using h5open(), h5write() directly. It means you
> need to write your own load() for your custom file format. But
> this way, you can try to find the best way to layout your data
> in hdf5 format.
> 3) in addition to 2) you could try to save each entry of your
> "filters" array as one dataset in a given hdf5 file.
>
> Did you search on bugzilla whether this bug was already submitted?
> Could you try to report it?
>
>
> Antoine
>
> Le 15/10/2018 à 10:11, Arvid Rosén a écrit :
>
> /////////////////////////////////
>
> N = 4;
>
> n = 10000;
>
> filters = list();
>
> for i=1:n
>
> G=syslin('c', rand(N,N), rand(N,1), rand(1,N), rand(1,1));
>
> filters($+1) = G;
>
> end
>
> tic();
>
> save('filters.dat', filters);
>
> ts1 = toc();
>
> tic();
>
> save('filters.dat', 'filters');
>
> ts2 = toc();
>
> printf("old save %.2fs\n", ts1);
>
> printf("new save %.2fs\n", ts2);
>
> printf("slowdown %.1f\n", ts2/ts1);
>
> /////////////////////////////////
>
> --
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> Antoine Monmayrant LAAS - CNRS
>
> 7 avenue du Colonel Roche
>
> BP 54200
>
> 31031 TOULOUSE Cedex 4
>
> FRANCE
>
>
>
> Tel:+33 5 61 33 64 59
>
>
>
> email :antoine.monmayrant at laas.fr <mailto:antoine.monmayrant at laas.fr>
>
> permanent email :antoine.monmayrant at polytechnique.org
> <mailto:antoine.monmayrant at polytechnique.org>
>
>
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
> --
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> Antoine Monmayrant LAAS - CNRS
>
> 7 avenue du Colonel Roche
>
> BP 54200
>
> 31031 TOULOUSE Cedex 4
>
> FRANCE
>
> Tel:+33 5 61 33 64 59
>
>
>
> email :antoine.monmayrant at laas.fr <mailto:antoine.monmayrant at laas.fr>
>
> permanent email :antoine.monmayrant at polytechnique.org
> <mailto:antoine.monmayrant at polytechnique.org>
>
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
> _______________________________________________
>
> users mailing list
>
> users at lists.scilab.org <mailto:users at lists.scilab.org>
>
> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users
> <https://antispam.utc.fr/proxy/2/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users>
>
> --
> Stéphane Mottelet
> Ingénieur de recherche
> EA 4297 Transformations Intégrées de la Matière Renouvelable
> Département Génie des Procédés Industriels
> Sorbonne Universités - Université de Technologie de Compiègne
> CS 60319, 60203 Compiègne cedex
> Tel : +33(0)344234688
> http://www.utc.fr/~mottelet
> <https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/www.utc.fr/%7Emottelet>
>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users
--
Stéphane Mottelet
Ingénieur de recherche
EA 4297 Transformations Intégrées de la Matière Renouvelable
Département Génie des Procédés Industriels
Sorbonne Universités - Université de Technologie de Compiègne
CS 60319, 60203 Compiègne cedex
Tel : +33(0)344234688
http://www.utc.fr/~mottelet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20181015/9b94d0f4/attachment.htm>
More information about the users
mailing list