[Scilab-users] HDF5 save is super slow

Clément DAVID clement.david at scilab-enterprises.com
Thu Oct 18 15:15:36 CEST 2018


Hello Stephane,

TL ;DR ; HDF5 is a cross-platform, cross-language, portable file format used in almost all scientific software these days. Please use this sane default !

Writing a custom serialization scheme (like the one provided by vec2var / var2vec) might not be complicated to implement however the hard part is maintaining and describing a serialization format to be used in the long term.

Using Scilab 5, the “stack” save and load functions were almost trivial as they are directly mapped from memory to disk; the format used is “the stack” so it is known and used everywhere (even for custom string encoding). This vec2var serialization is only used internally (to pass block parameters around), does not respect any described format nor validate against any documentation and is not portable; in the long term, I won’t promise it to be stable. Implementing your own serialization scheme will probably lead your software into trouble. Really, it isn’t easy in the long term! The HDF5 format is described, its serialized data are browsable (through hdfview) and does not cope with low-level requirements.

To me, the issue is really a performance bug. We might find a way to fix it within Scilab rather than provide a workaround (with custom encodings). The hdf5 library is a bug one, maybe with a clever understanding of its internal serialization, we might find a better execution path for this use-case (without changing the file format).

Thanks,

--
Clément

From: users <users-bounces at lists.scilab.org> On Behalf Of Stéphane Mottelet
Sent: Thursday, October 18, 2018 2:39 PM
To: users at lists.scilab.org
Subject: Re: [Scilab-users] HDF5 save is super slow

Hello Clément,

Le 18/10/2018 à 14:09, Clément DAVID a écrit :
Hello,

My 2cents, this is probably a poor man’s approach but Xcos offers vec2var / var2vec functions that encode in a double vector any Scilab datatypes passed as arguments. The encoding duplicates the data in memory so there might be some overhead.
Do you think it would be complicated to continuously write the serialized data on the disk ?

On my machine, I have these timings using the attached script (Antoine’s one edited):
save list of syslins: 1.361704
save list of vec[]: 0.056788
save var2vec(list of syslins): 0.014411

Discarding hdf5 groups creation is a huge performance win but remove any way to create clean hdf5 (eg. to address subgroups directly).

Thanks,

--
Clément

From: users <users-bounces at lists.scilab.org><mailto:users-bounces at lists.scilab.org> On Behalf Of Arvid Rosén
Sent: Tuesday, October 16, 2018 1:01 PM
To: antoine.monmayrant at laas.fr<mailto:antoine.monmayrant at laas.fr>; Users mailing list for Scilab <users at lists.scilab.org><mailto:users at lists.scilab.org>
Subject: Re: [Scilab-users] HDF5 save is super slow

From: users <users-bounces at lists.scilab.org<mailto:users-bounces at lists.scilab.org>> on behalf of "amonmayr at laas.fr<mailto:amonmayr at laas.fr>" <amonmayr at laas.fr<mailto:amonmayr at laas.fr>>
Reply-To: "antoine.monmayrant at laas.fr<mailto:antoine.monmayrant at laas.fr>" <antoine.monmayrant at laas.fr<mailto:antoine.monmayrant at laas.fr>>, Users mailing list for Scilab <users at lists.scilab.org<mailto:users at lists.scilab.org>>
Date: Tuesday, 16 October 2018 at 09:53
To: "users at lists.scilab.org<mailto:users at lists.scilab.org>" <users at lists.scilab.org<mailto:users at lists.scilab.org>>
Subject: Re: [Scilab-users] HDF5 save is super slow

Couldn't you create your own atom package that restore this raw memory dump for scilab 6.0?
I understand why we moved away from this model, but it seems to be key for you.
There is always a trade-off between portability (and robustness) and raw speed...

Yeah, if that was possible, I would certainly do it. We already have a bunch of C/C++ binaries that we compile and link dynamically, but for that to be easy to implement, I guess the lists and structures need to be stored linearly in one consecutive chunk of memory. I don’t know if that is the case. Anyone? C++ integrations and gateways are very poorly documented at the moment.
Otherwise, I would need to do some recursive implementation, that handles a bunch of different object types. Sounds painful.

Cheers,
Arvid




_______________________________________________

users mailing list

users at lists.scilab.org<mailto:users at lists.scilab.org>

https://antispam.utc.fr/proxy/1/c3RlcGhhbmUubW90dGVsZXRAdXRjLmZy/lists.scilab.org/mailman/listinfo/users



--

Stéphane Mottelet

Ingénieur de recherche

EA 4297 Transformations Intégrées de la Matière Renouvelable

Département Génie des Procédés Industriels

Sorbonne Universités - Université de Technologie de Compiègne

CS 60319, 60203 Compiègne cedex

Tel : +33(0)344234688

http://www.utc.fr/~mottelet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20181018/18170f5c/attachment.htm>


More information about the users mailing list