[Scilab-users] [EXT] parsing TSV (or CSV) file with scilab is a nightmare

Antoine Monmayrant antoine.monmayrant at laas.fr
Tue Apr 28 00:54:25 CEST 2020


Hello Adrian,


In essence, your extremely useful solution is similar to what Samuel and 
Jan proposed: grab the whole file once.
I must admit I did not even consider it given the length of the files 
involved and how easily I managed to crash scilab on small files.


Thanks,


Antoine

On 27/04/2020 18:58, Adrian Weeks wrote:
>
> Hi Antoine,
>
> I often have to read csv files with odd lines that trip functions like 
> csvRead so I often use the method below.  It may solve your problem.
>
> dataread = mgetl(readfile); // Read everything
>
> a = [];
>
> b = [];
>
>>
> for i = 1: size(dataread, 'r') do
>
> line = dataread(i);
>
> if length(line) ~= 0 
> then                                                 // Ignore blank lines
>
> line = tokens(line, [' ', ',', ascii(9)]);            // Accept 
> spaces, commas or tabs
>
> if and(isnum(line)) then                                // If the line 
> is all-numeric
>
> line = strtod(line);
>
> a = [a; line(1)];
>
> b = [b; line(2)];
>
>>
> end
>
> end
>
> end
>
> Adrian Weeks
> Development Engineer, Hardware Engineering EMEA
> Office: +44 (0)2920 528500 | Desk: +44 (0)2920 528523 | Fax: +44 
> (0)2920 520178
> aweeks at hidglobal.com <mailto:aweeks at hidglobal.com>
>
> HID Global Logo <http://www.hidglobal.com/>
>
> Unit 3, Cae Gwyrdd,
> Green meadow Springs,
> Cardiff, UK,
> CF15 7AB.
> www.hidglobal.com <http://www.hidglobal.com>
>
> *From:*users <users-bounces at lists.scilab.org> *On Behalf Of *Antoine 
> Monmayrant
> *Sent:* 27 April 2020 16:41
> *To:* Users mailing list for Scilab <users at lists.scilab.org>
> *Subject:* [EXT] [Scilab-users] parsing TSV (or CSV) file with scilab 
> is a nightmare
>
> **** Please use caution this is an externally originating email. *** ***
>
> Hi all,
>
> This is both a rant and desperate cry for help.
> I'm trying to parse some TSV data (tab separated data file) with 
> scilab and I cannot find a way to navigate around the minefield of 
> bugs present in meof/mgetl/mgetstr/csvRead.
>
> A bit of context: I need to load into scilab data generated by a 
> closed source software.
> The data is in the form of many TSV files (that I cannot share in 
> full, just some redacted bits) with a header and a footer.
> I don't want to hand modify these files or edit them in any way (I 
> need to keep this as portable as possible, so no sed/awk/grep...)
>
>
>     OPTION 1: csvRead
>
> That's the most intuitive solution, however, because of 
> http://bugzilla.scilab.org/show_bug.cgi?id=16391 
> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbugzilla.scilab.org%2Fshow_bug.cgi%3Fid%3D16391&data=02%7C01%7Caweeks%40hidglobal.com%7C958a9bb7c76f40cef22108d7eac3d120%7Cf0bdc1c951484f86ac40edd976e1814c%7C0%7C0%7C637235999087304170&sdata=Myj7OkrpbGfSl3LD4QAoYzifF80drUrz6nPjP9H7xC8%3D&reserved=0> 
> and the presence of more than 1 empty line in my header/footer, this 
> crashes Scilab.
>
>
>     OPTION 2: hand parsing line by line using mgetl/meof
>
> I tried:
>
> filename="tsv.txt";
> [fd, err] = mopen(filename, 'rt');
> while ~meof(fd) do
>     txtline=mgetl(fd,1);
> end
> mclose(fd)
>
> Saddly, and contrary to what's written in "help mgetl", meof keeps on 
> returning 0, well passed the end of the file and the while never ends!
>
>
>     OPTION 3: hand parsing chunk by chunk using mgetstr/meof
>
> "help meof" does not confirm that meof should work with mgetl, but 
> mgetstr is specifically listed.
> I thus tried:
>
> filename="tsv.txt";
> [fd, err] = mopen(filename, 'rt');
> while ~meof(fd) do
>     txtchunk=mgetstr(80,fd);
> end
> mclose(fd)
>
> But thanks to http://bugzilla.scilab.org/show_bug.cgi?id=16419 
> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbugzilla.scilab.org%2Fshow_bug.cgi%3Fid%3D16419&data=02%7C01%7Caweeks%40hidglobal.com%7C958a9bb7c76f40cef22108d7eac3d120%7Cf0bdc1c951484f86ac40edd976e1814c%7C0%7C0%7C637235999087304170&sdata=BOnmop38zy8wUtFRwrPSoVl9HTsJT6NcQAY23qPK8f8%3D&reserved=0> 
> this is also crashing Scilab.
>
>
>     OPTION 4: Can anyone here help me with this?
>
> I am really running out of ideas.
> Did I miss some -hmm- obvious combination of available file parsing 
> scilab functions to achieve my goal?
> I have the feeling that it would have been faster for me to just learn 
> a totally new language that does not suck at parsing files than trying 
> to get it to work with scilab....
>
> Antoine
>
> (depressed)
>
> http://bugzilla.scilab.org/show_bug.cgi?id=16419 
> <https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbugzilla.scilab.org%2Fshow_bug.cgi%3Fid%3D16419&data=02%7C01%7Caweeks%40hidglobal.com%7C958a9bb7c76f40cef22108d7eac3d120%7Cf0bdc1c951484f86ac40edd976e1814c%7C0%7C0%7C637235999087314165&sdata=hrLO7KWziAoacs9ytFJzziqv89FCY46SUNdZEUgvKNQ%3D&reserved=0>
>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> http://lists.scilab.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20200428/a5586369/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1751 bytes
Desc: not available
URL: <https://lists.scilab.org/pipermail/users/attachments/20200428/a5586369/attachment.gif>


More information about the users mailing list