[Scilab-users] [EXT] parsing TSV (or CSV) file with scilab is a nightmare

Adrian Weeks aweeks at hidglobal.com
Mon Apr 27 18:58:08 CEST 2020


Hi Antoine,

I often have to read csv files with odd lines that trip functions like csvRead so I often use the method below.  It may solve your problem.

                dataread = mgetl(readfile);                                                         // Read everything
                a = [];
                b = [];
                …
                for i = 1: size(dataread, 'r') do
                                line = dataread(i);
                                if length(line) ~= 0 then                                                 // Ignore blank lines
                                                line = tokens(line, [' ', ',', ascii(9)]);            // Accept spaces, commas or tabs
                                                if and(isnum(line)) then                                // If the line is all-numeric
                                                                line = strtod(line);
                                                                a = [a; line(1)];
                                                                b = [b; line(2)];
                                                                …
                                                end
                                end
                end


Adrian Weeks
Development Engineer, Hardware Engineering EMEA
Office: +44 (0)2920 528500 | Desk: +44 (0)2920 528523 | Fax: +44 (0)2920 520178
aweeks at hidglobal.com<mailto:aweeks at hidglobal.com>
[HID Global Logo]<http://www.hidglobal.com/>
Unit 3, Cae Gwyrdd,
Green meadow Springs,
Cardiff, UK,
CF15 7AB.
www.hidglobal.com<http://www.hidglobal.com>


From: users <users-bounces at lists.scilab.org> On Behalf Of Antoine Monmayrant
Sent: 27 April 2020 16:41
To: Users mailing list for Scilab <users at lists.scilab.org>
Subject: [EXT] [Scilab-users] parsing TSV (or CSV) file with scilab is a nightmare

*** Please use caution this is an externally originating email. ***

Hi all,



This is both a rant and desperate cry for help.
I'm trying to parse some TSV data (tab separated data file) with scilab and I cannot find a way to navigate around the minefield of bugs present in meof/mgetl/mgetstr/csvRead.

A bit of context: I need to load into scilab data generated by a closed source software.
The data is in the form of many TSV files (that I cannot share in full, just some redacted bits) with a header and a footer.
I don't want to hand modify these files or edit them in any way (I need to keep this as portable as possible, so no sed/awk/grep...)

OPTION 1: csvRead

That's the most intuitive solution, however, because of http://bugzilla.scilab.org/show_bug.cgi?id=16391<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbugzilla.scilab.org%2Fshow_bug.cgi%3Fid%3D16391&data=02%7C01%7Caweeks%40hidglobal.com%7C958a9bb7c76f40cef22108d7eac3d120%7Cf0bdc1c951484f86ac40edd976e1814c%7C0%7C0%7C637235999087304170&sdata=Myj7OkrpbGfSl3LD4QAoYzifF80drUrz6nPjP9H7xC8%3D&reserved=0> and the presence of more than 1 empty line in my header/footer, this crashes Scilab.

OPTION 2: hand parsing line by line using mgetl/meof

I tried:

filename="tsv.txt";
[fd, err] = mopen(filename, 'rt');
while ~meof(fd) do
    txtline=mgetl(fd,1);
end
mclose(fd)

Saddly, and contrary to what's written in "help mgetl", meof keeps on returning 0, well passed the end of the file and the while never ends!

OPTION 3: hand parsing chunk by chunk using mgetstr/meof

"help meof" does not confirm that meof should work with mgetl, but mgetstr is specifically listed.
I thus tried:

filename="tsv.txt";
[fd, err] = mopen(filename, 'rt');
while ~meof(fd) do
    txtchunk=mgetstr(80,fd);
end
mclose(fd)

But thanks to http://bugzilla.scilab.org/show_bug.cgi?id=16419<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbugzilla.scilab.org%2Fshow_bug.cgi%3Fid%3D16419&data=02%7C01%7Caweeks%40hidglobal.com%7C958a9bb7c76f40cef22108d7eac3d120%7Cf0bdc1c951484f86ac40edd976e1814c%7C0%7C0%7C637235999087304170&sdata=BOnmop38zy8wUtFRwrPSoVl9HTsJT6NcQAY23qPK8f8%3D&reserved=0> this is also crashing Scilab.



OPTION 4: Can anyone here help me with this?

I am really running out of ideas.
Did I miss some -hmm- obvious combination of available file parsing scilab functions to achieve my goal?
I have the feeling that it would have been faster for me to just learn a totally new language that does not suck at parsing files than trying to get it to work with scilab....



Antoine

(depressed)





http://bugzilla.scilab.org/show_bug.cgi?id=16419<https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fbugzilla.scilab.org%2Fshow_bug.cgi%3Fid%3D16419&data=02%7C01%7Caweeks%40hidglobal.com%7C958a9bb7c76f40cef22108d7eac3d120%7Cf0bdc1c951484f86ac40edd976e1814c%7C0%7C0%7C637235999087314165&sdata=hrLO7KWziAoacs9ytFJzziqv89FCY46SUNdZEUgvKNQ%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20200427/3ad3a708/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 1751 bytes
Desc: image001.gif
URL: <https://lists.scilab.org/pipermail/users/attachments/20200427/3ad3a708/attachment.gif>


More information about the users mailing list