[Scilab-users] Advice needed on file parsing

Alexis Cros Alexis.Cros at promes.cnrs.fr
Mon Jun 19 09:48:27 CEST 2017


Hello,

you may use csvTextScan function where you can pass the CSV separator as 
parameter. You can reach what you want calling ':' and ',' as separator 
parameter. As it was said, all lines must have the same number of 
columns and all columns must have the same number of lines.

It may look like somethings like this :

my_file = mgetl('your_file_path')

first_eight_lines = csvTextScan(my_file(1:8), ':')

header = csvTextScan(my_file(9), ',')     // ',' is set by default

data = csvTextScan(my_file(10:$), ',')

Cheers,
Alexis


Le 19/06/2017 à 08:23, paul.carrico at free.fr a écrit :
> Hi
>
> I cannot say if the following is the best way to proceed, but when the 
> number of columns differs, I always have a look to such functions in 
> order to get the data: mopen/mgetl/grep/strindex and so on ... it need 
> a bit of work.
>
> The previous method work when the file size is not huge because mgetl 
> loads in memory all the file first - in case of huge files (I mean 
> with millions of lines), I need to adopt another strategy (bash file 
> using awk - grep - seb and so on tool) in order to have a text/matrix 
> file in a right format ... nevertheless I do not get strings so the 
> previous method may not work.
>
> Just a feedback
> Paul
> Le 2017-06-18 23:10, Richard llom a écrit :
>> Hello fellow scilab-users,
>> I'm writing a script to read and process files, which are constructed as
>> follows:
>> <file start>
>> PCB: 007
>> ASM: 000
>> LOT: 00000
>> FW:  1477971088
>> CH1:  AMPS   10A
>> CH2:  VOLT   60V
>> SMPL: 0064 0125Hz
>> DESC: 12V CU LOG
>> UTC TIME SEC  ,CH1 AMPS DC  ,CH2 VOLT DC
>> 1497812372.910, 8.609146E-03, 1.210613E001
>> 1497812373.895, 1.577809E-01, 1.207540E001
>> 1497812374.578, 1.010268E000, 1.193087E001
>> ... [snip]
>> <file end>
>>
>> To process this file further, I need:
>> 1)
>> the first eight lines stored in pairs, e.g.
>> info(1,1) should yield "PCB" and info(1,2) should yield "007" (string 
>> is ok)
>>
>> 2)
>> line #9 (header), should be available as header(1)="UTC TIME SEC", etc...
>>
>> 3)
>> line 10+
>> these should be scanned in as a matrix.
>>
>>
>> I already tried csvread and msscanf (?), however with no luck so far...
>>
>>
>> So if someone could just point me to the apropiates function for each 
>> task.
>> I hopefully can take it then from there.
>> Thanks & cheers
>> richard
>>
>>
>>
>> --
>> View this message in context:
>> http://mailinglists.scilab.org/Advice-needed-on-file-parsing-tp4036587.html
>> Sent from the Scilab users - Mailing Lists Archives mailing list
>> archive at Nabble.com.
>> _______________________________________________
>> users mailing list
>> users at lists.scilab.org <mailto:users at lists.scilab.org>
>> http://lists.scilab.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> http://lists.scilab.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20170619/44d51629/attachment.htm>


More information about the users mailing list