[Scilab-users] [EXT] parsing TSV (or CSV) file with scilab

Tue Apr 28 11:18:38 CEST 2020

I find it safer to process the data without returning to a disk file. As 
mentioned I actually prefer to start with mgeti() and read the file as 
binary, as then all byte values are accepted.

But anyway with the data separated in lines, it is relatively simple to 
split up with the wanted separators and decimal sign :

clear  dataset;
headerlines=3:
footerlines=2:
for  k=1:size(in_text,1)
     if  k>headerlines  &&  k<n-footerlines  then
        datatemp=strtod(strsplit(in_text(k),[ascii(9),";"]),",");
        dataset(k-headerlines,1:length(datatemp))=datatemp;
     end
end

disp(in_text(1:headerlines));
disp(dataset);
disp(in_text(($-footerlines+1):$));

JÅ

On 2020-04-28 10:14 AM, Rafael Guerra wrote:
>
> Antoine,
>
> One workflow that works fast for me, for large data files, is to load 
> first the whole file with mgetl, then remove all empty lines using 
> isempty in a loop (as shown below), process the header block, isolate 
> the data block and save it to a temporary backup file to disk using 
> mputl, then load very efficiently from disk that backup file using 
> fscanfMat.
>
> tlines=mgetl(fid,-1); /// reads lines until end of file into 1 column 
> text vector/
>
> bool=~cellfun(isempty,tlines);
>
> tlines=tlines(bool); /// removes empty lines/
>
> function*out_text*=_cellfun_(*fun*, *in_text*)
>
> /// Applies function to input text (column strings vector), line by line/
>
> n=size(*in_text*,1);
>
> for i=1:n;
>
> *out_text*(i)=*fun*(*in_text*(i));
>
> end
>
> endfunction
>
> Regards,
>
> Rafael
>
>
> _______________________________________________
> users mailing list
> users at lists.scilab.org
> http://lists.scilab.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.scilab.org/pipermail/users/attachments/20200428/5c3536e2/attachment.htm>