[scilab-Users] Searching for a phrase in a text file

Mathieu Dubois mathieu.dubois at limsi.fr
Wed Jul 6 00:51:36 CEST 2011


Hello,

Le 06/07/2011 00:07, lukeaarond a écrit :
> I have a text document with a lot of information in it, but I just need to
> extract one sequence from it. Right before the sequence is always the phrase
> 'ncbi2na' and ends with an apostrophe, therefore, I want to search the whole
> file for 'ncbi2na' and save the data until the end apostrophe. The text file
> looks like the following:
>
> ....'junk'....
> ....'junk'....
> ....'junk'....
> ....'junk'....
> ....'junk'....
> ....'junk'....
> ncbi2na ATTTGAATGCCAA'H
> ....'junk'....
> ....'junk'....
>
> I am currently doing this by parsing each character until it sees the
> characters n,c,b,i,2,n and a, next to eachother. As shown below:
>
Scilab has a function which matches a regular exepression in a vector of 
strings.
So try to load your data in a vector (for example with mfscanf or mgetl) 
and use grep to find the matching line for instance:
w=grep(data, "/^ncbi2na/", "r"); // w contains indices of lines starting 
with ncbi2na

For help about regular expressions, see help("regexp") which points to 
the Perl documentation.

Once you have the index of the lines, it is not hard to find the 
information.
Just remember that stringsd in scilab are not vectors of characters:
s=data(w);
interesting_string = part(s, 8:length(s)-2); // discard 8 first (ncbi2na 
) and 2 last ('H) char

> fd=mopen(Filename,"r");
> while ~meof(fd)
>      character(i)=mgetstr(1,fd)
>      //then various code to save information after ncbi2na and before the
> apostrophe
> end
> mclose(fd)


>
> However, since there is about 1000 rows of 'junk' before the sequence, it
> takes a very long time to parse through and find the sequence.
>
> Therefore, I was wondering if there is a quicker way? Perhaps a function or
> method that allows me to "search" through the file until I find 'ncbi2na'
> and start my parsing from there. Thank you in advance.
>
> --
> View this message in context: http://mailinglists.scilab.org/Searching-for-a-phrase-in-a-text-file-tp3142731p3142731.html
> Sent from the Scilab users - Mailing Lists Archives mailing list archive at Nabble.com.



More information about the users mailing list