Search for the two patterns and print everything in between

pirates.genome · January 16, 2012, 2:07pm

Hi all,
I have a file having data:

@HWUSI-EAS1727:19:6:1:3674:984:0:1#GTTAATA
NTTGGGTTTTCT
@HWUSI-EAS1727:19:6:1:3674:984:0:1#GTTA...
NTTGGGTTTTCT
@HWUSI-EAS1727:19:6:1:3674:984:0:1#.....CT
NTTGGGTTTTCT

I want to print everything starting from # till line ends.
can you please help me how to do that??
I am trying in perl, able to parse "#......." pattern but not able to understand how to make it print??

Thanks...

Corona688 · January 16, 2012, 2:24pm

awk -F'#' '{ print "#"$2 }' inputfile > outputfile

If not on linux, try nawk or gawk.

jgt · January 16, 2012, 2:56pm

In sh or ksh or bash

IFS="#"
while read a b
echo "$b"
done <inputfile

Scrutinizer · January 16, 2012, 3:50pm

sed -n 's/.*#//p' infile

GTTAATA
GTTA...
.....CT

or do you need to combine the next line?

sed -n 'N;s/\n//;s/.*#//p' infile

GTTAATANTTGGGTTTTCT
GTTA...NTTGGGTTTTCT
.....CTNTTGGGTTTTCT

pirates.genome · January 16, 2012, 9:58pm

Finally I used this code...by Scrutinizer...
thank you very much for replies...

I didnt know sed can be so useful and easy...now along with perl need to learn sed...

durden_tyler · January 17, 2012, 1:34am

Using Perl, you could search for all text from "#" to end of line and replace the entire line by the result -

$
$ cat f31
@HWUSI-EAS1727:19:6:1:3674:984:0:1#GTTAATANTTGGGTTTTCT
@HWUSI-EAS1727:19:6:1:3674:984:0:1#GTTANTTGGGTTTTCT
@HWUSI-EAS1727:19:6:1:3674:984:0:1#CTNTTGGGTTTTCT
$
$
$ perl -lne 's/^.*#(.*)$/$1/ and print' f31
GTTAATANTTGGGTTTTCT
GTTANTTGGGTTTTCT
CTNTTGGGTTTTCT
$
$

Or you could remove everything up to and including the "#" character from each line -

$
$ perl -lne 's/^.*#// and print' f31
GTTAATANTTGGGTTTTCT
GTTANTTGGGTTTTCT
CTNTTGGGTTTTCT
$
$

Or you could split each line on "#" as the delimiter, assign the chunks to an array and print just the 2nd element of the array -

$
$ perl -lne '@x = split/#/ and print $x[1]' f31
GTTAATANTTGGGTTTTCT
GTTANTTGGGTTTTCT
CTNTTGGGTTTTCT
$
$
$ # Or more succintly...
$
$ perl -lne 'print ((split/#/)[1])' f31
GTTAATANTTGGGTTTTCT
GTTANTTGGGTTTTCT
CTNTTGGGTTTTCT
$
$ # Or even more succintly...
$
$ perl -plne '$_=(split/#/)[1]' f31
GTTAATANTTGGGTTTTCT
GTTANTTGGGTTTTCT
CTNTTGGGTTTTCT
$
$

Or you could even find out the index of the "#" character in each line, extract the substring of each line that starts from that index onwards, and then print that substring, like so -

$
$
$ perl -lne 'print substr($_,index($_,"#")+1)' f31
GTTAATANTTGGGTTTTCT
GTTANTTGGGTTTTCT
CTNTTGGGTTTTCT
$
$

tyler_durden