Read 4th column and print those many rows

jacobs.smith · August 18, 2014, 9:53am

Hi,

My input file

chr1	3217769	3217789	2952725-5	255	+
chr1	3260455	3260475	2434087-6	255	-

My desired output

chr1	3217769	3217789	2952725-1	255	+
chr1	3217769	3217789	2952725-2	255	+
chr1	3217769	3217789	2952725-3	255	+
chr1	3217769	3217789	2952725-4	255	+
chr1	3217769	3217789	2952725-5	255	+
chr1	3260455	3260475	2434087-1	255	-
chr1	3260455	3260475	2434087-2	255	-
chr1	3260455	3260475	2434087-3	255	-
chr1	3260455	3260475	2434087-4	255	-
chr1	3260455	3260475	2434087-5	255	-
chr1	3260455	3260475	2434087-6	255	-

So basically I want to read the number after the dash in 4th column.

Then print that entire row starting at 1 until the number that is after the dash.

Thanks in advance.

pilnet101 · August 18, 2014, 11:33am

Try this:

awk '{v=substr($0,index($0,"-")+1,1); for (x=1; x<=v+0; x++)  {sub("-"v,"-"x);sub("-"x-1,"-"x);print $0}}' inputfile

jacobs.smith · August 18, 2014, 11:37am

Thank you. Worked perfectly.

I just saw it that if the digits are more than 1 after the dash, the command is not considering it.

For ex:

The following examples are not being considered

chr1	3217769	3217789	2952725-15 255	+
chr1	3217769	3217789	2952725-555	255	+
chr1	3217769	3217789	2952725-05	255	+

Thanks again.

pilnet101 · August 18, 2014, 12:23pm

Try this one now:

awk 'match($0,/-[0-9]+ */){v=substr($0,RSTART+1,RLENGTH-1)}{for (x=1; x<=(v+0); x++) {sub("-"v,"-"x);sub("-"x-1,"-"x);print}}'

RavinderSingh13 · August 19, 2014, 4:45am

Hello,

Following may help also for same.

awk '{match($4,/\-.*/); a=substr($4,RSTART+1,RLENGTH)} {match($4,/.*\-/); b=substr($4,RSTART,RLENGTH-1)} {for(i=1;i<=a;i++){{$4=b"-"i} print $0}}' filename

Output will be as follows.

chr1 3217769 3217789 2952725-1 255 +
chr1 3217769 3217789 2952725-2 255 +
chr1 3217769 3217789 2952725-3 255 +
chr1 3217769 3217789 2952725-4 255 +
chr1 3217769 3217789 2952725-5 255 +
chr1 3260455 3260475 2434087-1 255 -
chr1 3260455 3260475 2434087-2 255 -
chr1 3260455 3260475 2434087-3 255 -
chr1 3260455 3260475 2434087-4 255 -
chr1 3260455 3260475 2434087-5 255 -
chr1 3260455 3260475 2434087-6 255 -

Thanks,
R. Singh

Scrutinizer · August 19, 2014, 6:32am

Another approach:

awk 'split($2,F," "){p=$0; for(i=1; i<F[1]; i++) {sub(F[1],i,$2); print; $0=p}}1' FS=- OFS=- file

RavinderSingh13 · August 19, 2014, 6:55am

Posted by Scrutinizer

A small change in code as that code will miss one line as it should be <= condition as follows.

awk 'split($2,F," "){p=$0; for(i=1; i<=F[1]; i++) {sub(F[1],i,$2); print; $0=p}}' FS=- OFS=- filename

Output will be as follows.

chr1    3217769 3217789 2952725-1       255     +
chr1    3217769 3217789 2952725-2       255     +
chr1    3217769 3217789 2952725-3       255     +
chr1    3217769 3217789 2952725-4       255     +
chr1    3217769 3217789 2952725-5       255     +
chr1    3260455 3260475 2434087-1       255     -
chr1    3260455 3260475 2434087-2       255     -
chr1    3260455 3260475 2434087-3       255     -
chr1    3260455 3260475 2434087-4       255     -
chr1    3260455 3260475 2434087-5       255     -
chr1    3260455 3260475 2434087-6       255     -

EDITED: Sorry my bad Scrutinizer's code is perfect because 1 is there for same.

Thanks,
R. Singh

pilnet101 · August 19, 2014, 9:01am

Hi Scruitinizer,

I much prefer this solution. I am just a little confused as to how the 'p and $0' assignments are making this work? The 'p' variable is not being used at all but seems to be allowing the sub function to work correctly.

Update - Sorry I understand now I must have overlooked. $0 is getting recompiled as 'p' after the amendments have been made each time.