Hi,
My input file
chr1 3217769 3217789 2952725-5 255 +
chr1 3260455 3260475 2434087-6 255 -
My desired output
chr1 3217769 3217789 2952725-1 255 +
chr1 3217769 3217789 2952725-2 255 +
chr1 3217769 3217789 2952725-3 255 +
chr1 3217769 3217789 2952725-4 255 +
chr1 3217769 3217789 2952725-5 255 +
chr1 3260455 3260475 2434087-1 255 -
chr1 3260455 3260475 2434087-2 255 -
chr1 3260455 3260475 2434087-3 255 -
chr1 3260455 3260475 2434087-4 255 -
chr1 3260455 3260475 2434087-5 255 -
chr1 3260455 3260475 2434087-6 255 -
So basically I want to read the number after the dash in 4th column.
Then print that entire row starting at 1 until the number that is after the dash.
Thanks in advance.
Try this:
awk '{v=substr($0,index($0,"-")+1,1); for (x=1; x<=v+0; x++) {sub("-"v,"-"x);sub("-"x-1,"-"x);print $0}}' inputfile
1 Like
pilnet101:
Try this:
awk '{v=substr($0,index($0,"-")+1,1); for (x=1; x<=v+0; x++) {sub("-"v,"-"x);sub("-"x-1,"-"x);print $0}}' inputfile
Thank you. Worked perfectly.
I just saw it that if the digits are more than 1 after the dash, the command is not considering it.
For ex:
The following examples are not being considered
chr1 3217769 3217789 2952725-15 255 +
chr1 3217769 3217789 2952725-555 255 +
chr1 3217769 3217789 2952725-05 255 +
Thanks again.
Try this one now:
awk 'match($0,/-[0-9]+ */){v=substr($0,RSTART+1,RLENGTH-1)}{for (x=1; x<=(v+0); x++) {sub("-"v,"-"x);sub("-"x-1,"-"x);print}}'
1 Like
Hello,
Following may help also for same.
awk '{match($4,/\-.*/); a=substr($4,RSTART+1,RLENGTH)} {match($4,/.*\-/); b=substr($4,RSTART,RLENGTH-1)} {for(i=1;i<=a;i++){{$4=b"-"i} print $0}}' filename
Output will be as follows.
chr1 3217769 3217789 2952725-1 255 +
chr1 3217769 3217789 2952725-2 255 +
chr1 3217769 3217789 2952725-3 255 +
chr1 3217769 3217789 2952725-4 255 +
chr1 3217769 3217789 2952725-5 255 +
chr1 3260455 3260475 2434087-1 255 -
chr1 3260455 3260475 2434087-2 255 -
chr1 3260455 3260475 2434087-3 255 -
chr1 3260455 3260475 2434087-4 255 -
chr1 3260455 3260475 2434087-5 255 -
chr1 3260455 3260475 2434087-6 255 -
Thanks,
R. Singh
Another approach:
awk 'split($2,F," "){p=$0; for(i=1; i<F[1]; i++) {sub(F[1],i,$2); print; $0=p}}1' FS=- OFS=- file
1 Like
Posted by Scrutinizer
Another approach:
Code:
awk 'split($2,F," "){p=$0; for(i=1; i<F[1]; i++) {sub(F[1],i,$2); print; $0=p}}1' FS=- OFS=- file
A small change in code as that code will miss one line as it should be <=
condition as follows.
awk 'split($2,F," "){p=$0; for(i=1; i<=F[1]; i++) {sub(F[1],i,$2); print; $0=p}}' FS=- OFS=- filename
Output will be as follows.
chr1 3217769 3217789 2952725-1 255 +
chr1 3217769 3217789 2952725-2 255 +
chr1 3217769 3217789 2952725-3 255 +
chr1 3217769 3217789 2952725-4 255 +
chr1 3217769 3217789 2952725-5 255 +
chr1 3260455 3260475 2434087-1 255 -
chr1 3260455 3260475 2434087-2 255 -
chr1 3260455 3260475 2434087-3 255 -
chr1 3260455 3260475 2434087-4 255 -
chr1 3260455 3260475 2434087-5 255 -
chr1 3260455 3260475 2434087-6 255 -
EDITED: Sorry my bad Scrutinizer's code is perfect because 1
is there for same.
Thanks,
R. Singh
scrutinizer:
Another approach:
awk 'split($2,F," "){p=$0; for(i=1; i<F[1]; i++) {sub(F[1],i,$2); print; $0=p}}1' FS=- OFS=- file
Hi Scruitinizer,
I much prefer this solution. I am just a little confused as to how the 'p and $0' assignments are making this work? The 'p' variable is not being used at all but seems to be allowing the sub function to work correctly.
Update - Sorry I understand now I must have overlooked. $0 is getting recompiled as 'p' after the amendments have been made each time.