I will try to explain directly how the input and output looks like and the way to produce output
every line in output comes from columns 2 (78507634), 3(78534748) and 11(last but one), 12(last).
For example line1 in output comes from
col2(78507634)+col12 1st value(0) = 78507634
78507634+ col11 1st value (29) = 78507663
For example line2 in output comes from
col2(78507634)+col12 2nd value(808) = 78508442
78508442+ col11 2nd value (188) = 78508630
input
c1 78507634 78534748 G_X 0 + 78508536 78534673 0 7 29,188,179,205,175,200,230, 0,808,11625,12514,16510,17993,26884,
output
c1 78507634 78507663 1stE_G_X
c1 78508442 78508630 2ndE_G_X
c1 78519259 78519438 3rdE_G_X
c1 78520148 78520353 4thE_G_X
c1 78524144 78524319 5thE_G_X
c1 78525627 78525827 6thE_G_X
c1 78534518 78534748 7thE_G_X
col11 1st value is 29 and col12 1st value is 0 or I'm missing something?
If you want the output you posted:
awk '{
n11 = split($11, t11, ",")
n12 = split($12, t12, ",")
for (i = 0; ++i < n11;) {
s12 = $2 + t12
print $1, s12, s12 + t11
}
}' infile
If the description is semi-correct and the output wrong:
awk '{
n11 = split($11, t11, ",")
n12 = split($12, t12, ",")
for (i = 0; ++i < n11;) {
s11 = $2 + t11
print $1, s11, s11 + s12
}
}' infile
P.S. I didn't include the last column, because I don't now how it is generated.
joeyg
July 21, 2010, 9:21am
3
input file, many lines
x c2 c3 x x x x x x x c11a,c11b,c11c c12a,c12b,c12c
thus that output is simplified:
c2+c12a c2+c11a
If so, then when is c3 value used?
Now I modified the instructions.
@ rad: yes you are right the first mistake you identified is the error (confusion of 11th and 12 columns)
Thank you for correcting me
joeyg:
input file, many lines
x c2 c3 x x x x x x x c11a,c11b,c11c c12a,c12b,c12c
thus that output is simplified:
c2+c12a c2+c11a
If so, then when is c3 value used?
agree, c3 is never used.
And c11 and c12 should be same segments.
c1 c2+c12(1) c2+c12(1)+c11(1)
c1 c2+c12(2) c2+c12(2)+c11(2)
...
Rad first script doing the job perfectly
awk '{
n11 = split($11, t11, ",")
n12 = split($12, t12, ",")
for (i = 0; ++i < n11;) {
s12 = $2 + t12
print $1, s12, s12 + t11
}
}' infile
How ever
The last column is just referring 1stline and 4th value in input (1stE_G_X)
I still don't understand: 4th field value is G_X and you want:
1stE_G_X
2ndE_G_X
...
What's the logic behind G_X <-> E_G_X?
You know, it's easy to generate progressive numbers, not 1st, 2nd etc, do you really need them like this?
There is no logic.
Yes progressive numbers should be fine like 1_G_X...2_G_X etc...
One more thing please Can I get output like this ( a small change after removing 1st (2nd column) 78507634and last number (3rd column))
78534748
c1 78507663 78508442 1_G_X
c1 78508630 78519259 2_G_X
c1 78519438 78520148 3_G_X
c1 78520353 78524144 4_G_X
c1 78524319 78525627 5_G_X
c1 78525827 78534518 6_G_X
Thank you Rad
Could you elaborate more? I don't understand this statement:
output1
c1 78507634 78507663 1_G_X
c1 78508442 78508630 2_G_X
c1 78519259 78519438 3_G_X
c1 78520148 78520353 4_G_X
c1 78524144 78524319 5_G_X
c1 78525627 78525827 6_G_X
c1 78534518 78534748 7_G_X
First remove (red) 1st value in 2nd col and last value in 3rd column. After that
take the values of 3rd column and preceding 2nd column
i.e. 78507663 78508442 and so on .......
output2
c1 78507663 78508442 1_G_X
c1 78508630 78519259 2_G_X
c1 78519438 78520148 3_G_X
c1 78520353 78524144 4_G_X
c1 78524319 78525627 5_G_X
c1 78525827 78534518 6_G_X
awk '{
n11 = split($11, t11, ",")
n12 = split($12, t12, ",")
for (i = 0; ++i < n11 - 1;) {
s12 = $2 + t12
print $1, s12 + t11, $2 + t12[i + 1], i "_" $4
}
}' infile
1 Like