Numerical calculation by any programming language or Awk ??

repinementer · July 21, 2010, 5:33am

I will try to explain directly how the input and output looks like and the way to produce output

every line in output comes from columns 2 (78507634), 3(78534748) and 11(last but one), 12(last).

For example line1 in output comes from

col2(78507634)+col12 1st value(0) = 78507634
78507634+ col11 1st value (29) = 78507663

For example line2 in output comes from

col2(78507634)+col12 2nd value(808) = 78508442
78508442+ col11 2nd value (188) = 78508630

input

c1    78507634    78534748    G_X    0    +    78508536    78534673    0    7    29,188,179,205,175,200,230,    0,808,11625,12514,16510,17993,26884,

output

c1    78507634    78507663    1stE_G_X
c1    78508442    78508630    2ndE_G_X
c1    78519259    78519438    3rdE_G_X
c1    78520148    78520353    4thE_G_X
c1    78524144    78524319    5thE_G_X
c1    78525627    78525827    6thE_G_X
c1    78534518    78534748    7thE_G_X

radoulov · July 21, 2010, 9:11am

col11 1st value is 29 and col12 1st value is 0 or I'm missing something?

If you want the output you posted:

awk '{
  n11 = split($11, t11, ",")
  n12 = split($12, t12, ",")
  for (i = 0; ++i < n11;) {
    s12 = $2 + t12
	print $1, s12, s12 + t11
    }
  }' infile

If the description is semi-correct and the output wrong:

awk '{
  n11 = split($11, t11, ",")
  n12 = split($12, t12, ",")
  for (i = 0; ++i < n11;) {
    s11 = $2 + t11
	print $1, s11, s11 + s12
    }
  }' infile

P.S. I didn't include the last column, because I don't now how it is generated.

joeyg · July 21, 2010, 9:21am

input file, many lines

x c2 c3 x x x x x x x c11a,c11b,c11c c12a,c12b,c12c

thus that output is simplified:

c2+c12a c2+c11a

If so, then when is c3 value used?

repinementer · July 21, 2010, 7:35pm

Now I modified the instructions.

@ rad: yes you are right the first mistake you identified is the error (confusion of 11th and 12 columns)

Thank you for correcting me

rdcwayx · July 21, 2010, 9:21pm

agree, c3 is never used.

And c11 and c12 should be same segments.

c1 c2+c12(1) c2+c12(1)+c11(1)
c1 c2+c12(2) c2+c12(2)+c11(2)
...

repinementer · July 21, 2010, 11:12pm

Rad first script doing the job perfectly

awk '{
  n11 = split($11, t11, ",")
  n12 = split($12, t12, ",")
  for (i = 0; ++i < n11;) {
    s12 = $2 + t12
	print $1, s12, s12 + t11
    }
  }' infile

How ever

The last column is just referring 1stline and 4th value in input (1stE_G_X)

radoulov · July 22, 2010, 2:51am

I still don't understand: 4th field value is G_X and you want:

1stE_G_X
2ndE_G_X
...

What's the logic behind G_X <-> E_G_X?
You know, it's easy to generate progressive numbers, not 1st, 2nd etc, do you really need them like this?

repinementer · July 22, 2010, 5:04am

There is no logic.
Yes progressive numbers should be fine like 1_G_X...2_G_X etc...

One more thing please Can I get output like this ( a small change after removing 1st (2nd column) 78507634and last number (3rd column))
78534748

c1    78507663    78508442    1_G_X
c1    78508630    78519259    2_G_X
c1    78519438    78520148    3_G_X
c1    78520353    78524144    4_G_X
c1    78524319    78525627    5_G_X
c1    78525827    78534518    6_G_X

Thank you Rad

radoulov · July 22, 2010, 5:40am

Could you elaborate more? I don't understand this statement:

repinementer · July 22, 2010, 8:59am

output1

c1    78507634    78507663    1_G_X
c1    78508442    78508630    2_G_X
c1    78519259    78519438    3_G_X
c1    78520148    78520353    4_G_X
c1    78524144    78524319    5_G_X
c1    78525627    78525827    6_G_X
c1    78534518    78534748    7_G_X

First remove (red) 1st value in 2nd col and last value in 3rd column. After that
take the values of 3rd column and preceding 2nd column
i.e. 78507663 78508442 and so on .......

output2

c1    78507663    78508442    1_G_X
c1    78508630    78519259    2_G_X
c1    78519438    78520148    3_G_X
c1    78520353    78524144    4_G_X
c1    78524319    78525627    5_G_X
c1    78525827    78534518    6_G_X

radoulov · July 22, 2010, 10:01am

awk '{
  n11 = split($11, t11, ",")
  n12 = split($12, t12, ",")
  for (i = 0; ++i < n11 - 1;) {
    s12 = $2 + t12
    print $1, s12 + t11, $2 + t12[i + 1], i "_" $4
    }
  }' infile