Request a script on manupilating the data. Please HELP!

liuzhencc · November 6, 2010, 9:33pm

Dear friends,

I'm struggling to preparing a bunch of gaussian input files, say manually. It's really a time-consuming work without any techniques. I suppose that it could be done by a smart script automatically. But I lack some basic knowledge on scripting. Please help!

My original input looks like,

 C              
 H                  1    1.07000000
 H                  1    1.07000000    2  109.47120259
 H                  1    1.07000000    2  109.47120261    3  120.00001480    0
 H                  1    1.07000000    2  109.47123158    4  119.99999268    0

the modified output should be in the following format,

 C              
 H                  1    B1
 H                  1    B2    2  A1
 H                  1    B3    2  A2    3  D1    0
 H                  1    B4    2  A3    4  D2    0
B1 1.0700000
B2 1.0700000
B3 1.0700000
B4 1.0700000
A1 109.47120259
A2 109.47120259
A3 109.47120259
D1 120.00001480
D2 119.99999268

Clearly to say, the script would do:

we would get the total line number N by "wc -l"
replace the numbers in third column by B1~B(N-1), respectively, and add B1~B(N-1) to the end of the file followed by the replaced number.
replace the numbers in the 5th column by A1~A(N-2), respectively, and add A1-A(N-2) to the end of the file followed by the replaced number.
replace the numbers in the 7th column by D1~D(N-3), respectively, and add D1-D(N-3) to the end of the file followed by the replaced number.

I hope I have already made a clear statement about this script. Please do me a favor. All you help will be greatly appreciated. Thank you in advanced!

ZHEN
from Shanghai, China.

rdcwayx · November 7, 2010, 3:15am

OUT=outfile

awk '{for (i=3;i<=NF;i=i+2) 
       { if (i==3) $i="B"++j
         if (i==5) $i="A"++k
         if (i==7) $i="D"++l
       }
     }1' infile > $OUT

awk 'NF>1 {print "B" ++i,$3}' infile >> $OUT
awk 'NF>3 {print "A" ++i,$5}' infile >> $OUT
awk 'NF>5 {print "D" ++i,$7}' infile >> $OUT

$ cat outfile

 C
H 1 B1
H 1 B2 2 A1
H 1 B3 2 A2 3 D1 0
H 1 B4 2 A3 4 D2 0
B1 1.07000000
B2 1.07000000
B3 1.07000000
B4 1.07000000
A1 109.47120259
A2 109.47120261
A3 109.47123158
D1 120.00001480
D2 119.99999268

Scrutinizer · November 7, 2010, 4:03am

awk ' NF>2{f=$3;$3="B"++b; B=B (B?RS:x) $3 FS f}
      NF>4{f=$5;$5="A"++a; A=A (A?RS:x) $5 FS f}
      NF>6{f=$7;$7="D"++d; D=D (D?RS:x) $7 FS f}
      $1=$1
      END {print B; print A; print D}' infile

C
H 1 B1
H 1 B2 2 A1
H 1 B3 2 A2 3 D1 0
H 1 B4 2 A3 4 D2 0
B1 1.07000000
B2 1.07000000
B3 1.07000000
B4 1.07000000
A1 109.47120259
A2 109.47120261
A3 109.47123158
D1 120.00001480
D2 119.99999268

liuzhencc · November 7, 2010, 4:19am

thanks you very much . both scripts work well for my case. Many thanks!

rdcwayx · November 7, 2010, 6:06pm

scrutinizer:

awk ' NF>2{f=$3;$3="B"++b; B=B (B?RS:x) $3 FS f}
   NF>4{f=$5;$5="A"++a; A=A (A?RS:x) $5 FS f}
   NF>6{f=$7;$7="D"++d; D=D (D?RS:x) $7 FS f}
   $1=$1
   END {print B; print A; print D}' infile

C
H 1 B1
H 1 B2 2 A1
H 1 B3 2 A2 3 D1 0
H 1 B4 2 A3 4 D2 0
B1 1.07000000
B2 1.07000000
B3 1.07000000
B4 1.07000000
A1 109.47120259
A2 109.47120261
A3 109.47123158
D1 120.00001480
D2 119.99999268

print array directly is the skill I learn from your code.