Using Awk for extracting data in specific format

please help me writing a awk script

001_r.pdb 0.0265185
001_r.pdb 0.0437049
001_r.pdb 0.0240642
001_r.pdb 0.0310264
001_r.pdb 0.0200482
001_r.pdb 0.0146746
001_r.pdb 0.0351344
001_r.pdb 0.0347856
001_r.pdb 0.036119
001_r.pdb 1.49
002_r.pdb 0.0281011
002_r.pdb 0.0319908
002_r.pdb 0.0516021
002_r.pdb 0.0440953
002_r.pdb 0.0357756
002_r.pdb 0.0289215
002_r.pdb 0.0335896
002_r.pdb 0.0503094
002_r.pdb 1.46839
007_r.pdb 0.0582815
007_r.pdb 0.0738922
007_r.pdb 0.0524815
007_r.pdb 0.0436297
007_r.pdb 0.0476785
007_r.pdb 0.0344794
007_r.pdb 0.0715756
007_r.pdb 1.47235
014_r.pdb 0.0238086
014_r.pdb 0.0410284
014_r.pdb 0.03811
014_r.pdb 0.0343461
014_r.pdb 0.0496776
014_r.pdb 0.0308409
014_r.pdb 1.47679
015_r.pdb 0.036504
015_r.pdb 0.039139
015_r.pdb 0.0505177
015_r.pdb 0.0601075
015_r.pdb 0.0290934
015_r.pdb 1.4956
018_r.pdb 0.00923608
018_r.pdb 0.0506758
018_r.pdb 0.0412613
018_r.pdb 0.0443338
018_r.pdb 1.50705
020_r.pdb 0.0447592
020_r.pdb 0.0346336
020_r.pdb 0.0444563
020_r.pdb 1.50034
027_r.pdb 0.0279227
027_r.pdb 0.0331829
027_r.pdb 1.47212
034_r.pdb 0.0468688
034_r.pdb 1.48727
046_r.pdb 1.49224

the output i wanted
001_r.pdb 0.0265185 0.0437049 0.0240642 0.0310264 0.0200482 0.0146746 0.0351344 0.0347856 0.036119 1.49
002_r.pdb 0.0281011 0.0319908 0.0516021 0.0440953 0.0357756 0.0289215 0.0335896 0.0503094 1.46839
.....
...

..so on..

awk ' { a[$1] = a[$1] == "" ? $2 : a[$1] " " $2 } END { for ( i in a ) { print i " " a } } ' file | sort
1 Like

It works brilliantly but please please can you explain me the logic please i m basic beginner in Awk

a[$1] == "" ? $2 : a[$1] " " $2 If array with first field as index is empty then assign second field to the array. Else append second field to whatever there in array.

Lets take this input and see how code works.

a["001_r.pdb"] is empty now. So
a["001_r.pdb"] = "0.0265185"

Now a["001_r.pdb"] is not empty
a["001_r.pdb"] = a["001_r.pdb"] + " " + "0.0437049"
= "0.0265185" + " " + "0.0437049"
= "0.0265185 0.0437049"

for ( i in a ) { print i " " a [i]} Loop thro array and print all the index and the value stored in array for that index

#!/bin/bash
# bash 4.0

declare -A dict
while read -r LINE
do
 set -- $LINE
 dict[$1]+=$2
done < "file"
for i in ${!dict[@]}
do
  echo "$i - ${dict[$i]}"
done | sort -n

sort -n urfile |awk '! a[$1] {a[$1]=1; printf RS $1 FS } {printf $2 FS}'

001_r.pdb 0.0146746 0.0200482 0.0240642 0.0265185 0.0310264 0.0347856 0.0351344 0.036119 0.0437049 1.49
002_r.pdb 0.0281011 0.0289215 0.0319908 0.0335896 0.0357756 0.0440953 0.0503094 0.0516021 1.46839
007_r.pdb 0.0344794 0.0436297 0.0476785 0.0524815 0.0582815 0.0715756 0.0738922 1.47235
014_r.pdb 0.0238086 0.0308409 0.0343461 0.03811 0.0410284 0.0496776 1.47679
015_r.pdb 0.0290934 0.036504 0.039139 0.0505177 0.0601075 1.4956
018_r.pdb 0.00923608 0.0412613 0.0443338 0.0506758 1.50705
020_r.pdb 0.0346336 0.0444563 0.0447592 1.50034
027_r.pdb 0.0279227 0.0331829 1.47212
034_r.pdb 0.0468688 1.48727
046_r.pdb 1.49224