sorting/arrangement problem

Hi,
I have the following 'sorting' problem.
Given the input file:

a:b:c:12:x:k
s:m:d:8:z:m
a:b:c:1:x:k
p:q:r:23:y:m
a:b:c:3:x:k
p:q:r:1:y:m

the output I expect is:

a:b:c:1:x:k
p:q:r:1:y:m
s:m:d:8:z:m

What happened here is I grouped together lines having the same values for columns 1,2,3,5,6 and selected only the line with the smallest value in the 4th column in every group.

I am currently managing this with a small PERL script. Is there a way to do this using shell utilities alone?

Regards...

Try...

awk 'BEGIN{FS=":"}{c=$4;b=$0;$4="";k=$0}!v[k]||v[k]>c{v[k]=c;a[k]=b}END{for(i in a)print a}' file1

You are a man of few words :smiley: ! Amazing!
My unix machine is down, and it will be a while before I can try this out. But now that I have seen it, I can tweak it, if I do run into any errors.

Thanks!

Another way:

sort -t':' -k1,3 -k5,6 -k4,4n s1.txt | \
awk  -F':' '{ line=$0; $4=""; if ($0 != key) print line; key=$0 }' inputfile

Jean-Pierre.

there is a slight problem i am encountering.......

reassigning $4, does not reassign $0. So both of the above methods are failing....any cures?

A little correction to my solution which works fine on my AIX box (do you try with nawk or gawk if they are available on your system ?) :

sort -t':' -k1,3 -k5,6 -k4,4n inputfile  | \
awk  -F':' '{ line=$0; $4=""; if ($0 != key) print line; key=$0 }'

The following solution doesn't modify fields, so it must work with all versions of awk :

sort -t':' -k1,3 -k5,6 -k4,4n inputfile  | \
awk  -F ':' '{ key=$1 ":" $2 ":" $3 ":" $5 ":" $6; if (key != prv_key) print; prv_key=key }'

Jean-Pierre.

Hi,

I made small change to the code posted by Ygor...and that will apply to solution by Aigles too (remember, my problem is reassigning $4 doesnt modify $0). Heres the changed code:

awk 'BEGIN{FS=":"}{str=""; for(j=1;j<=NF;++j) str=sprintf("%s%s",str,$j)} (v[str]=="")||(v[str]>$4){v[str]=$4;a[str]=$0}END{for(i in a)print a[i]}' file1

Here are the tweaks:
(1) Now I have a new variable "str" - this doesnt have the 4th column values
(2) In the condition part, now I check for (v[str]=="") instead of (!v[str]) --- the latter throws an error on my system.

Thanks to everyone for the effort....

Regards, Abhishek

Sorry...the corrected code is (missed the conditional in the for loop):
awk 'BEGIN{FS=":"}{str=""; for(j=1;j<=NF;++j) if(j!=4) str=sprintf("%s%s",str,$j)} (v[str]=="")||(v[str]>$4){v[str]=$4;a[str]=$0}END{for(i in a)print a[i]}' file1