This is first post. Kindly excuse me if I do not adhere to any rules and regulations of this forum.
I have a file containing some rows with three columns each per row(separeted by a space). There are certain rows for which first two columns have same value but the value in third column is different. In that case I wish to retain only that row which has the maximum value for third column. Further I want to wirte the rows not required to a file for further processing.
1123022201 9310777627 3976
1127021871 9312262893 3600 -- not required, sent to another file.
1127021871 9312262893 4016 -- to be retained.
1122000518 9350066745 4464
1127455152 7827493958 3600 -- not required, sent to another file.
1127455152 7827493958 5138 -- to be retained.
sort -k 1,2 file | awk '{print $1,$2}' | uniq -d > temp #Identifies the reduntant values and stores it in a temp file
while read i
do
row1=$(grep "$i" file | head -1) #Takes the first row from the redundant rows
row2=$(grep "$i" file | tail -1) #Takes the second row from the redundant rows
diff=$(grep "$i" file | awk '{gsub(/[a-zA-Z: ]+/," ")
m=split($3,a," ");
for (i=1;i<=m;i++)
if (NR==1) b=a; else print a - b }') #Finds the difference between the rows
if [ $diff -lt 0 ];
then
echo "$row2 moved to new file"
echo $row2 >> new_file #With the difference the least row is found and stored in another file
else
echo "$row1 moved to new file"
echo $row1 >> new_file
fi
done < temp
---------- Post updated at 03:08 AM ---------- Previous update was at 03:06 AM ----------
By comparing the above two comments, [pamu and myself], you can see how powerful awk is...
You did not specify what should happen
a) if the third column's values are identical
b) if values can go negative
c) to lines that do not have duplicates.
Thus any proposal might need further refinement. Try:
Thanks for replying so quickly and providing me the desired solution.
I tried the solution provided by RudiC and Sathyaonunix. Both are providing me the solutions. I am looking into the script by Pamu ( I am finding it difficult to understand as my expertise in Unix is not good, so kindly give me some time.)
As regarding query raised by RudiC
(i) if the third column entries are identical, then only one row is to be retained in the original file.
(ii) values cannot be negative.
(iii) the lines without duplicate values to be retained in the orginal file as such.
A[$1,$2]++ # Increments the array index for $1 and $2.
B[$1,$2]=B[$1,$2]?B[$1,$2]<$3?$3:B[$1,$2]:$3 # Here we compare is $3 is greater than previous $3 for $1 and $2 and retain max value and assign it to B[$1,$2]