squeeze duplicates from a table

Alex_P · May 23, 2010, 1:35pm

I have files with an x amounts of rows with each row having 2 columns seperated by delimiter "|" .
File contains following records for example.

I want to be able to format this table so that it only shows the rows whith the largest corresponding column numbers.

for example, using the example above, I want the command to return:

Is there anyway to return the largest column number ($2) with it's corresponding row ($1) using awk?

Appreciate help.

vgersh99 · May 23, 2010, 1:47pm

nawk '
  BEGIN { FS=OFS="|" }
  { if ($2 > a[$1]) a[$1]=$2 }
  END { for(i in a) print i, a}' myFile

Scott · May 23, 2010, 1:49pm

sort -t\| -nrsk2 file1 | awk -F\| '!A[$1]++'

ygemici · May 24, 2010, 3:40pm

Or bash script a little big

# cat justdoit
 
#!/bin/bash
 
oldIFS=$IFS
IFS="|"
i=0 ; ix=0 ; in=0
exec <$1
 
while read val1 val2
  do
    array=$val1
    let i=i+1
    tmpval=$val2
     if [ $i -ne 1 ] ; then
        if [ ${array[ix]} -ne ${array[ix+1]} ] ; then
           myval2[in]=${tmpval[i-1]}
             ((++in))
        fi
             ((++ix))
     fi
  done
myval2[in]=${tmpval}
 
IFS=$oldIFS
count=${#array[@]}
in=0  ; inx=0
 
myval[0]=${array[0]}
 
while [ $(( count -=1 )) -gt -1 ]
  do
   same=1
    for val in ${array[@]}
     do
       if [ $val -eq ${array[in]} ] ; then
           ((++same))
       fi
     done
 
  var=ok
 
   if [ $same -gt 2 ] ; then
    for newval in ${myval[@]}
      do
        if [ ${array[in]} -ne $newval ] ; then
          var=notok
        else
          var=ok
        fi
      done
 fi
 
 if [ "$var" == "notok" ] ; then
       myval[inx]=${array[in]}
 fi
 
((++in))
((++inx))
  done
 
inx=0
for val1 in ${myval[@]}
   do
     echo "$val1|${myval2[inx]}"
        ((++inx))
   done

# cat myfile
15|69
15|70
15|71
15|72
15|73
15|74
16|2
16|3
16|4
16|5
16|6
16|7
16|8
16|9
16|10
16|11
16|12
16|13
16|14
16|15
16|16
16|17
16|18
16|19
16|20
16|21
17|2
17|3
19|2
19|3

# ./justdoit myfile
15|74
16|21
17|3
19|3

Alex_P · May 25, 2010, 4:21am

Wow, thanks all for your answers. It really was helpfull and it allowed me to complete a program which purpose was to count the number of words in a given text file, and classify them according to the number of caracters they had. Thus, the file created with the repeating numbers - which was an effect of a counter that I placed in the part of the program which calculated the number of words per number of caracters. The only part that was missing was to find a way to take the highest value of the lines which resulted from my code, and print them instead of all those which lead to them. My knowledge in awk or bash programming is too limited for me moment to have found the way by myself, so thanks again for helping me out.

Much appreciated!

cheers to all who helped!