search the largest number and duplicates string

fongthai · May 17, 2007, 5:54am

Hi,

My input file contain list of username, and it may have name with number as a suffix (if duplicated).
Ex:
mary
john2
mike
john3
john5
mary10
alexa

So i want to check with a specific username (without suffix number) how many duplicated name, and what is the biggest number
Ex: with "john" -> ouput: 3 duplicates, biggest is 5
with "mary" -> ouput: 2 duplicates, biggest is 10

Anyone help?

Thanks

anbu23 · May 17, 2007, 6:42am

awk ' { sub("[0-9]+$"," &"); arr[$1] = $2 > arr[$1] ? $2 : arr[$1]; cnt[$1]++; }
> END { for ( nm in cnt ) { if( cnt[nm] > 1 ) print nm , arr[nm] , "Count :" cnt[nm] } } ' filename

fongthai · May 17, 2007, 6:48am

I got this error:

awk: { sub("[0-9]+$"," &"); arr[$1] = $2 > arr[$1] ? $2 : arr[$1]; cnt[$1]++; } > END { for ( nm in cnt ) { if( cnt[nm] > 1 ) print nm , arr[nm] , "Count :" cnt[nm] } }
awk: ^ syntax error

anbu23 · May 17, 2007, 6:52am

Remove > before END in this code

awk ' { sub("[0-9]+$"," &"); arr[$1] = $2 > arr[$1] ? $2 : arr[$1]; cnt[$1]++; }
> END { for ( nm in cnt ) { if( cnt[nm] > 1 ) print nm , arr[nm] , "Count :" cnt[nm] } } ' filename

Unbeliever · May 17, 2007, 6:53am

Assuming your data is in a file 'data.txt', this perl will do it

$name = $ARGV[0];
$count=0;
$biggest=0;

open (DATA,"<data.txt");

while($line = <DATA>)
{
  chomp;
  if ($line =~ /$name(\d*)/)
  {
    $num = $1;
    $count++;
    $biggest = $num > $biggest?$num:$biggest;
  }
}

print "$ARGV[0]: $count duplicates, biggest is $biggest\n";

fongthai · May 17, 2007, 9:32pm

Thanks anbu23 and Unbeliever,

Anbu,
I know the basic of awk, but your script is so hard to understand, could you please explain me? specically this part:

{ sub("[0-9]+$"," &"); arr[$1] = $2 > arr[$1] ? $2 : arr[$1]; cnt[$1]++; }

ghostdog74 · May 17, 2007, 10:15pm

Output display needs fine tuning.

awk '
{
   word=$1
   sub(/[a-zA-Z]+/,"",$1)
   num=$1
   sub(/[0-9]+/,"",word) 
   wordcount[word]++
   if  ( user[word] < num ) {
      user[word]=num
   }   
}
END {
    for (usr in user) {
       print usr "->\t" "largest number: " user[usr]
    }
    for (c in wordcount) {
       print c "->\t" wordcount[c],"duplicates"
    }    
}' "file"

output:

 # ./test.sh
mary->  largest number: 10
mike->  largest number:
john->  largest number: 5
alexa-> largest number:
mary->  2 duplicates
mike->  1 duplicates
john->  3 duplicates
alexa-> 1 duplicates

fongthai · May 17, 2007, 11:51pm

Thanks ghostdog74,

Dear all,

I want to to display the data only with user query.
Ex: User input: "mary"
Then script answer: dup=2, big=10

I saw that perl scipt of Unbeliver do that way, but i need it work in my bash shell.

Please show me how!

fongthai · May 18, 2007, 1:56am

OK, base on the above perl script, I can translate it to bash script.
Here is the script:

name=$1
count=0;
last=0;

while read line
do
        u=`echo $line |sed 's/[0-9]//g'`
        n=`echo $line |sed 's/[a-zA-Z]//g'`
        if [ "$u" = "$name" ]; then
                count=`expr $count + 1`
                if [ $n -gt $last ]; then
                        last=$n
                fi
        fi
done < user.txt
echo $count dups
echo $last is biggest

vino · May 18, 2007, 2:11am

Now that you have something running, here's another.

#! /bin/ksh
#

if [ $# -eq 0 ] ; then
    echo "Please enter a name"
    exit 1;
fi;

name=$1
repeat=0
large=0

repeat=$(grep $name t)
set -- $repeat
repeat=$#

for list in "$@"
do
    n=${list#$name}
    if [[ -n $n ]] ; then
        if [[ $n -gt $large ]] ; then
        large=$n
        fi;
    fi;
done

echo "$name : Repeat=$repeat"
echo "$name : Largest=$large"

anbu23 · May 18, 2007, 2:19am

awk -v name="mary" ' 
	$0 ~ name { 
		sub("[0-9]+$"," &"); 
		arr[$1] = $2 > arr[$1] ? $2 : arr[$1]; cnt[$1]++; 
	}
	END { 
		for ( nm in cnt ) { 
			if( cnt[nm] > 1 ) print nm , arr[nm] , "Count :" cnt[nm] 
		} 
	} ' filename

anbu23 · May 18, 2007, 2:31am

sub("[0-9]+$"," &") Add a space before the number in the input. For the input "mary10" the output is "mary 10". Now you have two fields one with name and other is number.

$2 > arr[$1] ? $2 : arr[$1] Here we are checking whether second field is greater than number already stored in array. If yes store that number to array.

For the first line 2 is second field and we are checking 2 > arr["john"]. Since this is the first time "john" is index and nothing is returned and two is greater and finally assigned to arr["john"].
For the second line 3 is second field and 3 > 2 is true and 3 is assigned to arr["john"]

Count is stored in cnt array

fongthai · May 18, 2007, 3:22am

Thanks anbu23 for explaining me!

Shell_Life · May 18, 2007, 10:40am

Fongthai,
See if the following does what you want:

typeset -i mNbrDups
mNbrDups=`egrep -c "^${1}[0-9]*$" input_file`
if [ ${mNbrDups} -eq 0 ]; then
  mHighest="Not found"
else
  mHighest=`egrep "^${1}[0-9]*$" input_file | sed "s/${1}//" | sort -n | tail -1`
fi
echo "User input: "${1}
echo "dup = "${mNbrDups}" big = "${mHighest}