Need Help in script

moutaz1983 · December 5, 2007, 5:18am

Hi All;

How are you all? nice to find this forum hoping to be helpful for me.

I have the following file, Data in it as follow:

Dec 5 11:37:00 stringA
Dec 5 11:37:01 stringC
Dec 5 11:37:02 stringA
Dec 5 11:37:03 stringA
Dec 5 11:37:04 stringA
Dec 5 11:37:05 stringF
Dec 5 11:37:06 stringE
Dec 5 11:37:06 stringB
Dec 5 11:37:08 stringB
Dec 5 11:37:09 stringD
Dec 5 11:37:10 stringA
Dec 5 11:37:11 stringC
Dec 5 11:37:11 stringB
Dec 5 11:37:11 stringA
Dec 5 11:37:15 stringA

I want to make a script using perl or bash to determine the top N occurance of a string, Note that the strings are not known, so I can't match on them.

If any one want to help me, I will be grateful for him

Thanks in advance

moutaz1983 · December 5, 2007, 6:04am

If any one don't know what is required exactly, send any questions. I realy need this script so important.

Also I think about the solution to put the column of strings in an array, then compare the first element in array with all the array if matched it hit a counter.

fpmurphy · December 5, 2007, 6:20am

Here is a start ...

BEGIN { FS=" " }

{
    freq[$4]++
}

END {
    i = 0
    for (word in freq) {
       if ( freq[word] > i ) {
            i = freq[word]
            maxword = word
       }
    }
    printf "%s  %d\n", maxword, freq[maxword]
}

matrixmadhan · December 5, 2007, 6:21am

Just a sample,

#! /opt/third-party/bin/perl

open(FILE, "<", $ARGV[0]);

while(<FILE>) {
  chomp;
  my @arr = split(/ /);
  $fileHash{$arr[$#arr]}++;
}

close(FILE);

my $cnt = 0;
foreach my $k ( keys %fileHash ) {
  if( $cnt == $ARGV[1] ) { last; }
  print "key:$k val:$fileHash{$k}\n";
  $cnt++;
}

exit 0

run it as,

perl file.pl <filename> N

filename - name of input file
N - number of occurrences desired

matrixmadhan · December 5, 2007, 6:28am

fpmurphy:

Here is a start ...

BEGIN { FS=" " }

{
   freq[$4]++
}

END {
   i = 0
   for (word in freq) {
   if ( freq[word] > i ) {
   i = freq[word]
   maxword = word
   }
   }
   printf "%s  %d\n", maxword, freq[maxword]
}

But this would give only the top '1' the element and not top 'N' th element

here is the modification i had made

n=3
awk -v var=$n '{ arr[$NF]++ }END{ for ( i in arr ) { if( var ) { print i, arr; var--; } } }' a

moutaz1983 · December 5, 2007, 7:14am

Thank you all very much, I can't express my thanks for all of you.