Hi All;
How are you all? nice to find this forum hoping to be helpful for me.
I have the following file, Data in it as follow:
Dec 5 11:37:00 stringA
Dec 5 11:37:01 stringC
Dec 5 11:37:02 stringA
Dec 5 11:37:03 stringA
Dec 5 11:37:04 stringA
Dec 5 11:37:05 stringF
Dec 5 11:37:06 stringE
Dec 5 11:37:06 stringB
Dec 5 11:37:08 stringB
Dec 5 11:37:09 stringD
Dec 5 11:37:10 stringA
Dec 5 11:37:11 stringC
Dec 5 11:37:11 stringB
Dec 5 11:37:11 stringA
Dec 5 11:37:15 stringA
I want to make a script using perl or bash to determine the top N occurance of a string, Note that the strings are not known, so I can't match on them.
If any one want to help me, I will be grateful for him
Thanks in advance
If any one don't know what is required exactly, send any questions. I realy need this script so important.
Also I think about the solution to put the column of strings in an array, then compare the first element in array with all the array if matched it hit a counter.
Here is a start ...
BEGIN { FS=" " }
{
freq[$4]++
}
END {
i = 0
for (word in freq) {
if ( freq[word] > i ) {
i = freq[word]
maxword = word
}
}
printf "%s %d\n", maxword, freq[maxword]
}
Just a sample,
#! /opt/third-party/bin/perl
open(FILE, "<", $ARGV[0]);
while(<FILE>) {
chomp;
my @arr = split(/ /);
$fileHash{$arr[$#arr]}++;
}
close(FILE);
my $cnt = 0;
foreach my $k ( keys %fileHash ) {
if( $cnt == $ARGV[1] ) { last; }
print "key:$k val:$fileHash{$k}\n";
$cnt++;
}
exit 0
run it as,
perl file.pl <filename> N
filename - name of input file
N - number of occurrences desired
But this would give only the top '1' the element and not top 'N' th element
here is the modification i had made
n=3
awk -v var=$n '{ arr[$NF]++ }END{ for ( i in arr ) { if( var ) { print i, arr; var--; } } }' a
Thank you all very much, I can't express my thanks for all of you.