Sort the file based on number of occurences

anurupa777 · August 9, 2013, 3:42am

I have a file (input) I want to sort the file based on the number of times a pattern in the first column occurs for example grapes occurs 4 times in combination with other patterns so i want it to be first like shown in the output file. then apple ocuurs thrice so it occupies second position and so on

input

apple	banana
apple	apple
apple	grapes
grapes	banana
grapes	melon
grapes	orange
grapes	cherry
orange	apple
orange	banana
banana	cherry

output

grapes	banana
grapes	melon
grapes	orange
grapes	cherry
apple	banana
apple	apple
apple	grapes
orange	apple
orange	banana
banana	cherry

Klashxx · August 9, 2013, 4:40am

Brute force:

awk '{a[$1]++}END{for (i in a) {print i,a}}' data|sort -k2 -nr|awk '{print $1}'|xargs -i ksh -c 'grep "^{}" data'

Jotne · August 9, 2013, 5:26am

@Klashxx
Something goes wrong here

cat data
apple   banana
apple   apple
apple   grapes
grapes  banana
grapes  melon
grapes  orange
grapes  cherry
orange  apple
orange  banana
banana  cherry

awk '{a[$1]++}END{for (i in a) {print i,a}}' data|sort -k2 -nr|awk '{print $1}'|xargs -i ksh -c 'grep "^{}" data'
xargs: ksh: No such file or directory

Klashxx · August 9, 2013, 5:49am

@Jotne try to use an absolute path.
Ex:

xargs -i /bin/ksh -c 'grep "^{}" data'

or:

xargs -i /bin/sh -c 'grep "^{}" data

Tested in redhat 6.3

Jotne · August 9, 2013, 6:30am

Did not have kshell on my ubuntu , bash and sh works fine

Skrynesaver · August 9, 2013, 6:47am

#!/usr/bin/perl
use strict;
use warnings;

my %fruit_salad;
while(<DATA>){
    chomp;
    my ($main,$side)=split/\s+/,$_;
    push @{$fruit_salad{$main}},$side;
}
for my $main (sort {scalar @{$fruit_salad{$b}} <=> scalar @{$fruit_salad{$a}} } keys %fruit_salad ){
    for my $side (@{$fruit_salad{$main}}){
         print "$main\t$side\n";
    }
}



__DATA__
apple   banana
apple   apple
apple   grapes
grapes  banana
grapes  melon
grapes  orange
grapes  cherry
orange  apple
orange  banana
banana  cherry

seems to work OK

ahamed101 · August 9, 2013, 11:22am

An awk solution

awk '{if(!x[$1])x[$1]=++y;z=x[$1];a[z]++;b[z]=b[z]","$2;m[z]=$1}
END{n=asort(a,d);for(i=n;i>0;i--){for(j in a){if(a[j]==d){split(b[j],c,",");f=2;for(k=a[j];k>0;k--)print m[j],c[f++];a[j]=-1}}}}' input_file

--ahamed

Jotne · August 9, 2013, 1:06pm

Wow, this was hmmmm
I tried to rewrite it some to make it more easy to read.

awk '
	!x[$1] {
		x[$1]=++y;}
	{
	z=x[$1]
	a[z]++
	b[z]=b[z]","$2
	m[z]=$1}
END {
	n=asort(a,d)
	for(i=n;i>0;i--){
		for(j in a){
			if(a[j]==d){
				split(b[j],c,",")
				f=2
				for(k=a[j];k>0;k--)
					print m[j],c[f++]
				a[j]=-1
				}
			}
		}
	}' file