How to replicate data using Uniq or awk

ahjiefreak · August 17, 2008, 8:34pm

Hi,

I have this scenario; where there are two classes:- apple and orange.

1,2,3,4,5,6,apple
1,1,0,4,2,3,apple
1,3,3,3,3,4,apple
1,1,1,1,1,1,orange
1,2,3,1,1,1,orange

Basically for apple, i have 3 entries in the file, and for orange, I have 2 entries. Im trying to edit the file and find way to replicate the orange data to make it 3 entries.

Output:-
1,2,3,4,5,6,apple
1,1,0,4,2,3,apple
1,3,3,3,3,4,apple
1,1,1,1,1,1,orange
1,2,3,1,1,1,orange
1,1,1,1,1,1,orange

This would make it balance for both number of line contains apple and orange.
I have tried using Uniq but cant figure out further from that.

Please advise. THanks.

Annihilannic · August 17, 2008, 8:48pm

How do you decide which "orange" line to duplicate? Is it always the first one?

Will it always be 3 and 2, or do those quantities vary? Is there other data in the file as well, or is that everything in the file?

vidyadhar85 · August 17, 2008, 10:24pm

you mean you wanna replicate line 4 as line 6??
if so use...
head -4|tail -1 filename >> filename
this appends line 4 as line 6 in you file..

ahjiefreak · August 18, 2008, 12:07am

Hi,

How do you decide which "orange" line to duplicate? Is it always the first one?
> It is always taking from the first one.
E.g if the data have

1,2,3,4,5,6,apple
1,1,0,4,2,3,apple
1,3,3,3,3,4,apple
1,1,0,4,2,3,apple
1,3,3,3,3,4,apple
1,1,1,1,1,1,orange
1,2,3,1,1,1,orange

So, it will have repeated of orange dataset from the first occurrence of orange until it fulfill the similar number of items of orange as apple:-
1,2,3,4,5,6,apple
1,1,0,4,2,3,apple
1,3,3,3,3,4,apple
1,1,0,4,2,3,apple
1,3,3,3,3,4,apple
1,1,1,1,1,1,orange
1,2,3,1,1,1,orange
1,1,1,1,1,1,orange
1,2,3,1,1,1,orange
1,1,1,1,1,1,orange

Will it always be 3 and 2, or do those quantities vary? Is there other data in the file as well, or is that everything in the file?
>The number can be 0,1,...100,.all the integers but no negative numbers.
The real data in the file contains more than 6 numbers with ",". There can be up to hundreds of numbers with ",". But i think it would be similar case handle using this small data example?

Thanks.

Annihilannic · August 18, 2008, 8:24pm

Try this:

awk -F, '
        /apple/ { applecount++ }
        /orange/ { orangedata[++orangecount]=$0 }
        1 # print the line
        END {
                for (i=orangecount;i<applecount;i++) {
                        print orangedata[(i%orangecount)+1]
                }
        }
'

summer_cherry · August 19, 2008, 1:25am

try below perl script

sub RepeatArray{
	$ref=shift;
	@arr=@$ref;
	$num=shift;
	$len=$#arr+1;
	for($i=$len;$i<$num;$i++){
		$arr[$i]=$arr[$i%$len];
	}
	return \@arr;
}
$file=shift;
open(FH,"<$file");
while(<FH>){
	@arr=split(",",$_);
	$temp=$arr[$#arr];
	$_=~tr/\n//d;
	if($hash{$temp}){
		$hash{$temp}=sprintf("%s/%s",$hash{$temp},$_);
	}
	else{
		$hash{$temp}=$_;
	}
	$h{$arr[$#arr]}++;
}
close(FH);
@sum=sort {$b<=>$a;} values %h;
$max=$sum[0];
for $key (keys %hash){
	@arr=split("/",$hash{$key});
	$ref=RepeatArray(\@arr,$max);
	@res=@$ref;
	for($i=0;$i<=$#res;$i++){
		print $res[$i],"\n";
	}
}