Help with uniq or awk??

shinoman28 · August 20, 2009, 3:38am

Hi, my dilemna is this:
example i got a file of fruit.txt which contains:

Apple 6
Apple_new 7
old_orange 9
orange 10

Is there any way for me to have an output of

Apple 13
Orange 19

using shell script:

If its duplicate names, it should have been alright but I'm lost as to what can I do in this scenario.
Any help is greatly appreciated.

Franklin52 · August 20, 2009, 3:45am

Is this a home work question?

What is your "real world problem"?

shinoman28 · August 20, 2009, 4:06am

 	 	 	 	 	  Its related to work. I have a system generated file which contains similar entries as stated above. I need to summarize it to something  as I have stated in the output.

My algorithm so far is to grab first entry, and store it in a file. Grab second entry and search the file if the entry already exists, if it does, just grab the number and increment the total number for that entry. If the entry doesnt exists den store it in the file. And so on and so forth.

Issue is entries are not the same, and the pattern is not consistent.
Any ideas??
cheers.

Franklin52 · August 20, 2009, 5:06am

It's gonna be difficult to find a balanced solution, your data is not consistent...

Regards

shinoman28 · August 20, 2009, 9:06pm

hmmmm..
ok i went tru again the generated file, there seems to be a pattern..the first 60% of the word is the same, e.g.

Apple_new_101 15
Apple_newMandarin 6
OrangeMango_new 6
OrangeMango_old 5

Algo:
1.grab the first 60% of the name and store it in a temp_name
2.create a file to put the filtered list e.g. filtered.txt
3.search filtered.txt, if temp_name already exists on the file
4. if it does, grab the number then sum it with the existing number on that name.
5.if it doesnt, store the name and number, and proceed with the next entry.

does this sound feasible? its just im not fluid with shell scripts.
thanks

---------- Post updated at 10:36 AM ---------- Previous update was at 09:15 AM ----------

actually thats fine, leave it for now.
i think to make it easier is to have a proper grouping of data and work from there.
thanks for the help.

danmero · August 20, 2009, 9:57pm

Something like that?

awk -F'[_| ]' 'NF{a[$1]+=$NF;next}END{for(i in a)print i,a}' file

summer_cherry · August 21, 2009, 1:31am

assume your post is just some sample data, so really up to your criteria base on what to categorize them into one, if simply as 'old' and 'new'. Then maybe below perl script can help you some:

while(<DATA>){
	chomp;
	my @tmp = split;
	$tmp[0]=~s/_?(old|new)_?//;
	$hash{$tmp[0]}+=$tmp[1];
}
foreach my $key (keys %hash){
	print $key," ",$hash{$key},"\n";
}
__DATA__
Apple 6
Apple_new 7
old_orange 9
orange 10