Remove duplicate lines based on field and sort

cokedude · March 17, 2012, 10:21am

I have a csv file that I would like to remove duplicate lines based on field 1 and sort. I don't care about any of the other fields but I still wanna keep there data intact. I was thinking I could do something like this but I have no idea how to print the full line with this. Please show any method you can think of, but awk would be my preferred tool if possible.

cut -f 1 -d , sorting.csv | sort | uniq

55,I,like,cookies,2,8,9
44,I,like,cookies,2,8,9
88,I,like,cookies,5,7,8
88,I,like,cookies,2,8,9
99,I,like,cookies,5,7,8
99,I,like,cookies,2,8,9
77,I,like,cookies,5,7,8
77,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
66,I,like,cookies,2,8,9
55,I,like,cookies,5,7,8
44,I,like,cookies,5,7,8

bartus11 · March 17, 2012, 10:26am

Post sample input and desired output.

balajesuri · March 17, 2012, 10:57am

sort -t, -nuk1 sorting.csv

adirajup · March 17, 2012, 3:36pm

Hi Balaji,

Pls mention what is sort -t and -nuk1

Regards,
adirajup

ahamed101 · March 17, 2012, 3:48pm

man sort

So basically, it will sort numerically (-n) on the first field (-k1) which is separated by ,(comma) (-t,) and produce unique (-u) results

--ahamed

pravin27 · March 17, 2012, 11:23pm

using Perl

#!/usr/bin/perl

use strict;
my %seen=();
my @flds;
while (<DATA>){
chomp;
@flds=split /,/;
print $_,"\n" if !$seen{$flds[0]}++;
}

__DATA__
55,I,like,cookies,2,8,9
44,I,like,cookies,2,8,9
88,I,like,cookies,5,7,8
88,I,like,cookies,2,8,9
99,I,like,cookies,5,7,8
99,I,like,cookies,2,8,9
77,I,like,cookies,5,7,8
77,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
66,I,like,cookies,2,8,9
55,I,like,cookies,5,7,8
44,I,like,cookies,5,7,8

cokedude · March 18, 2012, 2:27am

Sorry about that. I don't ask these type of questions very often.

44,I,like,cookies,2,8,9
55,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
77,I,like,cookies,5,7,8
88,I,like,cookies,5,7,8
99,I,like,cookies,5,7,8

Works perfectly.

pravin27:

using Perl

#!/usr/bin/perl

use strict;
my %seen=();
my @flds;
while (<DATA>){
chomp;
@flds=split /,/;
print $_,"\n" if !$seen{$flds[0]}++;
}

__DATA__
55,I,like,cookies,2,8,9
44,I,like,cookies,2,8,9
88,I,like,cookies,5,7,8
88,I,like,cookies,2,8,9
99,I,like,cookies,5,7,8
99,I,like,cookies,2,8,9
77,I,like,cookies,5,7,8
77,I,like,cookies,2,8,9
66,I,like,cookies,5,7,8
66,I,like,cookies,2,8,9
55,I,like,cookies,5,7,8
44,I,like,cookies,5,7,8

Does perl have a sorting function? I have never used perl before. Is there a way to use this on a file? I have several huge files I need to do this on. I was just trying to keep my example simple when I showed my data above.

Does anyone know a way to do this with awk?

Scrutinizer · March 18, 2012, 3:21am

You can do it in awk, but then you would still need to sort:

awk -F, '!A[$1]++' infile | sort -nt,

adirajup · March 18, 2012, 3:43pm

So nice of u thanks Ahamed