Removing duplicate records in a file based on single column

Hi,

I want to remove duplicate records including the first line based on column1. For example

inputfile(filer.txt):
-------------

1,3000,5000
1,4000,6000
2,4000,600
2,5000,700
3,60000,4000
4,7000,7777
5,999,8888

expected output:
----------------

3,60000,4000
4,7000,7777
5,999,8888

Is it possible to achieve this using awk command ??

I tried below awk command , it is working but i dont want to give two times file name(filer.txt) in the command. I am allowed to give file name only one time.

awk -F"," 'NR == FNR {  cnt[$1] ++} NR != FNR {  if (cnt[$1] == 1) print $0 }' filer.txt filer.txt

Please suggest me how to achieve this.

Thanks in advance

Use the unique option of the sort command.
Sort the file using the unique option. Then use diff between the original and the output (of the sort) file. Then use the diff file to remove the records from the output file of the sort.

Thanks for reply jgt :), i am allowed to use awk/sed command alone :mad:. can someone give suggestion how exactly i can code it in single command line.

Who makes up these rules, and why????

Got solution using single line command. Thanks. Problem resolved :slight_smile:

Hi,

One solution using 'sed':

$ cat infile
1,3000,5000
1,4000,6000
2,4000,600
2,5000,700
3,60000,4000
4,7000,7777
5,999,8888
$ sed -ne '$! { /\n/! N; } ; :a ; $! { /^\([0-9]*\),.*\n\1[^\n]\+$/ { N; ba; }; } ; s/^\([0-9]*\),.*\n\1// ; tb ; P ; D ; :b ; D' infile
3,60000,4000
4,7000,7777
5,999,8888

Regards,
Birei