Remove duplicate records

I want to remove the records based on duplicate. I want to remove if two or more records exists with combination fields. Those records should not come once also

file abc.txt


key fields are 1st,2nd and 4th.

Should return only


Can you give me the command for this

Please clarify on your 3rd record/line.

Corrected the fields

One approch in awk...

awk -F";" '{_s=$1" "$2" "$4; A[_s]++; B[_s]=$0;; } END { for (i in A) { if (A==1) print B; }}' file

This approach may yield false matches if the fields in question can contain a space and are not required to be of equal length. From the sample data, we see that the 4th field varies in length. Perhaps a space awaits as well.

False match example:

1;2 ;3;4 --> _s="1 2  4"
1;2;3; 4 --> _s="1 2  4"

Just in case, best to use the same delimiter as was used to split the input:


Alternatively, you can set SUBSEP (which determines what AWK will use as an internal separator for "multidimensional" array subscripts) to ";" which allows you to safely use A[$1,$2,$4].


>cut -d";" -f1,2,4 <scottn.txt | sort | uniq -u | gawk 'IFS=OFS=FS=";"{print $1,$2,".*",$3}' >match.txt
>grep -f match.txt <scottn.txt

Interesting approach. Here's a different take on it:

sort -t\; -k1,1 -k2,2 -k4,4 scottn.txt | sed 's/[^;]*;/*;/3' | uniq -u > match.txt
grep -f match.txt scottn.txt


>cut -d";" -f1,2,4 <scottn.txt | sort | uniq -u | gawk 'IFS=OFS=FS=";"{print $1,$2,".*",$3}' >match.txt
>grep -f match.txt <scottn.txt

sort -t\; -k1,1 -k2,2 -k4,4 scottn.txt | sed 's/[^;]*;/*;/3' | uniq -u > match.txt
grep -f match.txt scottn.txt

it is failing for the fields having $ ( ) characters.