zooby
July 27, 2010, 10:24am
1
Can anyone help me to removing duplicate records from 2 separate files in UNIX?
Please find the sample records for both the files
cat Monday.dat
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6
cat Tuesday.dat
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
I need to compare Monday.dat and Tuesday.dat and delete the duplicate records which exits in both files and get desire output like below
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
zaxxon
July 27, 2010, 10:37am
2
$> grep -vf Monday.dat Tuesday.dat
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
aigles
July 27, 2010, 10:55am
3
I think that they are miisng lines in your output.
A possible solution:
$ head Monday.txt Tuesday.txt
==> Monday.txt <==
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6
==> Tuesday.txt <==
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
$ sort Monday.txt > Monday.tmp
$ sort Tuesday.txt > Tuesday.tmp
$ head Monday.tmp Tuesday.tmp
==> Monday.tmp <==
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6
==> Tuesday.tmp <==
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
$ join -v1 -v2 Monday.tmp Tuesday.tmp
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6
$ rm Monday.tmp Tuesday.tmp
$
Jean-Pierre.
kurumi
July 27, 2010, 11:10am
4
cat Monday.txt Tuesday.txt | sort | uniq -u
zooby
July 27, 2010, 11:44am
5
Since both file has huge records it difficult for me to confirm that i got exact result. The following command ened with diffrent counts. I need to delete the duplicate records which exits in both files.
grep -vf Monday.dat Tuesday.dat
grep -vf Tuesday.dat Monday.dat
i tried join cmd but it produced the result with diffrent file format. Thanks.