i want to remove all the duplictaes in a file.I dont want even a single entry.
For the input data:
12345|12|34
12345|13|23
3456|12|90
15670|12|13
12345|10|14
3456|12|13
i need the below data in one file
15670|12|13
and the below data in another file
12345|12|34
12345|13|23
12345|10|14
3456|12|90
3456|12|13
I am identifying duplictaes based on first field alone.
if use sort -t"|" -u -k 1,1 it gives
12345|10|14
15670|12|13
3456|12|13
But i dont want the single entry too.
Please help me.
And also if i wnat to sort based on 10th field, can i use sort -k10 or sort -k 10,10?
Whats the difference between those?
Thanks
Try:
awk -F"|" '{a[$1]++;b[$1]=b[$1]?b[$1]"\n"$0:$0}END{for(i in a){if(a==1){print b>"file1"}else{print b>"file2"}}}' input
It will create two files: file1 and file2.
But it's giving illegal statement near line 1, syntax error at line 1.
I am checking in SunOS
Yes with nawk its working.But i want to make 10th field as key field.so what i need to change in that script?
shall i replace $1 by $10?
Thanks
I have changed like
awk -F"|" '{a[$10]++;b[$10]=b[$10]?b[$10]"\n"$0:$0}END{for(i in a){if(a==1){print b>"file1"}else{print b>"file2"}}}' input
But its not giving correct result.
Anything else i need to change?
Thanks
---------- Post updated at 02:11 PM ---------- Previous update was at 02:02 PM ----------
In the file1 i am getting unique records.
But in file2 i am getting all the records.
From the below code anything else i need to change for making 10th field as key?
awk -F"|" '{a[$10]++;b[$10]=b[$10]?b[$10]"\n"$0:$0}END{for(i in a){if(a==1){print b>"file1"}else{print b>"file2"}}}' input
I have tried $(10) too.
Please help me.. thanks
Can you post sample of your real data?
the data is like below:
12116| |12116 |C |M | |8913 |189 |111189 |12119249 |8000 |E|029|W Clock| ger |0|E 12th Street | | |FL |60 |U |111189 |
12116| |12116 |k |Dsd |Y |10 |124 |224 |19621192 |850 |E|D007| |SMr |0|. J- 12 | |Wrs |FL |3331 |US |111224 |
i need to find the duplictaes based on 10th field.
---------- Post updated at 03:06 PM ---------- Previous update was at 02:40 PM ----------
Anything i need to chnage in the below code for that?
awk -F"|" '{a[$10]++;b[$10]=b[$10]?b[$10]"\n"$0:$0}END{for(i in a){if(a==1){print b>"file1"}else{print b>"file2"}}}' input
I've checked that code for following data:
12116| |12116 |C |M | |8913 |189 |111189 |12119249 |8000 |E|029|W Clock| ger |0|E 12th Street | | |FL |60 |U |111189 |
22116| |12116 |C |M | |8913 |189 |111189 |12119249 |8000 |E|029|W Clock| ger |0|E 12th Street | | |FL |60 |U |111189 |
12116| |12116 |k |Dsd |Y |10 |124 |224 |19621192 |850 |E|D007| |SMr |0|. J- 12 | |Wrs |FL |3331 |US |111224 |
And got following result:
solaris% cat file1
12116| |12116 |k |Dsd |Y |10 |124 |224 |19621192 |850 |E|D007| |SMr |0|. J- 12 | |Wrs |FL |3331 |US |111224 |
solaris% cat file2
12116| |12116 |C |M | |8913 |189 |111189 |12119249 |8000 |E|029|W Clock| ger |0|E 12th Street | | |FL |60 |U |111189 |
22116| |12116 |C |M | |8913 |189 |111189 |12119249 |8000 |E|029|W Clock| ger |0|E 12th Street | | |FL |60 |U |111189 |
So it is working as expected for this sample... Can you post sample data that gives incorrect results?