pamu
July 11, 2012, 7:35am
1
these are numeric ids..
222932017099186177
222932014385467392
222932017371820032
222932017409556480
I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way..
sort | uniq -d will take longer time and may run out of memory.
Thanks...
methyl
July 11, 2012, 8:06am
2
Is the data in any particular order? Did it come from a database?
Do you have a database engine? Some processes are just not suitable for Shell tools.
pamu
July 11, 2012, 8:37am
3
Yes the data is in the same order as i have provided. I am not getting data from database. This is in text file only.
clx
July 11, 2012, 8:43am
4
You might get enough information from this thread.
Off course, if you have no option left ( within database, as methyl suggested )