find numeric duplicates from 300 million lines....

pamu · July 11, 2012, 7:35am

these are numeric ids..

   222932017099186177      
   222932014385467392      
   222932017371820032      
   222932017409556480

I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way..
sort | uniq -d will take longer time and may run out of memory.

Thanks...

methyl · July 11, 2012, 8:06am

Is the data in any particular order? Did it come from a database?

Do you have a database engine? Some processes are just not suitable for Shell tools.

pamu · July 11, 2012, 8:37am

Yes the data is in the same order as i have provided. I am not getting data from database. This is in text file only.

clx · July 11, 2012, 8:43am

You might get enough information from this thread.

Off course, if you have no option left ( within database, as methyl suggested )