Performance Issue for a file search command

Tanu · July 28, 2017, 11:04am

Hi All,

This query is regarding performance improvement of a command.

I have a list of IDs in a file (say file1 with single ID column) and file2 has the data rows.

I need to get the IDs from file1 and search in file2, matching rows from file2 should be written to a file3.

For this scenario I have been using below command-

 for ID in `cat file1`; do grep $ID file2; done > file3

The command I am using is super slow. Can someone please let me know how can I improve the performance here.

Thanks in advance,
Tanu

Corona688 · July 28, 2017, 11:24am

From man grep:

       -f FILE, --file=FILE
              Obtain patterns  from  FILE,  one  per  line.   The  empty  file
              contains zero patterns, and therefore matches nothing.

So

grep -f file1 file2 > file3

Tanu · July 28, 2017, 11:41am

Thank you for your reply @Corona688. But I see the grep command mentioned by you is also running very slow. I am working with files having millions of records.

Let me know if any other way to improve the search.

Corona688 · July 28, 2017, 11:52am

How big is file1? If it's larger than available memory, it will of course be slow.

Tanu · July 28, 2017, 12:23pm

Hi Corona688,

grep -F -f file1 file2 > file3

The above code (added -F) worked really fast for me. I could complete search for a file with 1.5 million records in 5-6 seconds.

Really appreciate for your response.

Thanks,
Tanu