Tanu
July 28, 2017, 11:04am
1
Hi All,
This query is regarding performance improvement of a command.
I have a list of IDs in a file (say file1 with single ID column) and file2 has the data rows.
I need to get the IDs from file1 and search in file2, matching rows from file2 should be written to a file3.
For this scenario I have been using below command-
for ID in `cat file1`; do grep $ID file2; done > file3
The command I am using is super slow. Can someone please let me know how can I improve the performance here.
Thanks in advance,
Tanu
From man grep:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing.
So
grep -f file1 file2 > file3
1 Like
Tanu
July 28, 2017, 11:41am
3
Thank you for your reply @Corona688 . But I see the grep command mentioned by you is also running very slow. I am working with files having millions of records.
Let me know if any other way to improve the search.
How big is file1? If it's larger than available memory, it will of course be slow.
Tanu
July 28, 2017, 12:23pm
5
Hi Corona688,
grep -F -f file1 file2 > file3
The above code (added -F) worked really fast for me. I could complete search for a file with 1.5 million records in 5-6 seconds.
Really appreciate for your response.
Thanks,
Tanu
1 Like