I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is:
while read line
do
emp_name=`echo $line`
grep -e $emp_name details.txt >> output.txt
done < emp.txt
Above code is working fine and I am getting expected result. But, this code is taking too much time (I don't have exact time, more than 6 hrs, later on cancelled the script) while the file size is huge. As an example, I have details.txt of around 2.5GB and record count is around 7.5lacs and the emp.txt has 55K employee name. Can you please suggest any other option/ command which will be better to handle such huge file. Thanks.
Don't use a loop to get this done, your processing the 2.5GB details.txt file for each name in emp.txt. So if you had 2 names in emp.txt your processing 5GB of detail.txt. 10 names = 25GB. It doesn't scale well that way.
Try this:
grep -F -f emp.txt details.txt
Then you are only processing details.txt once, and of course however big emp.txt is.
Using -F might also save some time. If you don't have the '-F' option look for 'fgrep'.
But being on HP-UX the standard 'grep' should have the -F option available.
Thank you all for your quick response !! Thanks a lot rwuertn; '-F' option is working and I am able to extract the required data within less time period.
However, the files are like:
emp.txt
------------
John
Kevin
Prakash
Susan
Ken
details.txt
-------------
HDR|Prakash D
DTL|Prakash|EMP0000010|Sr Associate|FL
HDR|Kevin T
DTL|Kevin|EMP0000004|Analyst|IL
HDR|John M
DTL|John|EMP0000184|Manager|CA
Nope, I do not have any further query right now. I did mention the file details as someone else was looking for the file structure.
Thanks rwuerth for your suggesstion. It is working fine. I would let you know about the saving by couple of days as full volume testing is yet pending.
It's a huge time saving with this command. It's taking less than 5 mins to extract details around 60K employee from around 2.8GB detail file.
Thanks again for your suggestion.