Performance improvement in grep

vegasluxor · September 17, 2014, 12:38pm

Below script is used to search numeric data from around 400 files in a folder. I have 300 such folders. Need help in performance improvement in the script.

Below Script searches 20 such folders ( 300 files in each folder) simultaneously. This increases cpu utilization upto 90% What changes will improve its performance?

  find . -type f -name "*.txt" -print | xargs cat | egrep 99023 >> myresult.text &

Scrutinizer · September 17, 2014, 1:04pm

Not sure if it will make much of a difference, but something like this:

find . -type f -name '*.txt' -exec grep -F 99023 {} + > myresult.text

You can use grep -Fh is you want to omit the file names...

---
When you say increases CPU utilization to 90%, why would that be bad performance? I can imagine if it takes too long that you should like to speed it up, but just the fact that it temporarily uses a lot of CPU is not necessarily bad..

vegasluxor · September 17, 2014, 10:12pm

I directly used egrep in the folder, instead of using find & cat... but still performance issue is there. Any other command will give better performance... ? Please advise.

---------- Post updated at 09:12 PM ---------- Previous update was at 09:11 PM ----------

@Scrutinizer
Thanks for your input, will try it out !!!

Chubler_XL · September 17, 2014, 10:43pm

Take a leaf out of googles book - index your data.

They do billions of searches on 30 trillion web pages every month.