help for fast way of finding line number for a regex

JoeColeEPL9 · September 5, 2011, 5:22am

Hello,

I am trying to find out the line numbers where regex match and put them into a file with below command:

 awk '/'$pat'/ {print NR}' $fileName >> temp.txt

where $pat is the regex

but this command is taking a lot of time to execute with bigger files for size more than 5000000 KBs.

could we make this faster by any alternative command?

Note: I am writing a script to find out a section of a file based on regex match. and the regex can be more than one. for example n lines above regex and n lines down the regex.

jayan_jay · September 5, 2011, 5:25am

try with fgrep ..

$ fgrep -n "$pat" infile

itkamaraj · September 5, 2011, 5:28am

 
grep -n $pat $fileName | cut -d: -f1 > temp.txt

pludi · September 5, 2011, 5:29am

Of course it's slow, what you're doing is akin to trying to repairing a watch using a hammer: it's possible, but frustrating. In this case, the regex as you're applying it has to scan each line completely, checking each character on each line, checking for a match.

So the first question is: is it really a regex, or is it a fixed string?
And the second: can the regex be anchored in some way? Eg, start of the line, or the only word on the line, or something else to minimize the search cost?

JoeColeEPL9 · September 5, 2011, 5:41am

Hello pludi,

regex is a user input and can be any string and we just need to find out the line numbers where it matches anywhere on a line.

Jayan,

fgrep -n "$pat" infile is not working ....

---------- Post updated at 04:41 AM ---------- Previous update was at 04:37 AM ----------

may be this clarifies more....

I am trying to write a script which will give n lines above (user input) and n lines below (user input) the matched pattern(user input).
and the pattern may be n number of times in the file. so taking all occurance separatly and asking user for which occurance user want the above and below lines.

jayan_jay · September 5, 2011, 5:43am

$ fgrep -n "$pat" $fileName | cut -d: -f1 > temp.txt

yazu · September 5, 2011, 5:48am

With GNU grep you can use -B (before) and -A (after) options.

pludi · September 5, 2011, 6:06am

Let me check if I get this right: the user input will probably be a regular string, without any funky regex stuff (eg foo.+ *b[aA]r\t(baz|BAZ) in it?

JoeColeEPL9 · September 5, 2011, 6:16am

I am using solaris .....

fgrep is almost taking same time as awk ....

---------- Post updated at 05:16 AM ---------- Previous update was at 05:15 AM ----------

yups user input is a regular input ..... simple words or numbers no speacial chars ....