Regex: print matched line and exact pattern match

stresing · July 14, 2014, 5:53am

Hi experts,

I have a file with regexes which is used for automatic searches on several files (40+ GB).

To do some postprocessing with the grep result I need the matching line as well as the match itself.

I know that the latter could be achieved with grep's -o option. But I'm not aware of a combined option, i. e. print the match <space> print the matched line or sth like this.

Currently I'm doing this:

zgrep -E -i -f ~/some_patterns.txt <file-list> > result.txt

Based on result.txt some postprocessing takes place. Maybe the extraction of the exact match should be part of the postprocessing?

Alternatively I could switch to an awk construction as well if this would meet my needs better.

TIA & best regards

Stephan

bakunin · July 14, 2014, 6:05am

With files that big performance considerations are perhaps vital, so take the following with a grain of salt. I have no file that big to test it.

Instead of using "-f <file>" to read the file automatically you could probably read the file in a shell loop and feed one regexp after the other to "zgrep", like this (just a skeleteon):

#! /bin/sh

fOut="/path/to/result.txt"

cat /dev/null > "$fOut"
while read LINE ; do
     echo "---------- $LINE" >> "$fOut"
     zgrep -iE "$LINE" <file-list> >> "$fOut"
done

Depending on your regexps you may or may not have to escape some characters so that they are not interpreted by the shell.

I hope this helps.

bakunin

stresing · July 14, 2014, 11:05am

Thanks for the hint.

I think I will move that step to the postprocessing step since the result file of the main grep process will be much smaller than the original files.