Hi experts,
I have a file with regexes which is used for automatic searches on several files (40+ GB).
To do some postprocessing with the grep result I need the matching line as well as the match itself.
I know that the latter could be achieved with grep's -o option. But I'm not aware of a combined option, i. e. print the match <space> print the matched line or sth like this.
Currently I'm doing this:
zgrep -E -i -f ~/some_patterns.txt <file-list> > result.txt
Based on result.txt some postprocessing takes place. Maybe the extraction of the exact match should be part of the postprocessing?
Alternatively I could switch to an awk construction as well if this would meet my needs better.
TIA & best regards
Stephan
With files that big performance considerations are perhaps vital, so take the following with a grain of salt. I have no file that big to test it.
Instead of using "-f <file>" to read the file automatically you could probably read the file in a shell loop and feed one regexp after the other to "zgrep", like this (just a skeleteon):
#! /bin/sh
fOut="/path/to/result.txt"
cat /dev/null > "$fOut"
while read LINE ; do
echo "---------- $LINE" >> "$fOut"
zgrep -iE "$LINE" <file-list> >> "$fOut"
done
Depending on your regexps you may or may not have to escape some characters so that they are not interpreted by the shell.
I hope this helps.
bakunin
1 Like
Thanks for the hint.
I think I will move that step to the postprocessing step since the result file of the main grep process will be much smaller than the original files.