Hi.
I often use cgrep for complex matching and manipulation. It extends some of the features of GNU/grep and is comparable in speed. The heart of the following script is the cgrep. The surrounding code displays the environment under which it was run, as well as comparing results:
#!/usr/bin/env bash
# @(#) s1 Demonstrate matching on successive lines, cgrep.
# See: http://sourceforge.net/projects/cgrep/
# Section 1, setup, pre-solution, $Revision: 1.25 $".
# Infrastructure details, environment, debug commands for forum posts.
# Uncomment export command to run script as external user.
# export PATH="/usr/local/bin:/usr/bin:/bin" HOME=""
set +o nounset
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
edges() { local _f _n _l;: ${1?"edges: need file"}; _f=$1;_l=$(wc -l $_f);
head -${_n:=3} $_f ; pe "--- ( $_l: lines total )" ; tail -$_n $_f ; }
db() { : ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
C=$HOME/bin/context && [ -f $C ] && $C cgrep
set -o nounset
pe
FILE=${1-data1}
# Display sample of data file, with edges or head & tail as a last resort.
db " Section 1: display of input data and expected output."
pe " || start sample [ specimen first:middle:last ] $FILE"
specimen $FILE expected-output.txt 2>/dev/null \
|| { pe "(head/tail)"; head -n 5 $FILE; pe " ||"; tail -n 5 $FILE; }
pe " || end"
# Section 2, solution.
pl " Results:"
db " Section 2: solution."
cgrep -a 'REQUEST.*\n.*RESPONSE' $FILE |
tee f1
# Section 3, post-solution, check results, clean-up, etc.
v1=$(wc -l <expected-output.txt)
v2=$(wc -l < f1)
pl " Comparison of $v2 created lines with $v1 lines of desired results:"
db " Section 3: validate generated calculations with desired results."
pl " Comparison with desired results:"
if [ ! -f expected-output.txt -o ! -s expected-output.txt ]
then
pe " Comparison file \"expected-output.txt\" zero-length or missing."
exit
fi
if cmp expected-output.txt f1
then
pe " Succeeded -- files have same content."
else
pe " Failed -- files not identical -- detailed comparison follows."
if diff -b expected-output.txt f1
then
pe " Succeeded by ignoring whitespace differences."
fi
fi
exit 0
producing:
% ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian GNU/Linux 5.0.8 (lenny)
bash GNU bash 3.2.39
cgrep ATT cgrep 8.15
db, Section 1: display of input data and expected output.
|| start sample [ specimen first:middle:last ] data1
Whole: 5:0:5 of 6 lines in file "data1"
20120530025502914 | REQUEST | whatever
20120530025502968 | RESPONSE | whatever
20120530025502985 | RESPONSE | whatever
20120530025502996 | REQUEST | whatever
20120530025503013 | REQUEST | whatever
20120530025503045 | RESPONSE | whatever
Whole: 5:0:5 of 4 lines in file "expected-output.txt"
20120530025502914 | REQUEST | whatever
20120530025502968 | RESPONSE | whatever
20120530025503013 | REQUEST | whatever
20120530025503045 | RESPONSE | whatever
|| end
-----
Results:
db, Section 2: solution.
20120530025502914 | REQUEST | whatever
20120530025502968 | RESPONSE | whatever
20120530025503013 | REQUEST | whatever
20120530025503045 | RESPONSE | whatever
-----
Comparison of 4 created lines with 4 lines of desired results:
db, Section 3: validate generated calculations with desired results.
-----
Comparison with desired results:
Succeeded -- files have same content.
I like awk for its flexilbility (and especially in readability compared to sed for compilcated jobs), but I don't like one-off (nonce) scripts, as well as the fact that my measurements indicate that awk uses about 5 times as much CPU and 5 times as much wall clock time as most members of the grep family for the similar tasks (however, cgrep does use more system time, about twice as much).
See the sourceforge link for the compilable source if it is not in an available repository.
Best wishes ... cheers, drl