I want to find a line that has "new = 0" in it, then search back based on field $4 ([Snapshot1690384351]) in the current line, and find the first line that has field $4 and "last fetch"
Grep or Awk preferred.
Here is what the data looks like:
2013-12-12 12:10:30,117 TRACE [Snapshot1690384351] [com.xxx.xxx] last fetch: Thu Dec 12 11:46:36 CST 2013, Files in root:
...
...
...
2013-12-12 12:12:17,698 DEBUG [Snapshot1690384351] [com.xxx.xxx] /tmp/snapshottmp6158934693853684856.txt diag stats, total = 238439, new = 0
#!/bin/bash
#
#
# check command-line for file
if [ $# -ne 1 ]
then
echo "Usage: ${0##*/} <file>"
exit 1
fi
# store the filename
f=$1
# get current working directory and
# change to it
cwd=$(pwd)
cd $cwd
# retrieve the 4th field of all the lines
# with the 'new = 0' pattern and store in
# an array
declare -a PATTERN
PATTERN=( $(awk '/new = 0/{print $4}' $f) )
# bail if no matches found
if [ ! ${PATTERN[@]} ]
then
echo "No matches found in $f."
exit 1
fi
# iterate the array and find the matching lines
# based on the stored pattern
for p in ${PATTERN[@]}
do
# let's escape the brackets in the pattern
p=$(echo $p | sed 's#\([]/&[]\)#\\\1#g')
# now build the search pattern
srchPattern="$p.*last fetch"
# search for the pattern in the file
# with last fetch in the line
awk "/$srchPattern/{print}" $f
done
# done
exit 0
getLast.sh /tmp/file.txt
2013-12-12 12:10:30,117 TRACE [Snapshot1690384351] [com.xxx.xxx] last fetch: Thu Dec 12 11:46:36 CST 2013, Files in root:
If last fetch,new=0 pairs never overlap, the problem is reduced to a trivial, sequential, single-pass solution; store a last fetch when encountered, compare the key of the last stored line to the current line if the current line matches new=0 . Unfortunately, since the OP provides nothing but a scant two lines of sample data, it is unknown if this is a valid assumption.
I disagree. One of the few things we know about the data is that a matching pair is ordered. Your code does not guarantee that a last fetch precedes its new = 0 counterpart.
Why change to the current directory? It's already the current directory. Am I missing something?
There is actually a bug in that cd . If the current directory is foo bar , and if there exists a directory named foo , cd $cwd will change to an unintended directory, because the variable expansion isn't double-quoted.
If there are P patterns, P+1 passes will be necessary. Hopefully it's a small dataset with few patterns. Your approach could be reimplemented to always finish in 2 passes.
Escaping characters for a regular expression engine is a bug-prone procedure that yields brittle solutions.
A case in point: sed 's#\([]/&[]\)#\\\1#g' will add a backslash before each ampersand. Since ampersands are not an AWK ERE metacharacter, \& is an undefined sequence. An AWK implementation is allowed to either silently ignore the backslash or it can choose to abort or it can do ... whatever it wants.
I prefer to avoid this type of manual escaping as much as possible.