Search: find current line, then search back

JimBurns · December 13, 2013, 9:52am

Hello.

I want to find a line that has "new = 0" in it, then search back based on field $4 ([Snapshot1690384351]) in the current line, and find the first line that has field $4 and "last fetch"

Grep or Awk preferred.

Here is what the data looks like:

2013-12-12 12:10:30,117 TRACE [Snapshot1690384351] [com.xxx.xxx] last fetch: Thu Dec 12 11:46:36 CST 2013, Files in root:
...
...
...
2013-12-12 12:12:17,698 DEBUG [Snapshot1690384351] [com.xxx.xxx] /tmp/snapshottmp6158934693853684856.txt diag stats, total = 238439, new = 0

Thanks in advance.

Akshay_Hegde · December 13, 2013, 10:44am

show more data and what output do you expect ?

Yoda · December 13, 2013, 11:00am

Use tac to print file in reverse and search using awk :

tac file | awk -F'[][]' '/new = 0/{v=$2;next}v==$2'

in2nix4life · December 13, 2013, 11:42am

Not a one-liner, but it does the job:

#!/bin/bash
#
#

# check command-line for file
if [ $# -ne 1 ]
then
    echo "Usage: ${0##*/} <file>"
    exit 1
fi

# store the filename
f=$1

# get current working directory and
# change to it
cwd=$(pwd)
cd $cwd

# retrieve the 4th field of all the lines
# with the 'new = 0' pattern and store in
# an array
declare -a PATTERN
PATTERN=( $(awk '/new = 0/{print $4}' $f) )

# bail if no matches found
if [ ! ${PATTERN[@]} ]
then
    echo "No matches found in $f."
    exit 1
fi

# iterate the array and find the matching lines
# based on the stored pattern
for p in ${PATTERN[@]}
do
    # let's escape the brackets in the pattern
    p=$(echo $p | sed 's#\([]/&[]\)#\\\1#g')

    # now build the search pattern
    srchPattern="$p.*last fetch"

    # search for the pattern in the file
    # with last fetch in the line
    awk "/$srchPattern/{print}" $f
done

# done
exit 0

getLast.sh /tmp/file.txt
2013-12-12 12:10:30,117 TRACE [Snapshot1690384351] [com.xxx.xxx] last fetch: Thu Dec 12 11:46:36 CST 2013, Files in root:

9573169c82d3bef7ecabfd699930ed75

alister · December 13, 2013, 12:49pm

If last fetch,new=0 pairs never overlap, the problem is reduced to a trivial, sequential, single-pass solution; store a last fetch when encountered, compare the key of the last stored line to the current line if the current line matches new=0 . Unfortunately, since the OP provides nothing but a scant two lines of sample data, it is unknown if this is a valid assumption.

Regards,
Alister

RudiC · December 13, 2013, 2:21pm

With

grep -E "new = 0|last fetch" file | sort -k4,4
2013-12-12 12:10:30,117 TRACE [Snapshot1690384351] [com.xxx.xxx] last fetch: Thu Dec 12 11:46:36 CST 2013, Files in root:
2013-12-12 12:12:17,698 DEBUG [Snapshot1690384351] [com.xxx.xxx] /tmp/snapshottmp6158934693853684856.txt diag stats, total = 238439, new = 0
2013-12-12 12:10:30,117 TRACE [Snapshot1690384352] [com.xxx.xxx] last fetch: Thu Dec 12 11:46:36 CST 2013, Files in root:
2013-12-12 12:12:17,698 DEBUG [Snapshot1690384352] [com.xxx.xxx] /tmp/snapshottmp6158934693853684856.txt diag stats, total = 238439, new = 0

you'd always have the corresponding lines in pairs.

alister · December 13, 2013, 3:18pm

I disagree. One of the few things we know about the data is that a matching pair is ordered. Your code does not guarantee that a last fetch precedes its new = 0 counterpart.

Why change to the current directory? It's already the current directory. Am I missing something?

There is actually a bug in that cd . If the current directory is foo bar , and if there exists a directory named foo , cd $cwd will change to an unintended directory, because the variable expansion isn't double-quoted.

If there are P patterns, P+1 passes will be necessary. Hopefully it's a small dataset with few patterns. Your approach could be reimplemented to always finish in 2 passes.

Escaping characters for a regular expression engine is a bug-prone procedure that yields brittle solutions.

A case in point: sed 's#$[]/&[]$#\\\1#g' will add a backslash before each ampersand. Since ampersands are not an AWK ERE metacharacter, \& is an undefined sequence. An AWK implementation is allowed to either silently ignore the backslash or it can choose to abort or it can do ... whatever it wants.

I prefer to avoid this type of manual escaping as much as possible.

Regards,
Alister

in2nix4life · December 14, 2013, 10:19am

Thanks for the critiques allister, but as you mentioned in your posting, there was very little information to work with to present an exact solution.

Anything posted would have been more or less a best guess until the poster provided more detailed information. But I digress...

Good eye on the sed blurb. The ampersand was a typo. I was working on something work-related at the same time. It happens. :rolleyes:

Keep up the good work...

in2nix4life