Print text between delimiters IF it contains a certain term...

So I'm racking my brain on appropriate ways to solve a problem that once fixed, will solve every problem in my life. Its very easy (for you guys and gals) I'm sure, but I can't seem to wrap my mind around the right approach. I really want to use bash to do this, but I can't grasp how I'm going to do it. I have a pretty good handle on how to do it with a long regex in perl, but I'd rather not call that within bash, and I really don't want to add another perl script into the mix. As a sample of what I'm needing, here is some data;

-------------------------------
15:00:01.194213 IP 6.14.8.18.80 > 21.1.12.11.3311: tcp 183
-------------------------------
15:00:01.201435 IP 21.2.17.16.4918 > 6.15.2.18.80: tcp 0
-------------------------------
15:00:01.235586 IP 16.23.16.12.80 > 21.1.18.11.1519: tcp 141
Content-Type: application/ocsp-response
Content-Length: 3686
Connection: Keep-Alive
-------------------------------
15:00:01.235839 IP 16.25.36.42.80 > 21.1.18.11.119: tcp 1380
-------------------------------
15:00:01.235840 IP 6.235.16.12.80 > 21.1.18.121.1519: tcp 80
-------------------------------

The general idea is that I want to "search" a bit of text like this for certain terms, such as "Content-Length:" and receive the following as the result;

-------------------------------
15:00:01.235586 IP 16.23.16.12.80 > 21.1.18.11.1519: tcp 141
Content-Type: application/ocsp-response
Content-Length: 3686
Connection: Keep-Alive

So, what you have as the result is a match for the search text in question, as well all other information that falls within that packet's delimiters. Now obviously for small cases such as this, I could easily just do a grep -B 4 or something, but not only does that not work consistently, but it also will not get the end of that "section" before the next delimiter. I will graciously accept any solution but I prefer that it be in bash for my own curiousity and ease of implementation.

one way:

nawk '{gsub("^--*$", "")}1' myFile  | nawk '/Content-Type/' RS=""
1 Like

You're the man! It works on this scenario great. I'm going to test it on a large scale, but it definitely looks to fit my needs. I don't have access to nawk, but awk works as well.

Alternatively..

awk '/Content-Length/{print}' RS='----*' inputfile

not all awk-s support RS as a regex - most take RS as a single char only.

It should be noted that multi-character RS is supported by limited awk implementations (GNU awk and may be TAWK).

Thanks radoulov and vgersh99 for the information. If all the awk's does not support regex in RS we could try as below..?

awk '/Content-Length/{sub(/--*/,"");print}' RS='---' inputfile

It depends, consider that that --- is considered a single dash: - .
It's not regex vs string, it's multi-character/regex vs. single character.

1 Like

If im not wrong, shall we say that all awks support multi-character in RS..?

Consider the following:

% awk --version | head -1;printf '%s\n' a--b-c | awk 'END { print NR } 1'  RS=--
GNU Awk 4.0.0
a
b-c

2
$ uname -sr; printf '%s\n' a--b-c | nawk 'END { print NR } 1' RS=--
SunOS 5.8
a

b
c

4
$ uname -sr; printf '%s\n' a--b-c | /usr/xpg4/bin/awk 'END { print NR } 1' RS=--
SunOS 5.8
a

b
c

4
1 Like

No!

Man nawk on Sun OS:

mawk seems to support regex as RS too:

% uname -sr; printf '%s\n' a--b-c | mawk 'END { print NR } 1' RS=--
Linux 2.6.38-11-generic
a
b-c

2
% uname -sr; printf '%s\n' a--b-c | mawk 'END { print NR } 1' RS=-+
Linux 2.6.38-11-generic
a
b
c

3

From man mawk on Ubuntu:

The documentation states:

mawk splits files into records by the same algorithm, but with the slight difference that RS is really a  termina-
       tor instead of a separator.  (ORS is really a terminator too).

              E.g., if FS = ":+" and $0 = "a::b:" , then NF = 3 and $1 = "a", $2 = "b" and $3 = "", but if "a::b:" is the
              contents of an input file and RS = ":+", then there are two records "a" and "b".

Thanks radoulov for your kind explanation..

You're very welcome!

So, I'm pleased with how my tool works, however I have more additions I would like to add, namely case insensitivity. So, I essentially have the user search for some term that they would like to see from a large list of stuff similar to below.

cat textfile.txt

gives

-------------------------------------
jibberish
-------------------------------------
Accept: */*
Referer: http://www.google.com/
Accept-Language: en-us
User-Agent: Mozilla/4.0
Accept-Encoding: gzip, deflate
Host: www.google.com
Connection: Keep-Alive
-------------------------------------
jibberish
-------------------------------------

Currently I am reading in a user provided variable, and using it to search,

echo "Enter search term:"
read searchstring
#escaping the appropriate characters
searchstring2=$(echo $searchstring | sed 's/(/\\(/g' | sed 's/)/\\)/g'| sed 's/\//\\\//g')
cat textfile.txt | awk '/'"$searchstring2"'/{print}' RS='------*'

The results for a search for "Referer: http://www.google.com/" provide exactly what I want, which is;

Accept: */*
Referer: http://www.google.com/
Accept-Language: en-us
User-Agent: Mozilla/4.0
Accept-Encoding: gzip, deflate
Host: www.google.com
Connection: Keep-Alive

However, I would like to also be able to search for something like the following;

Referer: http://Www.GooGle.com/

and get those same results;

Accept: */*
Referer: http://www.google.com/
Accept-Language: en-us
User-Agent: Mozilla/4.0
Accept-Encoding: gzip, deflate
Host: www.google.com
Connection: Keep-Alive

Any ideas? I've stepped around IGNORECASE but it doesn't seem right for what I want.

---------- Post updated at 10:57 AM ---------- Previous update was at 09:02 AM ----------

Nevermind, I ended up using ignorecase.

cat textfile.txt | awk -v IGNORECASE=1 '/'"$searchstring2"'/{print}' RS='------*'