Capture string contained on a line?

mrm5102 · June 28, 2013, 11:42am

Hello All,

I'm working on a script that runs the wget command on a list of IP Address in order to capture the data at that address' index.html.
That part works fine to get the HTML code at that address but the data I'm trying to pull out is on a line containing a BUNCH of
code for an HTML Table.

The line with the Table code contains about 2000 characters, give or take a few, and I'm trying to pull out the Serial Number I find
in that table. Each of the Serial Numbers is located/formatted like so:

<TABLE BORDER="0"........more table code.......more...more...more....<B> Serial Number</B></TD><td width=20></TD><TD><B>FCH*********</B>....more...code.....

So if you look at the end of the line with the TABLE code you'll see a string beginning with "FCH". Basically all the Serial Numbers
begin with "FCH" so I'm trying to capture starting from "FCH" until the end of the string or I guess something like:
"From 'FCH' till the first occurrence of '<' or '</B>' ".

I figure this is possible using grep, sed, and awk, but I wasn't sure which one would be best to use in this situation?

Could someone tell me what would be the best way to capture a string starting from one substring (FCH) till another substring (</B>).

Any thoughs or suggestions would be greatly appreciated!

Thanks in Advance,
Matt

Corona688 · June 28, 2013, 11:45am

awk has the useful property of using whatever you want as the "line feed", not necessarily \n.

awk -v RS="<" -F">" '/FCH/ { print $2 }' filename

mrm5102 · June 28, 2013, 12:14pm

Hey Corona688, thanks for the quick reply!

Outstanding...!! Works like a charm.

So how exactly is that doing it? I know the 'RS' is the record separator and '-F' is for the Field Separator...

Is it basically saying, Split each record on "<" then split each Field on ">", then find "FCH" and print the second field?

Thanks Again,
Matt

Corona688 · June 28, 2013, 12:18pm

That's exactly what it's doing, yes. For every 'line', check if FCH is in it, and if so, print.

Given this text:

<html><body><h1>HI!</h1></body></html>

it will interpret it like this:

html
body
h1      HI!
/h1
/body
/html

Not a bad way to make a rough-cut XML/HTML/whatever parser, though far from compliant, and some awks have an annoying 2000-byte "line" size limit.

mrm5102 · June 28, 2013, 1:18pm

Hey Corona, ok cool makes sense...

Yea this is just a temporary script to use wget on some Cisco IP Phones, given a list of IP Addresses to get their Serial Numbers.

Already finished up the script and it worked like a charm!! Thanks again!

Just in case anyone is interested here's the script:
*Basic idea is to collect Serial Numbers from some Cisco IP Phones, when you have the list of the Phone's IP Addresses...

#!/bin/bash


# Pass the File containing the IP Addresses to this script as the only CLI Arg:
IP_AddressFile="$1"

# Set the IFS to the newline to capture each IP:
OLD_IFS=$IFS
IFS='
'

# Read in the IP Address (*one line at a time) and save them to this Array:
addresses=( $(cat $IP_AddressFile) )

# Increment/Counter Var:
#x=0

# Loop through Array of Addresses and run the wget command on each one:
for (( x=0; x<${#addresses[@]}; x++))
 do
    echo "${addresses[$x]}"

    # Run wget and print output to stdout and ignore the stderr from wget:
    tmp_SN=$(wget -O - ${addresses[$x]} 2>/dev/null | grep -i 'Serial Number' | awk -v RS="<" -F ">" '/FCH/ {print $2}')

    echo -ne " $x. ${tmp_SN}'" | awk {'print $2'}
    echo -ne "\n"
done

IFS=$OLD_IFS

Thanks again Corona for the help and explanation!

Thanks,
Matt