Using AWK BEGIN to extract file header info into variables

Hi Folks,

I've searched for this for quite a while, but can't find any solution - hope someone can help.

I have various files with standard headers. eg.

<HEADER>
IP: 1.2.3.4
Username: Joe
Time: 12:00:00
Date: 23/05/2010
</HEADER>

This
is
a
test
and this part can be any size
<END>

Now, I want to process and transpose this into:

IP=1.2.3.4 User=Joe Time=12:00:00 Line=This
IP=1.2.3.4 User=Joe Time=12:00:00 Line=is
IP=1.2.3.4 User=Joe Time=12:00:00 Line=a
IP=1.2.3.4 User=Joe Time=12:00:00 Line=test
IP=1.2.3.4 User=Joe Time=12:00:00 Line=and this part can be any size
 

I thought AWK would be the way to go - because of the begin statement. However, I can't find any info on whether it can:

  • Read variables in from the file in the BEGIN section
  • Read variables using, say, Regex or fixed-position (eg. $my_ip=someregexfunction("IP\: (\d+.\d+.\d+.\d+)")

Can anyone advise if this is possible? Or if not, is there another tool I can try?

Thanks!

Damian

Here's a Perl solution -

$ 
$ 
$ cat f2
<HEADER>
IP: 1.2.3.4
Username: Joe
Time: 12:00:00
Date: 23/05/2010
</HEADER>

This
is
a
test
and this part can be any size
<END>
$ 
$ 
$ perl -ne 'chomp;
  if (/<HEADER>/) {$in=1}
  elsif ($in and /^IP: (.*)$/) {$ip=$1}
  elsif ($in and /^Username: (.*)$/) {$user=$1}
  elsif ($in and /^Time: (.*)$/) {$time=$1}
  elsif (/<\/HEADER>/) {$in=0}
  elsif (!$in and !/^\s*$/ and !/<END>/) {print "IP=$ip User=$user Time=$time Line=$_\n"}' f2
IP=1.2.3.4 User=Joe Time=12:00:00 Line=This
IP=1.2.3.4 User=Joe Time=12:00:00 Line=is
IP=1.2.3.4 User=Joe Time=12:00:00 Line=a
IP=1.2.3.4 User=Joe Time=12:00:00 Line=test
IP=1.2.3.4 User=Joe Time=12:00:00 Line=and this part can be any size
$ 
$ 

tyler_durden

Thanks! This is on a locked-down system, so I don't know if perl is available, or what flavour. I considered awk since I know they have it.

I actually got it started using something like:

BEGIN
{
 getline line1
 sub("IP: ","",line1)
 getline line2
 sub(......
 }

{ printf ("ip=%s Username=%s %s",line1,line2, $0) }

BUT - the output seems corrupted, with the leading variables sometimes not there, and sometimes overlaid on top of the characters. Really wierd. I wonder if such use of variables and $0 isn't supported...

This is on CENTOS right now, but I have to do it various commercial *nix.

Damian

I'm pretty sure it can be done with awk, though I don't know how :rolleyes:
This is my indeed "amateur" approach with standard tools, maybe it can help you with translating it to awk?

$ string=$(grep -E '^IP|^User|^Time' infile | sed 's/Username/User/;s/: /=/' | tr "\n" " ")
$ sed -n '/<\/HEADER>/,/<END>/p' infile | sed 's/<\/HEADER>//;s/<END>//;/^$/d;s/^/Line=/' | \
> while read line; do echo "$string$line"; done
IP=1.2.3.4 User=Joe Time=12:00:00 Line=This
IP=1.2.3.4 User=Joe Time=12:00:00 Line=is
IP=1.2.3.4 User=Joe Time=12:00:00 Line=a
IP=1.2.3.4 User=Joe Time=12:00:00 Line=test
IP=1.2.3.4 User=Joe Time=12:00:00 Line=and this part can be any size
$ 

Well, you could apply the logic to awk in that case -

$ 
$ 
$ cat f2
<HEADER>
IP: 1.2.3.4
Username: Joe
Time: 12:00:00
Date: 23/05/2010
</HEADER>

This
is
a
test
and this part can be any size
<END>
$ 
$ 
$ awk '{
  if (/<HEADER>/) {x=1}
  else if (x==1 && /^IP/) {sub("IP: ","",$0); ip=$0}
  else if (x==1 && /^Username/) {sub("Username: ","",$0); user=$0}
  else if (x==1 && /^Time/) {sub("Time: ","",$0); time=$0}
  else if (/<\/HEADER>/) {x=0}
  else if (x==0 && !/<END>/ && !/^ *$/) {print "IP="ip" User="user" Time="time" Line="$0}
}' f2
IP=1.2.3.4 User=Joe Time=12:00:00 Line=This
IP=1.2.3.4 User=Joe Time=12:00:00 Line=is
IP=1.2.3.4 User=Joe Time=12:00:00 Line=a
IP=1.2.3.4 User=Joe Time=12:00:00 Line=test
IP=1.2.3.4 User=Joe Time=12:00:00 Line=and this part can be any size
$ 
$ 

I know that looks kludgy and I do hope the excellent awk scripters on this forum would come up with a more polished and elegant script.

tyler_durden

Awk:

awk -F ": " '/<HEADER>/,/<\/HEADER>/{
if($0 ~ /IP/){ip=$2}
if ($0 ~ /Username/){use=$2}
if ($0 ~ /Time/){time=$2};next
}
NF =1 && !/END/ && !/^$/{print "IP=",ip,"User=",use,"Time=",time,$0}' filename

cheers,
Devaraj Takhellambam

A belated sincere thanks for these guys; I was on 'radio silence' for a couple of weeks, but I'll give these a go.

Damian