Tons
November 21, 2012, 3:30am
1
Hi All,
I have a problem to resolve. For following XML file, I need to parse the values based on Tag Name. I would prefer to use this by awk. I have used sed command to replace the tags (s/<SeqNo>//).
In this case there can be new tags introduced. So need to parse it based on Tag Name. Any awk command suggestions?
<Target>
<SeqNo>43156489079</SeqNo>
<AuthenticationToken><![CDATA[nY+sHZ2PrBmdj6wVnY]]></AuthenticationToken>
<redcode>SKNEQGGEVHW</redcode>
<GenError>Upload-Success</GenError>
</Target>
<Target>
<SeqNo>43156489079</SeqNo>
<AuthenticationToken><![CDATA[nY+sHZ2PrBmdj6wVnY]]></AuthenticationToken>
<redcode>SKNEQGGEVHW</redcode>
<GenError>Upload-Success</GenError>
</Target>
What's the expected output?
And please wrap your code and data samples with code tags to preserve formatting.
something like this.. ?
$ nawk -F"[<>]" -v pat="SeqNo" '$0~pat{print $3}' a.txt
43156489079
43156489079
$ nawk -F"[<>]" -v pat="redcode" '$0~pat{print $3}' a.txt
SKNEQGGEVHW
SKNEQGGEVHW
$ nawk -F"[<>]" -v pat="AuthenticationToken" '$0~pat{print $4}' a.txt
![CDATA[nY+sHZ2PrBmdj6wVnY]]
![CDATA[nY+sHZ2PrBmdj6wVnY]]
Jotne
November 21, 2012, 3:41am
4
Not quite sure how you like your output, like this?
awk -F"[<>]" '{print $5,$9,$13}' RS="</Target>\n" file
43156489079 SKNEQGGEVHW Upload-Success
43156489079 SKNEQGGEVHW Upload-Success
jotne:
Not quite sure how you like your output, like this?
awk -F"[<>]" '{print $5,$9,$13}' RS="</Target>\n" file
43156489079 SKNEQGGEVHW Upload-Success
43156489079 SKNEQGGEVHW Upload-Success
A regexp RS will not work with all awk implementations.
Tons
November 21, 2012, 3:01pm
6
Hi I want parse this file and write into delimited file format
Source file:
<Target>
<SeqNo>43156489079</SeqNo>
<AuthenticationToken><![CDATA[nY+sHZ2PrBmdj6wVnY]]></AuthenticationToken>
<RedCode>SKNEQGGEVHW</RedCode>
<IncentiveGenError>Upload-Success</IncentiveGenError>
</Target>
<Target>
<SeqNo>43156489070</SeqNo>
<AuthenticationToken><![CDATA[nY+sHZ2PrBmdj6wVnY]]></AuthenticationToken>
<RedCode>SKNEQGGEVHW</RedCode>
<IncentiveGenError>Upload-Success</IncentiveGenError>
</Target>
Answer:
43156489079 SKNEQGGEVHW Upload-Success
43156489079 SKNEQGGEVHW Upload-Success
The tags can be changed in the order or new Tags can be introduced. So I want to parse this based on the Tag name.
---------- Post updated at 03:01 PM ---------- Previous update was at 01:12 PM ----------
Thanks for your input.. I used following script:
nawk 'BEGIN{FS="[<|>]"}
/<SeqNo>/{SeqNo=$3}
/<RedCode>/{Redcd=$3}
{printf(" %s,%s\n",SeqNo,Redcd)}' newack.xml
Only problem I found is its duplicating the results.. Any idea why?
Thanks,
Tons
birei
November 21, 2012, 4:15pm
7
Not awk. But here you have one solution using XML::Twig parser in perl:
$ cat xmlfile
<root>
<Target>
<SeqNo>43156489079</SeqNo>
<AuthenticationToken><![CDATA[nY+sHZ2PrBmdj6wVnY]]></AuthenticationToken>
<RedCode>SKNEQGGEVHW</RedCode>
<IncentiveGenError>Upload-Success</IncentiveGenError>
</Target>
<Target>
<SeqNo>43156489070</SeqNo>
<AuthenticationToken><![CDATA[nY+sHZ2PrBmdj6wVnY]]></AuthenticationToken>
<RedCode>SKNEQGGEVHW</RedCode>
<IncentiveGenError>Upload-Success</IncentiveGenError>
</Target>
</root>
$ cat script.pl
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
{
my $twig = XML::Twig->new(
twig_handlers => {
'Target' => sub {
printf qq|%s\n|,
join q| |,
map { $_->trimmed_text }
grep { ! $_->is_cdata && $_->is_text }
$_->descendants
}
},
)->parsefile( shift );
}
$ perl-5.14.2 script.pl xmlfile
43156489079 SKNEQGGEVHW Upload-Success
43156489070 SKNEQGGEVHW Upload-Success
Parsing XML is not trivial.
Because of frequent requests for xml to flatfile conversion, I've got a script that works in some common situations however.
$ cat xmlh.awk
BEGIN { RS="<"; FS=">";
# Uncomment to make windows-readable text files
# ORS="\r\n";
# Change this to alter how many close-tags in a row are needed
# before a row of data is printed.
if(!DEP) DEP=1
SEP="\t"
}
# Skip weird XML specification lines or blank records
/^\?/ || /^$/ { next }
# Handle close tags
/^[/]/ {
N=D; while((N>0) && ("/"STACK[N] != $1)) N--;
if("/"STACK[N] == $1) D=(N-1);
POP++;
if(POP == DEP)
{
if(!HEADER++)
{
split(ARG[1], Z, SUBSEP);
printf("%s %s", Z[2], Z[3]);
for(N=2; N<=ARG_; N++)
{
split(ARG[N], Z, SUBSEP);
printf("%s%s %s", SEP, Z[2], Z[3]);
}
printf("\n");
}
printf("%s", DATA[ARG[1]]);
for(N=2; N<=ARG_; N++)
printf("%s%s", SEP, DATA[ARG[N]]);
printf("\n");
}
next
}
# Handle open tags
{
gsub(/^[ \r\n\t]*/, "", $2); # Whitespace isn't data
gsub(/[ \r\n\t]*$/, "", $2);
sub(/\/$/, "", $(NF-1));
# Reset parameters
POP=0;
M=split($1, A, " ");
STACK[++D]=A[1];
if((!MAX) || (D>MAX)) MAX=D; # Save max depth
# Handle parameters
Q=split(A[2], B, " ");
for(N=1; N<=Q; N++)
{
split(B[N], C, "=");
gsub(/['"]/,"", C[2]);
I=D SUBSEP STACK[D] SUBSEP C[1];
if(!SEEN++)
ARG[++ARG_]=I;
DATA=C[2];
}
if($2)
{
I=D SUBSEP STACK[D] SUBSEP "CDATA";
if(!SEEN++)
ARG[++ARG_]=I;
DATA=$2;
}
}
$ awk -f xmlh.awk DEP=2 data3.xml
SeqNo CDATA redcode CDATA GenError CDATA
43156489079 SKNEQGGEVHW Upload-Success
43156489079 SKNEQGGEVHW Upload-Success
$
Output is tab-separated. DEP is how many close-tags in a row it looks for before printing a row of data.
Tons
November 21, 2012, 4:25pm
9
Thanks ! I am looking for something by awk
I think we crossposted. Does my solution above work for you? It's a generic xml-to-flatfile converter in awk which groups columns by itself.
It has some limitations. Spaces inside tag values are a problem. But it works for the data you gave as shown above.