AWK to Parse XML messages

Hello Guys,

Please help with AWK problem. I have XML file which contains a list of messages for subjects.

Example of the messages:

[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�English� Status=�P�>
[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�Science� Status=�F�>
[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�Science� Status=�F�>
[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�Science� Status=�NA�>
[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�English� Status=�P�>
[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�English� Status=�F�>
[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�Maths� Status=�P�>
[date+time], message=[DATA= �<?xml version=�1.0?�><data changeMsg><Subject=�Science� Status=�P�>

I want to use AWK to parse these xml messages but I am new to awk and to programming.

What I want is to get output of these messages to look like this.

Output

Subject|Status|Count
  English|P|2
  Science|F|2
  Science|NA|1
  English|F|1
  Maths|P|1
  Science|P|1

Please Help.

Thanks all for any help.

Is that what the data really looks like, or have you prettied it up for posting? That makes a big difference to awk.

No this is how the data really looks like

"smart quotes" and all? It looks like it's been through MS Word...

Of course it has been through MS Word, this is only a sample of the messages and secondly i have removed the date and the time.
So this have been edited on MS Word

Please post a representative sample of your data. Include timestamps. Don't mangle it in Word.

Never use Word for data. Imagine typing up a shell script in Word, and having all your quotes turned into "smart quotes" and all your backticks being forced into grammatical correctness -- these things do matter. I was able to instantly tell what you'd done by how it'd been scrambled, but can only guess what it looked like before.

Computers are fussy. Anything we write to fit this sample won't work for you due to differences in the number of fields and handling of "smart" quotes.

I can see what MS Word did with double quotes, I have changes them now and added the date and time stamps.

[08-11-2011 13:40:12], message=[DATA= "<?xml version="1.0?"><data changeMsg><Subject="English" Status="P">
[08-11-2011 13:40:12], message=[DATA= "<?xml version="1.0?"><data changeMsg><Subject="Science" Status="F">
[08-11-2011 13:40:12], message=[DATA= "<?xml version="1.0?"><data changeMsg><Subject="Science" Status="F">
[08-11-2011 13:40:12], message=[DATA= "<?xml version="1.0?"><data changeMsg><Subject="Science" Status="NA">
[08-11-2011 13:40:12], message=[DATA= "<?xml version=�1.0?"><data changeMsg><Subject="English" Status="P">
[08-11-2011 13:40:12], message=[DATA= "<?xml version=�1.0?�><data changeMsg><Subject="English" Status="F">
[08-11-2011 13:40:12], message=[DATA= "<?xml version=�1.0?�><data changeMsg><Subject="Maths"   Status="P">
[08-11-2011 13:40:12], message=[DATA= "<?xml version=�1.0?"><data changeMsg><Subject="Science" Status="P">

I have hundreds of lines of messages which I will not be able to post but this is how all the messages look like.

Is there a way to get the output that I am after, is it possible?:wall::wall:

nawk -F'"' '{a[$(NF-3) OFS $(NF-1)]++}END{for(i in a) print i,a}' OFS='|' myXMLfile