Hi! I'm just new here and don't know much about shell scripting. I just want to ask for help in creating a shell script that will parse a string or value of the status in the xml file. Please sample xml file below. Can you please help me create a simple script to get the value of status? Also it would be better if I can get the values of each parameter from the xml file. I really need it asap. Hope someone can help me. Thanks!
Thanks for the reply. I tried sed but I got error sed: command garbled: s/\(<status>\)\(.*\)\(</status>\)/\2/. I just need to get the values of all the status from the xml file like <status>201</status> because it generates different values. The xml file is huge,filesize is approx 5Mb. Please see sample portion of xml file below which is just repeated with different values. I'm really having problem getting the values of status because it's a huge file and the format is not organized sometimes in a line you can have several occurance of status. I can't change the format since it's a cdr. It would be better if I will also get the values of other parameters like appid, threadid, date, chdate, etc. I do hope you were able to understand me. Thanks again!
Hi, thanks for taking time to reply on my post. I've tried your suggestion but I get this error. Is there other way? Also if you have time can you please explain the command? Thanks a lot! Sorry, I really don't know shell scripting. :)Have a nice day!
Thanks! I tried the solution that you've mentioned but I got the folowing error below. Maybe I can't use it. Do you know any other solution? Thanks again for the help!
Yes, it does not exist. Do you know other way how I can do it without using xlstproc because I'm not familiar with it. I really need to know how I can parse string from XML file which is around 7MB in size. I need to get values between <date> </date>, <time> </time> and <status> </status>. Thanks in advance!
>cat a
<?xml version="1.0"?><message><cdr version="1.0"><appid>testbed</appid><threadid>6</threadid><origin>node1</origin><date>20071009</date><time>12:45:36</time><chdate>20071009</chdate><chtime>12:45:43</chtime><status>201</status><type>103</type><calling>644</calling><cparty>xxxxxxx</cparty><accnum>xxxxxx</accnum><debirate1>0.0</debirate1><cos>-1</cos><strtbal>0.0</strtbal><freesms>0</freesms><tuc>0</tuc><fandftype></fandftype></cdr></message>
It's working! You're really great! I've been looking for scripts for a long time on how I can do it and it's working with the solution you've provided. Thanks so much! If it's not too much, can you please help me on how I modify the script that you provided to have the output like the one below? Thanks in advance!
expected output:
date time chdate chtime status calling cparty
20071009 12:45:36 20071009 12:45:43 201 644 xxxxxxx
20071010 03:09:13 20071010 03:10:07 29 644 xxxxxxx
Thanks for taking time to help me with my problem. I tried the solution that you've provided but the result is different. Can we just have one heading like the expected output below? Also if you can explain what the script does. Thanks a lot! I really appreciate all your help!
expected output:
date time chdate chtime status calling cparty
20071009 12:45:36 20071009 12:45:43 201 644 xxxxxxx
20071010 03:09:13 20071010 03:10:07 29 644 xxxxxxx
output of the script you've provided:
date time chdate chtime status calling cparty date time chdate chtime status calling cparty 20071009 12:45:36 20071009 12:45:43 201 644 xxxxxxx 20071010 03:09:13 20071010 03:10:07 29 644 xxxxxxx
For the sample that you provided it's working. But I use the actual input which is more than 5Mb of file. When I run using the script, it's output is different like what I mentioned in my previous post. I have attached a portion of the file since it's more than 5Mb I can't send it. Thanks again!
Before explaining the script, it was written on the run - so its definitely not the optimized one
open(FILE, "<", "a");
open the file - as simple as the code explains
while(<FILE>) {
chomp;
my @arr = split(/></);
based on the delimiter '><' split the input record and populate in the array '@arr'
foreach (@arr) {
if( />/ && /</ ) {
iterate through the array and make sure processing proceeds only when both '>' and '<' are available. Because we are interested only in that data really
if its the first line, only then header has to be printed and not for consequent xml records. Block the input data by 'grouping' and mark the block as '\1' and '\2'
append the header and data individually to a variable