Parse a string in XML file using shell script

ayhanne · November 15, 2007, 9:33am

Hi Matrixmadhan,

Thanks for the explanation for the scripts. I don't know why when I use the actual input file the output is not organized. What I wanted is to have one heading and then the values. when I tried the script, the headings are repeated several times and the values are under it. Please see details below. Hope you can help me organize the output.

expected output:
date time chdate chtime status calling cparty
20071009 12:45:36 20071009 12:45:43 201 644 xxxxxxx
20071010 03:09:13 20071010 03:10:07 29 644 xxxxxxx

output from scripts if the actual file is being used (more than 5MB file):
date time chdate chtime status calling cparty date time chdate chtime status calling cparty date time chdate chtime status calling cparty date time chdate chtime status calling cparty 20071009 12:45:36 20071009 12:45:43 201 644 xxxxxxx 20071010 03:09:13 20071010 03:10:07 29 644 xxxxxxx 20071010 03:09:13 20071010 03:10:07 29 644 xxxxxxx 20071009 12:45:36 20071009 12:45:43 201 644 xxxxxxx

matrixmadhan · November 15, 2007, 10:12am

Here is the updated code,

the reason why it didn't work in the first phase is that, input format was different, there were no 'line feeds' in the sample file provided.

Now it should work!

Let us know how it proceeds!

#! /opt/third-party/bin/perl

my $c = 0;

open(FILE, "<", "sample.txt");

while(<FILE>) {
  chomp;
    my @arr = split(/></);
    foreach (@arr) {
    if( /xml version/ ) {
      $c++;
      print "\n";
    }
    if( />/ && /</ ) {
      if( $c == 1 ) {
        s/(.*)>(.*)<.*$/\1|\2/;
        my($tmp1, $tmp2) = split(/\|/);
        $data .= (" " . $tmp2);
        printf "%s ", $tmp1;
      }
      else {
        s/(.*)>(.*)<.*$/\2/;
        printf "%s ", $_;
      }
    }
  }
  print "\n";
  print "$data\n" if( /xml version/i );
}

close(FILE);

exit 0

ayhanne · November 15, 2007, 10:27am

Hi Matrixmadhan,

You're so brilliant! I wish I can be like you when it comes to writing scripts (wink* wink*). Sad to say I'm not gifted One more thing, what if the values of calling or other headers are more than 3 characters? Let's say 12? How can I modify the script to arrange the values and headings allignment? So that the values and the headings are on the same column? I hope my question is clear. Thank you so much for helping me and taking time to create this script!

matrixmadhan · November 15, 2007, 1:10pm

this is quite confusing and I believe you are saying about alignment of the column header and the data in a unified way.

That should be quite easy to modify in the script, try it out, if you have any difficulty in that, please post a sample output you wish to have

ayhanne · November 19, 2007, 8:59am

Hi Matrixmadhan,

Thanks a lot! It's ok now!

ayhanne · November 19, 2007, 12:03pm

Hi Matrixmadhan,

Another question, what if the file that I will use will come from the output of a command? Instead of specifying a in (open(FILE, "<", "a");), I need to get the file a which will be the output of a command. How can I modify the script to do that? Thank you so much!

sed -n 1p /home/user/scripts/file.txt
sample.txt

matrixmadhan · November 19, 2007, 12:43pm

you could change it as,

my $command = "sed -n 10 file.txt |";
open(PIPE, "<", "$command") or die "Unable to open file\n";

ayhanne · November 25, 2007, 11:29am

Hi Matrixmadhan,

It's me again! I would like to ask how I can put into file the result of this script? I want to run this script several times and then append the result of the script in a file. Thanks in advance!

matrixmadhan · November 26, 2007, 12:15am

It's me again!

It's me again too !

there are two ways,

1) perl scriptname >> append_file

2) within the script

open(APP_FILE, ">>", $append_file) or die "Unable to open file $append_file <$!> \n";
print APP_FILE "$contents";

close(APP_FILE);

ayhanne · November 26, 2007, 11:20am

Hi Matrixmadhan,

I tried your solution but what it prints is the copy of the script and not the result of the script that you created before. What I want to do is instead of seeing the output of the script on the server, I want to append the result of the script in to a file let's say that the output of the script will save it in filename outputfile.txt. How can I include that in the script that you created before? Thanks a lot for answering all my queries!

#! /opt/third-party/bin/perl

my $c = 0;

open(FILE, "<", "sample.txt");

while(<FILE>) {
chomp;
my @arr = split(/></);
foreach (@arr) {
if( /xml version/ ) {
$c++;
print "\n";
}
if( />/ && /</ ) {
if( $c == 1 ) {
s/(.*)>(.*)<.$/\1|\2/;
my($tmp1, $tmp2) = split(/\|/);
$data .= (" " . $tmp2);
printf "%s ", $tmp1;
}
else {
s/(.*)>(.*)<.$/\2/;
printf "%s ", $_;
}
}
}
print "\n";
print "$data\n" if( /xml version/i );
}

close(FILE);

exit 0

matrixmadhan · November 26, 2007, 12:55pm

#! /opt/third-party/bin/perl

my $c = 0;

open(OUTPUT, ">>", "outputfile.txt");

open(FILE, "<", "sample.txt");

while(<FILE>) {
chomp;
my @arr = split(/></);
foreach (@arr) {
if( /xml version/ ) {
$c++;
print OUTPUT "\n";
}
if( />/ && /</ ) {
if( $c == 1 ) {
s/(.*)>(.*)<.*$/\1|\2/;
my($tmp1, $tmp2) = split(/\|/);
$data .= (" " . $tmp2);
printf OUTPUT "%s ", $tmp1;
}
else {
s/(.*)>(.*)<.*$/\2/;
printf OUTPUT "%s ", $_;
}
}
}
print OUTPUT "\n";
print OUTPUT "$data\n" if( /xml version/i );
}

close(FILE);

exit 0

open the file,

change file handle from STDOUT to the specific file handle

the above code would work.

Please do use CODE tags !

ayhanne · November 26, 2007, 1:25pm

Hi Matrixmadhan,

You're really great! Sorry I always disturb you. I really don't know how I can finish this script. It's complicated for me because I'm just starting to learn perl and I'm not really good in creating scripts. Thanks so much! Hope I can still ask for help next time.

matrixmadhan · November 27, 2007, 12:26am

Sure !

So what is the complication involved in the script ?

We would try to help you.

ayhanne · November 27, 2007, 6:14am

Hi Matrixmadhan,

My problem with the script is that I want the script to run only if there's a new CDR to process which is the CDR before CDR*.tmp. The *.tmp is where the current CDR is being processed that's why I'll just process the last completed CDR (last CDR without .tmp). The list of CDRs below are the input files on the script that you've created. Please see also the whole script below. Thanks again for all the help!

-rw-r----- 1 root transfer 5243297 Oct 27 10:39 CDR3310010.4
-rw-r----- 1 root transfer 5243090 Oct 27 10:47 CDR3310010.5
-rw-r----- 1 root transfer 5242988 Oct 27 10:54 CDR3310010.6
-rw-r----- 1 root transfer 5243269 Oct 27 11:02 CDR3310010.7
drwxrwx--- 2 root stats 24576 Oct 27 11:05 stats
-rw-r----- 1 root transfer 5243317 Oct 27 11:09 CDR3310011
-rw-r----- 1 root transfer 5242906 Oct 27 11:16 CDR3310011.1
-rw-r----- 1 root transfer 5243095 Oct 27 11:23 CDR3310011.2
-rw-r----- 1 root transfer 5243178 Oct 27 11:30 CDR3310011.3
-rw-r----- 1 root transfer 5242963 Oct 27 11:38 CDR3310011.4
-rw-r----- 1 root transfer 5243133 Oct 27 11:45 CDR3310011.5
-rw-r----- 1 root transfer 5243044 Oct 27 11:52 CDR3310011.6
-rw-r----- 1 root transfer 5243054 Oct 27 11:59 CDR3310011.7
-rw-r----- 1 root transfer 272109 Oct 27 12:00 CDR3310011.tmp

#! /usr/local/bin/perl

my $c = 0;

open(OUTPUT, ">>", "output.txt");
chomp(@inputfayl = `cat cdr.txt`);
while (<@inputfayl>) {
if (/CDR/) {
$inputfayl=$_;
}
}
open inputfayl or die "Cannot open file for read :$!";

while(<inputfayl>) {
chomp;
my @arr = split(/></);
foreach (@arr) {
if( /xml version/ ) {
$c++;
print OUTPUT "\n";
}
if( />/ && /</ ) {
if( $c == 1 ) {
s/(.*)>(.*)<.$/\1|\2/;
my($tmp1, $tmp2) = split(/\|/);
$data .= (" " . $tmp2);
printf OUTPUT "%s ", $tmp1;
}
else {
s/(.*)>(.*)<.$/\2/;
printf OUTPUT "%s ", $_;
}
}
}
print OUTPUT "\n";
print OUTPUT "$data\n" if( /xml version/i );
}

close inputfayl;

exit 0

matrixmadhan · November 27, 2007, 7:12am

first, please use the CODE tags - they are provided to look into the code much easily

I just understood your requirement partially, help me in understanding the remaining

I assume the list of files that is provided is the output of

cat cdr.txt

chomp(@inputfayl = `cat cdr.txt`);
while (<@inputfayl>) {
if (/CDR/) {
$inputfayl=$_;
}
}

From the above only the file in the last line of the file is selected

Now what is the file that needs to be processed and the file that shouldn't be processed

ayhanne · November 27, 2007, 8:37am

Hi Matrixmadhan,

Sorry I don't know how to use the code tags. I'm just new here. I have another script tail.sh wherein I get the result of the cdr.txt. It gets the last CDR processed before CDR*.tmp which is the current CDR being processed. I don't know if I did the right thing to create a seperate script which is tail.sh to get the last complete CDR. CDR*.tmp should not be processed by the script since it's not yet completed. Hope you understand my explanation. Everytime there's a new complete CDR, the script you created should process it. Thanks again!

cat tail.sh
ls -ltr | grep CDR | tail -2 | nawk '{print $9}' > cdrfile.txt
sed -n 1p cdrfile.txt > cdr.txt

if I run tail.sh, the output of cdr.txt:
CDR3310011.7 ##### this is the CDR that the script should process, the last completed CDR. Then if there's a new CDR completed, the script should process it again and append to a file.

cdrs:
-rw-r----- 1 root transfer 5243297 Oct 27 10:39 CDR3310010.4
-rw-r----- 1 root transfer 5243090 Oct 27 10:47 CDR3310010.5
-rw-r----- 1 root transfer 5242988 Oct 27 10:54 CDR3310010.6
-rw-r----- 1 root transfer 5243269 Oct 27 11:02 CDR3310010.7
drwxrwx--- 2 root stats 24576 Oct 27 11:05 stats
-rw-r----- 1 root transfer 5243317 Oct 27 11:09 CDR3310011
-rw-r----- 1 root transfer 5242906 Oct 27 11:16 CDR3310011.1
-rw-r----- 1 root transfer 5243095 Oct 27 11:23 CDR3310011.2
-rw-r----- 1 root transfer 5243178 Oct 27 11:30 CDR3310011.3
-rw-r----- 1 root transfer 5242963 Oct 27 11:38 CDR3310011.4
-rw-r----- 1 root transfer 5243133 Oct 27 11:45 CDR3310011.5
-rw-r----- 1 root transfer 5243044 Oct 27 11:52 CDR3310011.6
-rw-r----- 1 root transfer 5243054 Oct 27 11:59 CDR3310011.7
-rw-r----- 1 root transfer 272109 Oct 27 12:00 CDR3310011.tmp

matrixmadhan · November 28, 2007, 1:24pm

You can find the icon to enable code tags on panel above the message block

From the above, you are interested in last but one file, the same could be achieved by the following

ls -lrt | awk '/CDR/ { before = curr; curr = $9 }END{ print before }'

so, with the above you could make sure, whether current file that is available needs to be processed or not.

With tail.sh or the command given above check whether the retrieved filename contains the term "tmp", if so that is not the file to be processed, if not pass that filename to the script for processing.

Hope this makes it clear !

ayhanne · November 29, 2007, 9:56am

Hi Matrixmadhan,

I've tried to use the command that you provided since it's better to have just on script that to call on the result of tail.sh but I get the following errors below. I tried to change some part of the syntax on the unix command that you recommended but still the same, when I'm just using cat it works. Also on how the script can determine if there's a new complete CDR? Thanks for being so patient with me. I really don't know much about creating script, sorry! Thanks a lot!

root :ALCP2  # ./cdrs.pl
awk: syntax error near line 1
awk: illegal statement near line 1
Cannot open file for read :No such file or directory at ./cdrs.pl line 13.

line 13 is:

open inputfayl or die "Cannot open file for read :$!";

script:

open(OUTPUT, ">>", "output.txt");
chomp(@inputfayl = `ls -lrt | awk '/SMSCDR/ { before = curr; curr = $9 }END{ print before }'`); 
while (<@inputfayl>) {
if (/CDR/) {
$inputfayl=$_;
}
}
open inputfayl or die "Cannot open file for read :$!";

while(<inputfayl>) {
chomp;
my @arr = split(/></);
foreach (@arr) {
if( /xml version/ ) {
$c++;
print OUTPUT "\n";
}
if( />/ && /</ ) {
if( $c == 1 ) {
s/(.*)>(.*)<.*$/\1|\2/;
my($tmp1, $tmp2) = split(/\|/);
$data .= (" " . $tmp2);
printf OUTPUT "%s ", $tmp1;
}
else {
s/(.*)>(.*)<.*$/\2/;
printf OUTPUT "%s ", $_;
}
}
}
print OUTPUT "\n";
print OUTPUT "$data\n" if( /xml version/i );
}

close inputfayl;


exit 0

ayhanne · November 29, 2007, 11:07am

Hi Matrixmadhan,

The command that you gave works now, I just put \ in the $9. So I will not use the tail.sh script that I created before. What I need to do now is to run the script when there's a new completed CDR which is the last CDR before the CDR*.tmp. For the example below:

the current CDR which is being dumped by the system is CDR3310011.tmp. Then after few minutes (no specific time) the CDR3310011.tmp CDR will be the completed CDR which will be CDR3310011.8 and another *.tmp will be created for the current CDR. The script that you created should process CDR3310011.8 and everytime there's a new completed CDR. Thanks a lot! Hope you can help me with it. I'm also trying to check how I can do that.

-rw-r----- 1 root transfer 5243269 Oct 27 11:02 CDR3310010.7
drwxrwx--- 2 root stats 24576 Oct 27 11:05 stats
-rw-r----- 1 root transfer 5243317 Oct 27 11:09 CDR3310011
-rw-r----- 1 root transfer 5242906 Oct 27 11:16 CDR3310011.1
-rw-r----- 1 root transfer 5243095 Oct 27 11:23 CDR3310011.2
-rw-r----- 1 root transfer 5243178 Oct 27 11:30 CDR3310011.3
-rw-r----- 1 root transfer 5242963 Oct 27 11:38 CDR3310011.4
-rw-r----- 1 root transfer 5243133 Oct 27 11:45 CDR3310011.5
-rw-r----- 1 root transfer 5243044 Oct 27 11:52 CDR3310011.6
-rw-r----- 1 root transfer 5243054 Oct 27 11:59 CDR3310011.7
-rw-r----- 1 root transfer 272109 Oct 27 12:00 CDR3310011.tmp

matrixmadhan · November 29, 2007, 1:05pm

So, the design goes like this, the script should process the last but one file which is not a tmp file ( that one that ends with tmp )

so once that is being processed, another tmp file would be created and the script should poll for it and once done it should start processing that

Please correct me if am wrong !

something like might help but it is made such a way to run ever

open(OUTPUT, ">>", "output.txt");

while (1) {

#Make it as ever running process

$file = `ls -lrt | awk '/SMSCDR/ { before = curr; curr = $9 }END{ print before }'`; 

#select the file which is last but one

next if ( $file =~ /tmp/ );

#if its the 'tmp' file dumper process is still active so don't process it else continue processing

open (FILE, "<", $file) or die  "Unable to open file $file <$!> \n";

#open the file and start processing

while(<FILE>) {
chomp;
my @arr = split(/></);
foreach (@arr) {
if( /xml version/ ) {
$c++;
print OUTPUT "\n";
}
if( />/ && /</ ) {
if( $c == 1 ) {
s/(.*)>(.*)<.*$/1|2/;
my($tmp1, $tmp2) = split(/|/);
$data .= (" " . $tmp2);
printf OUTPUT "%s ", $tmp1;
}
else {
s/(.*)>(.*)<.*$/2/;
printf OUTPUT "%s ", $_;
}
}
}
print OUTPUT "\n";
print OUTPUT "$data\n" if( /xml version/i );
}

close (FILE);

#continue with while loop, by this time dumper process might have completed and next file ready for processing
}

close(OUTPUT);

exit 0;