want to skip a line in XML file using awk

HI All,
I am trying to split a xml using awk. now the issue is i want to skip three lines from the xml file. first two and last one based on pattern. plz some one help. i am new to awk and struggling :wall:

<?xml version="1.0"?>
<notification>
.....
.....
.....
.....
.....
</notification>
awk '!/^(<\?xml version="1\.0"\?>|<\/?notification>)$/' file

removed the code due to less space...

What's your OS? exilir's solution works fine for me on AIX using your example input.

Without making a suggestion to change the program (which I feel you should ;)), try:

#!/bin/awk -f
BEGIN {
   AlarmNbeg  = "alarmNew";
   AlarmClbeg = "alarmCleared";
   AlarmChbeg = "alarmChanged";
   AckStatbeg = "ackStateChanged";
     TotBlocks = 0;
}
!/^(<\?xml version="1\.0"\?>|<\/?notification>)$/{
   if((length($0) > 1)&&(NR > 2)) {

      if((substr($0, 2, length(AlarmNbeg)) ==  AlarmNbeg)|| (substr($0, 2, length(AlarmClbeg)) ==  AlarmClbeg) || (substr($0, 2, length(AlarmChbeg)) ==  Alar
mChbeg) || (substr($0, 2, length(AckStatbeg)) ==  AckStatbeg)) {
         printf "<?xml version="1.0"?>\n";
          printf $0;
          printf "\n";
      }

else {
       printf $0;
      printf "\n";
     }
   }
}
END {
}

Romed the cade due to space constrain...

And what does your awk script look like after incorporating the changes I suggested?

1 Like

removed the old code....

Try this

#!/bin/awk -f

BEGIN {

   AlarmNbeg  = "alarmNew";
   AlarmClbeg = "alarmCleared";
   AlarmChbeg = "alarmChanged";
   AckStatbeg = "ackStateChanged";
   TotBlocks = 0;
}

!/^(<\?xml version="1\.0"\?>|<\/?notification>)$/ {

   if((length($0) > 1)&&(NR > 2)) {
 
      if((substr($0, 2, length(AlarmNbeg)) ==  AlarmNbeg)|| (substr($0, 2, length(AlarmClbeg)) ==  AlarmClbeg) || (substr($0, 2, length(AlarmChbeg)) ==  AlarmChbeg) || (substr($0, 2, length(AckStatbeg)) ==  AckStatbeg)) {
         printf "<?xml version="1.0"?>\n";
          printf $0;
          printf "\n";
      }

else {
       printf $0;
      printf "\n";
     }
   }
}

END {

}
1 Like

i guess the above two codes are same.... and i am getting wrong output of all displaying twice. u can see the out put in page 1 i posted...:wall: this is breaking my head. plz help.

The change suggested by delugeag will do the job. I didn't pay much attention to it as I had, in my code, put that on the same line. But, you didn't.

Don't fret and pay attention to the highlighted part. The opening brace on the same line as the pattern.

1 Like

no it's not the same, there is a bold { in the line with the new test.
You have to know this :

#!/bin/awk -f
/test/
{ 
code
}

it's not the same than :

#!/bin/awk -f
/test/ { 
code
}

the first case print line when /test/ is ok and execute the code for EVERY line
the second case execute code IF /test/ is ok

2 Likes

A little more explanation:

When you put a pattern on a line by itself without the opening brace for the corresponding action, awk assumes that that's the end of your pattern-action block. So, if the pattern occurs in the input record, that record is printed (the default action). If not, the record is not printed. But, the following action (which was supposed to be for this pattern) gets executed unconditionally. Hence, the double records.

1 Like

Ohh god even a "{" making this much difference in awk. u guys are genius then. i got the out put with slight issue. i kept three xml files in a folder and run the command as " awk -f script.awk /folderx/*.xml" . below is the output i got. you can see the second and third xml files having "<?xml version="1.0"?>
<notification>" not removed. but </notification> tag got removed in all three files :). how to kick those two lines.....

<?xml version=1?>
<alarmNew systemDN="GGSN-Test1">
<eventTime>2006-04-18T16:04:35</eventTime>
<specificProblem>30425</specificProblem>
<alarmText>DATABASE NOT AVAILABLE</alarmText>
<perceivedSeverity>critical</perceivedSeverity>
<additionalText1>      Restart Database engine</additionalText1>
<eventType>processingError</eventType>
<alarmId>12428</alarmId>
</alarmNew>

<?xml version="1.0"?>
<notification>
<?xml version=1?>
<alarmNew systemDN="GGSN-testmy11111111">
<eventTime>2006-04-18T16:04:35</eventTime>
<specificProblem>30425</specificProblem>
<alarmText>DATABASE NOT AVAILABLE</alarmText>
<perceivedSeverity>critical</perceivedSeverity>
<additionalText1>      Restart Database engine</additionalText1>
<eventType>processingError</eventType>
<alarmId>12428</alarmId>
</alarmNew>

<?xml version="1.0"?>
<notification>
<?xml version=1?>
<alarmNew systemDN="GGSN-testmy2222222">
<eventTime>2006-04-18T16:04:35</eventTime>
<specificProblem>30425</specificProblem>
<alarmText>DATABASE NOT AVAILABLE</alarmText>
<perceivedSeverity>critical</perceivedSeverity>
<additionalText1>      Restart Database engine</additionalText1>
<eventType>processingError</eventType>
<alarmId>12428</alarmId>
</alarmNew>

---------- Post updated at 07:40 PM ---------- Previous update was at 06:42 PM ----------

hi people any of you there, i am waiting for yours reply.

---------- Post updated at 07:44 PM ---------- Previous update was at 07:40 PM ----------

Also i my code i used

printf "<?xml version="1.0"?>\n";

to print

in the xml.
but what i am getting is

<?xml version=1?>

some one plz help me for this also.

HI People i used your script and i am getting the first file without any issue. but second xml file i am getting first two lines tag. you can see the abouve our pt i pasted.... thee you can see <xml version 1.0> tag and <notification> tag occuring in second and third file...

I have multiple xml files. each xml file has multiple blocks. Now what i have to do is split all the blocks in each xml file and append it to a main xml file. while spliting each xml file i have to eliminate three lines. sadly i was unable to do that. Any awk expert here plz help me:wall: .
Below is two sample xml files.

file1.xml
------------
<?xml version="1.0"?>
<notification>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>

<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12222</alarmId>
</alarmNew>
</notification>

file2.xml
-----------
<?xml version="1.0"?>
<notification>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>
</notification>

Below is the code i am using which i got from experts here:

#!/bin/awk -f
BEGIN {
   AlarmNbeg  = "alarmNew";
   AlarmClbeg = "alarmCleared";
   AlarmChbeg = "alarmChanged";
   AckStatbeg = "ackStateChanged";
   TotBlocks = 0;
}
!/^(<\?xml version="1\.0"\?>|<\/?notification>)$/ {
   if((length($0) > 1)) {

      if((substr($0, 2, length(AlarmNbeg)) ==  AlarmNbeg)|| (substr($0, 2, length(AlarmClbeg)) ==  AlarmClbeg) || (substr($0, 2, length(AlarmChbeg)) ==  Alar
mChbeg) || (substr($0, 2, length(AckStatbeg)) ==  AckStatbeg)) {
         print "<?xml version=\"1.0\"?>\n";
          printf $0;
          printf "\n";
      }
else {
       printf $0;
      printf "\n";
     }
   }
}
END {
}

The output i am getting is:

o/p coming:

<?xml version="1.0"?>
<notification>
<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>

<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12222</alarmId>
</alarmNew>

<?xml version="1.0"?>
<notification>
<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>

Expected output is:

<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>

<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12222</alarmId>
</alarmNew>

<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>

some one plz help.

$ cat xmlcat.awk

BEGIN {
        RS=""           # Split records on blank lines
        FS=OFS="\n"     # Each field is a different line
        ORS="\n\n"      # Output blank lines, too
}

# Add missing XML lines
!/<[?]xml/{     $0="<?xml version=\"1.0\"?>\n" $0;      }

{       # Loop over lines
        for(N=2; N<=NF; N++)
        if(!(($N ~ /<alarmId>/)||($N ~ /<[/]?alarmNew/))) $N="";

        gsub(/\n+/, "\n");
        sub(/\n$/, "");
} 1

$ awk -f xmlcat.awk data1 data2

<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>

<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12222</alarmId>
</alarmNew>

<?xml version="1.0"?>
<alarmNew systemDN="GGSN-testmy11111111">
<alarmId>12528</alarmId>
</alarmNew>

$

Use nawk on solaris.

1 Like

Isn't this the same issue as your other thread?

1 Like

HI COrona688,
thanks for this i will try in my environment, i have small doubt can i run the above command as " nawk-f xmlcat.awk .xml" because i dont know how many xml files will flow in that folder. so if i mention file name as foldername/.xml. will this works for all xml files in that folder????

HI CarloM,
yes u r right but i got no response there:(

Yes, thoug you need a space between nawk and -f of course. * should work for all xml files in a folder. Expanding * into multiple filenames is a property of the shell, not of awk, so it should work for most things really.

The only gotcha is that if you have thousands and thousands of them, * will run out of room to cram arguments, since you can only put so many in one line.

1 Like