Check null value in xml

Neethu · March 4, 2013, 4:14am

Hi,

I have a log file which is having some xml tags. I need to check the value for a particular xml field is null or not and if it is null i have to add current time as the value for that xml field.

I tried below code to check whether the word count is 0. But even if the xml field is null it is showing the count as 1.

if [ `cat Log.out | grep "<timestamp1>.*</timestamp1>" | wc -l` -gt 0 ]
then
echo timestamp is not empty
fi

RudiC · March 4, 2013, 4:22am

What do you mean by "null"? Value "0" (zero) or empty field?
That .* means 0 or more occurrences of any char, so the word count will not equal 0 if that "<timestamp1> . . . </timestamp1>" occurs somewhere in your file.
Try <timestamp1>.+</timestamp1> or <timestamp1>..*</timestamp1> to discriminate empty fields.

And then - UUOC and UUOB.

grep -q "<timestamp1>.+</timestamp1>" Log.out && echo "found" || echo "empty/non existent"

will do.

Neethu · March 4, 2013, 4:37am

Hi RudiC

Thanks for the quick reply.

I meant the value is empty.

I tried with

<timestamp1>..*</timestamp1>

and I am able to get the count if it having null values.

But if the log file is having an empty value for timestamp1 then i have to pull the entire xml and insert current time to that empty xml field.

Sample xml:
<submit>
<ID>16</ID>
<Reference/>
<timestamp1></timestamp1>
....
.....
</submit>

Could you please help me to do this.

RudiC · March 4, 2013, 4:56am

Following these forums for nearly half a year, you should be able to at least try a solution yourself. What would you come up with in the first place? We are happy to help should you get stuck...

Neethu · March 4, 2013, 5:47am

Hi RudiC,

I tried the following code

awk '/<submit>/,/<\/submit>/' log.out > xml
if [ -s xml ]
then
grep "<timestamp1>..*</timestamp1>" xml > tvalue
else
echo xml is empty
fi

This will not pull any value if timestamp is empty in xml file. I want to check the value of timestamp1 in xml file and if it is empty i want to insert the current time. But in this code it is not pulling any xml tag.

RudiC · March 4, 2013, 6:29am

Fine so far.

What ideas do you have to insert the timestamp? Searching these forums might give you a head start...

Yoda · March 4, 2013, 9:38am

I noticed that in your code you are checking if your file size is empty, but if that is what you want, it is OK.

But if you are looking for a method to check if tag value is empty and insert current timestamp in it, then code something like:

awk -F'[<>]' ' BEGIN {
                cmd = "date +%Y%m%d%H%M%S"
} /<timestamp1>/ {
        if ( $3 == "" )
        {
                cmd | getline dt
                close(cmd)
                $0 = "<timestamp1>" dt "</timestamp1>"
        }
} 1 ' xml

Neethu · March 4, 2013, 9:59am

Hi Bipin,

Thanks for the reply.

I am pulling my xml from logs. First I want to check whether my xml is empty or not. If it is not empty then I have to check the tag value of timestamp1. If that is empty then I have to insert current timestamp.

Could you please explain me the code.

Yoda · March 4, 2013, 11:08am

Sure, here is the explanation of code:

awk -F'[<>]' '                                                  # Set < > as field separators.
BEGIN {                                                         # BEGIN block.
                cmd = "date +%Y%m%d%H%M%S"                      # Define cmd = "date +%Y%m%d%H%M%S"
} /<timestamp1>/ {                                              # Search for pattern: <timestamp1>
        if ( $3 == "" )                                         # If pattern found check if 3rd field is NULL (3rd field is tag value)
        {
                cmd | getline dt                                # Run cmd and read output in variable: dt
                close(cmd)                                      # Close cmd
                $0 = "<timestamp1>" dt "</timestamp1>"          # Set current record = <timestamp1> dt (current timestamp) </timestamp1>
        }
} 1 ' xml                                                       # 1 == true, so print current record.

I hope this helps.

RudiC · March 5, 2013, 1:01am

Above is working well if timestamp1 is the only tag in one line, and if there's no gap (space) between the two tags. For a bit more generic case, try

awk ' BEGIN {"date +%Y%m%d%H%M%S"|getline TS}
     {sub(/<timestamp1><\/timestamp1>/, "<timestamp1>"TS"<\/timestamp1>")}
     1
    ' FS="<|>" file

Neethu · March 5, 2013, 4:07am

Thanks Bipin and RudiC... Both codes worked as expected.

Thanks a lot Bipin for explaining me the code.

awk '/<submit>/,/<\/submit>/' log.out > xml

This will pull the entire xml tag from log.out. From this I it will check the empty value for timestamp1. But I want to pull the entire xml from /<submit>/,/<\/submit>/ only if the tag value of timestamp1 is empty. Could you please help me

RudiC · March 5, 2013, 5:29am

The easiest way would be to scrap the result file if condition is not met:

awk     ' BEGIN {"date +%Y%m%d%H%M%S"|getline TS; n=1}
         /<timestamp1>/ {n=sub(/<timestamp1> *<\/timestamp1>/, "<timestamp1>"TS"<\/timestamp1>")}
         n
         !n {exit 1}
        ' FS="<|>" file && echo good || echo bad

If bad, scrap result file.

Neethu · March 5, 2013, 6:58am

Hi RudiC,

The above code is pulling the xml file from /<submit> till <timestamp1> even if the timestamp1 is not empty.

I want to pull the entire xml if the tag value of timestamp1 is empty.

Is it possible to add the condition to the below line so that it will pull the entire xml tag only if the tag value of timestamp1 is empty.

awk '/<submit>/,/<\/submit>/' log.out > xml

RudiC · March 6, 2013, 4:54pm

Not sure if I got all requirements correctly, but you could give this a shot:

awk     ' BEGIN         {"date +%Y%m%d%H%M%S"|getline TS; n=1}          # prepare TS variable for timestamp
         /<submit>/,                                                    # discard e.th. outside "submit" tags
         /<\/submit>/   {if ($0 ~ /<timestamp1>/)                       # if "timestamp1" tag found within "submit" tags
                           {n = sub(/<timestamp1> *<\/timestamp1>/,     # replace zero+ spaces (= empty tag value)
                                    "<timestamp1>"TS"<\/timestamp1>")}  # with TS contents; n = 0 if not empty
                         if (!n) {exit 1}                               # if n = 0, i.e. non-empty tag value, quit with error
                         print           
                        }
        '  file > resultfile || rm resultfile                           # if error, i.e. non-empty tag value, remove output
$ cat resultfile 
<submit>
<ID>16</ID>
<Reference/>
<timestamp1>20130306225424<\/timestamp1>
....
.....
</submit>

If resultfile survives, it will have the before empty tag filled with actual timestamp. Resultfile will not survive timestamps with data in them.

Neethu · March 7, 2013, 2:18am

Hi RudiC,

Thanks a lot for your reply.

The above code is working fine if the log has either empty timestamp1 or value in timestamp1 for all the xmls in that log. If the log have empty timestamp1 for one xml and value for timestamp1 in another xml in the same log, then it is not working.

But I thought of splitting the xml with the html tag and the one with empty timestamp1 will have more wc than the other. So I will elimate the xml with value in timestamp1 here. Thanks a lot for your help.

RudiC · March 7, 2013, 3:06am

This the first time in the entire thread that you are talking of more than one log file snippet to be extracted. There's a rule in IT: garbage in garbage out, so...
Anyhow, I'm happy I could help and you found a solution.

Neethu · March 7, 2013, 4:34am

Hi RudiC,

I have mentioned that I have so many xml in one log at the beginning of the thread.

Thanks RudiC for the help.