XML parsing

I have an xml file where the format looks like below

<SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command" REUSABLE ="NO" TYPE ="Pre-session command">
                <TASK DESCRIPTION ="" NAME ="pre_session_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="NO"/>
                </TASK>
     </SESSIONCOMPONENT>

            <SESSIONCOMPONENT REFOBJECTNAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Post-session success command">
                <TASK DESCRIPTION ="" NAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Fail task if any command fails" VALUE ="NO"/>
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="NO"/>
                </TASK>

The ask is to replace the VALUE tag with the following strings that i will generate on the fly

STR1="/scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Paramprefilename"
STR2="scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Parampostfilename"

The Output will be like

<SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command" REUSABLE ="NO" TYPE ="Pre-session command">
                <TASK DESCRIPTION ="" NAME ="pre_session_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="/scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Paramprefilename"/>
                </TASK>
     </SESSIONCOMPONENT>

            <SESSIONCOMPONENT REFOBJECTNAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Post-session success command">
                <TASK DESCRIPTION ="" NAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Fail task if any command fails" VALUE ="NO"/>
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Parampostfilename"/>
                </TASK>

Im trying to use awk but unable to find the correct use of gsub in this .. any help wil be appreciated

 awk -F "[><]" '/VALUEPAIR EXECORDER/{gsub(/VALUE*/,VALUE ="/ac ")}1'  wf_DB2zOS_DB2PAAA_TCHK_ACCT_MASTER.xml 

In your awk gsub attempt, the quoting of the replacement needs to be corrected:

awk -F "[><]" '/VALUEPAIR EXECORDER/{gsub(/VALUE*/,"VALUE =\"/ac \"")}1'  file

Thanks Rudi however im unable to using variable replacement inside gsub

 awk -F "[><]" '/VALUEPAIR EXECORDER/{gsub(/VALUE =\"*\"/,VALUE = $STR1)}1'  wf_DB2zOS_DB2PAAA_TCHK_ACCT_MASTER.xml 

thanks

Hello r_t_1601,

Not sure how you want to substitute the string VALUE with multiple strings as it is present in many places, following is the example of using one variable and doing substitution with VALUE string.

awk -v STR1='/scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Paramprefilename'  -v STR2="scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Parampostfilename" '/VALUEPAIR EXECORDER/{sub("VALUE =","&"STR1)} 1'   Input_file

Above is only an example how to use variables in awk , in case you have more requirements then kindly mention them clearly.

Thanks,
R. Singh

1 Like

Try

awk -v RPST="$STR1|$STR2" 'BEGIN {n = split (RPST, RP, "|")} /VALUEPAIR EXECORDER/ {T=$NF; sub (/""/, "\"" RP[++CNT] "\"", T); sub ($NF, T)} 1'   file

but make sure the strings STR1 & 2 have been assigned using single (NOT double) quotes so the $ sign are being preserved.

1 Like

Hi Ravinder thanks for the response.

STR1="/scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Paramprefilename"
STR2="scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Parampostfilename"

The STR1 should be substituted where the SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command

<SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command" REUSABLE ="NO" TYPE ="Pre-session command">
                <TASK DESCRIPTION ="" NAME ="pre_session_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="STR1"/>
                </TASK>
     </SESSIONCOMPONENT>

and STR2 should substitute where SESSIONCOMPONENT REFOBJECTNAME ="post_session_success_command"

<SESSIONCOMPONENT REFOBJECTNAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Post-session success command">
                <TASK DESCRIPTION ="" NAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Fail task if any command fails" VALUE ="NO"/>
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="STR2"/>
                </TASK>
     </SESSIONCOMPONENT>

just stating the initial value would be "NO" instead of "null" in VALUE

Hello r_t_1601,

Could you please try following and let me know if this helps you.

awk -v STR1='/scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Paramprefilename'  -v SRT2='scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Parampostfilename'  '
/pre_session_command/{
  val=1
}
val==1 && /VALUEPAIR EXECORDER/{
  sub(/VALUE =[^/]*/,"VALUE =")
  sub("VALUE =","&\"" STR1 "\"");
  val=""
}
/post_session_success_command/{
  val=2
}
val==2 && /VALUEPAIR EXECORDER/{
  sub(/VALUE =[^/]*/,"VALUE =")
  sub("VALUE =","&\"" SRT2 "\"");
  val=""
}
1
'  Input_file

Output will be as follows.

<SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command" REUSABLE ="NO" TYPE ="Pre-session command">
                <TASK DESCRIPTION ="" NAME ="pre_session_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="/scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Paramprefilename"/>
                </TASK>
     </SESSIONCOMPONENT>
             <SESSIONCOMPONENT REFOBJECTNAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Post-session success command">
                <TASK DESCRIPTION ="" NAME ="post_session_success_command" REUSABLE ="NO" TYPE ="Command" VERSIONNUMBER ="1">
                    <ATTRIBUTE NAME ="Fail task if any command fails" VALUE ="NO"/>
                    <ATTRIBUTE NAME ="Recovery Strategy" VALUE ="Fail task and continue workflow"/>
                    <VALUEPAIR EXECORDER ="1" NAME ="Command1" REVERSEASSIGNMENT ="NO" VALUE ="scripts/waitForDummyFilePDS.sh -f$Paramdummyfile -H$ParamIP -e$Paramemail -j$Parampostfilename"/>
                </TASK>
 

Let me know if you have any queries on same.

Thanks,
R. Singh

1 Like

You keep adding new requirements with every post. It would really have been nice to have a complete specification up front.

If I understand correctly, the SESSIONCOMPONENT tag's REFOBJECTNAME parameter can have two values( pre_session_command or post_session_command ). And, if one of those values is found AND a VALUE parameter in a VALUEPAIR tag has the value NO before a /SESSIONCOMPONENT tag is found, then the string NO should be replaced by the contents of the variable STR1 on pre-session entries and by the contents of the variable STR2 on post-session entries. (Note that post #6 in this thread says that the NO should be replaced by one of the literal strings STR1 or STR2 rather than the contents of variables with those names???) Is this correct?

If the value of the VALUE parameter is something other than NO , should any substitution be made?

Are there other values for the REFOBJECTNAME parameter? If so, should these entries be deleted from the output? Copied to the output unchanged? Have some other value assigned to the corresponding VALUE parameter? Something else???

Apologies for the changes made . Don you have posted almost everything correctly

Note that post #6 in this thread says that the NO should be replaced by one of the literal strings STR1 or STR2 rather than the contents of variables with those names???) Is this correct?

The contents of STR1 and STR2 should

  1. In case any value is found in VALUE for eg
VALUE="NO"

or or value like VALUE="/scripts/abc.sh -H$PmRep -N$PmLog" then replace

VALUE="STR1" where SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command
&
VALUE="STR2" where
SESSIONCOMPONENT REFOBJECTNAME ="post_session_success_command
  1. In case no value is found eg
 VALUE="" 

then substitute

VALUE="STR1" where SESSIONCOMPONENT REFOBJECTNAME ="pre_session_command
&
VALUE="STR2" where
SESSIONCOMPONENT REFOBJECTNAME ="post_session_success_command 

there would be no other values for the

 REFOBJECTAME 

Thanks in advance