Instead of using a variable outside sed, I was trying to get the "id" within the same sed command and append it in the line "Description", but so far - no luck!
Standard disclaimer: to "understand" XML a program(ming language) needs to work context-sensitive. For this you need a (recursive) parser Because regexp machines (like sed or awk ) aren't parserswhatever you can create with these will always retain some sort of uncertainty - in other works it will always be possible to trick them into doing something they shouldn't by crafting the input in a respective way.
Having said this: there is nothing wrong with a "best-effort" solution as long as you are aware that it is exactly this.
Your sed script was already quite close, here is how it goes:
First, you need to set rules what happens with which type of lines:
1) In a line of the form <OfferDefinition Id=...> we need to extract the value ID and store it somewhere.
2) In a line of the form </OfferDefinition> the block within which the ID makes sense ends and we have to drop the stored value there.
3) In a line of the form <Description>....</Description> we need to insert the stored value if there is one.
Notice that i assume the lines to be "well-behaved". This tag:
<Description>
....
</Description>
would be well inside the definition but would confuse the regexp as it is. You would have to work on this if you want to cover that too. Likewise for some other quirks - this is what i was talking above.
Now let us implement the three rules, notice that the explanations are NOT part of the script. Also notice (the last line) that th content of the hold space contains a line break, which we have to clear. This is one of the more tricky things when you work with multiline patterns:
sed '/<OfferDefinition Id=.*>/ { # rule 1-lines
p # print, so that the unaltered line is in the output
s/.*Id="// # remove everything up to Id="
s/">.*// # remove the trailing part, isolating the value
h # move that to the hold space
d # and delete from pattern space
}
/<\/OfferDefinition>/ { # rule 2-lines
p # print unaltered line
d # delete pattern space
x # exchange hold/pattern (= clear hold)
d # and delete pattern again
}
/<Description>.*<\/Description>/ { # rule 3-lines
s/[ ]*$// # clear trailing whitespace
G # append hold space content to pattern space
s/\(<Description>\)\(.*\)\(<\/Description>\)\(.*\)/\1\4_\2\3/
# rearrange contents:
# from: <Des>content</Desc>val
# to: <Des>val_content</Desc>
s/\n// # remove extra line breaks
}' /path/to/input
Actually this was not the case in my test and the script worked as it was shown here. For reference, i used Linux (Kernel 4.10.42) and GNU-sed 4.2.2, shell is Kornshell 93 u+.
man sed
...
d
Delete pattern space. Start next cycle.
You must test with an input file that has a further <Description>xyz</Description> after (outside) the <OfferDefinition Id=...> ... </OfferDefinition> block.
While the other codes using awk were giving slight errors (while printing the 'id' in 'description', they were replacing the existing text) which could be tweaked, your code worked perfectly. Many Thanks!
---------- Post updated at 05:17 PM ---------- Previous update was at 05:03 PM ----------
Thanks Rudi, you are always helpful. Worked with a little bit of tweak in separators:
pairs. If pattern evaluates to TRUE, the respective action is executed. 1 is always TRUE, and for a missing actions the default, print , will be taken.