(n)awk: print regex search output lines in one line

Hello.
I have been looking high and low for the solution for this. I seems there should be a simple answer, but alas.

I have a big xml file, and I need to extract certain information from specific items. The information I need can be found between a specific set of tags. let's call them <tag></tag>. Doing a regex lookup yields 4 lines of text, which I need to be 1 line, and this has to happen with every snippet that's between those tags (about 700 or so <tag></tag> elements in the xml file).

So, if I do this:

awk -F"tag" '/ref1/;/ref2/;/ref3/;/ref4/ {print}' file.xml

the output I get is:

<ref1="this is ref1"/>
<re2="this is ref2" />
<ref3="this is ref3" />
<ref4="this is ref4 />
<ref1="this is ref1"/>
<re2="this is ref2" />
<ref3="this is ref3" />
<ref4="this is ref4 />
. . .

The output I need:

<ref1="this is ref1"/><re2="this is ref2" /><ref3="this is ref3" /><ref4="this is ref4 />
<ref1="this is ref1"/><re2="this is ref2" /><ref3="this is ref3" /><ref4="this is ref4 />
. . . 

Note: I am in a very secure environment, so I cannot install anything, so I only have the regular awk/nawk/gawk and sed that comes with Solaris 11.

Any help would be greatly appreciated.

Assuming the tags are in order as shown in your sample, the following should work:

awk '/ref1/{r1=$0};/ref2/{r2=$0};/ref3/{r3=$0};/ref4/{print r1 r2 r3 $0}' file.xml

Note, however, that your code (and this trivial modification of it) are not just matching tag names, they are matching tag names and text that is not part of the tag name (in the case of ref2 , it is only matching text since the tag name is re2 instead of ref2 ).

Of course, you could also use:

grep 'ref[1-4]' file.xml | paste -d '\0\0\0\n' - - - -

or:

awk '/ref[1-4]/' file.xml | paste -d '\0\0\0\n' - - - -
1 Like

Piping to paste worked like a charm

Thank you.