Combine two lists From Multiple Grep commands.

I'm working with a file with an xml structure. I'd like to parse it down to just the bits i want. Here is and example of the file

<message id="96352877" method="status">
      <date rfc="Sat, 12 Mar 2011 16:13:15 -0600" unix="1299967995" />
      <services>
        <service id="facebook" name="Facebook" />
        <service id="twitter" name="Twitter" />
        <service id="myspace" name="MySpace" />
        <service id="identi.ca" name="Identi.ca" />
        <service id="friendfeed" name="FriendFeed" />
        <service id="gtalk" name="GTalk Status" />
        <service id="yahoo" name="Yahoo Profiles" />
        <service id="buzz" name="Google Buzz" />
      </services>
      <content>
        <body>SSBqdXN0IGRpc21hbnRsZWQgbXkgUnViaWsncyBDdWJlLi4uIFRoZXJlJ3MgTm8gbWFnaWMgaW4gdGhlcmUgISEh</body>
      </content>
      <location />
      <mood />
      <tags />
      <from>API</from>
    </message>
    <message id="96345969" method="status">
      <date rfc="Sat, 12 Mar 2011 15:18:17 -0600" unix="1299964697" />
      <services>
        <service id="facebook" name="Facebook" />
        <service id="twitter" name="Twitter" />
        <service id="myspace" name="MySpace" />
        <service id="identi.ca" name="Identi.ca" />
        <service id="friendfeed" name="FriendFeed" />
        <service id="gtalk" name="GTalk Status" />
        <service id="yahoo" name="Yahoo Profiles" />
        <service id="buzz" name="Google Buzz" />
      </services>
      <content>
        <body>VGhlIEVsZXBoYW50cyBvZiBQb3puYW4gYnkgT3Jzb24gU2NvdHQgQ2FyZCBBIFN1cmVhbCBzaG9ydCBzdG9yeSBodHRwOi8vcGluZy5mbS8yRVpuVw==</body>
      </content>
      <location />
      <mood />
      <tags />
      <from>API</from>
    </message>

I'd like to take the bold bits and Put them in a list ie.
$date $body
so the output would look like this:

1299967995  SSBqdXN0IGRpc21hbnRsZWQgbXkgUnViaWsncyBDdWJlLi4uIFRoZXJlJ3MgTm8gbWFnaWMgaW4gdGhlcmUgISEh
1299964697  VGhlIEVsZXBoYW50cyBvZiBQb3puYW4gYnkgT3Jzb24gU2NvdHQgQ2FyZCBBIFN1cmVhbCBzaG9ydCBzdG9yeSBodHRwOi8vcGluZy5mbS8yRVpuVw==

I have some code that will get each of these and list them sepereately but I'm having a hard time combining the two lists.

date=$(grep "unix"| sed 's/.*\(unix="\)\(.*\)\(".*\)/\2/')
body=$(grep "<body>"| sed -e 's/<body>//g' | sed -e 's/<\/body>//g')
echo "$date $body" 

Everyway I've tried to make the list has failed so far :wall:

Thankyou for the help, I know this should be simple, yet alludes me so far.

Try this,

perl -nle 'printf "$1\t" if /unix="(.+?)"/>$/;print $1 if m/<body>(.*)<\/body>/;'  input.xmls
1 Like

try:

awk  '/unix=/{v=gensub("unix=|\"","",1,$(NF-1))}/<body>/{print v,gensub("</?body>","","g",$1)}' file
1 Like

Thank you both. They both work :slight_smile: I can understand the how perl script works pretty easily.

How does the awk line break down?

Hi Pravin,

perl -nle 'printf "$1\t" if /unix="(.+?)"/>$/;print $1 if m/<body>(.*)<\/body>/;'  input.xmls

I have two queries.

  1. Is there any special reason why we are going for non-greddy matching(.+?)
  2. In the first if statement , the delimiters are not provided properly. I think its treating ">" as redirection operator .

Thanks in advance