select a portion of a file into a CSV

How will i convert a file

<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>
<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>
<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>
<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>

to this
10-12-07,13:47:48.553,b'lore
10-12-07,13:47:48.553,b'lore
10-12-07,13:47:48.553,b'lore

Basically i want some portions of a file to be merged and form a new comma seperated file...
Please help this is urgent...:frowning:

anju

using Perl:

perl -pi -e 's#\<(.*?)\>(.*?)\<\/\1\>#\2,#g' tagged_file
perl -pi -e 's/,$//' tagged_file

could you please explain what u have done.coz i am new to perl and have no much idea abt this.

The poster wanted only the city portion of the third "field" to be ouputted. Yogesh Sawant's perl solution outputs all of the third field.

Here is a solution based on sed.

sed -e 's/></>,</g' -e 's/<[^<>]*>//g' -e 's/\(^.*,.*,\)\(.*city:\)\(.*\)\(;.*$\)/\1\3/g' tagged_file

which outputs

10-12-07,13:47:48.553,blore
10-12-07,13:47:48.553,blore
10-12-07,13:47:48.553,blore
10-12-07,13:47:48.553,blore

It works by:

  1. Place a comma between sets of right and left angle brackets
  2. Remove XML tags
  3. Extract city data from third field

GNU awk

awk '{
    match($0,/<LDATE>(.*)<\/LDATE>/,mldate)   
    match($0,/<LTIME>(.*)<\/LTIME>/,mltime)  
    match($0,/<LTEXT>(.*)<\/LTEXT>/,mltxt)  
    split(mltxt[1], mltext,"[;:]")    
    print mldate[1]","mltime[1]","mltext[4]   
}' "file"

Another one with gawk:

awk 'BEGIN {FS="[<>]"} {
  split($11, s, "[;:]")
  print $3 "," $7 "," s[4]
}' file

Regards

when i execute this code i am getting this error
awk: syntax error near line 2
awk: illegal statement near line 2
awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 4
awk: illegal statement near line 4

because you don't have GNU
this version works in Solaris "broken" awk

#!/bin/sh
awk '{
       a=index($0,"<LDATE>" )
       b=index($0,"</LDATE>")
       ldate=substr($0,a+length("<LDATE>"), b - length("</LDATE>") )
       c=index($0,"<LTIME>")
       d=index($0,"</LTIME>")
       ltime=substr($0,c+length("<LTIME>") , d-c-length("</LTIME>") )
       print ldate,ltime
       ## do LTEXT yourself.
}' file


sed 's/<LDATE>//;s/<\/LDATE>/,/;s/<LTIME>//;s/<\/LTIME>/,/;s/<LTEXT>//;s/<\/LTEXT>/,/;s/name:.*city://;s/;ph.*$//' filename