anju
January 15, 2008, 2:11am
1
How will i convert a file
<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>
<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>
<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>
<LDATE>10-12-07</LDATE><LTIME>13:47:48.553</LTIME><LTEXT>name:anju;city:blore;ph:123</LTEXT>
to this
10-12-07,13:47:48.553,b'lore
10-12-07,13:47:48.553,b'lore
10-12-07,13:47:48.553,b'lore
Basically i want some portions of a file to be merged and form a new comma seperated file...
Please help this is urgent...
anju
using Perl:
perl -pi -e 's#\<(.*?)\>(.*?)\<\/\1\>#\2,#g' tagged_file
perl -pi -e 's/,$//' tagged_file
anju
January 15, 2008, 5:34am
3
could you please explain what u have done.coz i am new to perl and have no much idea abt this.
The poster wanted only the city portion of the third "field" to be ouputted. Yogesh Sawant's perl solution outputs all of the third field.
Here is a solution based on sed.
sed -e 's/></>,</g' -e 's/<[^<>]*>//g' -e 's/\(^.*,.*,\)\(.*city:\)\(.*\)\(;.*$\)/\1\3/g' tagged_file
which outputs
10-12-07,13:47:48.553,blore
10-12-07,13:47:48.553,blore
10-12-07,13:47:48.553,blore
10-12-07,13:47:48.553,blore
It works by:
Place a comma between sets of right and left angle brackets
Remove XML tags
Extract city data from third field
GNU awk
awk '{
match($0,/<LDATE>(.*)<\/LDATE>/,mldate)
match($0,/<LTIME>(.*)<\/LTIME>/,mltime)
match($0,/<LTEXT>(.*)<\/LTEXT>/,mltxt)
split(mltxt[1], mltext,"[;:]")
print mldate[1]","mltime[1]","mltext[4]
}' "file"
Another one with gawk:
awk 'BEGIN {FS="[<>]"} {
split($11, s, "[;:]")
print $3 "," $7 "," s[4]
}' file
Regards
anju
January 16, 2008, 1:04am
7
ghostdog74:
GNU awk
awk '{
match($0,/<LDATE>(.*)<\/LDATE>/,mldate)
match($0,/<LTIME>(.*)<\/LTIME>/,mltime)
match($0,/<LTEXT>(.*)<\/LTEXT>/,mltxt)
split(mltxt[1], mltext,"[;:]")
print mldate[1]","mltime[1]","mltext[4]
}' "file"
when i execute this code i am getting this error
awk: syntax error near line 2
awk: illegal statement near line 2
awk: syntax error near line 3
awk: illegal statement near line 3
awk: syntax error near line 4
awk: illegal statement near line 4
because you don't have GNU
this version works in Solaris "broken" awk
#!/bin/sh
awk '{
a=index($0,"<LDATE>" )
b=index($0,"</LDATE>")
ldate=substr($0,a+length("<LDATE>"), b - length("</LDATE>") )
c=index($0,"<LTIME>")
d=index($0,"</LTIME>")
ltime=substr($0,c+length("<LTIME>") , d-c-length("</LTIME>") )
print ldate,ltime
## do LTEXT yourself.
}' file
sed 's/<LDATE>//;s/<\/LDATE>/,/;s/<LTIME>//;s/<\/LTIME>/,/;s/<LTEXT>//;s/<\/LTEXT>/,/;s/name:.*city://;s/;ph.*$//' filename