how to get xml tag..

Sorry to trouble you guys again.....but i encounter this problem:
My textfile contains this:
2006-01-12 01:12:08,290 [ExecuteThread: '1' for queue: 'default'] INFO - The XML message **************<PM_ARRIVAL xmlns:xsi=
"http://www.w3.org/2001/XMLSchemainstance"><system_c>GMS</system_c><trans_c>ARLC</trans_c></<PM_ARRIVAL>
2006-01-12 01:12:08,303 [ExecuteThread: '1' for queue: 'default'] INFO - Root Node is [PM_ARRIVAL]
2006-01-12 01:12:10,009 [ExecuteThread: '2' for queue: 'default'] INFO - message received...
2006-01-12 01:12:10,009 [ExecuteThread: '2' for queue: 'default'] INFO - The XML message **************<berth_allocation xmln
s:xsi="http://www.w3.org/2001/XMLSchemainstance"><system_c>BPMS</system_c><trans_c>BPMSMessage</trans_c><trans_dt>2006-01-12T01:12:09.601+08:00</trans_dt><message><record xmlns:xsi="http://www.w3.org/2001/XMLSchemainstance"><</berth_allocation>
2006-01-12 01:12:10,015 [ExecuteThread: '2' for queue: 'default'] INFO - Root Node is [berth_allocation]
2006-01-12 01:12:10,021 [ExecuteThread: '2' for queue: 'default'] INFO - XML messages retrieved:<record xmlns:xsi="http://www
.w3.org/2001/XMLSchemainstance"><func_c>U</func_c><vv_c>20744</vv_c><vessel_m>BUNGATERATAIDUA</vessel_m><abbr_vessel_m>BTERATAI2</abbr_vessel_m><voyage_out_n>4101</voyage_out_n><abbr_voyage_out_n>4101</abbr_voyage_out_n></record>
2006-01-12 02:07:23,179 [ExecuteThread: '1' for queue: 'default'] INFO - message received...
2006-01-12 02:07:23,179 [ExecuteThread: '1' for queue: 'default'] INFO - The XML message **************<PM_EXIT xmlns:xsi="ht
tp://www.w3.org/2001/XMLSchemainstance"><system_c>GMS</system_c><trans_c>EXLC</trans_c><trans_dt>200601120206</trans_dt><user_id_m>PD$CYM</user_id_m></PM_EXIT>2006-01-12 02:07:23,185 [ExecuteThread: '1' for queue: 'default'] INFO - Root Node is [PM_EXIT]
2006-01-12 02:08:19,633 [ExecuteThread: '2' for queue: 'default'] INFO - message received...
2006-01-12 02:07:23,185 [ExecuteThread: '1' for queue: 'default'] INFO - Root Node is [PM_EXIT]
2006-01-12 02:08:19,633 [ExecuteThread: '2' for queue: 'default'] INFO - message received...

I only want to get the PM_EXIT, PM_ARRIVAL and record tag which i highlighted in bold.
i can only get the record tag out with this sed command:
sed -n -e '/<record /{N;s_.\(<record .*<\/record>\)._\1_p;}' file.record.txt
however if i use for the PM_ARRIVAL and EXIT with this command i get
sed: command garbled: /<PM_EXIT /N;s_.\(<PM_EXIT .*<\/PM_EXIT>\)._\1_p;}..

any idea?

Try...

awk -v X='(PM_ARRIVAL|record|PM_EXIT)' 'match($0,"<" X ".*" X ">"){print substr($0,RSTART,RLENGTH)}' file1

hi.. i got this error when i try this code..
awk: syntax error near line 1
awk: bailing out near line 1

Perhaps use nawk or gawk?

thanks for your great help..erm can you explain to me the coding?? so that i can use that for further extraction of other xml files....

Details of the awk match function can be found in the awk manual: Built-in Functions for String Manipulation

thanks alot for the link! :slight_smile:

As for the original quesiton, you have to escape , as in \. Thus:

sed -n -e '/<PM\_EXIT /{N;s_.\(<PM\_EXIT .*<\/PM\_EXIT>\)._\1_p;}'

Cheers,

Keith

thanks alot kduffin :slight_smile: