To extract a string between two words in XML file

Padmanabhan · June 1, 2013, 6:31am

i need to extract the string between two tags,
input file is

<PersonInfoShipTo AddressID="446311709" AddressLine1="" AddressLine2="" AddressLine3="" AddressLine4="" AddressLine5="" AddressLine6="" AlternateEmailID="" Beeper="" City="" Company="" Country="" DayFaxNo="" DayPhone="" Department="" EMailID="" EveningFaxNo="" EveningPhone="" FirstName="la" IsAddressVerified="Y" JobTitle="" LastName="la" MiddleName="" MobilePhone="" OtherPhone="" PersonID="" PersonInfoKey="201204240041014009667499" State="" Suffix="" Title="" ZipCode=""/>

I need to extract between <PersonInfoShipTo and /> and put in another file.
I tried following code

awk '/PersonInfoshipTo /, ///' input2.xml | sed '$d'
awk '/PersonInfoshipTo /{s=x}{s=s$0"\n"}/Line13/{p=1}/Canceled/ && p{print s;exit}' file
sed -e 's/PersonInfoshipTo \(.*\)>/\1/'

Please help with your ideas
Thanks:)

Scrutinizer · June 1, 2013, 6:35am

What should your output look like?

Padmanabhan · June 1, 2013, 6:40am

This s the outout which i need,

AddressID="446311709" AddressLine1="" AddressLine2="" AddressLine3="" AddressLine4="" AddressLine5="" AddressLine6="" AlternateEmailID="" Beeper="" City="" Company="" Country="" DayFaxNo="" DayPhone="" Department="" EMailID="" EveningFaxNo="" EveningPhone="" FirstName="la" IsAddressVerified="Y" JobTitle="" LastName="la" MiddleName="" MobilePhone="" OtherPhone="" PersonID="" PersonInfoKey="201204240041014009667499" State="" Suffix="" Title="" ZipCode=""

---------- Post updated at 04:10 PM ---------- Previous update was at 04:07 PM ----------

output is like

AddressID="446311709" AddressLine1="" AddressLine2="" AddressLine3="" AddressLine4="" AddressLine5="" AddressLine6="" AlternateEmailID="" Beeper="" City="" Company="" Country="" DayFaxNo="" DayPhone="" Department="" EMailID="" EveningFaxNo="" EveningPhone="" FirstName="la" IsAddressVerified="Y" JobTitle="" LastName="la" MiddleName="" MobilePhone="" OtherPhone="" PersonID="" PersonInfoKey="201204240041014009667499" State="" Suffix="" Title="" ZipCode=""

Scrutinizer · June 1, 2013, 9:15am

Hi, see if this works:

awk 'sub("^" s FS,x) && sub(/\/>\n/,x)' s=PersonInfoShipTo RS=\< file > newfile

if it is always alway on one line, you could try:

sed -n 's|^<PersonInfoShipTo \(.*\)/>|\1|p' file > newfile

Otherwise you could try using an XML parser...

Padmanabhan · June 1, 2013, 9:52pm

hi it works perfect.thanks a lot:o:o
i have one more requirement.I have modified the extracted data in a file and and i need to insert in the place exactly where i take from.
i.e between <PersonInfoShipTo and />
thanks for ur help.

MadeInGermany · June 2, 2013, 9:11am

This replaces the matching line(s) with the contents of newfile

sed -e '/^<PersonInfoShipTo .*\/>/ {r newfile' -e 'd;}' file