I have data coming in the below format for each record
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
I need to remove the the XML prolog on the 2nd record until the last record
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
The o/p should look like below
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
I might have more than 100 records and this should happen to all the records starting from 2nd record.
Hello dsravanam,
Could you please use code tags in spite of using quotes for commands/Inputs/codes into your posts as per forum rules. Because of not using code tags your Input_file didn't come properly so let's say you have following Input_file:
cat Input_file
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
Now following is the code which may help you in same.
awk 'NR==1{print;next} !/<\?xml version=\"1\.0\" encoding=\"UTF-8\" standalone=\"no\"\?>/{print}' Input_file
Output will be as follows.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
on a Solaris/SunOS system, change awk
to /usr/xpg4/bin/awk
, /usr/xpg6/bin/awk
, or nawk
. Also if you have a different requirement then request you to please show us sample input with expected sample output, with your O.S details to us to help you.
Thanks,
R. Singh
RudiC
February 10, 2016, 10:32am
3
How about
awk 'NR>1 {sub (/^<[^>]*>/, "")} 1' file
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
---------- Post updated at 16:32 ---------- Previous update was at 16:29 ----------
You may want to refine the regex for the sub
function to be more specific if need be.
Thanks Rudic. If i have to replace the below string at the end of each record except the last record what should i use
</test_sox>
RudiC
February 10, 2016, 12:56pm
5
Are you sure? That would result in invalid xml elements <test_sox ...> as they all wouldn't be closed.
Rudic
I am sorry for the confusion. Actually I need from the 2nd record onwards to remove the below string as well
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
RudiC
February 10, 2016, 1:34pm
7
Confusion grows. What should the result look like?
i/p
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
o/p
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....</test_sox>
<testdetials>....</test_sox>
RudiC
February 10, 2016, 2:26pm
9
Correct me if I'm wrong - doesn't that have a closing </test_sox>
too many?
---------- Post updated at 20:26 ---------- Previous update was at 19:38 ----------
How about
awk '
NR>1 {sub (/^<[^>]*>/,"")
sub (/^<[^>]*>/,"")
}
{sub (/<\/test_sox>.?$/, "")
}
1
END {print "</test_sox>"
}
' file
<?xml version="1.0" encoding="UTF-8" standalone="no"?><test_sox xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><testdetials>....
<testdetials>....
</test_sox>