How can i break a text file into parts that occur between a specific pattern

abhinav192 · November 11, 2009, 7:04am

How can i break a text file into parts that occur between a specific pattern?

I have text file having various xml many tags like which starts with the tag "<?xml version="1.0" encoding="utf-8"?>" . I have to break the whole file into several xmls by looking for the above pattern.

All the tags occuring in between the above pattern will form a seperate xml to be created.

Please Help,

Thanks in advance

panyam · November 11, 2009, 7:29am

Pls post the input data and desired output expecting.

abhinav192 · November 11, 2009, 7:39am

Inpurt is >>
<fileName>Name1<fileName><?xml version="1.0" encoding="utf-8"?><Name> Abhinav</Name>
<Age>12</Age>
<fileName>Name2<fileName><?xml version="1.0" encoding="utf-8"?><Name> Abhinav</Name>
<Age>15></Age>

Desired output will be >>

Two xml files with the name Name1 and Name2 as given in the input and having contents as coming in the input...

1st file
<?xml version="1.0" encoding="utf-8"?><Name> Abhinav</Name>
<Age>12</Age>

2nd file
<?xml version="1.0" encoding="utf-8"?><Name> Abhinav</Name>
<Age>15></Age>

Actually we can search for the file Name tag >>>

i am new to UNIX so helpless

panyam · November 11, 2009, 7:56am

something like this :

awk '/<fileName>/ {co++; print >> "File_"co ;next} { print >> "File_"co }'  File_name.txt

abhinav192 · November 11, 2009, 8:13am

In your reply File_Name.txt will be the input file or the output

---------- Post updated at 08:13 AM ---------- Previous update was at 08:00 AM ----------

Hey panyam ... that was qute useful..
thanks,

BUt i need the names provided in the fileName tag as the names of the different files created...

ex: Name1.xml and Name2.xml

Franklin52 · November 11, 2009, 9:40am

Try this:

awk -F"[<>]" '/fileName/{f=$3".xml"}{print > f}' file

abhinav192 · November 11, 2009, 9:44am

thanks franklin....

Great help...

I being new dont understand the script you wrote...

Can you please explain a bit... or send a mail .. my mail id is

Franklin52 · November 11, 2009, 9:51am

awk -F"[<>]" '/fileName/{f=$3".xml"}{print > f}' file

Explanation:

awk -F"[<>]"			# set fieldseparators
'/fileName/{f=$3".xml"}		# if current line contains fileName, change filename (field 3 ".xml")
{print > f}			# print file to filename
' file

Have a read of one of the awk tutorials here:

http://www.unix.com/answers-frequently-asked-questions/13774-unix-tutorials-programming-tutorials-shell-scripting-tutorials.html

abhinav192 · November 12, 2009, 5:00am

The job has been done, but the only thing left is that the xml that i sgetting generated is also having the filename tag

Can we generate the xmls starting from <<?xml version="1.0" encoding="utf-8"?>>

rather than <fileName>Name1</fileName> tag

This is required because the xml thus generated is getting corrupted because of that filename tag

Franklin52 · November 12, 2009, 5:19am

Something like this?

awk -F"[<>]" '/fileName/{f=$3".xml";sub(".*fileName>","")}{print > f}' file