Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file -

I have a single text file in the form of :

<NAME>house........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
.
.

<NAME>car........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
.
.
<NAME>boat........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
.
.

I want to extract <NAME>, </script>, and all lines between the two and place them into respectives files

ending up with

file1.txt

<NAME>house........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>

file2.txt

<NAME>car........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>

file3.txt

<NAME>boat........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>

I have searched sed one liners, used the search feature here, looked in my Oreilly sed/awk pocket guide but nothing really provides a solution.

Thanks in advance. SORRY FOR THE REEDIT !!!

Your written specification does not fit your sample output file, as you want the tokens "and all lines between the two", but you have MORETEXT etc. in your File1.txt etc. which is outside the two.
That sample one would be easy, if not in sed, then well in awk:

$ awk '/<NAME>/ {FN="File"++i".txt"}; {print >FN}' file

You are absolutely correct. Dang cut and paste makes ya lazy .. lemme edit and fix.

Reedited - fine! Do you think you find the right answer with the starting point given above?

1 Like

Depending on how many <NAME> lines there are in the input file, you might have to close the output files when you're done writing to them:

{if(FN)close(FN);FN="File"++i".txt"}
1 Like

Hey RudiC .... I havent tried this yet. They just reimaged my Laptop with win 7 and my access to everything is hosed. Been working on that ... as soon as Im back up, Ill give this a try ...

Meanwhile Thanks !!

A modified version

awk '/<NAME>/{if(FN)close(FN);FN="File"++i".txt";p=1}p{print >FN}/script/{p=0}' file

--ahamed