Split a files into many files when condition

Hi Everyone,

file.txt

+++
a
b
c

+++
d


+++
asdf fefe
fff

Would like to have the output:
file1.txt

+++
a
b
c

file2.txt

+++
d

file3.txt

+++
asdf fefe
fff

Any simple way to do that, like awk, perl.

Thanks

Hi,

Do you want to separate the records into file base on '+++' ?
Could you explain more about your issue ?

Thanks, yes, want to split is when there is '+++'. (just now made a mistake for file3.txt, already corrected the output)

Try this:

awk /^+++/{c++}{print > "file" c ".txt"}' file.txt
$ cat buf
+++
a
b
c

+++
d


+++
asdf fefe
fff
$ awk '/^\+/ { f=++i".txt" } { print >> f }' buf
$ ls [1-3].txt
1.txt   2.txt   3.txt
$ cat -n [1-3].txt
     1  +++
     2  a
     3  b
     4  c
     5
     1  +++
     2  d
     3
     4
     1  +++
     2  asdf fefe
     3  fff

:b: work perfect, if i want to remove those empty lines (only beginning and ending of each file, not the middle empty line) also in each file[1-3].txt, please advice. Thanks

To remove empty lines:

$ awk '/^\+/ { f=++i".txt" } !/^$/{ print >> f }' buf

Thanks, but seems cannot remove the empty line in the beginging and ending of each file (not middle empty lines).
assume 1.txt, yours output is

+++
a
b
c

but if in between b and c got few empty lines, those middle empty lines cannot be removed.
Thanks

Try this:

awk '/^\+/{c++; s=""}
!NF && c {s=s?s ORS s:ORS;next}
{print s $0 > "file" c ".txt"; s=""}' file.txt

only bash

while read L
do	[ -z "$L" ] && continue
	[ "$L" = "+++" ] && ((i++))
	echo "$L" >> file$i.txt
done < file.txt

:b: Thanks Franklin52, also Frans
i learnt a lot about awk from you.

Thanks

---------- Post updated at 06:52 AM ---------- Previous update was at 06:45 AM ----------

:confused: i got another question, to see whehter can do together.
right now, the file name is "file1.txt", "file2.txt", and "file3.txt".
instead of fix the file name into [1-3].txt, wheter i can have the file name like:
file1.txt to a.txt
file2.txt to d
file3.txt to asdf fefe

means, the file name is each file's 2nd row value. Please advice.

Thanks

Try:

awk '/^\+/{p=$0;getline;f=$0;$0=p RS $0;s=""}
!NF && f {s=s?s ORS s:ORS;next}
{print s $0 > f; s=""}' file.txt

modified to match your needs

while read L
do
    [ -z "$L" ] && continue
    [ "$L" = "+++" ] && { read L; F="$L.txt"; echo "+++" > "$F"; }
    echo "$L" >> "$F"
done < file.txt

Thanks Franklin52 and Frans, your guys are :b::b::b:

---------- Post updated at 08:01 AM ---------- Previous update was at 07:50 AM ----------

Hi Frank, in your awk, which part is the one "2nd row value to be the file name", if it is the 3rd row, or other row. (assume the row has the value)
Thanks

i tried many ways to modify your awk, but fail to get other rows value as the file name. :frowning: please advice.

That happens on this line:

/^\+/{p=$0;getline;f=$0;$0=p RS $0;s=""}

Explanation:

/^\+/ 		-> When the line starts with a '+' then
p=$0		->   Store the line in p
getline		->   get the next line (with the new file name) and proceed with the next command
f=$0		->   Store the filename in variable f
$0=p RS $0	->   Combine previous line (++++) with the current line

Hope this helps.

Thanks Frank,
i modified below, failed, actually i tried lots of combinations.

awk '/^\+/{p=$0;getline;{if($NR==2) {f=$0}};$0=p RS $0;s=""} !NF && f {s=s?s ORS s:ORS;next} {print s $0 > f; s=""}' tmp

Please guide me where wrong. Thanks

Can you explain what you're trying to achieve?

Hi Frank,
let's say
file.txt

+++
a1
a2

a4

+++
b1
b2
b3

The output:
a2.txt

+++
a1
a2

a4

b2.txt

+++
b1
b2
b3

by suing below awk i modified from your awk, but fail to do that:

awk '/^\+/{p=$0;getline;{if($NR==3) {f=$0}};$0=p RS $0;s=""} !NF && f {s=s?s ORS s:ORS;next} {print s $0 > f; s=""}' file.txt

Please advice.
Thanks

I still don't understand what you're trying to achieve..

Did you get errors or a wrong output with my last command?

Hi Frank,

Your awk is working perfect, but the output new files are a1.txt and b1.txt.
if i want to a2.txt, b2.txt, means it is not the line after the line "+++".
Sorry again if i did not explain well. Thanks