Split a files into many files when condition

jimmy_y · April 26, 2010, 5:38am

Hi Everyone,

file.txt

+++
a
b
c

+++
d


+++
asdf fefe
fff

Would like to have the output:
file1.txt

+++
a
b
c

file2.txt

+++
d

file3.txt

+++
asdf fefe
fff

Any simple way to do that, like awk, perl.

Thanks

pravin27 · April 26, 2010, 5:49am

Hi,

Do you want to separate the records into file base on '+++' ?
Could you explain more about your issue ?

jimmy_y · April 26, 2010, 5:51am

Thanks, yes, want to split is when there is '+++'. (just now made a mistake for file3.txt, already corrected the output)

Franklin52 · April 26, 2010, 6:16am

Try this:

awk /^+++/{c++}{print > "file" c ".txt"}' file.txt

agn · April 26, 2010, 6:18am

$ cat buf
+++
a
b
c

+++
d


+++
asdf fefe
fff
$ awk '/^\+/ { f=++i".txt" } { print >> f }' buf
$ ls [1-3].txt
1.txt   2.txt   3.txt
$ cat -n [1-3].txt
     1  +++
     2  a
     3  b
     4  c
     5
     1  +++
     2  d
     3
     4
     1  +++
     2  asdf fefe
     3  fff

jimmy_y · April 26, 2010, 6:21am

work perfect, if i want to remove those empty lines (only beginning and ending of each file, not the middle empty line) also in each file[1-3].txt, please advice. Thanks

agn · April 26, 2010, 6:27am

To remove empty lines:

$ awk '/^\+/ { f=++i".txt" } !/^$/{ print >> f }' buf

jimmy_y · April 26, 2010, 6:30am

Thanks, but seems cannot remove the empty line in the beginging and ending of each file (not middle empty lines).
assume 1.txt, yours output is

+++
a
b
c

but if in between b and c got few empty lines, those middle empty lines cannot be removed.
Thanks

Franklin52 · April 26, 2010, 7:06am

Try this:

awk '/^\+/{c++; s=""}
!NF && c {s=s?s ORS s:ORS;next}
{print s $0 > "file" c ".txt"; s=""}' file.txt

frans · April 26, 2010, 7:44am

only bash

while read L
do	[ -z "$L" ] && continue
	[ "$L" = "+++" ] && ((i++))
	echo "$L" >> file$i.txt
done < file.txt

jimmy_y · April 26, 2010, 7:52am

Thanks Franklin52, also Frans
i learnt a lot about awk from you.

Thanks

---------- Post updated at 06:52 AM ---------- Previous update was at 06:45 AM ----------

i got another question, to see whehter can do together.
right now, the file name is "file1.txt", "file2.txt", and "file3.txt".
instead of fix the file name into [1-3].txt, wheter i can have the file name like:
file1.txt to a.txt
file2.txt to d
file3.txt to asdf fefe

means, the file name is each file's 2nd row value. Please advice.

Thanks

Franklin52 · April 26, 2010, 8:00am

Try:

awk '/^\+/{p=$0;getline;f=$0;$0=p RS $0;s=""}
!NF && f {s=s?s ORS s:ORS;next}
{print s $0 > f; s=""}' file.txt

frans · April 26, 2010, 8:01am

modified to match your needs

while read L
do
    [ -z "$L" ] && continue
    [ "$L" = "+++" ] && { read L; F="$L.txt"; echo "+++" > "$F"; }
    echo "$L" >> "$F"
done < file.txt

jimmy_y · April 26, 2010, 9:01am

Thanks Franklin52 and Frans, your guys are

---------- Post updated at 08:01 AM ---------- Previous update was at 07:50 AM ----------

Hi Frank, in your awk, which part is the one "2nd row value to be the file name", if it is the 3rd row, or other row. (assume the row has the value)
Thanks

i tried many ways to modify your awk, but fail to get other rows value as the file name. please advice.

Franklin52 · April 26, 2010, 9:52am

That happens on this line:

/^\+/{p=$0;getline;f=$0;$0=p RS $0;s=""}

Explanation:

/^\+/ 		-> When the line starts with a '+' then
p=$0		->   Store the line in p
getline		->   get the next line (with the new file name) and proceed with the next command
f=$0		->   Store the filename in variable f
$0=p RS $0	->   Combine previous line (++++) with the current line

Hope this helps.

jimmy_y · April 26, 2010, 10:16am

Thanks Frank,
i modified below, failed, actually i tried lots of combinations.

awk '/^\+/{p=$0;getline;{if($NR==2) {f=$0}};$0=p RS $0;s=""} !NF && f {s=s?s ORS s:ORS;next} {print s $0 > f; s=""}' tmp

Please guide me where wrong. Thanks

Franklin52 · April 26, 2010, 12:41pm

Can you explain what you're trying to achieve?

jimmy_y · April 26, 2010, 12:53pm

Hi Frank,
let's say
file.txt

+++
a1
a2

a4

+++
b1
b2
b3

The output:
a2.txt

+++
a1
a2

a4

b2.txt

+++
b1
b2
b3

by suing below awk i modified from your awk, but fail to do that:

awk '/^\+/{p=$0;getline;{if($NR==3) {f=$0}};$0=p RS $0;s=""} !NF && f {s=s?s ORS s:ORS;next} {print s $0 > f; s=""}' file.txt

Please advice.
Thanks

Franklin52 · April 26, 2010, 1:20pm

I still don't understand what you're trying to achieve..

Did you get errors or a wrong output with my last command?

jimmy_y · April 26, 2010, 1:53pm

Hi Frank,

Your awk is working perfect, but the output new files are a1.txt and b1.txt.
if i want to a2.txt, b2.txt, means it is not the line after the line "+++".
Sorry again if i did not explain well. Thanks