Split one file to many based on pattern

deal1dealer · November 13, 2014, 3:54pm

Hello All,

I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K

Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as

File1: A,B,B,B,B,K
File2: A,B,B,K

Corona688 · November 13, 2014, 4:01pm

So you want the first six records in one file, and the last four records in another?

awk -F, -v OFS="," '{ print $1,$2,$3,$4,$5,$6 > "file1" ; print $7, $8, $9, $10 > "file2" }' inputfile

This will work for the data you've given but might not for your real data if it's much more complicated. A representative sample (obscured where necessary) would be appreciated.

deal1dealer · November 13, 2014, 4:05pm

corona688:

So you want the first six records in one file, and the last four records in another?
awk -F, -v OFS="," '{ print $1,$2,$3,$4,$5,$6 > "file1" ; print $7, $8, $9, $10 > "file2" }' inputfile

Thank you very much for the reply, I should not be counting number of lines, I have to pick lines based on A record.

Whenever a line starts with 'A', I need to pick all the lines until next 'A' line and write it to file.

Corona688 · November 13, 2014, 4:15pm

This is not apparent from the input and output you have shown. It shows a single line being divided into two different files.

Show the input you have, and show the output you want.

deal1dealer · November 13, 2014, 4:23pm

I am sorry for any confusion, this is my first time posting here, below is the what the main file looks like:

A200198565634
B769348348547
B837563487567
K656895565906
A387562985749
B893745647875
B394857348957
K734564735644
A893745634785
B938457348953
K783456347856
A890345765875
B378945634789
B934785643534
K378945634764

Desired Output:

File1:

A200198565634
B769348348547
B837563487567
K656895565906

File2:

A387562985749
B893745647875
B394857348957
K734564735644

File3:

A893745634785
B938457348953
K783456347856

File:

A890345765875
B378945634789
B934785643534
K378945634764

Corona688 · November 13, 2014, 4:38pm

Why 'file' and not 'file4', or does that matter?

awk '/^A/ { if(F) close(F); F=sprintf("%s%d", NAME, ++N) } { print > F }' NAME="File" inputfile

Use nawk on solaris.

Corona688 · November 13, 2014, 4:42pm

You could also do file0001, file0002, etc, etc, which sorts more nicely in a directory listing, by putting "%s%04d" in sprintf instead.

RudiC · November 14, 2014, 3:02am

Just in case the input file does not start with an "A" record, try

 awk '/^A/ || NR==1 { if(F) close(F); F=sprintf("%s%d", NAME, ++N) } { print > F }' NAME="File" inputfile

drl · November 14, 2014, 11:35am

Hi.

Utility csplit was designed for these kinds of tasks, so:

csplit -k -z data1 '/^A/' '{*}'

produces byte counts and creates files:

56
56
42
56

xx00  xx01  xx02  xx03

Sample, contents of xx01:

A387562985749
B893745647875
B394857348957
K734564735644

See man csplit for details.

Best wishes ... cheers, drl

deal1dealer · November 14, 2014, 2:57pm

Thank you Experts, that was great help for novice like me. Both csplit and nawk work great