How to split a file based on pattern line number?

bhaskar_v · February 19, 2014, 7:04am

Hi

i have requirement like below

M <form_name>  sdasadasdMklkM
D   ......
D   .....
M  form_name>  sdasadasdMklkM
D   ......
D   .....
D   ......
D   .....
M  form_name>  sdasadasdMklkM
D   ......
M  form_name>  sdasadasdMklkM

i want split file based on line number by finding first character of row is M or not
for example if a file is having 500 Records in that M records are 100 and D records 400
till 50th M records i want to place in to one file
and remaining records should place in another files
the order M and followed D records should not change
(first 50 M records and followed D records should go in to one file )

Thanks for you help in advance

Vijai

Franklin52 · February 19, 2014, 7:26am

What have you tried?

bhaskar_v · February 19, 2014, 7:36am

Hi Frank.

Thank for responding

i have tried with below code

awk 's=index($0,"M") { print "line=" NR, "start position=" s}'  <file name >

awk 's=index($0,"M") { print "line=" NR}'  <file name >

for to find out record line number which at index $0 value is M
but the above command is showing all M pattern in the row. supposed if M is present in 23rd postion in a row it showing that row number also
below is the result

line=3490 start position=811
line=3491 start position=69

Vijai

chacko193 · February 19, 2014, 7:50am

What is your expected output?

bhaskar_v · February 19, 2014, 7:58am

my output is

if file is having 1 lakh records
in 12 k M records and 88k D records (M master ,D detail)
i want split filetill 5000 Mth record along with D records needs to be place one file and remaining 7k in that again till 5000 Mth records and D records needs to place in second file reaming all third file
my file details

M <form_name>  sdasadasdMklkM
D   ......
D   .....
M  form_name>  sdasadasdMklkM
D   ......
D   .....
D   ......
D   .....
M  form_name>  sdasadasdMklkM
D   ......
M  form_name>  sdasadasdMklkM

thanks
Vijai

bhaskar_v · February 19, 2014, 8:03am

Hi

just the lines of the 50th M records to go in one file and next till 50th M records in to second file ,and next till 50th M records in to third file and so on ... till end of file
50 th is sepecfice depends on size of file the record value will get change

Vijai

RudiC · February 19, 2014, 8:05am

Try

awk     'BEGIN          {FNM="SPLIT_M";FND="SPLIT_D"}
         /^M /          {CNT++}
         CNT < 50       {print; next}
         /^M/           {print > FNM}
         /^D/           {print > FND}
        ' file

bhaskar_v · February 19, 2014, 8:17am

hi RudiC

the above is just splitting the M records in to one files and D into another files that is not i expected

i m having all M and D records together ;
for example

M <form_name>  sdasadasdMklkM
D   ......
D   .....
M  form_name>  sdasadasdMklkM
D   ......
D   .....
D   ......
D   .....       -------------------- till this records it has go to one file and remianing all into another files 
M  form_name>  sdasadasdMklkM
D   ......
M  form_name>  sdasadasdMklkM

Regards
Vijai

---------- Post updated at 06:47 PM ---------- Previous update was at 06:40 PM ----------

my basic idea is if i get first postion of row is 50 thM record line number, based on (line number - 1 ) till that line i can move records to one file and reamining all into one file

Vijai

Franklin52 · February 19, 2014, 8:32am

Something like this?

awk '/^M/{i++}i<=50{print > "fileA"; next}1' file > fileB

chacko193 · February 19, 2014, 10:50am

Try :

awk ' BEGIN {cnt=0;i=0}
/^M/ {cnt++}
cnt == 50 {cnt=0;i++}
cnt < 50 {print $0 >> "file"i} ' infile

---------- Post updated at 10:50 AM ---------- Previous update was at 08:36 AM ----------

Hey Franklin52, can you please tell me what that "1" is doing over there?
I have always wondered what that does.
Thanks.

Franklin52 · February 20, 2014, 2:36am

An awk statement has the form:

condition {action}

Conditions in awk control the execution of actions and actions are executed when the condition is true.
If the condition is true (1 is true) and there are no actions between braces, awk prints the current record by default.