File Splitter output filename

santosh2k2 · October 10, 2012, 6:41am

Issue: I am able to split source file in multiple files of 10 rows each but unable to get the required outputfile name. please advise.

Details:
input = A.txt having 44 rows

required output = A_001.txt , A_002.txt and so on. Can below awk be modified to give required result

current status = Below command gives me output file names as A_1, A_2

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_"'"++i;}{print > x}' $SrcFileName.txt

pamu · October 10, 2012, 6:51am

why you don't just add 00 there..?

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_00"'"++i;}{print > x}' $SrcFileName.txt

elixir_sinari · October 10, 2012, 6:53am

Hint (or is it not? :)):

awk 'NR%10 == 1{f="x_" sprintf("%03d",++i)}{print > f}' file

santosh2k2 · October 10, 2012, 7:09am

Thanks elixir.. hint worked. below is my working command.

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_"'" sprintf("%03d",++i) ".txt"}{print > x}' $SrcFileName.txt

@pamu: Hardcoded 00 will add extra zeros if the files are spilleted in more than 9 parts.

thanks all for help

santosh2k2 · October 11, 2012, 11:19am

My requirement is extended where the file shud always start with 101 type record. The record count should be less that 10. for any section, 104 type records will not go more than 7

So below command splits the file in records of 10 but is not able to make 101 as first record of splitted file. can someone please extend below command.

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_"'" sprintf("%03d",++i) ".txt"}{print > x}' $SrcFileName.txt

Sample source file i.e. A.txt file is

101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|

pamu · October 11, 2012, 12:37pm

Not clear what is your requirement...

Please give sample input with desired output.

1) If First/next 10 lines have more than 7 "104" then what to do?
2) If second file is not starting with 101 then from where/which record we can get 101?

santosh2k2 · October 11, 2012, 1:37pm

The output files required in my example is as below. The requirement is

Records in each file should not be more than 20
each file should start with 101 record. This ensures that all associated 101 and 104 are in same file. Hence in example below since count including next set of 101 is going beyond 20, first file is cut at 18. rest of records are pushed to next file and so on.

A_001.txt
101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
 
A_002.txt
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|

pamu · October 11, 2012, 1:56pm

try something like this...

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0" "a}else{print s > fn;s=$0" "a}}
else{s=$0" "a;x++;fn="file__"x;}}
else{s=s"\n"$0" "a;}}END{print s > fn}' file

santosh2k2 · October 12, 2012, 7:34am

@pamu .., was not able to get right results with mentioned code.. any other ideas?

pamu · October 12, 2012, 7:42am

okies try this..

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn="file__"x;}}
else{s=s"\n"$0;}}END{if(a>=20){x++;fn="file__"x};print s > fn}' file

santosh2k2 · October 12, 2012, 9:50am

@pamu.. still not giving correct results in some scenarios. I will rewrite the requirements again. as i am now told that its ok to include next set of 104 data even if record count goes beyoond 20.

Reqirement:

The source file is set of customer data. a customer set has 101 header record and its child records as 104 records. Always 101 will be first records in set.
Target file should include all records in customer set and start with 101 record.
Target file can contain many customer sets.
Number of records in each Target file needs to be either equal or can be just more than splitCount variable to include next customer set.

Lets take example of splitCount=10. Below code just splits file in sets of 10 records and assigns correct name to output file. can someone please extend this logic to include Target file requirements.

awk 'NR%"'"${splitCount}"'"==1{x="'"${SrcFileName}_"'" sprintf("%04d",++i) ".txt"}{print > x}' $SrcFileName.txt
 
Variables assigned to run command
SrcFileName=SS
splitCount=10
 
Source file = SS.txt
101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|
101|M|30858| 
104|30858| I|
104|30858| S|
 
Target Files
SS_0001.txt= has 11 records as we cannot move pending 30855 records in next file
101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
 
SS_0002.txt= has more than 10 records as we cannot move pending 30857 records in next file
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|
 
SS_0003.txt
101|M|30858| 
104|30858| I|
104|30858| S|

pamu · October 12, 2012, 10:10am

santosh2k2:

@pamu.. still not giving correct results in some scenarios. I will rewrite the requirements again. as i am now told that its ok to include next set of 104 data even if record count goes beyoond 20.

Reqirement:

The source file is set of customer data. a customer set has 101 header record and its child records as 104 records. Always 101 will be first records in set.

Target file should include all records in customer set and start with 101 record.

Target file can contain many customer sets.

Number of records in each Target file needs to be either equal or can be just more than splitCount variable to include next customer set.

Lets take example of splitCount=10. Below code just splits file in sets of 10 records and assigns correct name to output file. can someone please extend this logic to include Target file requirements.

Your requirement is changing with every post..

see below
a - You can decide how much you want.
if you say a=20/10 it is maximum value of record. it will not contain more than 20/10 records.

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn="file__"x;}}
else{s=s"\n"$0;}}END{if(a>=20){x++;fn="file__"x};print s > fn}' file

I have tested for a=10 and a=20.

for a=20

$ ls file__*
file__1  file__2
$ wc -l file__1
18 file__1
$ wc -l file__2
10 file__2

a=10

$ wc -l file_*
  6 file__1
 12 file__2
 10 file__3
 28 total

Please let me know if you still have any doubts:)

santosh2k2 · October 12, 2012, 12:47pm

Thanks @pamu...The code works fine as it is.

I am trying to assin variables and still cant get it right . is it possible for you to help please.

Can you assign a variable to Count and output file name.
in below case for 20 and file__

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn="file__"x;}}
else{s=s"\n"$0;}}END{if(a>=20){x++;fn="file__"x};print s > fn}' file

pamu · October 12, 2012, 12:53pm

try this...

awk -v CN="20" -v File_name="file__" '{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name""x;}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}'file

santosh2k2 · October 12, 2012, 1:12pm

@pamu.. i got below error

awk: 0602-533 Cannot find or open file {a++;if($0 ~ /^101/){if(s){
if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name""x;}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}.
 The source line number is 1.

pamu · October 12, 2012, 1:18pm

corrected in previous post. Please check.

santosh2k2 · October 12, 2012, 1:55pm

Thanks @pamu . its working now. need some more refinements that was their earlier but i am unable to put them in new code

1> if Source file = A.txt. I will receive $1 as A and target file names required are A_0001.txt, A_0002.txt and so on
In old code it was achieved using sprintf command i.e.

 
awk 'NR%"'"${splitCount}"'"==1{x="'"${SrcFileName}_"'" sprintf("%04d",++i) ".txt"}{print > x}' $SrcFileName.txt

2> Need to assign CN with $2 i.e. SplitCount value

Call to shell script is as 
SplitFile.sh A 10
 
Code of shell script is as : Here i wanted to use varibales $1 and $2
SrcFileName=$1
SplitCount=$2
awk -v CN="10" -v File_name="file__" '{a++;if($0 ~ /^101/){if(s){ if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}} else{s=$0;x++;fn=File_name""x;}} else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}' ${SrcFileName}.txt

pamu · October 12, 2012, 1:59pm

SrcFileName=$1
SplitCount=$2


awk -v CN="$SplitCount" -v File_name="$SrcFileName" '{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name""x;}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}'file

Hope this helps you:)

santosh2k2 · October 12, 2012, 2:08pm

Thanks @pamu .. new code works and able to solve issue#2 in my earlier post. can you pls have a look at issue#1 in earlier post pls

pamu · October 12, 2012, 9:25pm

awk -v CN="$SplitCount" -v File_name="$SrcFileName" '{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0;x++;fn=File_name"_"sprintf("%04d",x)".txt";print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name"_"sprintf("%04d",x)".txt"}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name"_"sprintf("%04d",x)".txt"};print s > fn}' file

OR using function..

awk -v CN="$SplitCount" -v File_name="$SrcFileName" '
function file_namec(){
    fn=File_name"_"sprintf("%04d",++x)".txt";
}
{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0; file_namec();print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0; file_namec()}}
else{s=s"\n"$0;}}END{if(a>=CN){file_namec()};print s > fn}' file