File Splitter output filename

Issue: I am able to split source file in multiple files of 10 rows each but unable to get the required outputfile name. please advise.

Details:
input = A.txt having 44 rows

required output = A_001.txt , A_002.txt and so on. Can below awk be modified to give required result

current status = Below command gives me output file names as A_1, A_2

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_"'"++i;}{print > x}' $SrcFileName.txt

why you don't just add 00 there..?

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_00"'"++i;}{print > x}' $SrcFileName.txt

Hint (or is it not? :)):

awk 'NR%10 == 1{f="x_" sprintf("%03d",++i)}{print > f}' file
1 Like

Thanks elixir.. hint worked. below is my working command.

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_"'" sprintf("%03d",++i) ".txt"}{print > x}' $SrcFileName.txt

@pamu: Hardcoded 00 will add extra zeros if the files are spilleted in more than 9 parts.

thanks all for help

My requirement is extended where the file shud always start with 101 type record. The record count should be less that 10. for any section, 104 type records will not go more than 7

So below command splits the file in records of 10 but is not able to make 101 as first record of splitted file. can someone please extend below command.

awk 'NR%"'"10"'"==1{x="'"${SrcFileName}_"'" sprintf("%03d",++i) ".txt"}{print > x}' $SrcFileName.txt

Sample source file i.e. A.txt file is

101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|

Not clear what is your requirement...

Please give sample input with desired output.

1) If First/next 10 lines have more than 7 "104" then what to do?
2) If second file is not starting with 101 then from where/which record we can get 101?

The output files required in my example is as below. The requirement is

  • Records in each file should not be more than 20
  • each file should start with 101 record. This ensures that all associated 101 and 104 are in same file. Hence in example below since count including next set of 101 is going beyond 20, first file is cut at 18. rest of records are pushed to next file and so on.
A_001.txt
101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
 
A_002.txt
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|

try something like this...

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0" "a}else{print s > fn;s=$0" "a}}
else{s=$0" "a;x++;fn="file__"x;}}
else{s=s"\n"$0" "a;}}END{print s > fn}' file

@pamu .., was not able to get right results with mentioned code.. any other ideas?

okies try this..

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn="file__"x;}}
else{s=s"\n"$0;}}END{if(a>=20){x++;fn="file__"x};print s > fn}' file

@pamu.. still not giving correct results in some scenarios. I will rewrite the requirements again. as i am now told that its ok to include next set of 104 data even if record count goes beyoond 20.

Reqirement:

  • The source file is set of customer data. a customer set has 101 header record and its child records as 104 records. Always 101 will be first records in set.
  • Target file should include all records in customer set and start with 101 record.
  • Target file can contain many customer sets.
  • Number of records in each Target file needs to be either equal or can be just more than splitCount variable to include next customer set.

Lets take example of splitCount=10. Below code just splits file in sets of 10 records and assigns correct name to output file. can someone please extend this logic to include Target file requirements.

awk 'NR%"'"${splitCount}"'"==1{x="'"${SrcFileName}_"'" sprintf("%04d",++i) ".txt"}{print > x}' $SrcFileName.txt
 
Variables assigned to run command
SrcFileName=SS
splitCount=10
 
Source file = SS.txt
101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|
101|M|30858| 
104|30858| I|
104|30858| S|
 
Target Files
SS_0001.txt= has 11 records as we cannot move pending 30855 records in next file
101|M|28854| 
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
 
SS_0002.txt= has more than 10 records as we cannot move pending 30857 records in next file
101|M|30856| 
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857| 
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|
 
SS_0003.txt
101|M|30858| 
104|30858| I|
104|30858| S|

Your requirement is changing with every post..

see below
a - You can decide how much you want.
if you say a=20/10 it is maximum value of record. it will not contain more than 20/10 records.

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn="file__"x;}}
else{s=s"\n"$0;}}END{if(a>=20){x++;fn="file__"x};print s > fn}' file

I have tested for a=10 and a=20.

for a=20

$ ls file__*
file__1  file__2
$ wc -l file__1
18 file__1
$ wc -l file__2
10 file__2

a=10

$ wc -l file_*
  6 file__1
 12 file__2
 10 file__3
 28 total

Please let me know if you still have any doubts:)

Thanks @pamu...The code works fine as it is.

I am trying to assin variables and still cant get it right . is it possible for you to help please.

Can you assign a variable to Count and output file name.
in below case for 20 and file__

awk '{a++;if($0 ~ /^101/){if(s){ 
if(a>=20){a=0;x++;fn="file__"x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn="file__"x;}}
else{s=s"\n"$0;}}END{if(a>=20){x++;fn="file__"x};print s > fn}' file

try this...

awk -v CN="20" -v File_name="file__" '{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name""x;}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}'file

@pamu.. i got below error

awk: 0602-533 Cannot find or open file {a++;if($0 ~ /^101/){if(s){
if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name""x;}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}.
 The source line number is 1.

corrected in previous post. Please check.

Thanks @pamu . its working now. need some more refinements that was their earlier but i am unable to put them in new code

1> if Source file = A.txt. I will receive $1 as A and target file names required are A_0001.txt, A_0002.txt and so on
In old code it was achieved using sprintf command i.e.

 
awk 'NR%"'"${splitCount}"'"==1{x="'"${SrcFileName}_"'" sprintf("%04d",++i) ".txt"}{print > x}' $SrcFileName.txt 

2> Need to assign CN with $2 i.e. SplitCount value

Call to shell script is as 
SplitFile.sh A 10
 
Code of shell script is as : Here i wanted to use varibales $1 and $2
SrcFileName=$1
SplitCount=$2
awk -v CN="10" -v File_name="file__" '{a++;if($0 ~ /^101/){if(s){ if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}} else{s=$0;x++;fn=File_name""x;}} else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}' ${SrcFileName}.txt
 
SrcFileName=$1
SplitCount=$2


awk -v CN="$SplitCount" -v File_name="$SrcFileName" '{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name""x;}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}'file

Hope this helps you:)

1 Like

Thanks @pamu .. new code works and able to solve issue#2 in my earlier post. can you pls have a look at issue#1 in earlier post pls

awk -v CN="$SplitCount" -v File_name="$SrcFileName" '{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0;x++;fn=File_name"_"sprintf("%04d",x)".txt";print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name"_"sprintf("%04d",x)".txt"}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name"_"sprintf("%04d",x)".txt"};print s > fn}' file

OR using function..

awk -v CN="$SplitCount" -v File_name="$SrcFileName" '
function file_namec(){
    fn=File_name"_"sprintf("%04d",++x)".txt";
}
{a++;if($0 ~ /^101/){if(s){ 
if(a>=CN){a=0; file_namec();print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0; file_namec()}}
else{s=s"\n"$0;}}END{if(a>=CN){file_namec()};print s > fn}' file
1 Like