My requirement is extended where the file shud always start with 101 type record. The record count should be less that 10. for any section, 104 type records will not go more than 7
So below command splits the file in records of 10 but is not able to make 101 as first record of splitted file. can someone please extend below command.
1) If First/next 10 lines have more than 7 "104" then what to do?
2) If second file is not starting with 101 then from where/which record we can get 101?
The output files required in my example is as below. The requirement is
Records in each file should not be more than 20
each file should start with 101 record. This ensures that all associated 101 and 104 are in same file. Hence in example below since count including next set of 101 is going beyond 20, first file is cut at 18. rest of records are pushed to next file and so on.
@pamu.. still not giving correct results in some scenarios. I will rewrite the requirements again. as i am now told that its ok to include next set of 104 data even if record count goes beyoond 20.
Reqirement:
The source file is set of customer data. a customer set has 101 header record and its child records as 104 records. Always 101 will be first records in set.
Target file should include all records in customer set and start with 101 record.
Target file can contain many customer sets.
Number of records in each Target file needs to be either equal or can be just more than splitCount variable to include next customer set.
Lets take example of splitCount=10. Below code just splits file in sets of 10 records and assigns correct name to output file. can someone please extend this logic to include Target file requirements.
awk 'NR%"'"${splitCount}"'"==1{x="'"${SrcFileName}_"'" sprintf("%04d",++i) ".txt"}{print > x}' $SrcFileName.txt
Variables assigned to run command
SrcFileName=SS
splitCount=10
Source file = SS.txt
101|M|28854|
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
101|M|30856|
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857|
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|
101|M|30858|
104|30858| I|
104|30858| S|
Target Files
SS_0001.txt= has 11 records as we cannot move pending 30855 records in next file
101|M|28854|
104|28854| I|
101|M|30854| MER
104|30854| S|
104|30854| C|
104|30854| I|
101|M|30855| SG
104|30855| I|
104|30855| S|
104|30855| C|
104|30855| S|
SS_0002.txt= has more than 10 records as we cannot move pending 30857 records in next file
101|M|30856|
104|30856| I|
104|30856| S|
104|30856| S|
104|30856| S|
104|30856| C|
104|30856| S|
101|M|30857|
104|30857| I|
104|30857| S|
104|30857| S|
104|30857| S|
104|30857| C|
104|30857| S|
SS_0003.txt
101|M|30858|
104|30858| I|
104|30858| S|
awk: 0602-533 Cannot find or open file {a++;if($0 ~ /^101/){if(s){
if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}}
else{s=$0;x++;fn=File_name""x;}}
else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}.
The source line number is 1.
Thanks @pamu . its working now. need some more refinements that was their earlier but i am unable to put them in new code
1> if Source file = A.txt. I will receive $1 as A and target file names required are A_0001.txt, A_0002.txt and so on
In old code it was achieved using sprintf command i.e.
2> Need to assign CN with $2 i.e. SplitCount value
Call to shell script is as
SplitFile.sh A 10
Code of shell script is as : Here i wanted to use varibales $1 and $2
SrcFileName=$1
SplitCount=$2
awk -v CN="10" -v File_name="file__" '{a++;if($0 ~ /^101/){if(s){ if(a>=CN){a=0;x++;fn=File_name""x;print s > fn;s=$0}else{print s > fn;s=$0}} else{s=$0;x++;fn=File_name""x;}} else{s=s"\n"$0;}}END{if(a>=CN){x++;fn=File_name""x};print s > fn}' ${SrcFileName}.txt