Hello, I have a large file (2GB) that I would like to split based on pattern and size.
I've used the following command to split the file (token is "HELLO")
awk '/HELLO/{i++}{print > "file"i}' input.txt
and the output is similar to the following (i included filesize in KB):
10 file1
10 file2
20 file3
18 file4
1 file5
1 file6
5 file7
I'd like to make it so that I can merge/cat the files so that if two or more files are below a limit, they get merged. So my desired output with a 20kb restriction would be:
20 file1
20 file2
20 file3
5 file4
From my desired output, files 1-2 got merged, file 3 stayed the same, file 4-6 got merged, and file 7 stayed the same because it's the remainder.
I was thinking of using my awk command first and then for a for loop to merge the files. My only issue is that since there are so many files, if i did a sort based on file name, it would go file1, file10, file100, file2, file20, etc. and i don't want to merge file1 and file101 together.