I have a file that I want to split in 2 (with Bourne shell sh) preferably. The file is a configuration file for several elements and hence consists of a repeated configuration pattern like this:
#fruit apple #color green #surface smooth
size 7cm
#fruit grape #color green #surface smooth
size 2cm
I want to split the file in 2 as equal as possible pieces but a split has to be done at the start of an element (starting with a #fruit entry). If the configuration file has an odd number of entries it should allow one more item in one of the files, and if not should split so that the 2 resulting files will have the same amount of items.
The tags like "#fruit" are unique so they can be used in e.g. "grep" combined with "wc -l" to find amount of items and at which element to split.
It's not different, but since I received no answer on that query I decided to write the problem in a different way since maybe it was difficult to understand what I meant. I have a feeling though that this should be an easy task to solve, but I am stuck. I have determined at which element I should split by grep'ing for "#fruit" to find number of elements and using "expr" and "/" to get the closest integer value of the number of the element where I should split. But from there I am unsure about the rest. I have a feeling that awk should be the way to go but I am not sure how. Another option is to find the line number of the start of the element where I should cut.
grep for #fruit then get a count with wc -l. Your source is structured with 5 lines for each entry so divide the number of #fruit entries found by 2 then multiply that by 5 using bc. You can then use the split -l command to make your two files using those results. I would add something to make sure none of the lines go missing.
An awk script may be useful. There is a special variable "RS", Record Separator, that may be be set to read "paragraphs", i.e. groups of lines separated by an empty line:
RS = ""
That would allow you to treat your file as essentially just a number of such records.
With your calculated knowledge of where you want to the split to be, the "pattern" part of an awk statement:
pattern { action }
should allow you to complete the solution with the use of another builtin variable "NR", Number of Record. This is because the pattern part may be a logical expression, such as:
NR <= 5 { some-action-for-this-case }
the action might be something as simple as print ... cheers, drl