Hi.
I think segment 41-50 is supposed to be skipped: 1-20 ,21-30 ,31-40 ,51-60 ,61-100
Our shop needed something like this long ago so we created a split utility, r3split
, to handle these situations. When passing a file of lines f1 ... f100 through this command, we get:
Results, lines in receiving files:
Expecting first lines, 1(20), 21(10), 31(10), 51(10), 61-100
Note - accepting partial completion on r05, exit.
Edges: 1:0:0 of 20 lines in file "r01"
f1
---
Edges: 1:0:0 of 10 lines in file "r02"
f21
---
Edges: 1:0:0 of 10 lines in file "r03"
f31
---
Edges: 1:0:0 of 10 lines in file "r04"
f51
---
Edges: 1:0:0 of 40 lines in file "r05"
f61
---
And this is the command:
r3split -c "20,10+2,-10,10," -p "r" $FILE
translating to: write to files beginning with "r", 20 lines, 10 lines repeated twice, skip 10 lines, everthing else to last receiving file.
Sadly we have not yet released any of our library of utilities.
The reason it is called r3split
is for the 3 functions it performs. If anyone would like to use it as inspiration, here is the help page:
r3split - Split text file into Ratio, iRregular, or Random chunks.
usage: r3split [options] -- [files]
options:
Options which allow arguments require either a space or an "=" as
a separator, e.g. "-r 3" or "-r=3".
--help | -h
print this message and quit.
--chunks | -c [-]c1,[-]c2 ... cn
Transfer ci lines to next output file. If preceded by "-" skip
ci lines. For convenience, the ci sequence may be the first
parameter (or second after the --debug option).
The chunks may be integers, ratios ( 1 < ci < 0 ), percentages
(similar to ratios but 100 < ci < 0, followed by "%") Ratios
and percentages will require a preliminary (fast) pass over the
file to find the length in lines (see "estimate:). A count of
zero will be treated as 1.
If the symbol "+" occurs after the chunk, an integer repeat
count may be specified.
Usually the chunks are transferred in complete blocks, and
remaining lines less than a chunk are ignored. However if a
final comma, ",", appears, then the incomplete chunk will be
transferred to the final file. (Essentially a large number is
created for the final chunk.)
Ex: -c=1,1 --chunks 20%,.3,100 --chunks=-4,13,.75
-c 3,3+ -c3,3+,
--random | -r n
Transfer to n files, each file containing a random number of
lines. The total lines will equal the number of lines in the
input file. The NUMBER of lines is random here, the sequence
is as it appears in the input.
--debug | -d
Print debugging output. Must be first option.
--suffix | -s format
Suffix format, the default is "%02d". To specify a ".txt" for
each file use -s="%02d.txt". The format is used to create the
sequence of output filenames.
--prefix -p string
Prefix, default is "xx", like many other split utilities.
--estimate | -e
Allow the file length to be estimated, generally useful only
for "long" files, perhaps 1 GB or more. (This is accomplished
by external routine "esmele" which might not be available on
all systems.)
cheers, drl