Split a txt file on the basis of line number

abhaydas · April 8, 2019, 10:31am

I have to split a file containing 100 lines to 5 files say from lines ,1-20 ,21-30 ,31-40 ,51-60 ,61-100

Here is i can do it for 2 file but how to handle it for more than 2 files

awk 'NR < 21{ print >> "a"; next } {print >> "b" }' $input_file

Please advidse.

Thanks

vgersh99 · April 8, 2019, 11:01am

how about - 20 lines/records per file:

 awk -v div=20 '{print > ("file_" int(FNR/div))}' myFile

or split into 5 files:

awk -v div=5 '{print > ("file_" FNR%div)}' myFile

Scrutinizer · April 8, 2019, 11:29am

Try:

split -l20 inputfile outputfile.

RavinderSingh13 · April 8, 2019, 11:44am

Hello abhaydas,

I believe your number of lines which you need in output split files are NOT even. So I am coming up here with an approach where one could mention the line numbers when they want to generate a new output file. Let's say your own example posted one, you need first 20 lines in file1, then next 10 lines to file2, next 10 lines to file3, next 10 lines to file4 and then next 40 lines to file5. If this is the case then could you please try following(I have NOT tested it, I am cooking right now, I am pretty sure it should WORK ).

awk '
BEGIN{
  split("20,10,10,10,40",array,",")
  count=1
  file="file"count
}
FNR==array[count]{
  print $0 > file
  close(file)
  file="file"++count
  next
}
{
  print $0 > file
}'  Input_file

NOTE: Point to be noted here is split("20,10,10,10,40",array,",") code's part is responsible for mentioning number of lines in output file so please be sure you are providing proper values here.

NOTE2: Also I am closing the output files with close(file) statement to avoid errors like

Thanks,
R. Singh

drl · April 9, 2019, 9:02am

Hi.

I think segment 41-50 is supposed to be skipped: 1-20 ,21-30 ,31-40 ,51-60 ,61-100

Our shop needed something like this long ago so we created a split utility, r3split , to handle these situations. When passing a file of lines f1 ... f100 through this command, we get:

 Results, lines in receiving files:
 Expecting first lines, 1(20), 21(10), 31(10), 51(10), 61-100
 Note - accepting partial completion on r05, exit.
Edges: 1:0:0 of 20 lines in file "r01"
f1
   ---

Edges: 1:0:0 of 10 lines in file "r02"
f21
   ---

Edges: 1:0:0 of 10 lines in file "r03"
f31
   ---

Edges: 1:0:0 of 10 lines in file "r04"
f51
   ---

Edges: 1:0:0 of 40 lines in file "r05"
f61
   ---

And this is the command:

r3split -c "20,10+2,-10,10," -p "r" $FILE

translating to: write to files beginning with "r", 20 lines, 10 lines repeated twice, skip 10 lines, everthing else to last receiving file.

Sadly we have not yet released any of our library of utilities.

The reason it is called r3split is for the 3 functions it performs. If anyone would like to use it as inspiration, here is the help page:

r3split - Split text file into Ratio, iRregular, or Random chunks.

usage: r3split [options] -- [files]

options:

Options which allow arguments require either a space or an "=" as
a separator, e.g. "-r 3" or "-r=3".

--help | -h
  print this message and quit.

--chunks | -c [-]c1,[-]c2 ... cn
  Transfer ci lines to next output file.  If preceded by "-" skip
  ci lines.  For convenience, the ci sequence may be the first
  parameter (or second after the --debug option).

  The chunks may be integers, ratios ( 1 < ci < 0 ), percentages
  (similar to ratios but 100 < ci < 0, followed by "%")  Ratios
  and percentages will require a preliminary (fast) pass over the
  file to find the length in lines (see "estimate:).  A count of
  zero will be treated as 1.

  If the symbol "+" occurs after the chunk, an integer repeat
  count may be specified.

  Usually the chunks are transferred in complete blocks, and
  remaining lines less than a chunk are ignored.  However if a
  final comma, ",", appears, then the incomplete chunk will be
  transferred to the final file. (Essentially a large number is
  created for the final chunk.)

  Ex: -c=1,1    --chunks 20%,.3,100     --chunks=-4,13,.75
  -c 3,3+       -c3,3+,

--random | -r n
  Transfer to n files, each file containing a random number of
  lines.  The total lines will equal the number of lines in the
  input file.  The NUMBER of lines is random here, the sequence
  is as it appears in the input.

--debug | -d
  Print debugging output.  Must be first option.
  
--suffix | -s format
  Suffix format, the default is "%02d".  To specify a ".txt" for
  each file use -s="%02d.txt".  The format is used to create the
  sequence of output filenames.

--prefix -p string
  Prefix, default is "xx", like many other split utilities.

--estimate | -e
  Allow the file length to be estimated, generally useful only
  for "long" files, perhaps 1 GB or more.  (This is accomplished
  by external routine "esmele" which might not be available on
  all systems.)

cheers, drl