Splitting data file

Hello,
I'm trying to split a file by lines. I know that I can use the split command to do this, but the one problem I'm having is, each file created, the first line needs to be a header. I can use the split command the create another file with the header, then append the new split file to it. I do not like this approach. So I'm hoping someone may enlighten me.

thanks,

There's probably an easier way to do it, but I'd do something like this with Perl...

#!/usr/bin/perl

# Just modify the following two variables
$NUM_LINES=10;
$HEADER_STRING="This is my header";

$count=0;
$file_num=0;

while (<>) {
  if ( $count == 0 ) {
     $filename = join("","file",$file_num);
     open( FILE, ">> $filename" );
     print( FILE "$HEADER_STRING\n" );
     print( FILE "$_" );
     $count++;
  } elsif ( $count == $NUM_LINES ) {
     close( FILE );
     $count = 0;
     $file_num++;
  } else {
     # just write the line!
     print( FILE "$_" );
     $count++;
  }
}

Then make it executable and call with
$ my_perl_script.pl my_large_file

This will split your large file into as many 10 data line files (i.e. 11 lines including the header) as required.

Just modify the $NUM_LINES and $HEADER_STRING variables to suit your needs.

Cheers
ZB

Assuming that the header is the first line of the input file...

awk -v rows=3 -v filespec=outfile%03d '
    NR == 1 { 
        header = $0
        next
    }
    (row++ % rows) == 0 { 
        close(filename)
        filename = sprintf(filespec, ++filecount)
        print header > filename
    }
    {print $0 > filename}
' infile

Tested...

$ head infile outfile???
==> infile <==
h1
d1
d2
d3
d4
d5
d6
d7

==> outfile001 <==
h1
d1
d2
d3

==> outfile002 <==
h1
d4
d5
d6

==> outfile003 <==
h1
d7

just a suggestion but this one is an easy one...
awk '{if (FNR==1}{print FILENAME ; print $0}else print $0}' file1 file2 file3 ........> inputfile
not very fancy but will place FILENAME of every file as the header of that file and append the next name of file as header and the file proceeding the header..
you can also use the -v option to set a variable for the header instead of using the FILENAME option.
moxxx68

This will concatenate the files, not split them as per the original posters request.

Also, your redirection to "inputfile" is misleading - the ">" redirection operator redirects STDOUT, therefore a more apt name for the resulting file would be "outputfile" :wink:

Cheers
ZB