Splitting large file into small files

Hi,

I need to split a large file into small files based on a string.

At different palces in the large I have the string ^Job.
I need to split the file into different files starting from ^Job to the last character before the next ^Job.
Also all the small files should be automatically named.

Please suggest.

Thanks,
Chandra

man csplit

I have gone thru the Man csplit help but not sure how to use it for my requirement.

You have omitted some necessary information:

  1. can the string appear only on the begin or in the middle of lines too?

  2. Is the split-string to appear in the output too or is it to be stripped?

  3. is the line containing the string considered to go to the new part of the file or is the file to be split exactly at the search string? That is, in the following example:

This is the first part
still first part ^Job This is the new part

is "still first part" considered to go to first or second part? Is it even possible, according to question 1?

Assuming the split-string can appear anywhere in the file and lines are to be split exactly where the split-string occurs the solution is:

#!/bin/ksh

typeset srcfile="file"
typeset -i cnt=1
typeset line=""

exec 3>${srcfile}.part${cnt}                # define our output file
cat $srcfile | while read line ; do
                                            # we have a line with a splitter
     if [ $(print - "$line" | grep -c "\^Job") -gt 0 ] ; then
          print -u3 "${line%%^Job*}"        # put the part of the line before
                                            # the splitter to the old output
          exec 3>&-                         # close output
          (( cnt += 1 ))
          exec 3>${srcfile}.part${cnt}      # open the next part
          print -u3 "${line##*^Job}"        # output part of line after the
                                            # occurence of the splitter
     else                                   # this is a regular line, just print
          print -u3 "$line"
     fi
done
exec 3>&-                                   # close last output file

exit 0

The reason for the "exec"s is them making output to various files easier than the hassle with redirections IMHO.

To make the script more general I'd prefer putting the split-string into a variable and feed that by a commandline option. This is left as an exercise to the reader.

bakunin

Hi bakunin,
Thanks for your reply.
I will try the solution provided by you.
To provide more details , the file I am going to split will be having multiple Purchase Orders and after executing the script I should have 'n' number of files one for each PO.
It looks like below
^Job
PO No1:
<Po Details>
^Job
PO No1:
<Po Details>
......
I think the logic provided by you should fit for my requirement.
Thanks for your help.
Chandra