I need to split a large file into small files based on a string.
At different palces in the large I have the string ^Job.
I need to split the file into different files starting from ^Job to the last character before the next ^Job.
Also all the small files should be automatically named.
can the string appear only on the begin or in the middle of lines too?
Is the split-string to appear in the output too or is it to be stripped?
is the line containing the string considered to go to the new part of the file or is the file to be split exactly at the search string? That is, in the following example:
This is the first part
still first part ^Job This is the new part
is "still first part" considered to go to first or second part? Is it even possible, according to question 1?
Assuming the split-string can appear anywhere in the file and lines are to be split exactly where the split-string occurs the solution is:
#!/bin/ksh
typeset srcfile="file"
typeset -i cnt=1
typeset line=""
exec 3>${srcfile}.part${cnt} # define our output file
cat $srcfile | while read line ; do
# we have a line with a splitter
if [ $(print - "$line" | grep -c "\^Job") -gt 0 ] ; then
print -u3 "${line%%^Job*}" # put the part of the line before
# the splitter to the old output
exec 3>&- # close output
(( cnt += 1 ))
exec 3>${srcfile}.part${cnt} # open the next part
print -u3 "${line##*^Job}" # output part of line after the
# occurence of the splitter
else # this is a regular line, just print
print -u3 "$line"
fi
done
exec 3>&- # close last output file
exit 0
The reason for the "exec"s is them making output to various files easier than the hassle with redirections IMHO.
To make the script more general I'd prefer putting the split-string into a variable and feed that by a commandline option. This is left as an exercise to the reader.
Hi bakunin,
Thanks for your reply.
I will try the solution provided by you.
To provide more details , the file I am going to split will be having multiple Purchase Orders and after executing the script I should have 'n' number of files one for each PO.
It looks like below
^Job
PO No1:
<Po Details>
^Job
PO No1:
<Po Details>
......
I think the logic provided by you should fit for my requirement.
Thanks for your help.
Chandra