KSH script for split a txt file

ask.chowhan · December 22, 2011, 8:27pm

I have a problem which I would like to solve by using UNIX power and inspired minds around world. Here is the problem

I have a text file and it has data as follows

1X.....................1234567890123456789T1234598765XT1 (header)
1Z01............(sub HEADER)
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002.........
.
.
.
.
.
.
P110000..............
.
.
.
.
1Z502
.
.
.
.
P110003.......
Q1.....
R1...
S1....
4C.........(TRAILER)

So , in between header and trailer, there could be any no of records enclosed.

I need to split the file for every 10000 P1's, I want to create a new file but at the same time when create new file, I need to create a header header will have the count of 1Z and number of P1 in it as if you see the first line 12345 (1Z) and 98765 (P1) at the end.

The other thing is from header 1 to 19 chars should be unique in each file.

How to write a script in ksh

thank you for your help and support.

The out put file should be like below:

File 1:

1X.....................1234567890123456789T0050510000XT1 (header)
1Z01............(sub HEADER)
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
P100002.........
.
.
.
.
.
.
1Z505.............
P110000..............
Q1......
R1.......
4C............(Trailer static)

File 2:

1X.....................1234567890123456790T0000200003XT1 (header)
 1Z01............(sub HEADER)
 P100001............
 Q1........
 R1.................
 P100002.........
 Q1..................
 R1......................
 1Z02............
 P100001..........
 Q1.............
 R1...........
4C.........(TRAILERstatic)

Chowhan
:wall:

aigles · December 23, 2011, 9:25am

Try and adapt the following script :

#!/usr/bin/ksh

InFile=./chowhan.dat        # Input file
OutSpec='./chowhan_%d.dat'  # Outfiles specification, %d for file number
Pmax=5                      # Split every Pmax P1 record

awk -v PMAX="${Pmax}" -v OUT="${OutSpec}" '
    #----------------------
    # Functions
    #-----------------------
    
    function newOutFile() {
        ++FilesCount;
        Files[FilesCount      ] = sprintf(OUT ".tmp", FilesCount);
        Files[FilesCount, "1Z"] = 0;
        Files[FilesCount, "P1"] = 0;
        return FilesCount;
    }
    function writeSubHeader() {
        if (! (FileIndx in Files)) newOutFile();
        FileName = Files[FileIndx];
        Files[FileIndx, "1Z"  ]++;
        print SubHeader > FileName;
    }

    #----------------------
    # Actions
    #----------------------

    /^1X/ { 
        Header   = $0;
        FileIndx = 1;
        next;
    }
    /^1Z/ { 
        SubHeader = $0;
        FileIndx  = 1;
        Pcount    = 0;
        writeSubHeader();
        next 
    }
    /^P1/ {
        if (++Pcount > PMAX) {
            Pcount = 0;
            ++FileIndx;
            writeSubHeader();
        }
        FileName = Files[FileIndx];
        Files[FileIndx, "P1"]++;
        print $0 > FileName;       
        next;
    }           
    /^4C/   {
        Trailer = $0;
        exit;
    }
    {
        print $0 > FileName;
        next;
    }
    
    END {
        for (i=1; i<=FilesCount; i++) {
            FileName = Files;
            print Trailer > FileName;
            close(FileName);
            ResultFileName = sprintf(OUT, i)
            printf("%s%0.5d%0.5d%s\n", substr(Header,1,length(Header)-13), 
                                     Files[i,"1Z"], 
                                     Files[i,"P1"], 
                                     substr(Header, length(Header)-2,3)) > ResultFileName;
            close(ResultFileName);
            system(sprintf("/usr/bin/cat %s >>%s && /usr/bin/rm %s", FileName, ResultFileName, FileName));
        }
    }
' ${InFile}

InputFile chowhan.dat :

1X.....................1234567890123456789T0000300013XT1
1Z01............
P101001............
Q1........
R1.................
P101002.........
Q1..................
R1......................
1Z02............
P102001..........
Q1.............
R1...........
S1.....
P102002..........
Q1.............
R1...........
S1.....
P102003..........
Q1.............
R1...........
S1.....
P102004..........
Q1.............
R1...........
S1.....
P102005..........
Q1.............
R1...........
S1.....
1Z03
P103001..........
Q1.............
R1...........
S1.....
P103002..........
Q1.............
R1...........
S1.....
P103003..........
Q1.............
R1...........
S1.....
P103004..........
Q1.............
R1...........
S1.....
P103005..........
Q1.............
R1...........
S1.....
P103006..........
Q1.............
R1...........
S1.....
4C.........

Output File chowhan_1.dat :

1X.....................1234567890123456789T0000300012XT1
1Z01............
P101001............
Q1........
R1.................
P101002.........
Q1..................
R1......................
1Z02............
P102001..........
Q1.............
R1...........
S1.....
P102002..........
Q1.............
R1...........
S1.....
P102003..........
Q1.............
R1...........
S1.....
P102004..........
Q1.............
R1...........
S1.....
P102005..........
Q1.............
R1...........
S1.....
1Z03
P103001..........
Q1.............
R1...........
S1.....
P103002..........
Q1.............
R1...........
S1.....
P103003..........
Q1.............
R1...........
S1.....
P103004..........
Q1.............
R1...........
S1.....
P103005..........
Q1.............
R1...........
S1.....
4C.........

Output file chowhan_2.dat :

1X.....................1234567890123456789T0000100001XT1
1Z03
P103006..........
Q1.............
R1...........
S1.....
4C.........

Jean-Pierre.

ask.chowhan · December 23, 2011, 12:27pm

Thank you Jean for your quick reply.

You are very useful on this issue, let me reconfirm the requirement so it is easy to understand and write the program as needed.

as you assumed the input file values are below: (the file is fixed length)

all the P1 values are incremental.

1X.....................1234567890123456111T0000300013XT1
1Z01............no of P1's as 2 (sub header 1)
P100001............ (it is 5 digit value for subheader and not 01001)
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............no of P1's as 5 (sub header 2)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
1Z03............no of P1's as 6(sub header 3)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
P100006..........
Q1.............
R1...........
S1.....
4C.........

--------------Out puts should be like below-----------

If we think of doing split for every 5 P1's in to a seperate file then the out put should be like below:

File 1:

1X.....................1234567890123456111T0000200005XT1
1Z01............ (sub header 1)
P100001........no of P1's as 2(it is 5 digit value for subheader and not 01001)
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............no of P1's as 3(sub header 2)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

File 2:

1X.....................1234567890123456112T0000200005XT1
1Z01(2)............no of P1's as 2(sub header 1(2) recounting again)
P100001(4).......... four should be recounted again
Q1.............
R1...........
S1.....
P100002(5)..........
Q1.............
R1...........
S1.....
1Z02(3)............no of P1's as 3(sub header 3)
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

File 3:

1X.....................1234567890123456113T0000100003XT1
1Z01(3)............(sub header 1(3))
P100001(4)..........
Q1.............
R1...........
S1.....
P100002(5)..........
Q1.............
R1...........
S1.....
P100003(6)..........
Q1.............
R1...........
S1.....
4C.........

I think the above example gives good idea how it looks like in the output files.

1) for every 5 P1's , there should be a new file which has header and trailer
2) For the 1Z records, the number starts counts again with 1 for next file, if the 1Z has more than 5 P1's even the rule is same.
3) The header should represent how many 1Z and P1's in it aslo needs to increment the last digit by one (or some thing to add one for the 19 digits) to make it unique for each out put files.

I really appreciate your help and support.

Happy New year
in advance and

Marry Christmas.

Chowhan :wall:

aigles · January 2, 2012, 9:47am

Happy New Year !

Try and adapt the new version of the script (chowhan.ksh):

#!/usr/bin/ksh

InFile=./chowhan.dat        # Input file
OutSpec='./chowhan_%d.dat'  # Outfiles specification, %d for file number
P1max=5                      # Split every Pmax P1 record

rm -f ./chowhan_*.dat* >/dev/null 2<&1

/usr/xpg4/bin/awk -v P1MAX="${P1max}" -v OUT="${OutSpec}" '

    #----------------------
    # Functions
    #-----------------------

    function closeOutFile() {
        if (FileIndx > 0) close(FileName);
    }

    function openNewOutFile() {
        FileName = sprintf(OUT ".tmp", ++FileIndx);
        Files[FileIndx      ] = FileName;
        Files[FileIndx, "1Z"] = 0;
        Files[FileIndx, "P1"] = 0;
        SubHeaderCount        = 0;
        P1Zcount              = 0;
        P1count               = 0;
    }

    function switchOutFile() {
        closeOutFile();
        openNewOutFile();
    }

    function writeSubHeader() {
        if (! FileIndx) openNewOutFile();
        Files[FileIndx, "1Z"]++;
        printf "1Z%0.2d%s\n", ++SubHeaderCount, SubHeader > FileName;
    }

    function writeP1() {
        if (P1count >= P1MAX) {
            switchOutFile();
            writeSubHeader();
        }
        P1Zcount++;
        P1count++;
        Files[FileIndx, "P1"]++;
        printf "P1%0.5d%s\n", P1Zcount, substr($0,8) > FileName;
    }

    #----------------------
    # Actions
    #----------------------

    /^1X/ {
        Header   = $0;
        HeaderHead = substr($0, 1, length($0)-15);
        HeaderMid  = substr($0, length($0)-13, 1);
        HeaderTail = substr($0, length($0)-2);
        FileIndx = 0;
        next;
    }

    /^1Z/ {
        SubHeader = substr($0, 5);
        P1Zcount   = 0;
        writeSubHeader();
        next;
    }

    /^P1/ {
        writeP1();
        next;
    }
    /^4C/   {
        Trailer = $0;
        exit;
    }
    {
        print $0 > FileName;
        next;
    }

    END {
        closeOutFile();
        HeaderId = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        for (i=1; i<=FileIndx; i++) {
            FileName = Files;
            print Trailer >> FileName;
            close(FileName);
            ResultFileName = sprintf(OUT, i)
            printf("%s%1.1s%s%0.5d%0.5d%s\n",   HeaderHead,
                                                substr(HeaderId, i, 1) ,
                                                HeaderMid,
                                                Files[i,"1Z"],
                                                Files[i,"P1"],
                                                HeaderTail ) > ResultFileName;
            close(ResultFileName);
            system(sprintf("/usr/bin/cat %s >>%s && /usr/bin/rm %s", FileName, ResultFileName, FileName));
        }
    }
' ${InFile}

Input file (chowhan.dat) :

1X.....................1234567890123456111T0000300013XT1
1Z01............
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
1Z03............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
P100004..........
Q1.............
R1...........
S1.....
P100005..........
Q1.............
R1...........
S1.....
P100006..........
Q1.............
R1...........
S1.....
4C.........

Running the script :

$ ls chowhan*
chowhan.dat       chowhan.ksh
$ ./chowhan.ksh
$ ls chowhan*
chowhan_1.dat     chowhan_2.dat     chowhan_3.dat     chowhan.dat       chowhan.ksh

Output file 1 (chowhan_1.dat) :

1X.....................1234567890123456110T0000200005XT1
1Z01............
P100001............
Q1........
R1.................
P100002.........
Q1..................
R1......................
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

Output file 2 (chowhan_2.dat) :

1X.....................1234567890123456111T0000200005XT1
1Z01............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
1Z02............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

Output file 3 (chowhan_3.dat) :

1X.....................1234567890123456112T0000100003XT1
1Z01............
P100001..........
Q1.............
R1...........
S1.....
P100002..........
Q1.............
R1...........
S1.....
P100003..........
Q1.............
R1...........
S1.....
4C.........

ask.chowhan · January 9, 2012, 5:09pm

Thank you