Conditional Split

Greetings,

I need help in splitting the files in an efficient way while accommodating the below requirements . I am on AIX.

Split condition

Split the file based on the record type and the position of the data pattern that appears on the on the record type.

Both record type and and the position of the data pattern should be parameter to the script

The header and trailer records on the splitted files may or may not be needed and and I want to control them by passing a parameter to the script.

Split File Name should be same as Input File Name_split value_count of total splited records

Example

Input File Name : File123

H00000000000000000000000000000000123567890
D00000000000000000000000000000ABC123567890
D00000000000000000000000000000ABC123567890
D00000000000000000000000000000XYZ123567890
D00000000000000000000000000000XYZ123567890
D00000000000000000000000000000XYZ123567890
D00000000000000000000000000000PQR123567890
D00000000000000000000000000000PQR123567890
D00000000000000000000000000000PQR123567890
D00000000000000000000000000000PQR123567890
T00000000000000000000000000000000123567890

File123_ABC_2

H00000000000000000000000000000000123567890
D00000000000000000000000000000ABC123567890
D00000000000000000000000000000ABC123567890
T00000000000000000000000000000000123567890

File123_XYZ_3

H00000000000000000000000000000000123567890
D00000000000000000000000000000XYZ123567890
D00000000000000000000000000000XYZ123567890
D00000000000000000000000000000XYZ123567890
T00000000000000000000000000000000123567890

File123_PQR_4

H00000000000000000000000000000000123567890
D00000000000000000000000000000PQR123567890
D00000000000000000000000000000PQR123567890
D00000000000000000000000000000PQR123567890
D00000000000000000000000000000PQR123567890
T00000000000000000000000000000000123567890

What be the parameters that you mentioned?

RudiC

Script should have the below parameters/variables

Input File Name
Record Type
Position
Header
Trailer

OK - I can't see how you call the script. Try

awk '
NR == 1         {HD = $0
                 next
                }
                {SFX = substr ($0, 31, 3)
                 FN  = FILENAME "_" SFX
                 if (SFX == "000") next
                }

!(FN in CNT)    {print HD > FN
                }
                {print > FN
                 CNT[FN]++
                }
END             {for (FN in CNT)        {print > FN
                                         close (FN)
                                         print "mv " FN " " FN "_" CNT[FN]
                                        }
                }
' file | sh

and adapt to your needs.

RudiC - Thanks

Can we make the position as a varaiable as well ? substr ($0, 31, 3) ?

Also greatly appreciate if can you explain how the script works.

Thanks in Advance

Had you given the resp. info before, when requested to do so, the script had already had the necessary constructs. Try

awk  -vPOS=31 -vLEN=3 '                                 # run awk, pass parameters for suffix position and length
NR == 1         {HD = $0                                # save header to be printed in all output files
                 next                                   # no further action needed on this line
                }

                {SFX = substr ($0, POS, LEN)            # extract suffix from POSition, LENgth characters
                 if (SFX == "000") next                 # no further action on trailer line
                 FN  = FILENAME "_" SFX                 # compose output file name
                }

!(FN in CNT)    {print HD > FN                          # print header for new files (FN not yet registered)
                }

                {print > FN                             # print to resp. output file
                 CNT[FN]++                              # count hits for file name AND register it  
                }

END             {for (FN in CNT)        {print > FN     # print trailer record to every single out file
                                         close (FN)     # close file (could be dropped)
                                         print "mv " FN " " FN "_" CNT[FN]
                                                        # output rename commands to stdout for execution by sh
                                        }
        }
' file | sh                                             # pipe stdout to shell for rename operations
1 Like

Thannks RudiC

Have you considered using csplit? It might help and be neater.

Robin

Well, I checked csplit but my version didn't have a size nor a chunk count option - you need one of the two plus the regex for the empty line to accomplish what is requested.

I'd misread the requirements. Sorry. :o