I have a requirement where i need to split a file based on occurence of a character which is present at a fixed position. Description is as below:
The file will be more than 1 Lakh records.
Each line will be of fixed length of 987 characters.
At position 28 in each line either 'C' or 'D' will be present.
I need to split the file whenever occurence of 'D' is there.
Also the file name of the splitted files should have some common characters, something like <Original File Name>_aa,<Original File Name>_ab,<Original File Name>_ac and so on.
PFB example of the file:
666617000338 INR C 1800.0
655517000338 INR C 1000.0
644417000338 INR C 1800.0
655517000338 INR C 1500.0
666617000338 INR C 1200.0
699917000338 INR C 1100.0
688817000338 INR C 1500.0
644417000338 INR D 10000.0
655517000338 INR C 1800.0
677717000338 INR C 1800.0
699917000338 INR C 1800.0
622217000338 INR D 3600.0
So the splitted files should be like:
First File:
666617000338 INR C 1800.0
655517000338 INR C 1000.0
644417000338 INR C 1800.0
655517000338 INR C 1500.0
666617000338 INR C 1200.0
699917000338 INR C 1100.0
688817000338 INR C 1500.0
644417000338 INR D 10000.0
and second file should be like:
655517000338 INR C 1800.0
677717000338 INR C 1800.0
699917000338 INR C 1800.0
622217000338 INR D 3600.0
Hi Neelkanth,
I have added CODE tags to your original post in this thread. The fact that you omitted CODE tags explains why the responders to this thread saw the C and D in column 18 instead of 28. There is no indication in your posting that any line has any trailing spaces (or other data) following the last digit shown on each line. I hope that the video clip included in the infraction notice you received recently will help you understand how to use CODE tags so confusion like we've seen in this thread will not be a problem in future threads that you start.
There may still be a couple of problems here. The standards don't clearly specify the precedence for the command:
print > FILENAME "_" EXT
so it can be evaluated as:
(print > FILENAME) "_" EXT
(as it is on Mac OS X) or as:
print > (FILENAME "_" EXT)
(as I think it is on some other systems) so to be sure you get what was intended, you need to add the parentheses as shown in the last form above.
Since there is no indication of the number of expected output files (other than that it could be inferred to be somewhere between 27 and 676 since the suffix string is two lower case slphabetic characters), awk will run out of file descriptors if files aren't closed when they will no longer be used for output.
So for the real data (instead of the tiny sample file), the following might work better:
As always, if you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk instead of /usr/bin/awk or /bin/awk .
Note also that this script won't work as specified on a system that uses EBCDIC or some other non-ASCII based codeset where 97 is not the encoding for "a" or the lowercase alphabetic characters are not all in consecutive numeric sequence.