Split by Position

I am on AIX.

I need to use AWK to split the source file based on a character at a certain position.

Position 75 with a value of 'R' should go in one output file and the rest should go in another file.

I need proper names for the output files.

Source FileName : abc_xyz_pqr_a_1_yymmdd
Output FileName1 : abc_reject_pqr_a_1_yymmdd
Output FileName2 : abc_xyz_pqr_a_1_yymmdd [#Same as Source File name]

Source File

0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
A00000000000000000000000000000000000000000                            X 10R PQR                                         
A00000000000000000000000000000000000000000                            X 10R ZXC                                         
A00000000000000000000000000000000000000000                            X 10P HLI                                         
A00000000000000000000000000000000000000000                            X 10P BLC                                         
A00000000000000000000000000000000000000000                            X 10P PLB                                         
Z00000000000000000100000000000000000000000000000000000000000000000000

Output File1

0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
A00000000000000000000000000000000000000000                            X 10R PQR                                         
A00000000000000000000000000000000000000000                            X 10R ZXC                                         

Output File2

A00000000000000000000000000000000000000000                            X 10P HLI                                         
A00000000000000000000000000000000000000000                            X 10P BLC                                         
A00000000000000000000000000000000000000000                            X 10P PLB                                         
Z00000000000000000100000000000000000000000000000000000000000000000000

This creates two output files, you can use the mv command to rename them . I'm using short names.

awk ' substr($0, 75, 1)=="R" { print $0 > "has_r.rxt"; next}
        {print $0} ' infilename  > has_no_r.txt
        # verify that it worked the way you want First
        # then  rename the two files with the mv command

Edit: correction in red. Thanks for the correction.

1 Like

@ jim mcnamara

There is some syntax error on the command, can you check if it is missing a '

Try

awk '
substr($0, 75, 1)=="R"  {print $0 > "has_r.rxt"
                         next
                        }
1
   ' infilename > has_no_r.txt

What about the header and trailer lines? Your output files seem inconsistent here.

.

Hello techedipro,

Could you please try following and let me know if this helps you.

awk '{output_file=tolower(substr($0,75,1))=="r"?"temp_file":"abc_reject_pqr_a_1_yymmdd";print > output_file} ' Input_file && mv temp_file  Input_file

Thanks,
R. Singh

Try also

awk '
BEGIN   {OFN[0] = "temp_file"
         OFN[1] = "abc_reject_pqr_a_1_yymmdd"
        }

        {print  > OFN[substr($0,75,1)=="R"]
        }
' file
2 Likes

@ RudiC.

How can I pass the source file as a parameter to the script ?

Also can you please explain the code ?

Header and Trailers need to copied over to the file that has "R" at position 75.

Thanks in advance

The "file" represents the input file; replace by your file name.
We have a boolean expression here, evaluating the char at pos 75, the result is 0 for FALSE and 1 for TRUE, which in turn is used as the index into the Output File Name array. Please note that not everyone adores this construct.
Header and trailer ONLY to the reject file?

@ RudiC

I want to run the AWK command through a script by passing input file name as a parameter to the script as Each run will have a different input file.

The header and trailer should be in both files.

How can I fit the above requirements in to a script ?

Thanks in Advance

With all of the help you've gotten from us answering questions very similar to this in the last four years, why don't you try to modify the code RudiC suggested to meet your added requirements on your own?

If you can't get it to work, show us what you have tried, show us the output you are getting from your code, and explain what still isn't working.

2 Likes

@Don

Here is the working script I have , can you please improvise it and make it more efficient

#!/usr/bin/ksh

# Get header Record

grep '^0' Source.txt |head -1 > Output.txt

# Get Trailer Record

grep '^Z' Source.txt |tail -1 > Temp.txt

# Get the matching records

cat Source.txt | awk '{{DCL=substr($0,75,2)} {print DCL,$0}}' > Tempfile1.txt

cat Tempfile1.txt | awk '{{DCLM=substr($0,0,1)} if (DCLM ~/^[D]/){print $0}}' > Tempfile2.txt

cat Tempfile2.txt | cut -c 4- >> Output.txt

# append trailer record

cat Temp.txt >> Output.txt

I'm not sure I can follow. And it doesn't seem you want / need my awk proposal. Please answer Don Cragun's questions to their entirety.

It may be a small win, but try to use the file reading of the tools rather than cat | some-command , so you would have:-

:
:
awk '{{DCL=substr($0,75,2)} {print DCL,$0}}' Source.txt > Tempfile1.txt

awk '{{DCLM=substr($0,0,1)} if (DCLM ~/^[D]/){print $0}}' Tempfile1.txt > Tempfile2.txt

cut -c 4- Tempfile2.txt >> Output.txt
:
:

However you can improve this further by combining the commands of the Tempfilei.txt are not needed afterwards. Badly, you could:-

:
:
awk '{{DCL=substr($0,75,2)} {print DCL,$0}}' Source.txt     | \
  awk '{{DCLM=substr($0,0,1)} if (DCLM ~/^[D]/){print $0}}' | \
  cut -c 4- >> Output.txt
:
:

This will save the disk IO, but obviously at a memory cost. Perhaps you can find a way to combine this into a single awk statement. Have a go, show us your attempts and we can work on a more elegant and efficient solution.

Kind regards,
Robin

rbatte1 has already given you code that makes your code above more efficient. But, despite you saying that your script is working, it does not:

  • take a parameter naming your input file,
  • does not take input filenames in the format specified in post #1,
  • does not produce output filenames in the format specified in post #1, and
  • does not produce output in any of your output files adhering to the requirements you specified in post #1 nor to the requirements you specified in post #7.

Please reconsider what I said in post #10, clearly explain your new requirements (if what you requested in post #1 as modified by post #7 is not what you are now trying to do), and then answer all of the questions I raised there.

@Dan,

The requirements have not changed , the script that i coded was impromptu and may not satisfy the requirements

I have made a minor update to the code that RudiC gave on post #6 to pass the filename as parm to the script, see the code below

I am still working on the header and trailer part , will update the post once i make further progress

usr/bin/ksh

File1=$1

awk '
BEGIN   {OFN[0] ="temp_file"
         OFN[1] ="abc_reject_pqr_a_1_yymmdd"
        }

        {print  > OFN[substr($0,75,1)=="R"]
        }
' $1

Hi techedipro,
I'm not sure who @Dan is, but assuming you were talking to me...

I'm glad that you're making some progress. But:

  • I assume the first line of your script was intended to be #!/usr/bin/ksh instead of usr/bin/ksh .
  • Why do you bother setting a variable named File1 if your script never uses that variable after it has been set?
  • I guess I didn't understand your requirements. I thought that you meant that your input file names would be in the pattern abc_xyz_pqr_a_1_YYMMDD where YYMMDD would be replaced by a two digit year, a two digit month, and a two digit day representing a date (such as abc_xyz_pqr_a_1_180619 and that you wanted the reject file for input filenames in that format to replace the _xyz_ in that format with _reject_ . If the reject output file is ALWAYS going to be literally be named abc_reject_pqr_a_1_yymmdd and the non-reject output file is ALWAYS going to literally be named temp_file , why would you want an input file with a name that might be different from abc_xyz_pqr_a_1_yymmdd ?

I'm looking forward to seeing how you're going to make sure that the header and trailer lines will be copied to both of your output files. (The header should be really easy. The trailer is slightly more complicated if you want to do it in your awk script instead of using tail as an extra processing step to grab it from your input file.)