Verifying Record Length

SoloXX · May 11, 2013, 1:10pm

Hi all,

We are going through a total migration from AIX-based server framework to Linux-based servers. When I am testing *.sh and *.awk in a lower environments, it abends at the same step everytime in verifying the record length of the first row of the source file.

I know this source file is good because I pulled the file from the production environment in which the source file was already processed succesfully. So, I'm thinking the existing script (a header checking process) that was pulled from the existing AIX production server. There are two result options for this script: Good or Problem File.

The error always comes back that there is a "Problem with the source file sent". I'm not a scripter as it's not part of my duties, but the task has fallen onto my team. Wondering if the script below is compatible with Linux server syntax?? ANY insight would be greatly appreciated.

BEGIN {
good_count = 0
bad_count = 0
line_count = 0
}
NR == 1 {
rec_count = substr($0, 15, 8)
date_str = substr($0, 32, 10) 
 
print "Date in Header Record : " date_str
}
NR > 1 {
line_count++
 
if (length($0) == 710)
{
print $0 > GoodFile
good_count++
}
else
{
print $0 > BadFile
bad_count++
}
}
END {
if ( line_count == rec_count && bad_count==0 )
{
print "********************************************"
print "Count in header : " rec_count 
print "Count in file received : " line_count 
print " "
print "Number of Good Records : " good_count
print "Number of Bad Records : " bad_count 
}
else
{
print "********************************************"
print " PROBLEM WITH FILE SENT FROM SOURCE " 
print " "
print "Count in header : " rec_count 
print "Count in file received : " line_count 
print " "
print "Number of Good Records : " good_count
print "Number of Bad Records : " bad_count
exit 99
}
}

zozoo · May 11, 2013, 1:18pm

HI ,

if u can post the sample of your input file .it would be helpful to debug

SoloXX · May 11, 2013, 1:48pm

Hi zozoo,

I tried pasting the header and 1st row of data but the format didn't show up right, so I'm attaching it. Although the actual 1st record row is displayed in multiple lines in the atachment, it IS on one line in the source file.

I also want to show the logfile for the result of 'verifying record length':

Date in Header Record : 04/22/2013
******************************************** PROBLEM WITH FILE SENT FROM SOURCE
Count in header : 400133
Count in file received : 400133
Number of Good Records : 400133
Number of Bad Records : 0

Don_Cragun · May 11, 2013, 1:51pm

In the future, please post your source code as a single block inside CODE tags instead of tagging each line of your source code!

There isn't anything immediately obvious to me that is wrong with this awk script file.

I'm assuming that you at least realize that this is an awk script file; not a shell script file. Please show us the exact command line that is being used to invoke awk using this script file and show us how the input file is being passed to awk. Is input coming from a tape drive with variable length records (that the script is then verifying all each contain 710 characters not counting the trailing newline character)?

What Linux system are you using?
What shell are you using on Linux?
What version of AIX are you using on your AIX server?
What shell is being used on your AIX server?
Is the input data file encoded in EBCDIC?

SoloXX · May 11, 2013, 2:50pm

[quote=i'm assuming that you at least realize that this is an awk script file; not a shell script file. please show us the exact command line that is being used to invoke awk using this script file and show us how the input file is being passed to awk. is input coming from a tape drive with variable length records (that the script is then verifying all each contain 710 characters not counting the trailing newline character)?

what linux system are you using?
what shell are you using on linux?
what version of aix are you using on your aix server?
what shell is being used on your aix server?
is the input data file encoded in ebcdic?[/quote]

Hi Don,

Yes, sorry...yes I know it's an awk script based on the extension of the script.

==============
The process that invokes the awk script:

Check for source file arrival
Header date verified on incoming file from internal sql query to database
lookup table
Translate unwanted Characters to spaces
Move file from source directory to processing directory

After the above completes, the Verify Record Length AWK script automatically begins.

Veryifies total records and checks record lengths
===============

What Linux system are you using? Red Hat Enterprise Linux Server release 5.8 (Tikanga)
What shell are you using on Linux? Korn
What version of AIX are you using on your AIX server? 5.3.0.0
What shell is being used on your AIX server? Korn
Is the input data file encoded in EBCDIC? Do not know.

MadeInGermany · May 11, 2013, 2:52pm

Should be all in one line:

if ( line_count == rec_count && bad_count==0 ) {

Dito

if (length($0) == 710) {

SoloXX · May 11, 2013, 3:10pm

Hello,

So you're saying to place that one character symbol on the line you showed in your post?

Forgive me, but I am not involved in any scripiting in my job, so I am a bit ignorant. However, through googling trying to figure this out, I am a little more familiar now than I was before.

Question: regarding the "($0)"

if (length($0) == 710)

Is this checking ONLY the length of the FIRST record? Not entirely sure what that variable is representing.

MadeInGermany · May 11, 2013, 3:40pm

Yes, I always put the opening { on the if line.
On the other hand, a correct awk should also handle two lines ... But all the other code looks okay.

$0 is the current line.
NR is the line number.
BTW I also put this in one line:

} else {

Don_Cragun · May 11, 2013, 4:11pm

soloxx:

Hi Don,

Yes, sorry...yes I know it's an awk script based on the extension of the script.

==============
The process that invokes the awk script:

Check for source file arrival

Header date verified on incoming file from internal sql query to database
lookup table

Translate unwanted Characters to spaces

Move file from source directory to processing directory

After the above completes, the Verify Record Length AWK script automatically begins.

Veryifies total records and checks record lengths
===============

What Linux system are you using? Red Hat Enterprise Linux Server release 5.8 (Tikanga)
What shell are you using on Linux? Korn
What version of AIX are you using on your AIX server? 5.3.0.0
What shell is being used on your AIX server? Korn
Is the input data file encoded in EBCDIC? Do not know.

I repeat (from my 1st posting in this thread:

I understand the English description of the processing that is going on. To help you with your problem, we need to see the exact Korn shell commands that are being used to run your awk script. Saying:

doesn't give us any way to evaluate what might be going wrong.

SoloXX · May 11, 2013, 4:21pm

Sorry Don, but our Batch document doesn't provide us that much detail yet as it is being put together at the moment.

On anohter note, is there a command I can use in Unix that can tell me the first line's record length if I navigate to the source directory?

I tried wc -c <insert filename> but it gives me the TOTAL character counts.

MadeInGermany · May 11, 2013, 5:23pm

Length of first line:

awk 'NR==1 {print length($0)}'

If nothing else is to be processed, immediate exit is faster

awk '{print length($0); exit}'

---------- Post updated at 04:23 PM ---------- Previous update was at 04:15 PM ----------

Don suspects you run your script by the wrong interpreter. I can hardly believe this since you provided logfile. But to be sure I agree you should show how you run it.

Don_Cragun · May 12, 2013, 12:06am

Hi SoloXX,
When I last posted on this thread, I hadn't seen the sample input that you posted in message #3 in this thread 1 or 2 minutes before I hit the Submit Reply button. Now that we know that you have no idea what is going on, let me make a few commnets based on what you have shown us. First using CODE tags instead of ICODE tags and reformatting your awk script, it looks like this:

BEGIN { good_count = 0
        bad_count = 0
        line_count = 0
}
NR == 1 {
        rec_count = substr($0, 15, 8)
        date_str = substr($0, 32, 10)

        print "Date in Header Record : " date_str
}
NR > 1 {line_count++
        if (length($0) == 710) {
                print $0 > GoodFile
                good_count++
        } else {
                print $0 > BadFile
                bad_count++
        }
}
END {   if(line_count == rec_count && bad_count==0) {
                print "********************************************"
                print "Count in header : " rec_count
                print "Count in file received : " line_count
                print " "
                print "Number of Good Records : " good_count
                print "Number of Bad Records : " bad_count
        } else {
                print "********************************************"
                print " PROBLEM WITH FILE SENT FROM SOURCE "
                print " "
                print "Count in header : " rec_count
                print "Count in file received : " line_count
                print " "
                print "Number of Good Records : " good_count
                print "Number of Bad Records : " bad_count
                exit 99
        }
}

You keep asking how to find out how many characters are in the first line of your input file, but (as shown by the lines marked in red) this awk script doesn't care what the length is for the first line. It is checking to see that the length of every line AFTER the first line contains 710 characters not counting the terminating <newline> character. It assumes that the first line in the file contains at least 43 characters, but does not verify that that is true.

If you don't know how this awk script is being run, how do you know it isn't working correctly???

The file you uploaded ( First row of src file.txt ) happens to have 712 characters in the 1st line (counting the trailing <carriage-return> and <newline> characters), but the second line is incomplete. Since the file you uploaded contained 1422 bytes (and only contained one <newline> character as the 712th character in the file), we have no idea how long the 2nd line in the file is. Since awk is only specified to work on text files, the behavior of this script is unspecified when given your uploaded file as input. On OS X, it yields:

Date in Header Record : 04/22/2013
********************************************
 PROBLEM WITH FILE SENT FROM SOURCE 
 
Count in header : 400133  
Count in file received : 1
 
Number of Good Records : 1
Number of Bad Records : 0

but if you add a <carriage-return> and a <newline> to the end of your uploaded file (matching what is at the end of the 1st line in the file), it yields:

Date in Header Record : 04/22/2013
********************************************
 PROBLEM WITH FILE SENT FROM SOURCE 
 
Count in header : 400133  
Count in file received : 1
 
Number of Good Records : 0
Number of Bad Records : 1

As I said all of this is based on lots of conjecture, but if you won't show us how the script is used, how the input file is massaged before feeding it to the script, and any other pertinent details that have been hidden by not showing us how this awk script is invoked; we can't make any concrete suggestions on what, if anything, might be wrong. Maybe the <carriage-return> characters in you input file are discarded before the input is fed into this awk script??? Maybe the only <carriage-return> in the input file is in the first line (where it will be ignored).???