The script should validate that only 1 header and trailer exists. If more, raise exception.
The script should verify that total detail lines equal to the trailder record (Record_count)
I really appreciate if someone can provide me the script
Why don't you break the problem down into several parts?
1) Check if the header exist. Use grep & head
2) Check if the trailer record exist. Use grep & tail
3) Check if there is multiple records of header & trailers. Use grep & wc
It checks both for the existence of exactly one header and one trailer, AND checks that the record count in the trailer matches the records observed. If either fails the exit code is non-zero to indicate that there is an error. If you need more precision, knowing exactly why there is an error, a longer programme can be used:
awk -F "|" '
/^HDR/ { h++; next; }
/^TRL/ { t++; next; }
END {
ec = 1;
if( h != 1 || t != 1 )
printf( "header/trailer count error: %d headers; %d trailers\n", h, t ) >"/dev/fd/2";
else
if( $NF != NR - 2 )
printf( "bad record count: %d(t) != %d(rec)\n", $NF, NR ) >"/dev/fd/2";
else
ec = 0;
exit( ec );
}' input-file
I am not familiar with what an 'ETL program' is, so I don't know if you can invoke awk or not.
If ETL is anything like and of the standard *NIX shells, you probably can do something like this:
if ! awk -F "|" ' /^HDR/ { h++; next; } /^TRL/ {t++; next} END { exit( (h != 1) || t != 1 || $NF != NR - 2 ); }' $file_name
then
echo "file did not pass verification test: $file_name"
exit 1
fi
# put rest of your processing on success here.
@chedlee88-1 -- for small files, reading each three times might not have a noticeable impact, but if the input being verified is large, reading each three times might be so inefficient as not to be practical. It makes more sense to read the file once.
This programme will do the same thing, non-zero exit code, and write the failure reason onto standard err.
@agama, thanks for the script, i will test it tomorrow. Btw, my file is quite large indeed and surely something efficient is a must. Oh ETL is an integration/middleware tool to transfer data from source to target.