Now the check is to see if all the "A" records have a corresponding B and C record. Between A and C, records are considered as a block. So in each block there should be one A,C record and atleast one B record. There can be n number of blocks in the file.
Are you on Solaris? In that case use nawk or /usr/xpg4/bin/awk instead of awk.
The gsub deletes all instances of AB*C from the string s. If the syntax is correct then the end result should be an empty string...
A correction is needed as there should be at least one B between A and C:
Thanks Scrutinizer, it worked. I have further checks to do in the same file, I thought I can enhance your guidelines and complete the script but looks like its not the case.
In each block (ABC), I should check for record layout checks(number of fields/delimiters) and check if the block is balanced(debits/credits with C record)
Hi, you could extend the awk script, e.g. like so:
awk -F'|' '{ s=s$1 }
$1=="C"&&$2!=$3 {
print "Balance error at line " NR
}
($1=="A"&&NF!=4)||($1=="B"&&NF!=6)||($1=="C"&&NF!=3) {
print "Wrong number of fields at line "NR
}
END {
print s;gsub(/AB+C/,"",s)
if(s=="")print "Syntax OK"
else print "Syntax Error"
} ' infile
Or perhaps a shell script is easier to work with and modify to suit your needs, e.g:
syntax_error()
{
echo syntax error in record $rec ending at line $line
exit 1
}
while IFS="|" read label x; do
line=$((line+1))
check="$check$label" # Append label to check variable
case $label in
A) rec=$((rec+1));;
C) case $check in
A*BC)
check="${check%BC}" # Cut off trailing B and C
check="${check#A}" # Then cut off leading A
while [ "$check" != "${check%B}" ]; do # Then cut off rest of trailing B's
check="${check%B}"
done
[ "$check" = "" ] || syntax_error # Something is wrong if check is not empty
;;
*) syntax_error # Something is wrong if pattern not A*BC
esac ;;
esac
done < infile
echo syntax is OK