Script to perform record format checks

gsjdrr · June 7, 2010, 10:01am

Hi All,

I have a requirement to perform the following checks.

Input file is a "|" delimited file and looks like this.

A|c1|c2|c3|....
B|G1|G2|G3....
C|H1|H2|H3...
A|c4|c5|c6|....
B|G4|G5|G6....
C|H4|H5|H6...

Now the check is to see if all the "A" records have a corresponding B and C record. Between A and C, records are considered as a block. So in each block there should be one A,C record and atleast one B record. There can be n number of blocks in the file.

Thanks in advance.

Scrutinizer · June 7, 2010, 10:56am

Try:

awk -F'|' '{s=s$1} END{print s;gsub(/AB*C/,"",s);if(s=="")print "Syntax OK"; else print "Syntax Error"}' infile

mr_harish80 · June 7, 2010, 11:01am

Can you please explain the gsub part please.

durden_tyler · June 7, 2010, 11:03am

One way to do it with Perl:

$
$
$ cat -n f1
     1  A|c1|c2|c3|....
     2  B|G1|G2|G3....
     3  C|H1|H2|H3...
     4  A|c4|c5|c6|....
     5  B|G4|G5|G6....
     6  C|H4|H5|H6...
     7  A|c7|c8|c9|....
     8  C|H7|H8|H9...
     9  A|c0|c1|c2|....
    10  B|G7|G8|G9....
    11  B|G0|G1|G2....
    12  B|G3|G4|G5....
    13  C|H0|H1|H2...
$
$
$ ##
$ perl -ne 'push @x, $_;
          if (/^A\|/) {$in = 1; $lnum=$.}
          elsif ($in and /^B\|/) {$count++}
          elsif ($in and /^C\|/) {
            printf("The block at line %4d is %10s =>\n", $lnum, ($count == 0 ? "invalid" : "valid"));
            foreach (@x) {print "\t",$_}
            $count=0; $in=0; @x=();
          }
         ' f1
The block at line    1 is      valid =>
        A|c1|c2|c3|....
        B|G1|G2|G3....
        C|H1|H2|H3...
The block at line    4 is      valid =>
        A|c4|c5|c6|....
        B|G4|G5|G6....
        C|H4|H5|H6...
The block at line    7 is    invalid =>
        A|c7|c8|c9|....
        C|H7|H8|H9...
The block at line    9 is      valid =>
        A|c0|c1|c2|....
        B|G7|G8|G9....
        B|G0|G1|G2....
        B|G3|G4|G5....
        C|H0|H1|H2...
$
$

tyler_durden

gsjdrr · June 7, 2010, 11:17am

Thanks for your quick response.

I tried your code

 
 awk -F'|' '{s=s$1} END{print s;gsub(/AB*C/,"",s);if(s=="")print "Syntax OK"; else print "Syntax Error"}' samplefile.dat
awk: syntax error near line 1
awk: illegal statement near line 1

---------- Post updated at 10:17 AM ---------- Previous update was at 10:11 AM ----------

Thanks durden_tyler for perl code. Unfortunately, I am not supposed to use perl .

Scrutinizer · June 7, 2010, 1:43pm

Are you on Solaris? In that case use nawk or /usr/xpg4/bin/awk instead of awk.
The gsub deletes all instances of AB*C from the string s. If the syntax is correct then the end result should be an empty string...

A correction is needed as there should be at least one B between A and C:

awk -F'|' '{s=s$1} END{print s;gsub(/AB+C/,"",s);if(s=="")print "Syntax OK"; else print "Syntax Error"}' infile

gsjdrr · June 7, 2010, 3:23pm

Thanks Scrutinizer, it worked. I have further checks to do in the same file, I thought I can enhance your guidelines and complete the script but looks like its not the case.

In each block (ABC), I should check for record layout checks(number of fields/delimiters) and check if the block is balanced(debits/credits with C record)

A|c1|c2|c3|
B|G1|G2|G3|1000|line
B|G1|G2|G3|3000|line2
B|G1|G2|G3|-4000|line3
C|4000|-4000

Scrutinizer · June 7, 2010, 4:29pm

Hi, you could extend the awk script, e.g. like so:

awk -F'|' '{ s=s$1 }
           $1=="C"&&$2!=$3 {
             print "Balance error at line " NR
           }
           ($1=="A"&&NF!=4)||($1=="B"&&NF!=6)||($1=="C"&&NF!=3) {
             print "Wrong number of fields at line "NR
           }
           END {
             print s;gsub(/AB+C/,"",s)
             if(s=="")print "Syntax OK"
             else print "Syntax Error"
           } ' infile

Or perhaps a shell script is easier to work with and modify to suit your needs, e.g:

syntax_error()
{
  echo syntax error in record $rec ending at line $line
  exit 1
}
while IFS="|" read label x; do
  line=$((line+1))
  check="$check$label"                                  # Append label to check variable
  case $label in
    A) rec=$((rec+1));;
    C) case $check in
         A*BC)
           check="${check%BC}"                          # Cut off trailing B and C
           check="${check#A}"                           # Then cut off leading A
           while [ "$check" != "${check%B}" ]; do       # Then cut off rest of trailing B's
             check="${check%B}"
           done
           [ "$check" = "" ] || syntax_error            # Something is wrong if check is not empty
           ;;
         *)    syntax_error                             # Something is wrong if pattern not A*BC
       esac ;;
  esac
done < infile
echo syntax is OK