Hi,
Here i'm using a awk inside bash script to validate the datafile by referring to the configuration file(schema file).
Here the validation check is done for datatype, field length and null values.
Once the validation is done on data file the error records are moved to the bad file. So till here everything is working fine .
Now i wanted to add a error code to each bad record in the badfile along with the error field.
Below i have shown the details of datafile, confile, code tried, output getting and output expected:
configurationfile:
id,Integer(3),NOT NULL
name,String(20)
state,String(5),NOT NULL
phone_no,Integer(4)
gender,Char(1)
datafile:
201,John,MI,4589,M
202,Lilly,FL,589,F
20w,Taylor,,5888,M
210,8888,OK,456
215,Madav,,4454,M
2165,ram,MI,4589,M
21734,Leena,,589,F
218,Rohan,CA,2212,M
Script/Code:
#!/bin/bash
awk -F "," -vDT="$(date +%m%d%Y%H%M)" 'BEGIN {
GOOD = "good_" DT; #Adding timestamp into a GOOD file
BAD = "bad_" DT; #Adding timestamp into a BAD file
putB = "hadoop fs -put /home/user/data/" BAD " /user/user/bad/"}
NR == FNR{
gsub("[)(]", "-", $2);
split($2, a, "-");
split($1, c,",");
hh[NR] = c[1]; d[NR] = a[1]; l[NR] = a[2]; n[NR] = ($3 == "NOT NULL") ? 1 : 0; next}
{
for(i = 1; i <= NF; i++)
{
if(((d == "Integer" && (($i + 0) == $i || $i == "")) || (d == "String" && ($i + 0) != $i) || (d == "Char" && ($i + 0) != $i)) && (length($i) <= l) && (length($i) >= n))
{f = 1} else {f = 0};
if(f == 0) {print $0 > BAD; b++; next}
}
print $0 > GOOD; g++
}
END {
print "Count of Bad Records : " b;
#system(putB);
}' configfile.txt datafile2.txt
Output getting without errorcode:
20w,Taylor,,5888,M
210,8888,OK,456
215,Madav,,4454,M
2165,ram,MI,4589,M
21734,Leena,,589,F
Expected output along with the error detail along with the error field name:
20w,Taylor,,5888,M,datatypeerror|id
210,8888,OK,456,datatyprerror|name
215,Madav,,4454,M,nullerror|state
2165,ram,MI,4589,M,columnwidthError|id
21734,Leena,,589,F,columnwidthError|id,nullerror|state
In the above expected result, at the end of each record the error details along with the eoorr field. So this can be achieved?
Thanks,
Shree