Greetings Experts,
I have an excel file and I am unable to read it directly into awk (contains , " etc); So, I cleansed and copied the data into notepad.
I need to generate a script that generates the SQL.
Requirement:
- Filter and select only the data that has the "mapping" as "direct"
- Add COALESCE and NVL (with sample value of 11111) for all columns according to datatype. I will change it appropriately wherever applicable
- Whenever the source and target table names are same the queries need to be concatenated with "," so that I validate all relative columns at one pass for each combination of source and target.
- I will add the required join conditions based on the source and target table names manually;
From the intended output, t1c4 column query is separated as the source table is different; t2c2 is separated as the target table is different;
Input File:
SRC_TABLE SRC_COL SRC_DATATYPE MAPPING TGT_TABLE TGT_COL TGT_DATATYPE
SRC_TBL1 S1C1 DATE DIRECT TGT_TBL1 T1C1 CHAR(6)YYYYMM
SRC_TBL1 S1C2 VARCHAR DIRECT TGT_TBL1 T1C2 INTEGER
SRC_TBL1 S1C3 DATE OTHER TGT_TBL1 T1C3 INTEGER
SRC_TBL2 S2C4 INTEGER DIRECT TGT_TBL1 T1C4 INTEGER
SRC_TBL2 S2C2 INTEGER DIRECT TGT_TBL2 T2C2 INTEGER
Intended Output:
SUM(CASE WHEN COALESCE(TO_CHAR(SRC_TBL1.S1C1,'YYYYMMDD'), '111101' = COALESCE(TGT_TBL1.T1C1,'111101') THEN 0 ELSE 1 END) COL_T1C1,
SUM(CASE WHEN COALESCE(SRC_TBL1.S1C2,'11111')=COALESCE(TGT_TBL1.T1C2,'11111') THEN 0 ELSE 1 END) COL_T1C2
SUM(CASE WHEN NVL(SRC_TBL2.S2C4,11111)=NULL(TGT_TBL1.T1C4,11111) THEN 0 ELSE 1 END) COL_T1C4
SUM(CASE WHEN NVL(SRC_TBL2.S2C2,11111)=NVL(TGT_TBL2.T2C2,11111) THEN 0 ELSE 1 END) COL_T2C1
What I have done:
awk -F " " ' {
if((tolower($4) ~ /direct/) && (toupper($3) ~ /DATE/) && (toupper($7) ~ /CHAR\(6/)) {
str="SUM(CASE WHEN COALESCE(TO_CHAR("$1"."$2",\47YYYYMM\47), \47777707\47)=COALESCE("$5"."$6 ",\47777707\47) THEN 0 ELSE 1 END) COL_"$6
a[$1 OFS $5]=(!a[$1 OFS $5])?a[$1 OFS $5]","str:str
if((tolower($4) ~ /direct/) && (toupper($3) ~ /CHAR/)) && (toupper($7) ~ /CHAR\(6/) {
str="SUM(CASE WHEN COALESCE("$1"."$2",\47777707\47)=COALESCE("$5"."$6 ",\47777707\47) THEN 0 ELSE 1 END) COL_"$6
a[$1 OFS $5]=(!a[$1 OFS $5])?a[$1 OFS $5]","str:str
#.......similarly handling other data types
}
}
END {
for (i in a)
{
print "source and target table names are " i
print "--------------"
print "excerpt of the query is" a
print "%%%%%%%%%%%%%%%%%%%%"
}
}' < input_file.txt > output_file.txt
Excuse any syntax issues as I was not able to copy/paste the code;
My output doesn't seem to concatenate when there are 2 or more than eligible records for the same source and target table name; It has only one target_column no matter how many eligible records are there; For eg: the first source and target table names gives this result
My output through code:
SUM(CASE WHEN COALESCE(SRC_TBL1.S1C2,'11111')=COALESCE(TGT_TBL1.T1C2,'11111') THEN 0 ELSE 1 END) COL_T1C2
I am not able to figure it out as I am learning awk recently through the forums posts. Also, I am not able to have single quote (') as part of the query directly using ' (tried with backslash,'',''','''') still not working; after searching, used \47 and it breaks when used with numeric chars as \47777777\47
Can you please tell me how to have ' with out using awk variables at the command line or BEGIN and with using awk variables;
Many thanks for your time..