awk concatenation issue - SQL generation

Greetings Experts,

I have an excel file and I am unable to read it directly into awk (contains , " etc); So, I cleansed and copied the data into notepad.

I need to generate a script that generates the SQL.

Requirement:

  1. Filter and select only the data that has the "mapping" as "direct"
  2. Add COALESCE and NVL (with sample value of 11111) for all columns according to datatype. I will change it appropriately wherever applicable
  3. Whenever the source and target table names are same the queries need to be concatenated with "," so that I validate all relative columns at one pass for each combination of source and target.
  4. I will add the required join conditions based on the source and target table names manually;

From the intended output, t1c4 column query is separated as the source table is different; t2c2 is separated as the target table is different;

Input File:

SRC_TABLE  SRC_COL  SRC_DATATYPE  MAPPING  TGT_TABLE  TGT_COL  TGT_DATATYPE
SRC_TBL1    S1C1        DATE        DIRECT     TGT_TBL1   T1C1  CHAR(6)YYYYMM
SRC_TBL1    S1C2        VARCHAR     DIRECT     TGT_TBL1   T1C2  INTEGER
SRC_TBL1    S1C3        DATE        OTHER      TGT_TBL1   T1C3  INTEGER
SRC_TBL2    S2C4        INTEGER     DIRECT     TGT_TBL1   T1C4  INTEGER
SRC_TBL2    S2C2        INTEGER     DIRECT     TGT_TBL2   T2C2  INTEGER

Intended Output:

SUM(CASE WHEN COALESCE(TO_CHAR(SRC_TBL1.S1C1,'YYYYMMDD'), '111101' = COALESCE(TGT_TBL1.T1C1,'111101') THEN 0 ELSE 1 END) COL_T1C1,
SUM(CASE WHEN COALESCE(SRC_TBL1.S1C2,'11111')=COALESCE(TGT_TBL1.T1C2,'11111') THEN 0 ELSE 1 END) COL_T1C2
SUM(CASE WHEN NVL(SRC_TBL2.S2C4,11111)=NULL(TGT_TBL1.T1C4,11111) THEN 0 ELSE 1 END) COL_T1C4
SUM(CASE WHEN NVL(SRC_TBL2.S2C2,11111)=NVL(TGT_TBL2.T2C2,11111) THEN 0 ELSE 1 END) COL_T2C1

What I have done:

awk -F " " ' {
if((tolower($4) ~ /direct/) && (toupper($3) ~ /DATE/) && (toupper($7) ~ /CHAR\(6/)) {
str="SUM(CASE WHEN COALESCE(TO_CHAR("$1"."$2",\47YYYYMM\47), \47777707\47)=COALESCE("$5"."$6 ",\47777707\47) THEN 0 ELSE 1 END) COL_"$6
a[$1 OFS $5]=(!a[$1 OFS $5])?a[$1 OFS $5]","str:str

if((tolower($4) ~ /direct/) && (toupper($3) ~ /CHAR/)) && (toupper($7) ~ /CHAR\(6/) {
str="SUM(CASE WHEN COALESCE("$1"."$2",\47777707\47)=COALESCE("$5"."$6 ",\47777707\47) THEN 0 ELSE 1 END) COL_"$6
a[$1 OFS $5]=(!a[$1 OFS $5])?a[$1 OFS $5]","str:str
#.......similarly handling other data types

}
}
END {
for (i in a)
{
print "source and target table names are " i
print "--------------"
print "excerpt of the query is" a
print "%%%%%%%%%%%%%%%%%%%%"
}
}' < input_file.txt > output_file.txt

Excuse any syntax issues as I was not able to copy/paste the code;
My output doesn't seem to concatenate when there are 2 or more than eligible records for the same source and target table name; It has only one target_column no matter how many eligible records are there; For eg: the first source and target table names gives this result
My output through code:

SUM(CASE WHEN COALESCE(SRC_TBL1.S1C2,'11111')=COALESCE(TGT_TBL1.T1C2,'11111') THEN 0 ELSE 1 END) COL_T1C2

I am not able to figure it out as I am learning awk recently through the forums posts. Also, I am not able to have single quote (') as part of the query directly using ' (tried with backslash,'',''','''') still not working; after searching, used \47 and it breaks when used with numeric chars as \47777777\47
Can you please tell me how to have ' with out using awk variables at the command line or BEGIN and with using awk variables;
Many thanks for your time..

Hmmm - feeling a bit lost here. Nevertheless, some hints:

  • for the single quote, either use \047 or terminate the string and immediately reopen it: "...\47" "777707\47"
  • as the input consistently seems to be upper case, you might be able to drop the to_lower(...) and to_upper(...) conversions.
  • it would concatenate if it got there, but it can't as the respective branch is chosen max once. There IS just nothing to concatenate.

This may point you in a direction:

awk '
$4 == "DIRECT"  {IX = $1 OFS $5
                 if ($3 $7 ~ /DATE.*CHAR\(6/)
                        str="SUM(CASE WHEN COALESCE(TO_CHAR("$1"."$2",\47""YYYYMM\47), \47""777707\47)=COALESCE("$5"."$6 ",\47""777707\47) THEN 0 ELSE 1 END) COL_"$6

                 if ($3 $7 ~ /CHAR.*INTEGER/)
                        str="SUM(CASE WHEN COALESCE("$1"."$2",\47""777707\47)=COALESCE("$5"."$6 ",\47""777707\47) THEN 0 ELSE 1 END) COL_"$6

                 a[IX] = (a[IX])?a[IX] "," str:str
                }

END     {for (i in a)   {print "source and target table names are " i
                         print "--------------"
                         print "excerpt of the query is " a
                         print "%%%%%%%%%%%%%%%%%%%%"
                        }
        }
' < file
1 Like

Thank you RudiC; that worked

Glad it helped - but it was not meant to work but to indicate how you (c|sh)ould continue your development efforts. Anyhow: welcome!