How to create SQL statement out of data using shell script?

Table TAB1 contains following example data (its a tree sitting in table data format & its driven based CHILD & PARENT column pick the RULE condition to generate the below SQL:

CHILD	PARENT	SS	MID	MNM	VNM	RULE	FLG
1	?	S1	?	?	V1	rule004	I         
2	1	S1	?	?	V1	0	Z         
3	1	S1	?	?	V1	1	Z         
4	2	S1	?	?	V2	rule001	N         
5	4	S1	?	?	V2	-999999999 to 20000 OR Missing	Z         
6	4	S1	?	?	V2	20000.000001 to 999999999	Z         
10	5	S1	?	?	V3	rule002	N         
11	10	S1	?	?	V3	360 to 700 OR Missing	Z         
12	10	S1	?	?	V3	700.000001 to 850	Z         
13	6	S1	?	?	V3	rule002	N         
14	13	S1	?	?	V3	360 to 700 OR Missing	Z         
15	13	S1	?	?	V3	700.000001 to 850	Z         
16	3	S1	60054	M5	?	?	Y         
17	11	S1	60050	M1	?	?	Y         
18	12	S1	60051	M2	?	?	Y         
19	14	S1	60052	M3	?	?	Y         
20	15	S1	60053	M4	?	?	Y         
99999999	99999999	S1	60050	M1	xxxxxxxx	?	D         

the output SQL looks like:

SELECT ROW_KEY_COL,
CASE  
WHEN V1  = 1 THEN 'M5'  
WHEN V1  = 0 AND (V2 BETWEEN -999999999 AND 20000   OR V2 IS NULL ) AND (V3 BETWEEN 360 AND 700   OR V3 IS NULL ) THEN 'M1'  
WHEN V1  = 0 AND (V2 BETWEEN -999999999 AND 20000   OR V2 IS NULL )  AND V3 BETWEEN 700.000001 AND 850 THEN 'M2'  
WHEN V1  = 0 AND V2 BETWEEN 20000.000001 AND 999999999 AND (V3 BETWEEN 360 AND 700   OR V3 IS NULL ) THEN 'M3'  
WHEN V1  = 0 AND V2 BETWEEN 20000.000001 AND 999999999 AND V3 BETWEEN 700.000001 AND 850 THEN 'M4' 
ELSE 'M1' END
FROM TAB1;

(OR)

SELECT ROW_KEY_COL,
CASE  
WHEN V1  = 0 AND (V2 BETWEEN -999999999 AND 20000   OR V2 IS NULL ) AND (V3 BETWEEN 360 AND 700   OR V3 IS NULL ) THEN 'M1'  
WHEN V1  = 0 AND (V2 BETWEEN -999999999 AND 20000   OR V2 IS NULL )  AND V3 BETWEEN 700.000001 AND 850 THEN 'M2'  
WHEN V1  = 0 AND V2 BETWEEN 20000.000001 AND 999999999 AND (V3 BETWEEN 360 AND 700   OR V3 IS NULL ) THEN 'M3'  
WHEN V1  = 0 AND V2 BETWEEN 20000.000001 AND 999999999 AND V3 BETWEEN 700.000001 AND 850 THEN 'M4' 
WHEN V1  = 1 THEN 'M5'  
ELSE 'M1' END
FROM TAB1;

its kind of based on 1st rule it goes to 2nd rule then 3 rule & each rule contains 2 or more rule branch & it goes to until # of sub tree levels & it generate kind of above SQL query..

any help/suggestion will help, thanks

---------- Post updated at 03:00 AM ---------- Previous update was at 02:54 AM ----------

another example data & output SQL looks like below:

CHILD	PARENT	SS	MID	MNM	VNM	RULE	FLG
1	?	S2	?	?	V1	rule002	I         
2	1	S2	?	?	V1	N	Z         
3	1	S2	?	?	V1	Y	Z         
4	2	S2	?	?	V2	rule003	N         
5	4	S2	?	?	V2	'AIRLINE','HOTEL'	Z         
6	4	S2	?	?	V2	CORPORATE	Z         
7	5	S2	12331	M1	?	?	Y         
8	6	S2	12332	M2	?	?	Y         
9	3	S2	12332	M3	?	?	Y         
99999999	99999999	S2	12333	M4	XXXXXXXX	?	D         

output SQL needs following format is:

SELECT ROW_KEY_COL,
CASE  
WHEN V1  IN ('airline','hotel') AND V2  = 'N' THEN 'M1'  
WHEN V1  = 'corporate' AND V2  = 'N' THEN 'M2'  
WHEN V2  = 'Y' THEN 'M3' 
ELSE 'M4' END
FROM TAB1 

How about this using awk, note the inputfile is a tab separated text file matching you posted input.

awk -F'\t' '
BEGIN { printf "SELECT ROW_KEY_COL,\nCASE\n" }
$8 ~ "[ZN]" {
    if($8=="Z") RL[$1]=$7
    VAR[$1]=$6
    NXT[$1]=$2
    next
}
$8 == "D" { printf "ELSE %c%s%c END\n",39,$5,39}
$8=="Y" {
  wexp=sprintf("THEN %c%s%c\n",39,$5,39)
  rule=$2
  ND=" "
  while(rule in NXT) {
     num=split(RL[rule],rf," ")

     # Parse rule text - this needs some enhancement
     if (num==1) {
        if (index(RL[rule], ",")) wexp=sprintf("%s IN (%s)%s",VAR[rule],RL[rule],ND) wexp
        else
           if (RL[rule]+0 > 0 || RL[rule]+0 == RL[rule]) wexp=sprintf("%s = %s%s",VAR[rule],RL[rule],ND) wexp
           else wexp=sprintf("%s = %c%s%c%s",VAR[rule],39,RL[rule],39,ND) wexp
     }
     if (num==3) wexp=sprintf("%s BETWEEN %s AND %s%s",VAR[rule],rf[1],rf[3],ND) wexp
     if (num==5) wexp=sprintf("(%s BETWEEN %s AND %s OR %s IS NULL)%s",VAR[rule],rf[1],rf[3],VAR[rule],ND) wexp
     # parse done

     rule=NXT[rule]
     ND = " AND "
  }
  printf "WHEN %s", wexp
}
END { printf "FROM TAB1;\n" }' infile

Also note: The parsing of rules is pretty basic and only caters for you posted examples. I expect you will need something more complex to replace these if (num ==x) tests

1 Like

Thanks Chubler, you certainly spent much time in this.
I saw some trailing spaces in the given input files ... for robustness you should add a line

{sub("[ \r]+$","")}

just after the line

BEGIN { ... }
1 Like

WOW!!!!!!!!!!
This is awesome code, I was writing this code connecting with database to do with 500+ lines & still I am working on how to handle All Other (if RULE contains 'All Other' then i need to put not in ( value for that tree branch ) scenario.

But this code is tiny & doing super BIG magic init..

Thank you Chubler_XL :b::b:!!!!! & MadeInGermany!!!!
:b::b::b:

Nice Post!!!!!!!!!!

This is grand code, I completely agree with @Chubler_XL

SHELL SCRIPT
-------------

#!/bin/sh
awk -F'\t' '
BEGIN { printf "SELECT ROW_KEY_COL,\nCASE\n" }
{sub("[ \r]+$","")}
$8 ~ "[ZN]" {
    if($8=="Z") {
        RL[$1]=$7
        if ($7 != "All Other")
                { if ( $7 ~ /[0-9]+$/ ) Val = $7;
                  else Val=sprintf("%c%s%c",39,$7,39);
                  if( AOTH[$6 $2] == "") AOTH[$6 $2]= Val;
                  else AOTH[$6 $2]= AOTH[$6 $2] "," Val;}
        }
    VAR[$1]=$6
    NXT[$1]=$2
    #print $6 "==" AOTH[$6] "\n"
    next
}
$8 == "D" { printf "ELSE %c%s%c END\n",39,$5,39}
$8=="Y" {
  wexp=sprintf("THEN %c%s%c\n",39,$5,39)
  rule=$2
  ND=" "
  while(rule in NXT) {
     num=split(RL[rule],rf," ")
     #print RL[rule] " = " num;
     if (num==1) {
        if (index(RL[rule], ",")) wexp=sprintf("%s IN (%s)%s",VAR[rule],RL[rule],ND) wexp
        else
           if (RL[rule]+0 > 0 || RL[rule]+0 == RL[rule]) wexp=sprintf("%s = %s%s",VAR[rule],RL[rule],ND) wexp
           else wexp=sprintf("%s = %c%s%c%s",VAR[rule],39,RL[rule],39,ND) wexp
     }
     if (num==3) wexp=sprintf("%s BETWEEN %s AND %s%s",VAR[rule],rf[1],rf[3],ND) wexp
     if (num==5) wexp=sprintf("(%s BETWEEN %s AND %s OR %s IS NULL)%s",VAR[rule],rf[1],rf[3],VAR[rule],ND) wexp
     if (RL[rule]=="All Other") wexp=sprintf("(%s NOT IN (%s) )%s",VAR[rule],AOTH[VAR[rule] NXT[rule]],ND) wexp
     # parse done

     rule=NXT[rule]
     ND = " AND "
  }
  printf "WHEN %s", wexp
}
END { printf "FROM TAB1;\n" }' infile

INPUT DATA
----------

CHILD 	PARENT 	SS 	MID 	MNM 	VNM 	RULE 	FLG 
1		S1			V1	rule001	I
2	1	S1			V1	high_bureau	Z
3	1	S1			V1	All Other	Z
4	2	S1			V2	rule002	N
5	4	S1			V2	Charge V2	Z
6	4	S1			V2	All Other	Z
7	3	S1			V2	rule002	N
8	7	S1			V2	Charge V2	Z
9	7	S1			V2	All Other	Z
10	5	S1			V3	rule003	N
11	10	S1			V3	1 to 1	Z
12	10	S1			V3	2 to 2	Z
13	10	S1			V3	3 to 3	Z
14	10	S1			V3	4 to 4	Z
15	10	S1			V3	5 to 5	Z
16	10	S1			V3	6 to 6	Z
17	10	S1			V3	7 to 7	Z
18	6	S1			V3	rule003	N
19	18	S1			V3	1 to 1	Z
20	18	S1			V3	2 to 2	Z
21	18	S1			V3	3 to 3	Z
22	18	S1			V3	4 to 4	Z
23	18	S1			V3	5 to 5	Z
24	18	S1			V3	6 to 6	Z
25	18	S1			V3	7 to 7	Z
26	8	S1			V3	rule004	N
27	26	S1			V3	1 to 1	Z
28	26	S1			V3	2 to 2	Z
29	26	S1			V3	3 to 4	Z
30	26	S1			V3	5 to 7	Z
31	9	S1			V3	rule004	N
32	31	S1			V3	1 to 1	Z
33	31	S1			V3	2 to 2	Z
34	31	S1			V3	3 to 4	Z
35	31	S1			V3	5 to 7	Z
36	12	S1	65264	SS519			Y
37	20	S1	65264	SS519			Y
38	28	S1	65264	SS519			Y
39	33	S1	65264	SS519			Y
40	11	S1	65244	SS501			Y
41	13	S1	65245	SS502			Y
42	14	S1	65246	SS503			Y
43	15	S1	65247	SS504			Y
44	16	S1	65248	SS505			Y
45	17	S1	65249	SS506			Y
46	19	S1	65251	SS507			Y
47	21	S1	65252	SS508			Y
48	22	S1	65253	SS509			Y
49	23	S1	65254	SS510			Y
50	24	S1	65255	SS511			Y
51	25	S1	65256	SS512			Y
52	27	S1	65258	SS513			Y
53	29	S1	65259	SS514			Y
54	30	S1	65260	SS515			Y
55	32	S1	65261	SS516			Y
56	34	S1	65262	SS517			Y
57	35	S1	65263	SS518			Y
99999999	99999999	S1	65264	SS519	XXXXXXXX		D

SHELL OUTPUT (shell script is producing below output)
----------------------------------------------------

SELECT ROW_KEY_COL,
CASE 
WHEN V1 = 'high_bureau' AND V3 BETWEEN 1 AND 1 THEN 'SS501'
WHEN V1 = 'high_bureau' AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN V1 = 'high_bureau' AND V3 BETWEEN 3 AND 3 THEN 'SS502'
WHEN V1 = 'high_bureau' AND V3 BETWEEN 4 AND 4 THEN 'SS503'
WHEN V1 = 'high_bureau' AND V3 BETWEEN 5 AND 5 THEN 'SS504'
WHEN V1 = 'high_bureau' AND V3 BETWEEN 6 AND 6 THEN 'SS505'
WHEN V1 = 'high_bureau' AND V3 BETWEEN 7 AND 7 THEN 'SS506'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 1 AND 1 THEN 'SS507'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 3 AND 3 THEN 'SS508'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 4 AND 4 THEN 'SS509'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 5 AND 5 THEN 'SS510'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 6 AND 6 THEN 'SS511'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 7 AND 7 THEN 'SS512'

WHEN (V1 NOT IN ('high_bureau') ) AND V3 BETWEEN 1 AND 1 THEN 'SS513'
WHEN (V1 NOT IN ('high_bureau') ) AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN (V1 NOT IN ('high_bureau') ) AND V3 BETWEEN 3 AND 4 THEN 'SS514'
WHEN (V1 NOT IN ('high_bureau') ) AND V3 BETWEEN 5 AND 7 THEN 'SS515'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 1 AND 1 THEN 'SS516'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 3 AND 4 THEN 'SS517'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 5 AND 7 THEN 'SS518'
ELSE 'SS519' END
FROM TAB1;

EXPECTED OUPUT
-----------------

SELECT ROW_KEY_COL,
CASE 
WHEN V1 = 'high_bureau' AND V2 = 'Charge V2' AND V3 BETWEEN 1 AND 1 THEN 'SS501'
WHEN V1 = 'high_bureau' AND V2 = 'Charge V2' AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN V1 = 'high_bureau' AND V2 = 'Charge V2' AND V3 BETWEEN 3 AND 3 THEN 'SS502'
WHEN V1 = 'high_bureau' AND V2 = 'Charge V2' AND V3 BETWEEN 4 AND 4 THEN 'SS503'
WHEN V1 = 'high_bureau' AND V2 = 'Charge V2' AND V3 BETWEEN 5 AND 5 THEN 'SS504'
WHEN V1 = 'high_bureau' AND V2 = 'Charge V2' AND V3 BETWEEN 6 AND 6 THEN 'SS505'
WHEN V1 = 'high_bureau' AND V2 = 'Charge V2' AND V3 BETWEEN 7 AND 7 THEN 'SS506'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 1 AND 1 THEN 'SS507'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 3 AND 3 THEN 'SS508'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 4 AND 4 THEN 'SS509'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 5 AND 5 THEN 'SS510'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 6 AND 6 THEN 'SS511'
WHEN V1 = 'high_bureau' AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 7 AND 7 THEN 'SS512'

WHEN (V1 NOT IN ('high_bureau') ) AND V2 = 'Charge V2' AND V3 BETWEEN 1 AND 1 THEN 'SS513'
WHEN (V1 NOT IN ('high_bureau') ) AND V2 = 'Charge V2' AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN (V1 NOT IN ('high_bureau') ) AND V2 = 'Charge V2' AND V3 BETWEEN 3 AND 4 THEN 'SS514'
WHEN (V1 NOT IN ('high_bureau') ) AND V2 = 'Charge V2' AND V3 BETWEEN 5 AND 7 THEN 'SS515'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 1 AND 1 THEN 'SS516'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 2 AND 2 THEN 'SS519'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 3 AND 4 THEN 'SS517'
WHEN (V1 NOT IN ('high_bureau') ) AND (V2 NOT IN ('Charge V2') ) AND V3 BETWEEN 5 AND 7 THEN 'SS518'
ELSE 'SS519' END
FROM TAB1;

The issue is:-

If the RULE column value contains NON-NUMERIC value and if it contains SPACE between word then its causing that issue.. (Charge V2 --> if change to --> Charge_v2 then its working fine)

Please help me to fix.., thanks!

Again this is still a pretty basic rule parser but should cover the supplied examples:

     if (tolower(RL[rule])=="all other") wexp=sprintf("(%s NOT IN (%s) )%s",VAR[rule],AOTH[VAR[rule] NXT[rule]],ND) wexp
     else if (rf[2] == "to" && num==3) wexp=sprintf("%s BETWEEN %s AND %s%s",VAR[rule],rf[1],rf[3],ND) wexp
     else if (tolower(rf[2]" "rf[4]" "rf[5]) == "to or missing") wexp=sprintf("(%s BETWEEN %s AND %s OR %s IS NULL)%s",VAR[rule],rf[1],rf[3],VAR[rule],ND) wexp
     else if (length(RL[rule]) > 0) {
        if (index(RL[rule], ",")) wexp=sprintf("%s IN (%s)%s",VAR[rule],RL[rule],ND) wexp
        else
           if (RL[rule]+0 > 0 || RL[rule]+0 == RL[rule]) wexp=sprintf("%s = %s%s",VAR[rule],RL[rule],ND) wexp
           else wexp=sprintf("%s = %c%s%c%s",VAR[rule],39,RL[rule],39,ND) wexp
     }