Replace sql with dynamic values

Hi Guys,

I am using a function to replace the values dynamically to frame sql query by reading a file. My file will have column names like

file.txt
col_1
col_2

expected output:

select id,col_1,col_2 from (
select a.id,
a.col_1,
rank() over (ORDER BY cast(a.col_1 AS double) DESC) AS col_1_rank,
a.col_2,
rank() over (ORDER BY cast(a.col_2 AS double) DESC) AS col_2_rank
from table_name a )t
join 
( 
SELECT count(case when col_1='.' then null else 1 end) AS col_1_count,
count(case when col_2='.' then null else 1 end) AS col_2_count				
FROM   table_name
) f
where t.col_1_rank < 0.0001*f.col_1_count or t.col_2_rank < 0.0001*f.col_2_count;

where ever i am using col_1 and col_2 that should be read from the file and get replaced , there is a possibility that more columns will be added in file.txt but the query structure remains the same

sql_query {
while read -r col do
printf 'select id,%s,%s from (
select a.id,
a.%s,
rank() over (ORDER BY cast(a.%s AS double) DESC) AS %s_rank,
a.%s,
rank() over (ORDER BY cast(a.%s AS double) DESC) AS %s_rank
from table_name a )t
join 
( 
SELECT count(case when %s='.' then null else 1 end) AS %s_count,
count(case when %s='.' then null else 1 end) AS %s_count				
FROM   table_name
) f
where t.%s_rank < 0.0001*f.%s_count or t.%s_rank < 0.0001*f.%s_count;'
done < $1 # this will be file.txt as parameter

}

And the question is?
What if the number of columns is NOT 2, i.e. is 1 or 4? And, isn't there a table_name missing?

Yes the number of columns in the file can increase but the structure of the query remains the same and it gets added for e.g. if col_3 is there in file.txt. Then below will be the output

select id,col_1,col_2,col_3 from (
select a.id,
a.col_1,
a.col_2,
a.col_3,
rank() over (ORDER BY cast(a.col_1 AS double) DESC) AS col_1_rank,
rank() over (ORDER BY cast(a.col_2 AS double) DESC) AS col_2_rank,
rank() over (ORDER BY cast(a.col_3 AS double) DESC) AS col_3_rank,
from table_name a )t
join 
( 
SELECT count(case when col_1='.' then null else 1 end) AS col_1_count,
count(case when col_2='.' then null else 1 end) AS col_2_count,
count(case when col_3='.' then null else 1 end) AS col_3_count				
FROM   table_name
) f
where t.col_1_rank < 0.0001*f.col_1_count or t.col_2_rank < 0.0001*f.col_2_count or t.col_3_rank < 0.0001*f.col_3_count ;

So why don't you enhance your function to handle additional column entries?

Post your attempts to resolve this problem and let us know where you are stuck.

Thanks Yoda,

I am struck in my function where the query formation gets duplicated. So struck over there.

Try

awk '
        {T[NR] = $1
        }

END     {printf "select id"
         for (i=1; i<=NR; i++) printf ",%s", T
         print " from (" ORS "select a.id,"
         for (i=1; i<=NR; i++)  printf "a.%s,%srank() over (ORDER BY cast(a.%s AS double) DESC) AS %s_rank%s%s", T, ORS, T, T, (i<NR)?",":"", ORS
         print "from table_name a )t" ORS "join" ORS "(" ORS "SELECT"
         for (i=1; i<=NR; i++) printf "count(case when %s=%s.%s then null else 1 end) AS %s_count%s%s", T, SQ, SQ, T, (i<NR)?",":"", ORS
         print "FROM table_name" ORS ") f"
         printf "where"
         for (i=1; i<=NR; i++) printf " t.%s_rank < 0.0001*f.%s_count%s", T, T, (i==NR)?";" ORS:" or" 
        }
' SQ="'" file

with file containing any number of columns.

1 Like