Parse through ~21,000 Database DDL statements -- Fastest way to perform search, replace and insert

Hello All:

We are looking to search through 2000 files with around 21,000 statements where we have to search, replace and insert a pattern based on the following:

1) Parse through the file and check for CREATE MULTISET TABLE or CREATE SET TABLE statements.....and they always end with ON COMMIT PRESERVE ROWS;

CREATE MULTISET VOLATILE TABLE vt_test
, NO FALLBACK, NO JOURNAL, NO LOG AS
(
                select col1,max(col2) as col2 
                from vt_test
                where col3 in (1,2,3)
)
WITH DATA 
PRIMARY INDEX (col1) 
ON COMMIT PRESERVE ROWS;

2) Replace WITH DATA to WITH NO DATA. If there is already NO DATA, skip changing it.

CREATE MULTISET VOLATILE TABLE vt_test
, NO FALLBACK, NO JOURNAL, NO LOG AS
(
                select col1,max(col2) as col2 
                from vt_test
                where col3 in (1,2,3)
)
WITH NO DATA 
PRIMARY INDEX (col1) 
ON COMMIT PRESERVE ROWS;

3) Add an INSERT statement right after this....with the same table name. Basically, take the SELECT part from above and end it with a semi-colon. The challenge is code is not formatted. It can be in one line...and can be a mix of CAPS and SMALL (case insensitive). Final code should look like this:

CREATE MULTISET VOLATILE TABLE vt_test
, NO FALLBACK, NO JOURNAL, NO LOG AS
(
                select col1,max(col2) as col2 
                from vt_test
                where col3 in (1,2,3)
)
WITH NO DATA 
PRIMARY INDEX (col1) 
ON COMMIT PRESERVE ROWS;

INSERT INTO vt_test
select col1,max(col2) as col2 
from vt_test
where col3 in (1,2,3);

4) Output a new file in another directory....

I am comfortable doing the pattern replace using awk, but not well versed with doing the additional step. Please see below and see if you can help.

#!/usr/bin/ksh
#|------------------------------------------------------------------|
#|  Split the CREATE TABLE AS into DDL and DML Step
#|------------------------------------------------------------------|

usage ()
{
     echo " Usage: $0 <SRC_DIR> <TGT_DIR>"
}

if [ $# -lt 2 ]; then
        usage
        exit;
fi

SRC_DIR=$1
TGT_DIR=$2


for i in *.prc
do
awk 'BEGIN{IGNORECASE=1} {gsub(/WITH DATA/,"WITH NO DATA");print}' $i > TGT_DIR
done

You may want to use this as a starting point:

awk     '       {TMP = $5
                 sub ("WITH DATA", "WITH NO DATA");
                }
         NF>1   {print
                 print "\nINSERT INTO " TMP "\nselect col1,max(col2) as col2\nfrom " TMP "\nwhere col3 in (1,2,3)"
                }
         END    {printf "\n"}
        ' RS=";" ORS=";" file
CREATE MULTISET VOLATILE TABLE vt_test
, NO FALLBACK, NO JOURNAL, NO LOG AS
(
                select col1,max(col2) as col2 
                from vt_test
                where col3 in (1,2,3)
)
WITH NO DATA 
PRIMARY INDEX (col1) 
ON COMMIT PRESERVE ROWS;
INSERT INTO vt_test
select col1,max(col2) as col2
from vt_test
where col3 in (1,2,3);

Thanks Rudi. Much appreciated. Tested this.....The challenge is that each statement is different. In addition, this snippet touches other parts of the code.

Before:

BEGIN

   DECLARE v_1 INTEGER;
   DECLARE v_1 BIGINT;
   DECLARE v_3 VARCHAR(16);

CREATE MULTISET VOLATILE TABLE vt_test2
(
      col1 VARCHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      col2 BIGINT NOT NULL,
      col3 BYTEINT NOT NULL,
      col4 BYTEINT NOT NULL
)
PRIMARY INDEX ( col1 ) ON COMMIT PRESERVE ROWS;

Now became....

BEGIN

   DECLARE v_1INTEGER;
INSERT INTO Date
select
from Date );
   DECLARE v_2 BIGINT;
INSERT INTO
select
from  );
   DECLARE v_3 VARCHAR(16);
INSERT INTO
select
from  );

CREATE MULTISET VOLATILE TABLE vt_test2
(
      col1 VARCHAR(16) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
      col2 BIGINT NOT NULL,
      col3 BYTEINT NOT NULL,
      col4 BYTEINT NOT NULL
)
PRIMARY INDEX ( col1 ) ON COMMIT PRESERVE ROWS;

INSERT INTO vt_test2
select
from vt_test2 );

The code should not touch these at all...and that is the biggest challenge.

BEGIN {
IGNORECASE=1
}
/^\(/,/^\)/ { # lets find out the table name and fill array.
	if ( $1 !~ /\(|\)/ ) {
	if ( match($0,/(from [a-z_]+)/)) {
	table=substr($0,RSTART+5,RLENGTH-5)
	}
	a[$0]
	}
}

{
if (sub("WITH DATA","WITH NO DATA")) { rtn = 1 }
} 1 # do END (change) or print same file.

END {
if ( rtn > 0 ) {
print "INSERT INTO " table
for ( i in a ) {
	printf i
	}
	printf "; \n"
	}
}

Save as job.awk and run in scripts as awk -f job.awk yourinputfile

Hope that helps
Regards
Peasant.

1 Like

Thanks Peasant. Tried the option. Take for example the following file which has these contents.

   CREATE MULTISET VOLATILE TABLE vt_test2
   , NO FALLBACK, NO JOURNAL, NO LOG AS
   (
      SELECT a.col1                   ,
             a.col2,
             a.col3,
             a.col4,
             a.col5                ,
             1 AS rule
      FROM   table1 a
      INNER JOIN vt_test1 b
             ON     a.col1 = b.col1
      WHERE  a.col1     = 1
      AND    a.col2    =
             (SELECT MAX(c.col3) AS max_col3
             FROM    talble1 c
             WHERE   b.col2=c.col2
             )
   )
   WITH DATA
   PRIMARY INDEX (col1)
   ON COMMIT PRESERVE ROWS;

CREATE MULTISET VOLATILE TABLE vt_test
, NO FALLBACK, NO JOURNAL, NO LOG AS
(
                select col1,max(col2) as col2
                from vt_test
                where col3 in (1,2,3)
)
WITH DATA
PRIMARY INDEX (col1)
ON COMMIT PRESERVE ROWS;

What is interesting is, the program read the first statement, changed WITH DATA to WITH NO DATA and skipped adding INSERT statement. Read the next one, added the INSERT but changed the format.

   CREATE MULTISET VOLATILE TABLE vt_test2
   , NO FALLBACK, NO JOURNAL, NO LOG AS
   (
      SELECT a.col1                   ,
             a.col2,
             a.col3,
             a.col4,
             a.col5                ,
             1 AS rule
      FROM   table1 a
      INNER JOIN vt_test1 b
             ON     a.col1 = b.col1
      WHERE  a.col1     = 1
      AND    a.col2    =
             (SELECT MAX(c.col3) AS max_col3
             FROM    talble1 c
             WHERE   b.col2=c.col2
             )
   )
   WITH NO DATA
   PRIMARY INDEX (col1)
   ON COMMIT PRESERVE ROWS;

CREATE MULTISET VOLATILE TABLE vt_test
, NO FALLBACK, NO JOURNAL, NO LOG AS
(
                select col1,max(col2) as col2
                from vt_test
                where col3 in (1,2,3)
)
WITH NO DATA
PRIMARY INDEX (col1)
ON COMMIT PRESERVE ROWS;
INSERT INTO vt_test
                where col3 in (1,2,3)                from vt_test                select col1,max(col2) as col2 ;

Why didn't (or even: don't) you post a meaningful, comprehensive example of your file? No surprise the proposals fail. On the sample you supplied the proposal does work satisfyingly.