extract DDL - output every match to separate file

Hi,
i want to extract the 'CREATE INDEX' or 'CREATE UNIQUE INDEX' statements from a ddl file and output each match to a separate file. i was looking around the net but couldn�t find anything.

a possible sed-script could be:

sed -n '/CREATE[A-Za-z ]*INDEX*/,/COMMIT/p' filename.ddl

but i couldn�t find out how to output each match to a separate file. every CREATE-statement should go to its own file, while the filename is the name of the index.

any idea? is sed the right choice for this?

thank you

can you provide some sample statements?

--ahamed

Looking for a program to extract SQL statements is like looking for a programming language to print the letter 'A'. You're probably not going to find someone who had the exact same problem as you and solved it the exact same way...

nawk or gawk would be better.

$ echo "CREATE INDEX" | gawk '/CREATE([\t ]+UNIQUE)?[\t ]+INDEX/ { print $0 > NR ".txt" ; close(NR ".txt"); }'
$ cat 1.txt
CREATE INDEX
$

example:

-- TABLE001
CREATE UNIQUE INDEX ATABL001
    ON TABLE001
     (COLUMN1 ASC,
      COLUMN2 ASC,
      COLUMN3 ASC,
      COLUMN4 ASC);
COMMIT;

CREATE INDEX BTABL001
    ON TABLE001
     (COLUMN3 ASC,
      COLUMN4 ASC);
COMMIT;

CREATE INDEX CTABL001
    ON TABLE001
     (COLUMN3 ASC);
COMMIT;

-- TABLE002
CREATE UNIQUE INDEX ATABL002
    ON TABLE002
     (COLUMN1 ASC,
      COLUMN2 ASC,);
COMMIT;
...

every single create index statement should go in its separate file e.g. index ATABL001 to ATABL001.ddl, BTABL001 to BTABL001.ddl, ... i mean the whole statement to the semicolon or the commit...

awk '/^--/{next}/CREATE UNIQUE INDEX|CREATE INDEX/{file=$NF}{print>file".ddl"}' input.ddl

Assuming this file has only index create statements!

--ahamed

Then:

gawk 'BEGIN { FILE=1 }
/CREATE([\t ]+UNIQUE)?[\t ]+INDEX/ { if(W) close(FILE ".txt");  FILE++; W=0 }
{ print $0 > FILE ".txt" ; W=1 }' < infile

this one worked for me! great! thank you!

couldn�t try the other one because i currntly don�t have gawk installed.

$
$ # list all files in current directory
$ ls
ddl_stmts.sql
$
$ # display the contents of "ddl_stmts.sql"
$ cat -n ddl_stmts.sql
     1  -- TABLE001
     2  CREATE UNIQUE INDEX ATABL001
     3      ON TABLE001
     4       (COLUMN1 ASC,
     5        COLUMN2 ASC,
     6        COLUMN3 ASC,
     7        COLUMN4 ASC);
     8  COMMIT;
     9
    10  CREATE INDEX BTABL001
    11      ON TABLE001
    12       (COLUMN3 ASC,
    13        COLUMN4 ASC);
    14  COMMIT;
    15
    16  CREATE INDEX CTAB001
    17      ON TABLE001
    18       (COLUMN3 ASC);
    19  COMMIT;
    20
    21  -- TABLE002
    22  CREATE UNIQUE INDEX ATABL002
    23      ON TABLE002
    24       (COLUMN1 ASC,
    25        COLUMN2 ASC,);
    26  COMMIT;
    27
$
$ # run the Perl one-liner
$
$ ##
$ perl -lne 'if (/^create.*?index (.*?)$/i) {open (FH, ">", "$1.sql"); print FH $_; $in=1}
             elsif (/;\s*$/ && $in) {print FH $_; close (FH); $in=0}
             elsif ($in) {print FH $_}
            ' ddl_stmts.sql
$
$
$ # list all files in current directory again
$ ls -1
ATABL001.sql
ATABL002.sql
BTABL001.sql
CTAB001.sql
ddl_stmts.sql
$
$ # display the contents of the newly created SQL files
$
$ cat -n ATABL001.sql
     1  CREATE UNIQUE INDEX ATABL001
     2      ON TABLE001
     3       (COLUMN1 ASC,
     4        COLUMN2 ASC,
     5        COLUMN3 ASC,
     6        COLUMN4 ASC);
$
$ cat -n ATABL002.sql
     1  CREATE UNIQUE INDEX ATABL002
     2      ON TABLE002
     3       (COLUMN1 ASC,
     4        COLUMN2 ASC,);
$
$ cat -n BTABL001.sql
     1  CREATE INDEX BTABL001
     2      ON TABLE001
     3       (COLUMN3 ASC,
     4        COLUMN4 ASC);
$
$ cat -n CTAB001.sql
     1  CREATE INDEX CTAB001
     2      ON TABLE001
     3       (COLUMN3 ASC);
$
$

tyler_durden

Did you try nawk like I suggested?

The difference is that gawk/nawk have close. Open too many files at once in awk and you can hit its limit -- and that can be pretty small on some systems!

I guess, awk also has close(). I never thought about that.

--ahamed

Not all awk, especially not on "true" UNIXes where shell commands usually don't have any non-POSIX options or features. They'd rather add a 'nawk' than risk breaking vanilla awk's backwards compatibility in someone's 30-year-old shell script. In Solaris, nawk is available under the silly path of /usr/xpg4/bin/awk...

Linux/GNU usually has nothing but gawk. awk is just a symlink. Things like this can occasionally make testing UNIX scripts on Linux difficult -- features that seem so obvious they should be everywhere turn out to be missing...

there are other statements in the file too e.g. CREATE TABLE. how would the above script look like?