Extract a block of text

kiki_riki_miki · March 7, 2013, 5:18am

Hello all,

I am working on a script which should parse a large file called input.txt which contains table definitions, index definitions and comments like these ones:

------------------------------------------------
-- DDL Statements for table "CMWSYS"."CMWD_TEC_SUIVI_TRT"
------------------------------------------------
 

CREATE TABLE "CMWSYS"."CMWD_TEC_SUIVI_TRT"  (
          "ORDER_ID" VARCHAR(20) , 
          "PERIODE_DATE" VARCHAR(50) , 
          "NOM_JOB" VARCHAR(100) , 
          "D_START" TIMESTAMP , 
          "D_END" TIMESTAMP , 
          "T_ELAPSED" TIME , 
          "STATUS" VARCHAR(50) )   
         IN "CMW_DA16DN01" INDEX IN "CMW_IX16DN01" ;

------------------------------------------------
-- DDL Statements for table "CMWSYS"."CMWD_REF_MARCHE_AFFECTATION"
------------------------------------------------
 

CREATE TABLE "CMWSYS"."CMWD_REF_MARCHE_AFFECTATION"  (
          "CD_CSM" SMALLINT NOT NULL , 
          "CD_SSE_3EME_CAR" CHAR(1) NOT NULL , 
          "CD_TYP_PERS" CHAR(3) NOT NULL , 
          "DT_VAL_DEB" DATE , 
          "CLE_SS_COMPRT" SMALLINT NOT NULL , 
          "DT_VAL_FIN" DATE , 
          "TOP_COURANT" CHAR(1) , 
          "TOP_ANNUL" CHAR(1) , 
          "DT_MAJ_ETL" TIMESTAMP )   
         IN "DIM_DA16DN01" INDEX IN "DIM_IX16DN01" ; 

COMMENT ON COLUMN "CMWSYS"."CMWD_REF_ZONE_MONETAIRE"."DT_MAJ_ETL" IS 'Date ETL chargement du flux';

COMMENT ON COLUMN "CMWSYS"."CMWD_REF_ZONE_MONETAIRE"."TOP_ANNUL" IS '1 annul�, 0 sinon';

The script should extract each table definition in one separate file (normal txt file) called dbname_tablename.txt. The "dbname" is the first word in the quotation marks after CREATE TABLE statement and the "tablename" is the string characters included in quotation marks after the '.' sign (for this case CMWSYS_CMWD_TEC_SUIVI_TRT or CMWSYS_CMWD_REF_MARCHE_AFFECTATION). All the others definitions (indexes) and the comments should be ignored..

Do you have any idea how I can manage this task? I still have no result with my humble script I am working at.

Thanks

ctsgnb · March 7, 2013, 5:27am

If you don't provide more clue about the expected format's result, i am afraid people won't be able to help you much.

kiki_riki_miki · March 7, 2013, 6:50am

I edited the description and I guess it should be better now. Thank you, ctsgnb for this observation.

ctsgnb · March 7, 2013, 8:16am

Depending on your Database, some specific tool should exist to perform such kind of task. I think you should consider this approach rather than "reinventing the wheel"

kiki_riki_miki · March 7, 2013, 10:46am

I do not intend to reinvent the wheel, but to find a useful tool to make this conversion. I found some tools, but all of them are not free of use, unless you pay for the license.

Corona688 · March 7, 2013, 11:03am

Could you show a larger sample? Telling where the definitions begin and end can be awkward unless they all begin the same way, with a comment...

You can use awk for this though, I'm pretty sure. It has a "block" mode, where it can read entire sections delimited by blank lines.

kiki_riki_miki · March 7, 2013, 3:11pm

Attached you will find the input.txt file containing all the table definitions.

Corona688 · March 7, 2013, 3:20pm

Okay, so each table is prefixed by a 'Statements for table' comment. This should put them in numbered files.

nawk '/Statements for table/ { ++FILE }; { print > sprintf("file%04d", FILE) }' FILE=0 RS="" ORS="\n\n" FS="\n" inputfile

kiki_riki_miki · March 8, 2013, 4:19am

There is an equivalent command using awk instead of nawk? I am receiving the following error when I launch the script:

tddemo2:~/test/scripts # ./test.ksh input.txt
./test.ksh[84]: nawk: not found [No such file or directory]

I am working on the following version of Linux:

tddemo2:~/test/scripts # uname -a
Linux 2.6.16.54-0.2.12-smp #1 SMP x86_64 GNU/Linux

busyboy · March 8, 2013, 4:54am

Hope this is what you are looking for..

awk '/^CREATE TABLE/,/^[ \t]*IN/ { print }' input.txt

kiki_riki_miki · March 8, 2013, 9:34am

Yes busyboy, you are great - this is what I was searching for. Now all I have to do is to split the output file into small ones - one for each table. Thank you!

Corona688 · March 8, 2013, 10:24am

If you are on Linux, awk is nawk.

If you're on Solaris or AIX or some other kinds of UNIX, 'awk' is 'old awk', kept for compatibility reasons, and 'nawk' is what you really want.

busyboy · March 13, 2013, 4:38am

sorry for late response.

hope this may help.

awk  '/^CREATE TABLE/,/^[ \t]*IN/ { if($0 ~ /^CREATE TABLE/)  { TBDB=$0; gsub(/CREATE TABLE|\"|\(|[ ]/,"",TBDB);   }; print > "TABLE."TBDB".txt" }' input.txt