Intelligent Script to Insert Records in Database Table

chetanojha · January 5, 2015, 6:56am

Hello All Members,

I am new to this forum and to the shell scripting. I want to write a shell script to do the following:

Scenario: I have a pipe delimited .txt file with multiple fields in it. The very first row of the file contains the name of the column which resembles the name of the column in the corresponding database table. I am providing the example of the file below:

�Name�|�Dept�|�Empno�|�Salary�|�DOB�|�DOJ�
�Alexander�|�10�|�200�|�$1000�|�25/05/1977�|�01/01/2015�
�Graham�|�11�|�300�|�$1500�|�05/09/1970�|�19/02/2011�
�Bell� |�12�|�400�|�$2000�|�31/01/1965�|�01/10/1999�

Table Structure:

Create table EMPLOYEE (
Name VARCHAR2(255),
Dept  NUMBER,
Empno NUMBER,
Salary NUMBER,
DOB DATE,
DOJ DATE);

Requirement: I want to write a script which can read the first row of the text file and create a dynamic insert statement based on the columns available in the file. The reason for this requirement is because:

I do not know whether same number of columns will arrive in file every time or not. File may contains all the columns or very few of them.
The order of the column is also not fixed. They could be in random order e.g. File can arrive with �Dept�|�Empno�|�DOB�|�DOJ�|�Name�|�Salary�

Currently I am using external table but I need to change the table definition every time files arrive in different column order. I cannot force the format of the file on the supplier of this file. So I have to work with this.

PsuedoCode: With my limited knowledge, I think below would be the pseudo code.

Read the first line of the file.
Read the columns available in the first line.
Create insert statement using the columns available in the first line.
And put the values to the above insert statement from the second row onwards.

I know this will not be a simple script (atleast for me). Hence I am posting this in forum. I would appreciate if any of the senior unix member can help me out with this.

Thanks.

RudiC · January 5, 2015, 7:14am

Did you consider the threads proposed at the bottom of this page?

Some questions reg. your requirement:

How does the script know the table name?
Is it always the same?
Will the table be deleted somewhere else?
How does the script know the data type of the columns?
Will the date format always be the same?

chetanojha · January 5, 2015, 7:28am

Dear RudiC,

Please see my reply below in blue. I hope I am making sense.

rudic:

Did you consider the threads proposed at the bottom of this page?
Yes, I have seen the links below. Most of them proposing to use the sqlldr which is something I cannot use in this case. I am currently using External tables (via OWB). But there is a restriction for that too. When file arrives with random order of the column name, the implicit sqlldr of the external table starts throwing error.

Some questions reg. your requirement:

How does the script know the table name?
Yes, I can arrange for script to know the name of the table based on the filename. I can pass name of the table as input parameter to script which can them embedded into the dynamic sql created by the script.

Is it always the same?
Yes, table name will always be the same. And so will be the column name. Ofcourse tablename and columns will be different from source file to source file but I think if we can achieve the basic functionality - that will not matter anymore (?)

Will the table be deleted somewhere else?
The nature and source of the data is such that I do not have any visibility of it. I do not know from where it is coming or how the files is created. Otherwise , I would have asked source side to provide file in a fixed format which can be uploaded into the database table (via sqldr or external table) straight way.

How does the script know the data type of the columns?
Script doesn't need to know the datatype of the column. I can arrange all columns to be of VARCHAR2 type. So that records are loaded irrespective of the data. This can further interpreted or casted into their respective datatypes.

Will the date format always be the same?
Yes. Date format will be same for all the records and for all the files.

pravin27 · January 5, 2015, 9:04am

Could this help you ?

#!/bin/sh

_fileName=$1
_tableName=$(basename ${_fileName})
_SQLFile=${_tableName}".sql"

tr "|" "," < ${_fileName} | awk -F"," -v tableNM=$_tableName -v sqlfn=${_SQLFile} 'BEGIN{sqlInsert="INSERT INTO table (columns) values (cValues)"}
                                  { if( NR==1)
                                    {
                                      gsub("\"","",$0)
                                      clName=$0
                                      next
                                    }
                                    gsub("table",tableNM,sqlInsert)
                                    gsub("columns",clName,sqlInsert)
                                    gsub("cValues",$0,sqlInsert)
                                    print  sqlInsert";\n" > sqlfn;
                                  }'



sqlplus -s user/passwrd <<ENDOFSQL
whenever sqlerror exit sql.sqlcode;
@${_SQLFile}
select sysdate from dual;
exit;
ENDOFSQL

RetCode=$?

#Check the return code from SQL Plus
if [ $RetCode != 0 ]
then
    echo "********************"
    echo "ERROR: The SQL Plus Command Failed. RetCode: $RetCode"
else
    echo "********************"
    echo "SQL Plus Successfully Ran. RetCode: $RetCode"
fi

Invocation

/path/toscript/InsertSql.sh /file/path/employee

chetanojha · January 5, 2015, 9:56am

Thanks RudiC,

I will run this in my environment and revert back to you. The table will already exist in the database so I do not need to add table creation/drop part of the code.

I am trying to understand how this will cater to the changing order of the columns (and column names)? I needed this to read the first row of the file to establish the number (and list) of columns arrive in the file. This list will then be added to the into the INSERT statement followed by the values.

pravin27:

Could this help you ?

#!/bin/sh

_fileName=$1
_tableName=$(basename ${_fileName})
_SQLFile=${_tableName}".sql"

tr "|" "," < ${_fileName} | awk -F"," -v tableNM=$_tableName -v sqlfn=${_SQLFile} 'BEGIN{sqlInsert="INSERT INTO table (columns) values (cValues)"}
   { if( NR==1)
   {
   gsub("\"","",$0)
   clName=$0
   next
   }
   gsub("table",tableNM,sqlInsert)
   gsub("columns",clName,sqlInsert)
   gsub("cValues",$0,sqlInsert)
   print  sqlInsert";\n" > sqlfn;
   }'



sqlplus -s user/passwrd <<ENDOFSQL
whenever sqlerror exit sql.sqlcode;
@${_SQLFile}
select sysdate from dual;
exit;
ENDOFSQL

RetCode=$?

#Check the return code from SQL Plus
if [ $RetCode != 0 ]
then
   echo "********************"
   echo "ERROR: The SQL Plus Command Failed. RetCode: $RetCode"
else
   echo "********************"
   echo "SQL Plus Successfully Ran. RetCode: $RetCode"
fi

Invocation

/path/toscript/InsertSql.sh /file/path/employee

---------- Post updated at 09:56 AM ---------- Previous update was at 09:54 AM ----------

I think i have lost the second post of yours which used awk command. I put that code in action but was expecting output as below:

insert into EMPLOYEE (Name, Deptno,Empno, Salary, DOB, DOJ)  values("Alexander","10","200","$1000","25/05/1977","01/01/2015"); 
insert into EMPLOYEE (Name, Deptno,Empno, Salary, DOB, DOJ) values("Graham","11","300","$1500","05/09/1970","19/02/2011"); 
insert into EMPLOYEE (Name, Deptno,Empno, Salary, DOB, DOJ) values("Bell" ,"12","400","$2000","31/01/1965","01/10/1999");

RudiC · January 5, 2015, 10:23am

For that, try

awk     '       {gsub (/\|/,",")
                }
         NR==1  {gsub (/"/,"")
                 TMP = "insert into EMPLOYEE (" $0 ") values ("
                 next
                }
                {print TMP, $0, ");"
                }
        ' FS="\|" file
insert into EMPLOYEE (Name,Dept,Empno,Salary,DOB,DOJ) values ( "Alexander","10","200","$1000","25/05/1977","01/01/2015" );
insert into EMPLOYEE (Name,Dept,Empno,Salary,DOB,DOJ) values ( "Graham","11","300","$1500","05/09/1970","19/02/2011" );
insert into EMPLOYEE (Name,Dept,Empno,Salary,DOB,DOJ) values ( "Bell" ,"12","400","$2000","31/01/1965","01/10/1999" );

To allow for your word processor's non-ASCII double quotes, you may want to add gsub (/�|�/, "\"") upfront.

I had to remove my former post as I missed the requirement by far. pravin27's suggestion will work for you after a little polishing.

chetanojha · January 5, 2015, 12:19pm

Hi Pravin,

I ran the script in my environment. The output I am getting from it is shown below.

1.I ran the script and it is populating correct insert statement in the file. But also giving below error after execution.

./InsertSQL.sh : line 30: [:0:unary operator expected

2.Besides, I am also getting error that it is not able to find the employee.sql file.
But when I looked into the directory, I can see a file {_SQLFile}

I think the script wants to create an employee.sql file and then run this file against sqlplus command which will eventually fire the insert statements created in the file.

INSERT INTO TABLE (NAME,DEPT,EMPNO,SALARY,DOB,DOJ) values ("Alexander","10","1000","10000","25/05/1977","01/01/2015");

---------- Post updated at 12:19 PM ---------- Previous update was at 11:58 AM ----------

I ran Pravin's code. After correcting many typos, which I introduced, I manage to run the code. And getting below two errors:

INSERT INTO EMPLOYEE (NAME, DEPT,EMPNO,SALARY,DOB,DOJ) VALUES ("Alexander","10","1000","10000","25/05/1977","01/01/2015");
ORA-00984: Column not allowed here.

I ran the command in sqlplus and realised the values part of the INSERT statement is carrying double quotes around VARCHAR columns. These should be single quotes. So Ideally the Insert Statement should look like below. I am amazed with help from your guys, I am getting there very fast. I think this is one of the last few hurdles and the script will be ready to roll.

INSERT INTO EMPLOYEE (NAME, DEPT,EMPNO,SALARY,DOB,DOJ) VALUES ('Alexander','10','1000','10000','25/05/1977','01/01/2015');
ORA-00984: Column not allowed here.

RudiC · January 5, 2015, 12:31pm

Try gsub (/�|�/, "\'") in the first line of the awk part.

chetanojha · January 5, 2015, 12:40pm

You mean as shown below:

tr "|" "," < ${_fileName} | awk -F"," -v tableNM=$_tableName -v sqlfn=${_SQLFile} 'BEGIN{sqlInsert="INSERT INTO table (columns) values (cValues)"}                                   { if( NR==1)                                     { 
                                      gsub (/"|"/,"\'")
                                      gsub("\"","",$0)                                       clName=$0                                       next                                     }                                     gsub("table",tableNM,sqlInsert)                                     gsub("columns",clName,sqlInsert)                                     gsub("cValues",$0,sqlInsert)                                     print  sqlInsert";\n" > sqlfn;                                   }'

OR

awk     '      {gsub (/"|"/,"\'") 
               }
               {gsub (/\|/,",")                 }          NR==1  {gsub (/"/,"")                  TMP = "insert into EMPLOYEE (" $0 ") values ("                  next                 }                 {print TMP, $0, ");"                 }         ' FS="\|" file

RudiC · January 5, 2015, 12:54pm

Sorry, can't read your post. Did you try the Go advanced and there Preview Post ?

However, put it into the general action before the NR==1 pattern, like

awk     '       {gsub (/\|/,",")
                 gsub (/�|�/, "\"")
                }
         NR==1  {gsub (/\'/,"")
                 TMP = "insert into EMPLOYEE (" $0 ") values ("
                 next
                }
                {print TMP, $0, ");"
                }
        ' FS="\|" file

, but then take care to remove the "\'" in the NR==1 action! And, make sure you know what you're working on: these � � three byte chars, or the ASCII 0x22 double quotes

Akshay_Hegde · January 5, 2015, 1:19pm

This might help you

akshay@Aix:/tmp$ cat infile
�Name�|�Dept�|�Empno�|�Salary�|�DOB�|�DOJ�
�Alexander�|�10�|�200�|�$1000�|�25/05/1977�|�01/01/2015�
�Graham�|�11�|�300�|�$1500�|�05/09/1970�|�19/02/2011�
�Bell� |�12�|�400�|�$2000�|�31/01/1965�|�01/10/1999�

akshay@Aix:/tmp$ cat import.awk
function clean(x){ gsub(/[��\047\042]/,x); gsub(/\|/,",") }
FNR==1{ clean(); h = $0 ; next}
{
	clean("\047");
	print "insert into",table,"("h")","values","("$0");"
}

akshay@Aix:/tmp$ awk -vtable="Employee" -f import.awk infile >test.sql

akshay@Aix:/tmp$ cat test.sql 
insert into Employee (Name,Dept,Empno,Salary,DOB,DOJ) values ('Alexander','10','200','$1000','25/05/1977','01/01/2015');
insert into Employee (Name,Dept,Empno,Salary,DOB,DOJ) values ('Graham','11','300','$1500','05/09/1970','19/02/2011');
insert into Employee (Name,Dept,Empno,Salary,DOB,DOJ) values ('Bell' ,'12','400','$2000','31/01/1965','01/10/1999');

---------- Post updated at 12:49 AM ---------- Previous update was at 12:38 AM ----------

OR try this

akshay@Aix:/tmp$ cat import_2.awk
function clean(x){ gsub(/[��\047\042]/,x); gsub(/\|/,",") }
FNR==1{ clean("`"); print "INSERT INTO","`"table"`","("$0")","VALUES"; next}
{
	if(s)print s",";
	clean("\047");
	s = "("$0")"
}
END{
	print s";"
}

akshay@Aix:/tmp$ awk -vtable="Employee" -f import_2.awk infile
INSERT INTO `Employee` (`Name`,`Dept`,`Empno`,`Salary`,`DOB`,`DOJ`) VALUES
('Alexander','10','200','$1000','25/05/1977','01/01/2015'),
('Graham','11','300','$1500','05/09/1970','19/02/2011'),
('Bell' ,'12','400','$2000','31/01/1965','01/10/1999');

RudiC · January 5, 2015, 1:22pm

This is what I get from above:

insert into Employee (Name,Dept,Empno,Salary,DOB,DOJ) values ('''Bell''' ,'''12''','''400''','''$2000''','''31/01/1965''','''01/10/1999''');

Don't forget, those �� are three byte chars (at least in my locale)!

pravin27 · January 6, 2015, 1:18am

Modified code

#!/bin/sh

_fileName=$1
_tableName=$(basename ${_fileName})
_SQLFile=${_tableName}".sql"

tr "|" "," < ${_fileName} | awk -F"," -v tableNM=$_tableName -v sqlfn=${_SQLFile} 'BEGIN{sqlInsert="INSERT INTO table (columns) values (cValues)"}
                                  { if( NR==1)
                                    {
                                      gsub("\"","",$0)
                                      clName=$0
                                      next
                                    }
                                    gsub("table",tableNM,sqlInsert)
                                    gsub("columns",clName,sqlInsert)
                                    gsub("\"","'\''",$0)
                                    gsub("cValues",$0,sqlInsert)
                                    print  sqlInsert";\n" > sqlfn;
                                  }'



sqlplus -s $user/$passwrd <<ENDOFSQL
whenever sqlerror exit sql.sqlcode;
@${_SQLFile}
select sysdate from dual;
exit;
ENDOFSQL

RetCode=$?

Check the return code from SQL Plus
if [ $RetCode != 0 ]
then
    echo "********************"
    echo "ERROR: The SQL Plus Command Failed. RetCode: $RetCode"
else
    echo "********************"
    echo "SQL Plus Successfully Ran. RetCode: $RetCode"
fi

chetanojha · January 6, 2015, 11:25am

Hi Akshay,

The first code works perfect for me. But I need these into one script. so that I can execute the test.sql from the single script only. So that once I fire the script, it will create the test.sql (or any other intermediate file), fire it against the database, and insert the records in the table.

One more questions, what do I need to change in case of ~ separated fields?

Thanks

---------- Post updated at 11:25 AM ---------- Previous update was at 07:09 AM ----------

Chaps. this must be my last question on this topic now.

How can I also include tilda (~) along with the pipe separators?

Some files have tilda and some have pipe.

Akshay_Hegde · January 6, 2015, 11:31am

Ok use this

function clean(x){ gsub(/[��\047\042]/,x); gsub(FS,",") }
FNR==1{ clean(); h = $0 ; next}
{
	clean("\047");
	print "insert into",table,"("h")","values","("$0");"
}

akshay@Aix:/tmp$ awk -vFS="~" -vtable="Employee" -f import.awk infile >test.sql

chetanojha · January 6, 2015, 12:10pm

This doesn't work. As while executing, it create INSERT statement with PIPE in it instead of Comma as shown below:

INSERT INTO EMPLOYEE (NAME|DEPT|EMPNO|SALARY|DOB|DOJ).......

akshay hegde:

Ok use this

function clean(x){ gsub(/[��\047\042]/,x); gsub(FS,",") }
FNR==1{ clean(); h = $0 ; next}
{
   clean("\047");
   print "insert into",table,"("h")","values","("$0");"
}

akshay@Aix:/tmp$ awk -vFS="~" -vtable="Employee" -f import.awk infile >test.sql

Akshay_Hegde · January 6, 2015, 12:25pm

or try this

if sep is ~

awk -vFS="\\\~" -vtable="Employee" -f import.awk infile

else if sep is |

awk -vFS="\\\|" -vtable="Employee" -f import.awk infile

RudiC · January 6, 2015, 2:56pm

Try

awk     '       {gsub (/[|~]/,",")
                 gsub (/�|�/, "\047")
                }
         NR==1  {gsub (/\047/,"")
                 TMP = "insert into EMPLOYEE (" $0 ") values ("
                 next
                }
                {print TMP, $0, ");"
                }
        '  file

chetanojha · January 7, 2015, 7:02am

Guys,

Thanks a lot for your prompt and detailed help. I am not able to put together script which can help me to dynamically create the insert script with different separators. And this will also insert data (using the insert statements created with your help) into the database table and cleanup the intermediate files.

Once again thanks a ton for your help especially RudyC, Akshay and Pravin. You guys really rock!! Keep the good work going.

Thanks.