Create table

Hi

I want create table based on csv file .I have come up with some informations getting columns names and then no idea from that
.Please any help


file.txt

col1|col2|col3|col4|col5
1234|zxxxx|xcvvv|300|null

file.sh

file=$1
c1=`head -1 $file|tr "|" "\n" |cat -n`
echo "$c1"
cnt=`echo "$c1"|nawk -F " " '{print $1}'`
#echo "$cnt"

for i in `echo $c1`
do
echo $i
done

Thanks in advance
MR

If file.txt is your source file, what is the expected output?

Expecting like that

"create table tablename(
col1 varchar2(10),
col2 number(10,2)
...).

Regards,
MR

Hi,

How u will identify the datatypes with width of the coloumns.

Hi
Sorry for not providing complete information

using the below ,will find out max length
 awk -F "|" '{ if (length($83) > max) max = length($83) } END { print max }'    file


cat filename|egrep '[0-9]' ---for numbers

car filename |egrep '[a-z][A-Z]' --for strings

Something like that ?

awk -F'|' '
NR==1 {
   for (c=1; c<=NF; c++) {
      colName[c] = $c;
   }
   colCount = NF;
   next;
}
{
   for (c=1; c<=NF; c++) {
      if ($c !~ /^[0-9]+$/) colIsText[c]++;
      if (length($c) > colLength[c]) colLength[c] = length($c);
   }
}
END {
   print "create table tablename("
   for (c=1; c<=colCount; c++) {
      out = "  " colName[c];
      if (colIsText[c]) 
         out = out " varchar2(" colLength[c] ")";
      else
         out = out " number(" colLength[c] ",2)";
      if (c != colCount) out = out ",";
      print out
   }
   print ");";
}
' inputfile

Input file:

col1|col2|col3|col4|col5
1234|zxxxx|xcvvv|300|null
12|abcdefghij|xyz|1024|comment

Output:

create table tablename(
  col1 number(4,2),
  col2 varchar2(10),
  col3 varchar2(5),
  col4 number(4,2),
  col5 varchar2(7)
);

Three problems remaining :

  • Table name (fixed to tablename in the script)
  • Precision for numbers (fixed to 2 in the script)
  • Columns with null value
    Jean-Pierre.

Hi Aigles,

Thanks a lot for your reply .can you please explain? .Is there any way to check for date format.Regarding the decimal and precesion uisng awk check for numbers with decimal ,if any decimal ,want hard code to decimal(38,4) else only number with the length.

ex:

col1|col2|col3|col4|col5
1234|zxxxx|xcvvv|300.200|10/08/2008 

output

create table tablename(
  col1 number(4),
  col2 varchar2(10),
  col3 varchar2(5),
  col4 number(38,4),
  col5 date
);

Regards,
MR

New version:

awk -F'|' '
NR==1 {
   for ( c=1; c<=NF; c++ ) {
      colName[c] = $c;
   }
   colCount = NF;
   next;
}
{
   for ( c=1; c<=NF; c++ ) {
           if ( $c ~ /^[0-9]+$/        )          colIsNum[c]++
      else if ( $c ~ /^[0-9]*\.[0-9]*$/ )         colIsDec[c]++
      else if ( $c ~ /^[0-9]+\/[0-9]+\/[0-9]+$/ ) colIsDat[c]++
      else                                        colIsTxt[c]++;
      if (length($c) > colLength[c]) colLength[c] = length($c);
   }
}
END {
   print "create table tablename("
   for ( c=1; c<=colCount; c++ ) {
      out = "  " colName[c];
           if ( colIsTxt[c] ) out = out " varchar2(" colLength[c] ")"
      else if ( colIsDat[c] ) out = out " date"
      else if ( colIsDec[c] ) out = out " number(38,4)"
      else if ( colIsNum[c] ) out = out " number(" colLength[c] ")"
      else                    out = out " Error!";
      if ( c != colCount ) out = out ",";
      print out
   }
   print ");";
}
' inputfile

Inputfile :

col1|col2|col3|col4|col5
1234|zxxxx|999|300.2|01/01/1970
12|abcdefghij|xyz|1024|10/08/2010

Output :

create table tablename(
  col1 number(4),
  col2 varchar2(10),
  col3 varchar2(3),
  col4 number(38,4),
  col5 date
);

Jean-Pierre.

Hi Aigles,
Thanks for your reply .Its working for all columns except last column.
For the last column,set data type with column name .
problem with 4th column,its start with decimal places but the o/p showing varchar2(13)

Input file

col1|col2|col3|col4|col5|col6|col7|col8
1234|zxxxx|999|   |01/01/1970|||xxxx
12|abcdefghij|xyz|13.0000000000|10/08/2010||commentsxx|REAL xxxx- xxxx WILL ONLY BE xxxx; renegotiated 12/16/2000, yyy 47106812 ;payment xxx  

Output

create table tablename(
  col1 number(4),
  col2 varchar2(10),
  col3 varchar2(3),
  varchar2(13),
  col5 date,
  col6 varchar2(2),
  col7 varchar2(10),
  varchar2(67) col8
);

Regards,
MR

The 4th Column in row #2 doesn't contain a number (only spaces), so the column is assigned the varchar2 type.

Jean-Pierre.

---------- Post updated at 08:27 ---------- Previous update was at 08:18 ----------

With your last sample datas, i don't have the problem on the last column name :
Input file:

col1|col2|col3|col4|col5|col6|col7|col8
1234|zxxxx|999|   |01/01/1970|||xxxx
12|abcdefghij|xyz|13.0000000000|10/08/2010||commentsxx|REAL xxxx- xxxx WILL ONLY BE xxxx; renegotiated 12/16/2000, yyy 47106812 ;p
ment xxx

Output:

create table tablename(
  col1 number(4),
  col2 varchar2(10),
  col3 varchar2(3),
  col4 varchar2(13),
  col5 date,
  col6 varchar2(),
  col7 varchar2(10),
  col8 varchar2(85)
);

but there is a problem with the 6th column which is empty for all rows.

Have you modified the posted script ?
Our outputs are not the same.

Jean-Pierre.

Hi Aigles,
There is no space in the 4 thc column ,but it still showing string data type.Plese advise

input file

col1|col2|col3|col4|col5|col6|col7|col8
1234|zxxxx|999|32.3000|01/01/1970|||xxxx
12|abcdefghij|xyz|13.0000000000|10/08/2010||commentsxx|REAL xxxx- xxxx WILL ONLY BE xxxx; renegotiated 12/16/2000, yyy 47106812 ;payment xxx

output

create table tablename(
  col1 number(4),
  col2 varchar2(10),
  col3 varchar2(3),
  col4 varchar2(13),
  col5 date,
  col6 varchar2(),
  col7 varchar2(10),
  col8 varchar2(85)
);

Regards,
MR

I ran the script with your datas, and the result seems fine for me :

$ cat mohan.sh
awk -F'|' '
NR==1 {
   for ( c=1; c<=NF; c++ ) {
      colName[c] = $c;
   }
   colCount = NF;
   next;
}
{
   for ( c=1; c<=NF; c++ ) {
           if ( $c ~ /^[0-9]+$/        )          colIsNum[c]++
      else if ( $c ~ /^[0-9]*\.[0-9]*$/ )         colIsDec[c]++
      else if ( $c ~ /^[0-9]+\/[0-9]+\/[0-9]+$/ ) colIsDat[c]++
      else                                        colIsTxt[c]++;
      if (length($c) > colLength[c]) colLength[c] = length($c);
   }
}
END {
   print "create table tablename("
   for ( c=1; c<=colCount; c++ ) {
      out = "  " colName[c];
           if ( colIsTxt[c] ) out = out " varchar2(" colLength[c] ")"
      else if ( colIsDat[c] ) out = out " date"
      else if ( colIsDec[c] ) out = out " number(38,4)"
      else if ( colIsNum[c] ) out = out " number(" colLength[c] ")"
      else                    out = out " Error!";
      if ( c != colCount ) out = out ",";
      print out
   }
   print ");";
}
' mohan.txt
$ cat mohan.txt
col1|col2|col3|col4|col5|col6|col7|col8
1234|zxxxx|999|32.3000|01/01/1970|||xxxx
12|abcdefghij|xyz|13.0000000000|10/08/2010||commentsxx|REAL xxxx- xxxx WILL ONLY BE xxxx; renegotiated 12/16/2000, yyy 47106812 ;payment xxx
$ ./mohan.sh
create table tablename(
  col1 number(4),
  col2 varchar2(10),
  col3 varchar2(3),
  col4 number(38,4),
  col5 date,
  col6 varchar2(),
  col7 varchar2(10),
  col8 varchar2(85)
);
$ 

Jean-Pierre.

Hi Aigless,

Thanks for your great help .Want to check with you ,if the columns contains spaces or null and numbers ,will not assign number data type .The i/p file some of the columns are null and some with date/number values ,its assigned to varchar2 data type.Is it any otherway to check to ignore null or spaces and assign correspoing data type.

Thanks in advance
MR

The following new version of the AWK program ignore null and spaces.

awk -F'|' '
BEGIN {
   colDefaultLength = 1;
}
NR==1 {
   for ( c=1; c<=NF; c++ ) {
      colName[c] = $c;
   }
   colCount = NF;
   next;
}
{
   for ( c=1; c<=NF; c++ ) {
      if (length($c) > colLength[c]) colLength[c] = length($c);
           if ( $c ~ /^ *$/ )                     continue;
           if ( $c ~ /^[0-9]+$/        )          colIsNum[c]++
      else if ( $c ~ /^[0-9]*\.[0-9]*$/ )         colIsDec[c]++
      else if ( $c ~ /^[0-9]+\/[0-9]+\/[0-9]+$/ ) colIsDat[c]++
      else                                        colIsTxt[c]++;
   }
}
END {
   print "create table tablename("
   for ( c=1; c<=colCount; c++ ) {
      if ( colLength[c]+0 == 0 ) colLength[c] = colDefaultLength;
      out = "  " colName[c];
           if ( colIsTxt[c] ) out = out " varchar2(" colLength[c] ")"
      else if ( colIsDat[c] ) out = out " date"
      else if ( colIsDec[c] ) out = out " number(38,4)"
      else if ( colIsNum[c] ) out = out " number(" colLength[c] ")"
      else                    out = out " varchar2(" colLength[c] ")";
      if ( c != colCount ) out = out ",";
      print out
   }
   print ");";
}
' inputfile

Inputfile :

col1|col2|col3|col4|col5|col6|col7|col8
1234|zxxxx|999|32.3000|01/01/1970|||xxxx
12|abcdefghij|xyz|13.0000000000|10/08/2010||commentsxx|REAL xxxx- xxxx WILL ONLY BE xxxx
|not empty|foo|   |05/05/2010||text|

Output:

create table tablename(
  col1 number(4),
  col2 varchar2(10),
  col3 varchar2(3),
  col4 number(38,4),
  col5 date,
  col6 varchar2(1),
  col7 varchar2(10),
  col8 varchar2(33)
);

Jean-Pierre.

Hi Aigless,

Thanks a lot for your great help .Its working fine.

Regards,
MR

Hi Aigles,

Thanks for your help ,script is working fine.I need some additional modifcation on the script

If string data lenghth below 50 ,set string default length to 50 ,if between 50 to lessthan 100,set default data length to 100
If between 100 to less than 200 ,set default length to 500

If between 500 to less than 1000,set default length 1000.

Numbers data length ,no issue ,set to 38 whatvery length comes

If there is spaces or no value at the last column ,script to remove those values .

ex:
col1|col2|col3|col4|
xx|xx1|5566|8999|

Thanks in advance
MR

Try and adapt this new version (not tested)

awk -F'|' '
BEGIN {
   colDefaultLength = 1;
}
NR==1 {
   for ( c=1; c<=NF; c++ ) {
      colName[c] = $c;
   }
   colCount = NF;
   if (!colName[colCount]) colCount--;
   next;
}
{
   for ( c=1; c<=NF; c++ ) {
      if (length($c) > colLength[c]) colLength[c] = length($c);
           if ( $c ~ /^ *$/ )                     continue;
           if ( $c ~ /^[0-9]+$/        )          colIsNum[c]++
      else if ( $c ~ /^[0-9]*\.[0-9]*$/ )         colIsDec[c]++
      else if ( $c ~ /^[0-9]+\/[0-9]+\/[0-9]+$/ ) colIsDat[c]++
      else                                        colIsTxt[c]++;
   }
}
END {
   print "create table tablename("
   for ( c=1; c<=colCount; c++ ) {

      if ( colLength[c]+0 == 0 ) colLength[c] = colDefaultLength;
      var2Len = colLength[c];
           if (var2Len <   50) var2Len = 50
      else if (var2Len <  100) var2Len = 100
      else if (var2Len <  500) var2Len = 500
      else if (var2Len < 1000) var2Len = 1000;

      out = "  " colName[c];
           if ( colIsTxt[c] ) out = out " varchar2(" var2Len ")"
      else if ( colIsDat[c] ) out = out " date"
      else if ( colIsDec[c] ) out = out " number(38,4)"
      else if ( colIsNum[c] ) out = out " number(" colLength[c] ")"
      else                    out = out " varchar2(" var2Len ")";
      if ( c != colCount ) out = out ",";

      print out

   }
   print ");";
}
' inputfile

Jean-Pierre.

Hi Aigles,

Thanks a lot,its working fine.

Regards,
MR

Hi Aigles,

The script workinf fine if last filed is null and data also null(After col4) ,if any column data also null,its giving Error data type .For Instance for 2 column having null values ,when executing the script its giving Error data type.Is there any way to check only last field column name ,if column name exist and then no data set string 5o or if no column name and no data,consider upto col4.Your help is greatly appreciated.

col1|col2|col3|col4|
xxx| |10/08/2003|234506|
yyy| |12/03/2003|345678|

output

 
create table (col1 varchar2(50),
                  col2 Error!,
                  col3 date,
                  col4 number)

Regards,
MR

Are you sure of your script ?
The string 'Error!' doesn't exist in the script.
When I execute the script of my last post with your last sample datas, the result is :

create table tablename(
  col1 varchar2(50),
  col2 varchar2(50),
  col3 date,
  col4 number(6)
);

A new version !

  • Column name and no data -> Varchar2(50)
  • No column name and datas -> column name set by the scriipt = Column<#column>
  • No column name and no data -> column ignored
awk -F'|' '
BEGIN {
   colDefaultLength = 1;
}
NR==1 {
   for ( c=1; c<=NF; c++ ) {
      colName[c] = $c;
   }
   colCount = NF;
   if (!colName[colCount]) colCount--;
   next;
}
{
   for ( c=1; c<=NF; c++ ) {
      if (length($c) > colLength[c]) colLength[c] = length($c);
           if ( $c ~ /^ *$/ )                     continue;
           if ( $c ~ /^[0-9]+$/        )          colIsNum[c]++
      else if ( $c ~ /^[0-9]*\.[0-9]*$/ )         colIsDec[c]++
      else if ( $c ~ /^[0-9]+\/[0-9]+\/[0-9]+$/ ) colIsDat[c]++
      else                                        colIsTxt[c]++;
   }
}
END {
   print "create table tablename("
   for ( c=1; c<=colCount; c++ ) {

      if ( colName[c] == "" ) {
         if (colLength[c]+0 == 0) {
            continue;
         } else {
            colName[c] = "Column" c;
         }
      }


      if ( colLength[c]+0 == 0 ) colLength[c] = colDefaultLength;
      var2Len = colLength[c];
           if (var2Len <   50) var2Len = 50
      else if (var2Len <  100) var2Len = 100
      else if (var2Len <  500) var2Len = 500
      else if (var2Len < 1000) var2Len = 1000;

      out = "  " colName[c];
           if ( colIsTxt[c] ) out = out " varchar2(" var2Len ")"
      else if ( colIsDat[c] ) out = out " date"
      else if ( colIsDec[c] ) out = out " number(38,4)"
      else if ( colIsNum[c] ) out = out " number(" colLength[c] ")"
      else                    out = out " varchar2(" var2Len ")";
      if ( c != colCount ) out = out ",";

      print out

   }
   print ");";
}
' inputfile

Inputfile

MyCol1|MyCol2|||MyCol5|MyCol6|
xxx| |111||10/08/2003|234506|
yyy| |222||12/03/2003|345678|

Result:

create table tablename(
  MyCol1 varchar2(50),
  MyCol2 varchar2(50),
  Column3 number(3),
  MyCol5 date,
  MyCol6 number(6)
);

Jean-Pierre.