Creating file from an existing file using CUT, is it the best option?

abhishekakaomi · May 30, 2011, 7:01am

Dear All,

I have a requirement in which i have to load a file placed in FTP location onto my database. The process i'll follow is as below:
1) Get the files using FTP.
2) Create the desired load files as i have to load only 19 fields out of the 104 available in the file.
The fields i require can only be distinguished using their position. E.g 1-3 is one value, 5th field is another, 16-33 is another and so on for 19 fields.

I'm using CUT command to create a new file which will have "|" as delimiter. So while cutting the fields i'm appending the delimiter.
3) The file created above will then be used to load into database, pipe "|" being the delimiter using SQLLDR utility.

My concern is the step no. 2), as even for a file of length 10000 (each line 3159), it takes around 20 minutes . And i have to process around >7500000 records per day.

Is there any other method to achieve wht i want in a much shorter time :wall:?

I'm new to Unix, so please point out if you see any area for improvement.

Unix box is : SunOS 5.8

Thanks!!

clx · May 30, 2011, 9:02am

Try for a chance..

while read LINE
do
 SQL=
 for i in 1-3 5 18-33 48-63 64-65 66-83 84-115 116-147 259-260 261-292 303-306 307-310 326-327 328-333 334 2126-2127 2206-2219 2220-2225 2226-2237
 do
  SQL=${SQL}$(echo "$LINE" | cut -c $i)"|"
 done
 echo $SQL >> test_load_file.txt
done < XYZ.cdr

binlib · May 30, 2011, 5:06pm

First, various *nix flavors of cut are very slow because they read a byte at a time. Try awk with substr. Make sure you use the C locale.
Second, with sqlldr you don't need to create a temp file to load. You can just mention the columns you need with positions and lengths in your control card. My experiences tell me it will be very fast.

rdcwayx · May 30, 2011, 9:34pm

upload input data file , maybe you needn't cut command at all.

if the column split by space or other, we can take them directly by:

awk '{print $2, $5, etc}' your_data.file

abhishekakaomi · May 31, 2011, 12:22am

Hi All,

Thanks for replying.

@binlib -->
I have tried using AWK but it gives "value too long error".
And, i didn't know if i there was a way to pick data position-wise in SQLLDR.
Could you please share an example on the file sample feed file i'm attaching??

@rdcwayx -->
There is no delimiter in the feed file, i'm only aware of the string positions which i require :(.

@anchal_khare -->
Will try out your method

Also, i am going to try one method, in which i'm thinking of placing a delimiter "|" in positions require using SED and then taking out the required fields using NAWK. Will keep you posted on status of the process.

Regards.

Peasant · May 31, 2011, 1:53am

You can also write a control file for sqlldr which will do just that.

Check POSITION syntax in sqlldr control files, it should be just what you are looking for.
i would write some example but i'm home, don't have access to oracle, but there is plenty online.

rdcwayx · May 31, 2011, 2:07am

If awk have "value too long error", try this:

while read line
do
  # E.g 1-3 is one value, 5th field is another, 16-33 is another and so on 
  echo ${line:1:3} ${line:5:1} ${line:16:17} ......
done < infile

abhishekakaomi · May 31, 2011, 3:39am

Dear All,
Thanks for your response.

Got the solution bye just mentioning the field postition in SQLLDR control file.

LOAD DATA
APPEND
INTO <table name>
TRAILING NULLCOLS
(
col1 POSITION(1:3) CHAR NULLIF col1=BLANKS,
:
:
)

As its very fast, 10,000 records are inserted in < 2sec.

But, still learned alot from the whole process of trying out various things to get the job done.