Perl-to-Oracle performance: DBI-pack visa 'sqlplus' usage

alex_5161 · November 3, 2009, 3:28pm

I wondering if anybody tried already or know about the performance to process some Oracle staff from Perl.
I see it could be done by the DBI pachage (so, I guess, it is interface to the OCI, but who know how sufficiant it is..,) with all gemicks around (define, open, parce, bind,.. ), or it can be done by involving the 'sqlplus' and have it piped (by the

open pp, '-|', q| sqlplus .... |

) or by the comand execution into an array of return:

@return = qx| sqlplus .... |

, which (any way using the sqlplus) seems easy to coding than with the DBI pack.

Does any one have any information about that, or, maybe there are some link to such review?

Appreciate your oppinions!

adderek · November 14, 2009, 6:25pm

Dude,

You are concerned about Oracle DB connection performance.... and it shows that you lack some basic understanding of it. I'm not a god-of-Oracle but I will try to explain some things here so that you would understand why your approach is wrong.

When you are connecting to Oracle - you are establishing a session. This would consume some memory (let say that it is 0.5 MiB) on the server. It require some time as well. Let say that it require 0.5 s. Everything you do in Oracle is done in transaction (DDL, DML, ...) and the transaction is ended by any commit (including DDL).
Whenever you make a single insert into the DB - it require some time. Let say that it is 0.0000001 s.

Now imagine that you are using sqlplus and DBI. In both cases you wish to insert a single row of data. If one of them (DBI and sqlplus) would insert data in 0.0000002 s instead of 0.0000001 s then.... which one is faster? Do you remember that the session establishment took 0.5 s?

It is possible that you would like to compare the data insertion performance. In that case I can see the following options:

Inserts using sql*loader in direct mode (note that additional constraints should be taken into consideration)
Inserts using sql*loader in indirect mode or some other thing using bulk loading (jdbc or something else)
Inserts using non-bulk loading (sql*loader with commit after every row or something else like repeating dumb insert n-times from perl)
Inserts using 1 session and 1 transaction for every row inserted into the DB

The performance is best for 1. and worst for 4.
If you wish to load a lot of data into the DB then I suggest bulk loading (JDBC might be an option) or... if you already have files like .csv then you might use sql*loader (in indirect or direct mode).

Could you, please, specify :

What do you understand as "the performance"?
What kind of data you would like to load?
What Oracle version you are using?
How many data you have? (ex. 20 000 files each 1kB or 1 file of size 1TB)
How often the data are supposed to be loaded?
Is the loader running on the same machine as the Oracle DB is?

Just a general hint: Thy to avoid using shell + sqlplus if you are dealing with a large number of data and complex logic.

matrixmadhan · November 14, 2009, 11:17pm

Great hint !

Generally for bulk loading I dont prefer a shell invoking a client to do that, instead establish a connection ( specific connection_id is returned ) to the database. Since you are using perl to do that, interfaces are very much available to connect to oracle database.

Also, the commit after number of records plays a vital and role, that should be determined before.
1) Few number of records and a commit after that will bring down the performance
2) Too many records and a commit - possibility of filling the redo log buffer, corruption of buffer and if operation should fail the retry logic should work out on all the records again

As requested earlier by adderek, please post some stats that you are trying to load

adderek · November 15, 2009, 6:07am

Something that I didn't mentioned before (because it is of a little value to the original question asked):
When you use sqlloader in direct mode then it would create the table by itself - without using Oracle (or rather by itself because it is Oracle itself). This is why it can be fastest solution - you skip one interface (possibly more).
When using sqlloader in direct mode you should remember that it has a lot of constraints (ex. it "locks" the table). When dealing with huge amount of data (ex. terabytes of data per hour) then you are probably sentenced to use that option and multiple tricks/hacks. In any other case - establish a connection (...)

alex_5161 · November 16, 2009, 1:07pm

Very nice!
I appreciate your answers, guys, but aren't you avoiding to answer instead?!
'Loader'?!
'Shell'?!
I did not ask to compare any possibility to load (?! - why that, I did not ask about it) data to Oracle!!
Also I did not ask about different version and different configuration of Oracle used!!
(Let's avoid considering the diff-OS, Windows, Linux, .., different machines, dais of the week, weather, magnetic storms and alliances' from out of space, as well as mood of developer, sysadmin and DBA!)

I am talking about the 2 specified way of communicating (!!!) with Oracle from (!!!) Perl ! (Do not consider different version of Perl; the DBI pack is already available!)
That simply means: comparing the same amount of operations by processing it with the Perl package DBI and by processing the SAME set of operations by using the 'sqlplus' through Perl-pipe as I have shown!!

The Oracle session is going to be opened in any way! Everything processed in Oracle going to be processed the same way!
You would like to say it is most expensive part - fine !! - I have heard you!! But that part does not depend on the way of establishing it from a Perl code!

Would you like to say that the pipe (open pp,'-|',..) and 'executed' (.. = qx |..|) done by a new shell session? If so, I could see your point. I am not sure how Perl performs such task. I would guess it is done by system calls inside of own session.

I see that the 'pipe' and 'executed should not be compared, as the 'pipe' could be left opened and used as needed, but the 'qx|...|' must be completed right away. So, it is not the stuff that could be used in all cases by the same way.

But the DBI and 'pipe' could be compared as both could stay opened as much as decided and be used for the same stuff.

The ACCTIONS to Oracle I am talking about is repeatable iteration to retrieve and update some record(s) under some list of specifications - more specific, having some input file. Every file record required some analyzing of the Oracle related (to the file record) data and, maybe, update or insert.
The amount of the file record pretty significant: hundreds thousands - couple millions records.

Therefore, besides the Oracle processing (sure it has to be most sufficient, but it is different story) the way to request and change data from/in Oracle still important.

Under the 'performance', honorable gurus, I mean the amount of time that the operations should take.

So, I guess now it is little bit more clear that all these questions aren't relevant:

adderek · November 16, 2009, 6:44pm

In that case here are your answers:

Yes, I did.

alex_5161:

I see it could be done by the DBI pachage (so, I guess, it is interface to the OCI, but who know how sufficiant it is..,) with all gemicks around (define, open, parce, bind,.. ), or it can be done by involving the 'sqlplus' and have it piped (by the
open pp, '-|', q| sqlplus .... |
) or by the comand execution into an array of return:
@return = qx| sqlplus .... |
, which (any way using the sqlplus) seems easy to coding than with the DBI pack.

Yes, there are multiple ways of doing that - you got most of them listed above.

Yes, we have.

Good

If you restrict your question to simple "compare the-known-to-be-suboptimal-sqlplus with powerfull-DBI" then:
DBI is much faster than piped sqlplus. This is your answer. You can get it by running a simple tests.
The "parse" is not "parce". You should read about how the SQL is executed - then you would see that sqlplus needs to compile the statement as well.

If you can use sqlplus to execute data pump then you might use sql*loader engine and this might be much faster.

Try reading the answers above again - it should help you.

jim_mcnamara · November 16, 2009, 9:38pm

This is an ill-posed question. perl DBI is faster than sqlplus as a front end. That answers the question.

But. Oracle stores the results of previously parsed queries, which are reused.
So even faster parsing gains you little.

Ok. I've been coding in Oracle since version 2, in 19-um-whatever. I'm older than dirt.
However, SQL code is 85% of performance. Period. Full-table scans instead of using indexes, for example. Reduce excessive disk i/o for queries. All of this is a programmer/application designer problem. Or an exadata problem if you know what that is. Not DBI vs. sqlplus.

Learn to use tkprof. PL/SQL developer. TOAD. Read a Tom Kite book.

The reason you are not getting answers you want is your presupposition:

speed up front = speed in the back end.

It does not work that way - except maybe in special situations - which are usually covered by apps like sqlldr or interesting specialty objects like clustered tables.

alex_5161 · November 17, 2009, 12:56pm

adderek, jim mcnamara
I've get your opinion that the DBI is faster than the 'pipe'-solution and that the way to work in Oracle much more important than the way to bring the requests upt to Oracle.
I have considered the sql*loader as a good option.
Thank you for that!

Generally speaking, I've get that the difference of involve the Oracle considerable insignificant relatively to the way of processing the Oracle staff.
So, more convenient and easy to handle (from coding side) way is reasonably more appropriate!

(I am interested in all other 'funny staff', like 'exadata', tkprof, '..developer', TOAD and 'clustered tables' and even much expressed by so wide ... referred area!
I hope that for some readers it would be a good guideline to learn Oracle tools and market.
One of the smart ways to hide information is to provide it with tones of unrelated information. )

Anyway, thank you for participation and your advance opinions!

adderek · November 17, 2009, 3:15pm

Wow... So there was Oracle 2?

I would add Oracle SQL Developer (which has superb functionality but is an unstable memory hog... which is free where way better Toad-non-freeware costs a lot).

Tom Kyte from asktom? Then it might be a great book indeed.
I would add books by Steven Feuerstein (although it is mostly about PL/SQL).

Thanks for the post and for the summarize