Time of query execution much different between 3 servers

omonoiatis9 · January 11, 2016, 5:31am

Hello,

I have 3 AIX 6.1 machines running INFORMIX 11.7 database engine.
One of these servers is the database server and the other 2 servers are connecting to it.
I am doing a test to determine the time of query execution between these servers and i see that in specific times one of these server is taking much more response time.
Here is an example of the test:

server1: 11:00:00     11:01:01    11:02:00   11:03:00   11:04:00
              0m0,05s     0m0,08s    0m0,08s   0m0,05s   0m0,05s

server2: 11:00:00    11:01:00    11:02:00   11:03:00   11:04:00
              0m0,08s    0m0,10s    0m0,06s   0m0,08s   0m0,08s

server3:  11:00:00   11:01:00  11:02:00    11:03:00    11:04:00
               0m0,07s  0m0,65s   0m0,27s    0m0,25s   0m0,50s

You can see that the execution time on server3 is much longer that the other 2 and i noticed that this happens at the first 5-6 minutes of each hour.
I am trying to figure out what causes this extra delay during these hours but i cannot find anything. I tried to check the crons to see if there is something automatic that runs during these hours that is causing the delay but i dont think my answer is there.

Any help will be appreciated.
Thank you.

MadeInGermany · January 11, 2016, 12:24pm

Examine the crontabs! The following command greps the cron jobs that are started at the beginning of an hour.

awk '/^[^#]/ && $2~/^0*[0-4]$/' /var/spool/cron/crontabs/*

bakunin · January 11, 2016, 5:52pm

What makes you think the answer is not in the crontabs? (The question is meant serious, what haveyou done to come to that conclusion?)

You might want to start with the basic tool for all things performance: vmstat . Use

vmstat -tw 1 | tee -a /some/log/file

and analyse the log. See if there is any significant difference between the first 5-6 minutes of an hour and the rest of the time. See if there is a difference between the first 5-6 minutes of the hour on server3 and the other two servers.

I hope this helps.

bakunin

omonoiatis9 · January 12, 2016, 3:47am

Hello guys,

Thank you for your responses.
Well to answer your question about the crons, the crons on server3 are schedules during the early morning hours and from the crons that are running during working hours i did some test.
For the crons that are running at the last or first minutes of an hour i tried to disable one cron at a time to see if there will be any improvement on the time of the execution. When i see that there was no difference then i enabled that cron again and for the next hour i disabled a different one. I ended up disabling all crons one at a time and i get no improvement.
Thats why i said that probably the answer is not there.
Except if you have any suggestion of a different way to test it.
By the way i followed your suggestion to create a

vmstat

report. I will let it run for few hours and after that i will examine it.

omonoiatis9 · January 13, 2016, 5:26am

i let vmstat run for 2 hours and i look at the reports to see the output.
the cpu usage of the server never reaches 100% and also it doesnt seem to have any great variation comparing to the time before and after the delay is experienced.

bakunin · January 15, 2016, 11:47am

That was not the question. If you want to find out yourself what infos you can glean from a vmstat-output you might want to read a little treatise about the topic.

I hope this helps.

bakunin

omonoiatis9 · January 22, 2016, 10:53am

thank you bakunin.
the link is indeed very useful!
however when i read the reports the server3 which is having the problem with the delay seems healthy (meaning no paging in/out, no block, wait is low)
one of the other servers shows some block from time to time but that server doesnt have any problem in the aspect of sql delay.

vbe · January 22, 2016, 11:37am

I searched your first post, ( but its friday so forgive if I zapped...) looking anyware for the 3 servers configurations and find nothing...
Saying they are 3 servers running AIX 6.1 only means to us they have same OS ... they can be quite different in size, resource, proc speed etc... and they can also be on different networks...
We could even imagine 2 are LPARS on the same physical machine ( doing nothing...) and the 3rd an LPAR on a periodically hard working machine...
...
Last but not least, some apps can be running periodically - not using crontabs haha
I think of something like ControlM or even TSM since you are on AIX and chaces are the backup are done by Tivoli...
my 2 cents in addition to Bakunin's
With what you given to us so far it can just be free speculation, you need to gather far more information to be able to start any serious interpolation

All the best

omonoiatis9 · January 26, 2016, 1:19am

hello vbe,
thank you for your post.
as per your request here you can find some more information about the servers:

server1 -> lcpu=18 mem=71680MB  Page Space    Size 
                                                            paging01     10224MB    
                                                            paging00     10224MB 
server2 -> lcpu=24 mem=51200MB   Page Space     Size 
                                                             paging01      10224MB   
                                                             paging00      10224MB  
server3 -> lcpu=6 mem=8192MB    Page Space    Size 
                                                          paging01     5048MB   
                                                          paging00     5048MB

Note also that another difference between the 3 servers is that the first 2 act as physical machines while server3 is virtual (vio in the middle)
Also about the comment for the backup, there is no backup running during the times that i get the info, Tivoli is disabled as another backup application is used for backups.

I hope this helps more. If you need additional info please feel free to ask.

dukessd · January 26, 2016, 9:21pm

There's your problemsss
The slow one is tiny (little cpu or memory) and virtualised... which may or may not be a factor (probably is ;0)
HTH

omonoiatis9 · January 27, 2016, 7:10am

so you think it is a matter of resources?
if i can add more cpu and memory it will improve the server's behavior?

bakunin · January 27, 2016, 8:47am

omonoiatis9, nobody is able to answer that. dukessd has voiced a conjecture (read: educated guess) and maybe he is right, maybe not. As long as you do not publish any data (like the one i have asked for a week ago) nobody will be able to really help you. This is not because dukessd (or i, for that matter) are unable or unwilling to help you, but because we do not know your systems and there are literally hundreds of possible causes.

dukessd himself has called what he said a guess - a good one and certainly based on his considerable experience, but a guess nevertheless. The situation is like you calling a doctor via telephone and asking him: "my left side hurts, tell me what it is."

Show us some data, then we can (maybe) tell you what the problem is. But so far you have only told us some generalities and therefore you get only generalities back.

Sorry, but that's the way it is.

I hope this helps.

bakunin

omonoiatis9 · January 28, 2016, 4:37am

hello bakunin,
what data are you referring to? cause i ask if you want any more info then you can tell me exactly what you want and i will give it to you.
if you are referring to the vmstat reports, i attached 2 reports in order to compare.
note that file vmstat_report_rea.txt refers to server3 and file vmstat_report_zeus.txt refers to server1.
if you need more data please be specific in what you need.

thank you.