Ldapsearch takes minutes when using FQDN vs IP

Devyn · May 15, 2015, 11:29am

Hey All,

ldapsearch takes minutes when using FQDN vs IP. What could be some of the reasons for that?

Cheers,
DH

agent.kgb · May 15, 2015, 11:34am

DNS resolution. Check, how long it takes to resolve a host name from your DNS server.

$ time host -n myldapserver

Devyn · May 15, 2015, 1:15pm

Seems quick. Yet same issue with the ldapsearch command despite the quick time:

# time host -n ad01
ad01.test.com has address 10.0.0.10

real     0m0.11s
user    0m0.00s
sys      0m0.00s
#

nslookup and ping resolve quickly too. But not ldapsearch . Anything else that could be checked? There's a debug option to ldapsearch but I see no extra messages and no idea where the extra logs are kept on this AIX system:

  idsldap.clt32bit62.rte    6.2.0.32    C     F    Directory Server - 32 bit
  idsldap.clt64bit62.rte    6.2.0.32    C     F    Directory Server - 64 bit
  idsldap.clt_max_crypto32bit62.rte
  idsldap.clt_max_crypto64bit62.rte
  idsldap.cltbase62.adt     6.2.0.32    C     F    Directory Server - Base Client
  idsldap.cltbase62.rte     6.2.0.32    C     F    Directory Server - Base Client
  idsldap.cltjava62.rte     6.2.0.32    C     F    Directory Server - Java Client
  idsldap.ent62.rte          6.2.0.3    C     F    Directory Server - Entitlement
  idsldap.msg62.en_US       6.2.0.32    C     F    Directory Server - Messages -
  idsldap.srv_max_cryptobase64bit62.rte
  idsldap.srvbase64bit62.rte
  idsldap.srvproxy64bit62.rte
  idsldap.webadmin62.rte    6.2.0.32    C     F    Directory Server - Web
  idsldap.webadmin_max_crypto62.rte



grep -v "#" /etc/netsvc.conf
hosts=bind,local

I don't define anything in /etc/hosts though. keeping it clean. Noteworthy to say ssh to IP AND to FQDN is also slow.

Cheers,
DH

agent.kgb · May 15, 2015, 2:09pm

If you'd like to trace IBM LDAP client libraries.

have luck speaking with IBM Tivoli support! I hope you are familiar with IBM escalation procedures.

Devyn · May 15, 2015, 3:08pm

Yes I've had the misf.... er pleasure of speaking with support. (cough)

On the debug options, that looks like fun. Going to try it.

As an aside question, there is tprof , svmon , truss etc for tracing but none of these seem to grab a trace from start of a process I'm starting. On Linux I can run strace <PROGRAM> and it starts and traces from start. How to do the same on AIX?

Cheers,
DH

jlliagre · May 15, 2015, 3:15pm

truss -f <program> should trace a process and its children from the start.

bakunin · May 16, 2015, 1:08pm

I can wholeheartedly relate to that. Once they had the the most expensive machines and the best support. Their systems are still the most expensive ones.

The svmon you mentioned is a tool you use for performance monitoring because it keeps short-term and long-term statistics about virtual memory consumption. This is done for the system as a whole as well as on a per-process basis. It is a phantastic tool which i dearly missed in other unixoid systems but it won't help you any in your case.

Btw.:

Some basic entries (like "localhost") should definitely be in there, everything else is not even supported. If you ever set up HACMP even removing the "::1" (aka IPv6-localhost alias) will confuse the cluster daemons, regardless of you using IPv6 or not! Clean /etc/hosts is one (and in fact a good) thing but to carry this to the one extreme is as bad as carrying it to the other.

I hope this helps.

bakunin

Devyn · May 19, 2015, 8:54am

Thanks all for the replies. Appreciated. LDAP logs are as per below. As suspected it's an issue with the host resolution but now need more details as to why:

/tmp/debug.log
134:20:08:09 T1 K30539837 ldap_msg_table_send_message entered: table=11000ec50 msg=11000edd0 msgid=1
134:20:08:09 T1 K30539837 ldap_write_msg entered: ld=11000e850, lm=11000edd0
134:20:08:09 T1 K30539837 open_ldap_connection: ld(11000e850), lc(11000ea70)
134:20:08:09 T1 K30539837 open_connection: entered sb(11000ea88) host(admaster01) port(389)
134:20:08:09 T1 K30539837 ids_getaddrinfo: host(admaster01), port(389), res(fffffffffffdca8)
134:20:08:32 T1 K30539837 ids_getaddrinfo: rc=0
134:20:08:32 T1 K30539837 tds_connect: socket(4), address(fffffffffffdd00), address_len(16), connect_to(0)
134:20:08:32 T1 K30539837 open_connection: connect rc=0
134:20:08:32 T1 K30539837 open_connection: returning rc=0

The truss trace yields this:

11862262:       28442773: _poll(0x0FFFFFFFFFFFB1D0, 1, 10000)   = 0
11862262:       28442773: close(3)                              = 0
11862262:       28442773: socket(2, 2, 0)                       = 3
11862262:       28442773: getsockopt(3, 65535, 4104, 0x0FFFFFFFFFFFB104, 0x0FFFFFFFFFFFB100) = 0
11862262:       28442773: connext(3, 0x09001000A0022198, 16)    = 0
11862262:       28442773: _esend(3, 0x0FFFFFFFFFFFC030, 38, 0, 0x0000000000000000) = 38

11862262:       28442773: _poll(0x0FFFFFFFFFFFB1D0, 1, 20000) (sleeping...)


(pauses here)


11862262:       28442773: _poll(0x0FFFFFFFFFFFB1D0, 1, 20000)   = 1
11862262:       28442773: _enrecvfrom(3, 0x0FFFFFFFFFFFD380, 1024, 0, 0x0FFFFFFFFFFFB990, 0x0FFFFFFFFFFFB1B8, 0x0000000000000000) = 38
11862262:       28442773: close(3)                              = 0
11862262:       28442773: socket(2, 2, 0)                       = 3
11862262:       28442773: getsockopt(3, 65535, 4104, 0x0FFFFFFFFFFFB104, 0x0FFFFFFFFFFFB100) = 0
11862262:       28442773: connext(3, 0x09001000A0022198, 16)    = 0
11862262:       28442773: _esend(3, 0x0FFFFFFFFFFFC030, 37, 0, 0x0000000000000000) = 37
11862262:       28442773: _poll(0x0FFFFFFFFFFFB1D0, 1, 5000)    = 1
11862262:       28442773: _enrecvfrom(3, 0x0FFFFFFFFFFFD380, 1024, 0, 0x0FFFFFFFFFFFB990, 0x0FFFFFFFFFFFB1B8, 0x0000000000000000) = 111
11862262:       28442773: _esend(3, 0x0FFFFFFFFFFFC030, 28, 0, 0x0000000000000000) = 28
11862262:       28442773: _poll(0x0FFFFFFFFFFFB1D0, 1, 5000)    = 1
11862262:       28442773: _enrecvfrom(3, 0x0FFFFFFFFFFFD380, 1024, 0, 0x0FFFFFFFFFFFB990, 0x0FFFFFFFFFFFB1B8, 0x0000000000000000) = 103
11862262:       28442773: close(3)                              = 0
11862262:       28442773: kopen("/etc/hosts", O_RDONLY)         = 3
11862262:       28442773: kioctl(3, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
11862262:       28442773: kfcntl(3, F_SETFD, 0x0000000000000001) = 0
11862262:       28442773: kioctl(3, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
11862262:       28442773: kread(3, " #   I B M _ P R O L O G".., 4096) = 2162
11862262:       28442773: kread(3, " #   I B M _ P R O L O G".., 4096) = 0
11862262:       28442773: close(3)                              = 0
11862262:       28442773: socket(2, 1, 0)                       = 3

So not much more but could find out why LDAP search is taking a while when all other OS commands return the address from the hostname adm01 instantaneously. Nothing is visible on the Windows side of things. I tried reversing the /etc/netsvc.conf entries again but no effect. It has been set to : hosts=bind,local .

Cheers,
DH

subrkann · May 19, 2015, 9:07am

Can you try to ping your DNS IP from your machine and let us know.

# cat /etc/resolv.conf

The output of above command will show primary and secondary (if configured) nameserver IP. Make sure you are able reach those IPs without any ping drops. Especially, the primary one.

Devyn · May 19, 2015, 9:51am

10.0.0.3 is a forwarding AIX based DNS that forwards to the below AD/DNS servers. It is working quickly since if I take it down, I can't resolve the admaster0? servers below. Primary DNS servers are:

10.0.0.1 admaster01 / DNS01
10.0.0.2 admaster02 / DNS02

I tried the direct route to the primary DNS servers above, same result.

root [aixdns01] /tmp: cat /etc/resolv.conf
nameserver      10.0.0.3
domain            aix.b.a
search             aix.b.a b.a 
root [aixdns01] /tmp: ping aixdns01
PING aixdns01.aix.b.a (10.0.0.3): 56 data bytes
64 bytes from 10.0.0.3: icmp_seq=0 ttl=255 time=0 ms

--- aixdns01.aix.b.a ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
root [aixdns01] /tmp: ping admaster01
PING admaster01.b.a (10.0.0.1): 56 data bytes
64 bytes from 10.0.0.1: icmp_seq=0 ttl=128 time=0 ms
64 bytes from 10.0.0.1: icmp_seq=1 ttl=128 time=0 ms

--- admaster01.b.a ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
root [aixdns01] /tmp: cat /etc/netsvc.conf|grep -v "#"
hosts=bind,local
root [aixdns01] /tmp:

The local DNS to AIX is a long story but we are using it as a forwarder to the AD servers for now.

Cheers,
DH

subrkann · May 19, 2015, 12:46pm

So your AD and DNS services are running in one server. I can see you are able to ping your primary DNS IP without any problems. This is not the issue that I was suspecting, looks like this is something else.

---------- Post updated at 07:46 PM ---------- Previous update was at 07:19 PM ----------

Just noticed some non-standard configuration on your /etc/resolv.conf file.

You can keep either domain or search but not both at the same time. This is something not usual. Any specific reason for keeping both domain and search on your /etc/resolv.conf ? If there is no specific reasons then try to update your /etc/resolv.conf and /etc/netsvc.conf files as following:

Take a backup of these two files:

# cp -p /etc/resolv.conf /etc/resolv.conf.MMDDYY
# cp -p /etc/netsvc.conf /etc/netsvc.conf.MMDDYY

Update your /etc/resolv.conf exactly same as below:

# vi /etc/resolv.conf
search    aix.b.a b.a
nameserver    10.0.0.3

Update /etc/netsvc.conf file:

# vi /etc/netsvc.conf

Search for "hosts=bind,local" on /etc/netsvc.conf and modify it as following:

hosts = local4 , bind4

Devyn · May 20, 2015, 11:02am

Setting the netsvc.conf did the trick. Irrespective of what was in resolv.conf. Thanks Guy's for all the help on this. Wish the debug logs could have provided more hints towards this, but it's great. Everything works quickly now.

# grep -v "#" /etc/netsvc.conf
hosts=bind4,local4
#

Earlier I disabled IPv6 but didn't update the /etc/netsvc.conf file for bind,local as above. I changed it back to original and it was slow again, then back to only bind4,local4 and it's fast again so that was it. Do I click something on the forum to give you points for helping out?

Cheers,
DH

subrkann · May 20, 2015, 11:15am

Good to know it is fixed.

bakunin · May 21, 2015, 4:22am

You can use the "thanks"-feature to mark the most useful post(s) IYO.

Regarding your problem: DNS is a de-centralized service and because networks in itself were considered unreliable (actually to cope with this is the main point of the design of TCP/IP) timeouts are relatively long. You don't want to get a "host not known" every time a 1-second hiccup of the network occurs.

This is perhaps why id did take so long: it tried to resolve the IPv6-localhost first and only after the timeout for this query ran out it issued another query for IPv4.

To verify this assumption try adding the line:

::1       loopback localhost

to the file /etc/hosts and switch back to hosts = local, bind . If my suspicion is correct it should work without the timeouts.

Anyway, glad you solved it and thanks for posting the final solution.

bakunin