Solaris[TM] sd driver taq queuing problems/sd_max_throttle
How does one fix SCSI disk tag queuing problem?
By setting sd_max_throttle, in /etc/system, to a lower value.
sd_max_throttle, a sd driver tunable parameter, determines the max
number of commands that can be queued up by sd to be submitted to the
HBA (Host Bus Adapter) driver. By default, sd_max_throttle is 256.
Since SCSI tag queuing, SCSI_OPTIONS_TAG (0x80), is enabled by default
in Solaris, when the disk controller is fully populated with targets
or having very fast disks (e.g., RAID devices), commands can be queued
up too fast (and reach the limit of 256) for sd driver to handle.
Once this condition is met, tagged command time-outs/retries or SCSI
transport failure messages often are displayed:
-> WARNING: /io-unit@f,e1200000/sbi@0,0/dma@0,81000/esp@0,80000 (esp1):
-> Disconnected tagged cmds (1) timeout for Target 1.0
-> WARNING: /io-unit@f,e1200000/sbi@0,0/dma@0,81000/esp@0,80000/sd@1,0 (sd16):
-> Error for command 'write' Error Level: Retryable
-> WARNING: /io-unit@f,e0200000/sbi@0,0/dma@0,81000/esp@0,80000/sd@3,0 (sd3):
-> SCSI transport failed: reason 'timeout': retrying command
-> WARNING: /io-unit@f,e0200000/sbi@0,0/dma@0,81000/esp@0,80000/sd@3,0 (sd3):
-> unix: SCSI transport failed: reason 'incomplete': retrying command
Setting sd_max_throttle to use a much smaller value, such as < 256, can fix
the problem.
To what value should sd_max_throttle be set? That depends on how many SCSI
targets are in the system. To have total queued commands < 100 can be a
workable rule (e.g., if there are 6 fast SCSI targets), and if sd_max_throttle
is set to be 16, the total queued commands can be 96. If tagged command
timeouts still are seen, then in /etc/system:
set sd:sd_max_throttle = 16