Hello,
The organization I work for uses SCOM(Microsoft Systems Center Operations Manager) for Data Center Management/alerting. Since the client was installed on our Linux servers we have been getting messages from SCOM stating "DPC Time Percentage is too high". This is happening on all our MySQL cluster servers. From researching it appears that this message relates to software interrupts.
From running top or mpstat I can see the %SI for processor 7 is frequently over 20%.
Cpu0 : 41.0%us, 15.3%sy, 0.0%ni, 35.7%id, 0.0%wa, 0.0%hi, 8.0%si, 0.0%st
Cpu1 : 25.7%us, 17.3%sy, 0.0%ni, 52.0%id, 0.0%wa, 0.0%hi, 5.0%si, 0.0%st
Cpu2 : 21.9%us, 1.3%sy, 0.0%ni, 75.7%id, 0.0%wa, 0.0%hi, 1.0%si, 0.0%st
Cpu3 : 14.0%us, 9.3%sy, 0.0%ni, 73.1%id, 0.0%wa, 0.0%hi, 3.7%si, 0.0%st
Cpu4 : 55.3%us, 4.3%sy, 0.0%ni, 38.3%id, 0.0%wa, 0.0%hi, 2.0%si, 0.0%st
Cpu5 : 53.3%us, 4.6%sy, 0.0%ni, 40.1%id, 0.0%wa, 0.0%hi, 2.0%si, 0.0%st
Cpu6 : 5.0%us, 9.0%sy, 0.0%ni, 83.7%id, 1.0%wa, 0.0%hi, 1.3%si, 0.0%st
Cpu7 : 50.7%us, 4.3%sy, 0.0%ni, 1.3%id, 0.0%wa, 11.6%hi, 32.1%si, 0.0%st
mpstat -P ALL 60
Linux 2.6.18-238.9.1.el5 () 10/13/2014
02:19:27 PM CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s
02:20:27 PM all 29.34 0.00 4.64 0.06 0.91 4.35 0.00 60.70 17469.47
02:20:27 PM 0 12.07 0.00 3.62 0.07 0.00 0.42 0.00 83.83 1000.03
02:20:27 PM 1 38.68 0.00 4.33 0.05 0.00 2.23 0.00 54.70 0.00
02:20:27 PM 2 8.79 0.00 1.97 0.00 0.00 0.55 0.00 88.70 0.00
02:20:27 PM 3 28.72 0.00 5.50 0.12 0.00 2.05 0.00 63.61 0.53
02:20:27 PM 4 53.98 0.00 3.74 0.00 0.00 1.70 0.00 40.59 0.00
02:20:27 PM 5 44.97 0.00 5.08 0.00 0.00 1.93 0.00 48.01 0.58
02:20:27 PM 6 35.75 0.00 4.02 0.02 0.00 1.37 0.00 58.85 0.00
02:20:27 PM 7 11.74 0.00 8.85 0.20 7.28 24.59 0.00 47.34 16468.35
From /proc/interrupts IRQ 185 seems to be the largest cause of interrupts for processor 7. This is the same on all 4 servers in question each with "IO-APIC-level megasas, eth1, eth0" on IRQ 185.
cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 1385547152 1 0 0 0 80 5 57382 IO-APIC-edge timer
1: 0 0 0 0 0 0 0 2 IO-APIC-edge i8042
8: 0 0 0 0 0 0 0 1 IO-APIC-edge rtc
9: 0 0 0 0 0 1 0 34 IO-APIC-level acpi
11: 0 0 323 0 0 0 0 127 IO-APIC-level ehci_hcd:usb1, ohci_hcd:usb2, ohci_hcd:usb3
12: 0 0 0 0 0 0 0 5 IO-APIC-edge i8042
138: 24 0 85097 1927115 17543366 4371772 26364546 4915645 PCI-MSI eth3
154: 22 0 55073 1919698 9263542 111344311 28653821 119374902 PCI-MSI eth2
185: 2 1 0 2 1 336790701 12301729 3763055601 IO-APIC-level megasas, eth1, eth0
NMI: 7588535 7138711 7412871 7375055 7517698 8340865 8123444 8485641
LOC: 1384277563 1384278693 1384279520 1384278027 1384279083 1384265499 1384279672 1384273293
ERR: 0
MIS: 0
This is what is in /proc/irq/185/smp_affinity which appears to be setting IRQ 185 to CPU7.
cat /proc/irq/185/smp_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
Can anyone offer assistance on the steps needed to determine if this is an issue on these servers? The average load on these servers is typically about 3.5, so the servers seem to be running fine. These are Red Hat 5.6 servers.
Thanks,
Chris.