[Cialug] stuck load average and ntp?

Aaron Porter atporter at gmail.com
Fri May 15 10:56:04 CDT 2009


On Fri, May 15, 2009 at 8:47 AM, John Lengeling
<John.Lengeling at radisys.com> wrote:
> The load average isn't coming from any file...don't know why you are
> messing with wtmp...it is coming directly from the kernel/scheduler.

This I agree with

> If you have a load average of 6 then you have 6 procs running.   See the
> second line of the top output "6 running", so you probably have runaway
> procs consuming all CPU.

This is bunk. Remember that load average is only tenuously linked to
CPU usage. What you're looking at is the number of processes in the
run queue averaged over 1/5/15 min. A busy CPU is only one of many
reasons why a process in a runable state might be waiting. A quick
look at the output in the original message gives us "77.4%id, 21.9%wa"
-- so the CPU(s? not full top output) are 77.4% idle, but something(s)
are doing a bit of IO that's slow (busy disk, nfs, etc).

> Kill the procs or reboot and your load average will go down.

While a reboot will indeed drive the load average down, I'd suggest
checking ps for processes in the "D" state (uninteruptable sleep) and
consider attaching to them with strace to see that they're actually
doing some real work. And I do agree that ntpd not-from-cron is a
smart thing.


More information about the Cialug mailing list