[Cialug] Guarantee SSH availability

Nicolai nicolai-cialug at chocolatine.org
Tue Jun 28 12:48:40 CDT 2011


On Tue, Jun 28, 2011 at 12:06:17PM -0500, Kenneth Younger wrote:

> I know this isn't HUGE traffic, but it was getting 3000-4000 pageviews
> an hour at peak, which is the most traffic I've ever seen on a site I
> run.

That is one per second; impressive, but not demanding for today's
hardware.  For static pages, a 200MHz Pentium Pro would be more than
enough.

> How would tell if it ran out of memory, or the CPU was just fully pegged, or
> the disk started thrashing?

The logs may give an indication.  Check first for anything that is
unexpected, and make careful note of timestamps.

Also, does cron mail you the output of its work?  To give an example, I
have a script that runs once per hour, and if it can't connect to a
remote server (or fails for another reason), cron sends a mail saying so
with a helpful error message.  You can use these bits of information to
help draw a picture of the state of the machine during the time in
question.  Even an absense of logs can be informative, particularly if
some data is logged but not all you would expect.

Do you have bandwidth logs/graphs?  They can sometimes be highly
informative.

And since SSH was unavailable, did you go to the machine in person?
If so, what was printed on the screen/console?

Start with the logs.  Write down anything pertinent (by hand) that
suggests a daemon was either working correctly or not, going down the
list of all packages running at the time.  Try to correlate the logs.
For example, if Apache logs show CGI activity that would create
users/$user.html, verify that said file exists and is properly written.
Use whatever information is available to see if the software on your
machine functioned as intended.

Using the logs, draw a timeline of symptoms.  That will bring you closer
to determining what happened.

Interested in hearing more,

Nicolai


More information about the Cialug mailing list