Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-l] web server fiasco



On 09/03/2010 04:35 PM, Larry Smith wrote:

http://www.av8n.com/physics/uncertainty.htm#sec-quad-roots

My browser is getting a 404 error on this URL.

My entire web site was offline (or worse) for more than 48 hours.
It's back now.

In case you're wondering what's worse than being offline: For
quite a while in there, it was serving _wrong pages_.

As you might imagine, I'm shopping for a new hosting service.
If anybody knows of one they can recommend, please contact
me off-list.

I cannot imagine how it could take a major hosting service
48 hours to recover from *any* type of outage, even if it
means firing up a brand-new spare system and restoring
everything from backup. That's an order of magnitude too
long.

I very rarely send email to the hosting service's "support"
desk ... but when I do, I expect somebody to READ it and
give a thoughtful reply. Recently I've received nothing
but platitudinous off-topic replies, and I'm fed up.



==================

By way of constructive advice:

RAID is wonderful for protecting against some types of disk
failure, including classic "disk crashes". However:
-- It doesn't protect against all failure modes. Examples
include fire, theft, operating-system errors, and malicious
software (viruses, Trojan horses, et cetera).
-- Therefore you still need a good backup system.
-- You _must_ monitor the health of the array, and promptly
replace any element that is sick or dead. Otherwise using
RAID is _worse than useless_.

If you have a non-RAID array of N disks and one of them fails,
you lose 1/Nth of your data. If two of them fail, you lose
2/Nths of your data.

If you are using something like RAID-5 and one of the disks
fails, you lose nothing, provided you replace that disk
before anything else fails. If two of the disks fail, you
lose _everything_.

It boggles the mind, but this week was not even the first
time that I've seen somebody operate a RAID array and either
not monitor it, or not respond quickly enough when one of
the elements was known to be failing.

===

Western Digital makes some very nice disks. For quite a
while they have been selling "RAID-class" disks that are
quite a bit more expensive than almost-identical disks.
Until recently it was possible to buy the cheaper disks
and flip one bit in the firmware configuration to make
them behave exactly the same as the expensive disks.
Recently, though, WD got wise to this, and the bit is
no longer flippable.

Many hardware RAID controllers will not tolerate the
non-RAID-class disks, because of timing issues. However,
the point of my story is that Linux's _software_ RAID
works great with the cheaper disks. This allows you
to save a significant amount of money.