Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] web server fiasco



TWC = The Worst Cable company.....

Mark
________________________________________
From: phys-l-bounces@carnot.physics.buffalo.edu [phys-l-bounces@carnot.physics.buffalo.edu] On Behalf Of Michael Edmiston [edmiston@bluffton.edu]
Sent: Saturday, September 04, 2010 12:15 PM
To: Forum for Physics Educators
Subject: Re: [Phys-l] web server fiasco

Losing your hosting service is indeed a pain, but I have seen it take 24 to
48 hours to get our Bluffton University system back up, and it is not due to
incompetence. I'm not the Bluffton University network manager, but I am
his friend and next-door neighbor. I know some of the details, but not as
much as he knows, so I might not get the story totally correct.

We use RAID-5 and we do monitor the condition of the drives, and we do
replace a drive as soon as it begins looking unreliable. We also have
backups, but I don't know how often the backup is performed.

We have had power failures in which we have lost two of the Raid-5 drives
even though there was no prior indication that either of them was becoming
unreliable. We indeed have UPS devices on all the network servers, but when
we lose one of three phases of power feeding the university, it is common
for things to be really screwed up for a while. All large three-phase
motors are supposed to shut down when one phase goes, but some HVAC devices
always seem to try to keep running for a while before they blow their
thermal breakers. This allows the big motors (5 to 10 horsepower) to
back-feed the missing phase with some voltage that is too low, and varying.
This seems to play havoc with the UPS devices. We have also had power
restore, fail, restore, fail, etc. in fairly rapid succession before it
completely fails, or stays on. A similar thing can happen when power is
restored from a complete failure.

We believe the funny things that happen in the power cause failures of some
computerized equipment even when that equipment is "protected" by UPS
devices. If we have a nasty power failure that causes us to lose a couple
Raid-5 drives, we might also lose one or more of the server computers as
well. It can take 4 or 5 hours after power is restored just to figure out
what all is working and what is not working. Then, if we indeed lost two
Raid-5 disks, we have to restore from the backup, and that can indeed take a
long time (another 4 or 5 hours). Additional time is needed if some servers
have been lost. Then, once the system is up, the manager does some
reliability testing before making things available to the public again.

To make matters worse, we have something like six or seven servers for the
network, and several different Raid-5 systems. We have generally stocked
two replacement drives, but once we lost three in the same power failure. I
think two were on one server and the third was on a different server. Since
we only had two replacements in stock, we had to have the third drive
shipped overnight in order to get the system back up. That is an easy way
to make it take over 24 hours to get something back up... not having
sufficient replacement parts on hand. How many spare disk drives and spare
server computers do you think an organization should have on hand? I don't
know the answer to that, but I do know we have been down waiting for parts
even though we have at least one spare of everything.

I'm sorry if I have misstated anything above. I get bits and pieces of what
happened from my neighbor, but I am not the one directly involved.

Right now I am fighting a different battle. In my neighborhood we have many
homes with Time-Warner cable modems. We can pay a monthly fee for internet
connectivity at 1.5 Mbps or 7 Mbps or 15 Mbps. I am paying for 7 Mbps just
like many of my neighbors. Because Time Warner is overextended in our
village, I can get 7 Mbps between 2:00 AM and 7:00 AM, but starting about
7:00 AM the speed slows down throughout the day until it bottoms out at
about 0.8 Mbps around 8:00 PM and then slowly gets up to about 4 Mbps around
midnight, and finally to the advertised speed by 2:00 AM. TW knows about
the problem, but hasn't got new equipment installed yet even though this has
been going on for several months. Worse, they are still charging us for the
7 Mbps and the billing states that if we don't pay any portion of the billed
charges they will disconnect us, and we will have to pay a reconnect charge
to get service back. This in not a nice company.


Michael D. Edmiston, Ph.D.
Professor of Chemistry and Physics
Bluffton University
1 University Drive
Bluffton, OH 45817
419.358.3270
edmiston@bluffton.edu


_______________________________________________
Forum for Physics Educators
Phys-l@carnot.physics.buffalo.edu
https://carnot.physics.buffalo.edu/mailman/listinfo/phys-l