Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] web server fiasco



Losing your hosting service is indeed a pain, but I have seen it take 24 to 48 hours to get our Bluffton University system back up, and it is not due to incompetence. I'm not the Bluffton University network manager, but I am his friend and next-door neighbor. I know some of the details, but not as much as he knows, so I might not get the story totally correct.

We use RAID-5 and we do monitor the condition of the drives, and we do replace a drive as soon as it begins looking unreliable. We also have backups, but I don't know how often the backup is performed.

We have had power failures in which we have lost two of the Raid-5 drives even though there was no prior indication that either of them was becoming unreliable. We indeed have UPS devices on all the network servers, but when we lose one of three phases of power feeding the university, it is common for things to be really screwed up for a while. All large three-phase motors are supposed to shut down when one phase goes, but some HVAC devices always seem to try to keep running for a while before they blow their thermal breakers. This allows the big motors (5 to 10 horsepower) to back-feed the missing phase with some voltage that is too low, and varying. This seems to play havoc with the UPS devices. We have also had power restore, fail, restore, fail, etc. in fairly rapid succession before it completely fails, or stays on. A similar thing can happen when power is restored from a complete failure.

We believe the funny things that happen in the power cause failures of some computerized equipment even when that equipment is "protected" by UPS devices. If we have a nasty power failure that causes us to lose a couple Raid-5 drives, we might also lose one or more of the server computers as well. It can take 4 or 5 hours after power is restored just to figure out what all is working and what is not working. Then, if we indeed lost two Raid-5 disks, we have to restore from the backup, and that can indeed take a long time (another 4 or 5 hours). Additional time is needed if some servers have been lost. Then, once the system is up, the manager does some reliability testing before making things available to the public again.

To make matters worse, we have something like six or seven servers for the network, and several different Raid-5 systems. We have generally stocked two replacement drives, but once we lost three in the same power failure. I think two were on one server and the third was on a different server. Since we only had two replacements in stock, we had to have the third drive shipped overnight in order to get the system back up. That is an easy way to make it take over 24 hours to get something back up... not having sufficient replacement parts on hand. How many spare disk drives and spare server computers do you think an organization should have on hand? I don't know the answer to that, but I do know we have been down waiting for parts even though we have at least one spare of everything.

I'm sorry if I have misstated anything above. I get bits and pieces of what happened from my neighbor, but I am not the one directly involved.

Right now I am fighting a different battle. In my neighborhood we have many homes with Time-Warner cable modems. We can pay a monthly fee for internet connectivity at 1.5 Mbps or 7 Mbps or 15 Mbps. I am paying for 7 Mbps just like many of my neighbors. Because Time Warner is overextended in our village, I can get 7 Mbps between 2:00 AM and 7:00 AM, but starting about 7:00 AM the speed slows down throughout the day until it bottoms out at about 0.8 Mbps around 8:00 PM and then slowly gets up to about 4 Mbps around midnight, and finally to the advertised speed by 2:00 AM. TW knows about the problem, but hasn't got new equipment installed yet even though this has been going on for several months. Worse, they are still charging us for the 7 Mbps and the billing states that if we don't pay any portion of the billed charges they will disconnect us, and we will have to pay a reconnect charge to get service back. This in not a nice company.


Michael D. Edmiston, Ph.D.
Professor of Chemistry and Physics
Bluffton University
1 University Drive
Bluffton, OH 45817
419.358.3270
edmiston@bluffton.edu