Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Stupid Data Center Tricks

timothy posted more than 4 years ago | from the ejector-seat-toilet dept.

Networking 305

jcatcw writes "A university network is brought down when two network cables are plugged into the wrong hub. An employee is injured after an ill-timed entry into a data center. Overheated systems are shut down by a thermostat setting changed from Fahrenheit to Celsius. And, of course, Big Red Buttons. These are just a few of the data center disasters caused by human folly."

cancel ×

305 comments

Sorry! There are no comments related to the filter you selected.

bad article is bad (5, Insightful)

X0563511 (793323) | more than 4 years ago | (#33256488)

The summary reads like a digg post, and has two different links that, in actuality, link to the exact same thing.

This needs some fixin'.

Re:bad article is bad (0, Redundant)

X0563511 (793323) | more than 4 years ago | (#33256498)

Oh. And the summary text is, verbatim, the first part of the article. Wow, Timothy... this was just bad.

Re:bad article is bad (3, Funny)

Timex (11710) | more than 4 years ago | (#33256518)

the summary text is, verbatim, the first part of the article.

It is my personal observation that this seems to be the best way to get anything on the front page: using the article text as the "summary". Isn't it nice to see that Slashdot submitters are so original in their writing skill? :D

Re:bad article is bad (1, Insightful)

Anonymous Coward | more than 4 years ago | (#33256538)

For me, it' usually more due to time constraints and the belief that if I spend too much time composing a narrative somebody else will submit it first.

Re:bad article is bad (2, Interesting)

OnlineAlias (828288) | more than 4 years ago | (#33256772)

The first one, at the IU school of medicine, I'm very familiar with that place...they have no data center to speak of, and I do not know that person. I never heard of that incident. Also, who doesn't run spanning tree with BPDU gaurd and other such protections. I know IU does, for a fact.

Something is very very wrong with that article.

Re:bad article is bad (1)

Cylix (55374) | more than 4 years ago | (#33256824)

They said hub so maybe it was from the 90s?

That was a big danger back in the day when running a lot of hubs and reserving switches closer to the core.

So either it was a limitation of funds that led to the problem or a limitation of intelligence.

Re:bad article is bad (2, Interesting)

commodore64_love (1445365) | more than 4 years ago | (#33256740)

But.....

I only got a 200 on my English SAT. I's got no writin' skills. That's why I became a computer geek instead.

Re:bad article is bad (1, Offtopic)

The Grim Reefer2 (1195989) | more than 4 years ago | (#33257246)

Isn't it nice to see that Slashdot submitters are so original in their writing skill? :D

I believe you meant to say, "Ain't it gr8 2 C th@ /. submitters R so 0Ri9IN4l @ writin' skillz?"

Re:bad article is bad (2, Informative)

dsoltesz (563978) | more than 4 years ago | (#33256752)

*yawn* That's because it was on digg [digg.com] , posted in a nearly identical fashion, two days ago. Agreed. Bad article is bad. And now it's old.

Re:bad article is bad (1)

Mitchell314 (1576581) | more than 4 years ago | (#33256786)

. . . so that makes it perfect for /. !
:P

Re:bad article is bad (1)

kefler (938387) | more than 4 years ago | (#33257140)

Agreed. I just buried it.

Re:bad article is bad (1, Informative)

Anonymous Coward | more than 4 years ago | (#33256502)

Let me help out a bit

Printable Version [computerworld.com]

Re:bad article is bad (2, Interesting)

Anonymous Coward | more than 4 years ago | (#33256512)

I seem to remember in the early days of Telehouse London an engineer switched off power to the
entire building. Only two routes out of the UK remained (one was a 256k satellite connection)
that had their own back-up power.

Re:bad article is bad (5, Insightful)

macwhizkid (864124) | more than 4 years ago | (#33256716)

Article also needs fixin' in the lessons learned from the incidents described. Look, I'm sorry, but if your hospital network was inadvertently taken down by a "rogue wireless access point", the lesson to be learned isn't that "human errors account for more problems than technical errors" -- it's that your network design is fundamentally flawed.

Or the woman who backed up the office database, reinstalled SQL server, and backed up the new (empty) server on the same tape. Yeah, a new tape would have solved that problem. Or, you know, not being a mindless automaton. Reminds me of a quote one of my high school teachers was fond of: "Life is hard. But life is really hard if you're stupid."

Re:bad article is bad (1)

amorsen (7485) | more than 4 years ago | (#33256908)

Or the woman who backed up the office database, reinstalled SQL server, and backed up the new (empty) server on the same tape. Yeah, a new tape would have solved that problem. Or, you know, not being a mindless automaton.

It is not obvious to someone replacing the backup tape whether the backup is appended to the previous backup or replaces the previous one entirely. The former was not all that uncommon back when backup tapes had decent sizes. These days where you need 4 tapes to backup a single drive no one appends.

Of course there are tons of other things wrong with a one-tape backup schedule, but again she couldn't necessarily be expected to know about them.

Re:bad article is bad (2, Insightful)

macwhizkid (864124) | more than 4 years ago | (#33257000)

It is not obvious to someone replacing the backup tape whether the backup is appended to the previous backup or replaces the previous one entirely. The former was not all that uncommon back when backup tapes had decent sizes. These days where you need 4 tapes to backup a single drive no one appends.

Yeah, it's not clear from TFA whether she thought there was enough space, or was just clueless. Regardless, though, when you have mission critical data on a single drive you shut it down, put in a fire safe until you're ready to restore, whatever. But you don't just casually keep using it. And who backs up a test database install anyway?

It's just interesting that the first story in the article was a technical problem (poor network design/admin) being blamed on user error (unauthorized wireless AP/network cable plugged into wrong switch), while the second story was procedural user error (do the backup every day, no matter what) being blamed on a technical problem (the backup system).

Re:bad article is bad (2, Insightful)

fishbowl (7759) | more than 4 years ago | (#33257036)

>These days where you need 4 tapes to backup a single drive no one appends.

These days with LTO-4, my biggest problem is having enough time to guarantee a daily backup.

Oh really? (1)

Iriscal (1563535) | more than 4 years ago | (#33256496)

So this is why Comcast has been stonewalling me with their excuses.

Network meltdown due to hub cross-connects (5, Interesting)

Florian Weimer (88405) | more than 4 years ago | (#33256508)

Can this really happen easily? I thought for really ugly things to happen, you need to have switches (without working STP, that is).

Re:Network meltdown due to hub cross-connects (1)

Lehk228 (705449) | more than 4 years ago | (#33256562)

a hub can also be a switch. I have worked with people who referred to both switches and repeaters as hubs

Re:Network meltdown due to hub cross-connects (4, Interesting)

ianalis (833346) | more than 4 years ago | (#33256718)

According to CCNA Sem 1, a hub is a multiport repeater that operates in layer 1. A switch is a multiport bridge that operates in layer 2. I thought these definitions are universally accepted and used, until I used non-Cisco devices. I now have to refer to L2 and L3 switches even if CCNA taught me that these are switches and routers, respectively.

Re:Network meltdown due to hub cross-connects (2, Interesting)

X0563511 (793323) | more than 4 years ago | (#33256792)

It's so irritating when you ask for a hub, and someone hands you a switch. Stores do the same thing. It's hard enough to find hubs, let alone find them when the categorization lumps them together.

No, I said hub. I don't want switching. I want bits coming in one port to come back out of all the others.

You can do that with a switch, but getting a switch that can do that is a bit more pricey than a real hub...

Re:Network meltdown due to hub cross-connects (2, Insightful)

coryking (104614) | more than 4 years ago | (#33256916)

Re:Network meltdown due to hub cross-connects (0, Offtopic)

coryking (104614) | more than 4 years ago | (#33256950)

Stupid slashdot botching my HTML...

Re:Network meltdown due to hub cross-connects (2, Interesting)

Mad Bad Rabbit (539142) | more than 4 years ago | (#33256974)

Cheap deep-packet inspection (using an old hub and Wireshark) ?

Re:Network meltdown due to hub cross-connects (3, Informative)

Pentium100 (1240090) | more than 4 years ago | (#33256566)

This should work quite OK with hubs. A hub, after all, sends the packet to every port except the one where it came from. So two hubs in a loop should just forward the same packet back and forth all the time.

Re:Network meltdown due to hub cross-connects (4, Informative)

omglolbah (731566) | more than 4 years ago | (#33256584)

Oh yes, it works quite well for sabotaging a network.

It used to be a constant issue at LAN parties where "pranksters" would do it before going to sleep... Usually we never found them but when we did we flogged them with cat5 cables stripped of insulation :p

Re:Network meltdown due to hub cross-connects (1)

betterunixthanunix (980855) | more than 4 years ago | (#33256622)

I saw this happen at my high school once -- someone thought it would be funny to connect one port of an old switch to another port on that same switch. The entire network was flooded for a day while the IT staff tried to figure out where the switch was.

That was years ago though, I would have thought that by now, these issues had been resolved.

Re:Network meltdown due to hub cross-connects (1)

jimicus (737525) | more than 4 years ago | (#33256646)

It has in theory. Spanning tree should take care of it.

Though I have seen interop issues which prevent any traffic from going between two different vendors' STP-enabled switches.

Human error rate (1)

frisket (149522) | more than 4 years ago | (#33256788)

Human error rate is enormously variable [hawaii.edu] , but for infrequently-occurring tasks (those you only do occasionally, not every day), a value of between 1% and 2% is a useful approximation.

I am fortunate in working in an organisation with perhaps the best and most competent ops manager I have ever worked with, but even with well-written procedures and well-trained ops staff, errors still occur — but very rarely.

Re:Network meltdown due to hub cross-connects (1)

MoogMan (442253) | more than 4 years ago | (#33256856)

Reading TFA, it was almost certainly because STP wasn't set up correctly. For instance, if the switchport in question had bpduguard enabled then it would have become disabled as soon as the erroneous hub was added, resulting in a localised issue not a network-wide problem.

It's an issue that many Network Engineers learn the hard way exactly once and fix quickly by reviewing their STP configuration and in many cases, introduce QoS for sanity.

"We didn't do an official lessons learned [exercise] after this, it was just more of a 'don't do that again,'" says Bowers

Well, apart from that guy.

Re:Network meltdown due to hub cross-connects (0)

Anonymous Coward | more than 4 years ago | (#33256988)

BPDU guard would not help since hubs do not generate BPDUs as they are not bridges.

Re:Network meltdown due to hub cross-connects (2, Informative)

Shimbo (100005) | more than 4 years ago | (#33257034)

Can this really happen easily? I thought for really ugly things to happen, you need to have switches (without working STP, that is).

Spanning tree can not deal with the situation where there is a loop on a single port, which you can do easily by attaching a consumer grade switch. There are various workarounds (such as BPDU protection) but they aren't standard, and require manual configuration. Once your network gets big enough, you probably can't afford not to use them, though.

Router Plugged Into Itself (5, Funny)

Anonymous Coward | more than 4 years ago | (#33256514)

Where I work a couple years ago one of the non-technical people decided to plug a router into itself. Ended up bringing down the whole network for ~25 people in a company which depended on the Internet (Internet marketing company).

Unfortunately one of the tech guys figured it out literally as everyone was standing by the elevator waiting for it to take us home. We were that close to freedom :(

Re:Router Plugged Into Itself (1)

mister_dave (1613441) | more than 4 years ago | (#33256608)

Internet marketing company

Spam?

Re:Router Plugged Into Itself (1)

X0563511 (793323) | more than 4 years ago | (#33256800)

Nah, those guys just lease dedicated servers until they get an abuse takedown, then move on (or bitch and whine to squeeze that server for all it's worth)

Re:Router Plugged Into Itself (0)

Anonymous Coward | more than 4 years ago | (#33256794)

The Question is, why did a nono-person decide to plug the router to anywhere..

Don't try this at work... (2, Interesting)

alphatel (1450715) | more than 4 years ago | (#33256526)

  • Plug all the ethernet-like T1 cables into a switch
  • Change the administrator password and forget what you changed it to
  • Hang everything off a single power strip, no UPS
  • Buy expensive remote management cards but don't bother to configure them

Re:Don't try this at work... (3, Interesting)

v1 (525388) | more than 4 years ago | (#33256610)

- run thinnet lines along the floor under people's desks, for them to occasionally get kicked and aggravate loose crimps, taking entire banks of computers (in a different wing of the building) off the LAN with maddening irregularity

- plug a critical switch into one of the ups's "surge only" outlets

- install expensive new baytech RPMs on the servers at all remote locations, and forget to configure several of the servers to "power on after power failure".

- on the one local server you cannot remote manage, plug its inaccessible monitor into a wall outlet

honorable mention:

- junk the last service machine you have laying around that has a scsi card in it while you still have a few servers using scsi drives

Not using Cisco ACLs (3, Interesting)

Nimey (114278) | more than 4 years ago | (#33256556)

Our entire network was brought down a few years ago when a student plugged a consumer router into his dorm room's port. Said router provided DHCP, and having two conflicting DHCP servers on the network terminally confused everything that didn't use static IPs.

Took our networking guys hours to trace that one down.

Re:Not using Cisco ACLs (3, Insightful)

omglolbah (731566) | more than 4 years ago | (#33256574)

Amusingly anyone who ever worked as tech crew at a lan party knows that this is the first thing you look for... :p

Re:Not using Cisco ACLs (2, Interesting)

GuldKalle (1065310) | more than 4 years ago | (#33256606)

I had that error too, on a city-wide network. The solution? Get an IP from the offending router, go to its web interface, use the default password to get in, and disable DHCP.

Re:Not using Cisco ACLs (4, Insightful)

blair1q (305137) | more than 4 years ago | (#33256992)

Or unplug it.

The slow part is figuring out that that's the problem. The first time it happens to you.

Which is why it's good to have oldbies around, to whom lots of weird shit has happened.

Re:Not using Cisco ACLs (4, Informative)

jimicus (737525) | more than 4 years ago | (#33256656)

Hours?

You get something on the network which has an IP from the offending DHCP server, use ARP to establish what that DHCP servers' MAC address is then lookup the switches' own tables to figure out which port that MAC is plugged into and switch that port off and wait for the equipment owner to start complaining. Takes about 3-5 minutes to do by hand, and some switches can do it automatically.

Re:Not using Cisco ACLs (0)

Anonymous Coward | more than 4 years ago | (#33256754)

Uh, it took hours to figure out that was the problem genius.

Re:Not using Cisco ACLs (1)

X0563511 (793323) | more than 4 years ago | (#33256814)

Hmm. People seem to get an address from one of two subnets, randomly. I wonder what the problem could be!?

That, and people seem to be afraid of firing up the o'le packet sniffer... it would have been REALLY clear (immediatly) what the problem is, should someone do that.

If you don't have (or don't know how to make) a passive tap, GTFO.

Re:Not using Cisco ACLs (3, Insightful)

Gumbercules!! (1158841) | more than 4 years ago | (#33256880)

I have to agree with this guy. As soon as IP addresses started being assigned incorrectly, the first thing I would be doing is checking the DHCP server. ipconfig /all on a windows box (so may 3 seconds of typing) would give this answer.

More to the point, though - why was another DHCP allowed on the network? Can your switches not block or refuse to route DHCP traffic from the wrong host?? Otherwise every single student who brings in their own wifi box is going to shut down the network.

Re:Not using Cisco ACLs (1)

PrimordialSoup (1065284) | more than 4 years ago | (#33256872)

hours to figure out..in which room of the dorm it was plugged in !

Re:Not using Cisco ACLs (2, Informative)

eric2hill (33085) | more than 4 years ago | (#33256932)

Cisco switches have a wonderful feature called dhcp snooping.

ip dhcp snooping
Followed by
ip dhcp snooping trust
on your port that supplies DHCP to the network. This ensures that only the trusted port can hand out dhcp addresses, and as a bonus, the switch tells you which MAC has which IP.
show ip dhcp snooping binding

Re:Not using Cisco ACLs (1)

Nimey (114278) | more than 4 years ago | (#33257128)

*shrug* Most likely they'd never considered a "hostile" DHCP server on the network (lots of other things could have killed the network, so they thought), and had never seen what that looks like.

OTOH we can't pay very well, so we can't get top-notch talent.

Re:Not using Cisco ACLs (1)

jimicus (737525) | more than 4 years ago | (#33257198)

*shrug* Most likely they'd never considered a "hostile" DHCP server on the network (lots of other things could have killed the network, so they thought), and had never seen what that looks like.

OTOH we can't pay very well, so we can't get top-notch talent.

My employer develops router firmware. Our engineers are experts at finding odd ways to kill the network ;)

Re:Not using Cisco ACLs (1)

Darth_brooks (180756) | more than 4 years ago | (#33257210)

That just tells you what it's plugged in to. Doesn't necessarily tell you *where* it is, it just narrows it down. and if you can't disable that switch port remotely....hoo boy...and since it's in a dorm you have the risk of multiple patches in a single room or worse, someone smart enough to say "hey, this doesn't work in my room, lemme try my friend's room down the hall..."

Goes back to the old line "I've lost a server. Literally lost it. It's up, it responds to ping, i just cant *find* it."

Re:Not using Cisco ACLs (1)

TangoMargarine (1617195) | more than 4 years ago | (#33256798)

Your signature is particularly appropriate in this situation :-)

Re:Not using Cisco ACLs (2, Funny)

contrapunctus (907549) | more than 4 years ago | (#33256842)

I have done this error before :)

What surprised me was that the linksys router assigned IP numbers up thorough the uplink connection. I thought that was impossible, guess not.

Looking Greate idea (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#33256568)

You are right
RTM Group [rtmgroupq8.com]
HRMS Systems [rtmgroupq8.com]
SharePoint Application [rtmgroupq8.com]
Timesheet [rtmgroupq8.com]

Quad Graphics 2000 (5, Interesting)

Anonymous Coward | more than 4 years ago | (#33256570)

In the summer of 2000 I worked at Quad/Graphics (printer, at least at that time, of Time, Newsweek, Playboy, and several other big-name publications). I was on a team of interns inventorying the company's computer equipment -- scanning bar coded equipment, and giving bar codes to those odds and ends that managed to slip through the cracks in the previous years. (It's amazing what grew legs and walked from one plant to another 40 miles away without being noticed.)

One of my co-workers got curious about the unlabeled big red button in the server room. Because he lied about hitting it, the servers were down for a day and a half while a team tried to find out what wiring or environmental monitor fault caused the shutdown. That little stunt cost my co-worker his job and cost the company several million dollars in productivity. It slowed or stopped work at three plants in Wisconsin, one in New York, and one in Georgia.

The real pisser was the guilty party lying about it, thereby starting the wild goose chase. If he had been honest, or even claimed it was an accident, the servers would have all been up within the hour, and at most plants little or no productivity would have been lost.

The reality: a 20 year old's shame cost a company millions.

Re:Quad Graphics 2000 (3, Insightful)

Anonymous Coward | more than 4 years ago | (#33256636)

Why the fuck was the button unlabeled? That's the REAL MISTAKE.

Re:Quad Graphics 2000 (0)

Anonymous Coward | more than 4 years ago | (#33256946)

What label do you give to a big red button? All I can think of is "Big Red Button - Do Not Press."

How about... (1)

denzacar (181829) | more than 4 years ago | (#33257072)

ICBM Launch Control - Moscow, Leningrad, Novosibirsk
WAIT FOR PRESIDENT'S ORDERS BEFORE PRESSING

Deathly silence after someone does press the button should be adequate punishment.
Naturally, potential super-criminals, James Bond villains and right-leaning survivalist nationalist employees should be explained button's real purpose to avoid accidents caused by someone deciding to rid the world of communism during their lunch break.

Re:Quad Graphics 2000 (5, Funny)

FictionPimp (712802) | more than 4 years ago | (#33256660)

Well, where I work some maintenance genius decided that the location of the red button (near the entrance door) was too risky. They said people coming in the door could hit it while trying to turn on the lights.

Their solution? They moved it to behind the racks. So every time I bend down to move or check something I have to be conscious not to turn off the power to the entire room with my ass.

Re:Quad Graphics 2000 (1, Informative)

Anonymous Coward | more than 4 years ago | (#33256802)

Someone needs a Molly-guard

Re:Quad Graphics 2000 (3, Informative)

X0563511 (793323) | more than 4 years ago | (#33256830)

Hmm, if only someone could invent some kind of cover [wiktionary.org] to prevent accidental use...

I think a compounding issue is that the facilities guy (or higher up) is a cheapass.

Obligatory: The Etherkiller (2, Funny)

Anonymous Coward | more than 4 years ago | (#33256594)

The Etherkiller [fiftythree.org]

From TFA (1)

ep32g79 (538056) | more than 4 years ago | (#33256624)

Sure, technology causes its share of headaches, but human error accounts for roughly 70% of all data-center problems.

And 70% of all statistics are made up on the spot.

Re:From TFA (0)

Anonymous Coward | more than 4 years ago | (#33256676)

Sure, technology causes its share of headaches, but human error accounts for roughly 70% of all data-center problems.

And 70% of all statistics are made up on the spot.

According to the Uptime Institute, a New York-based research and consulting organization that focuses on data-center performance, human error causes roughly 70% of the problems that plague data centers today. The group analyzed 4,500 data-center incidents, including 400 full downtime events

70% of all Slashdot readers who make snarky comments don't read the actual article.

Video (5, Funny)

AnonymousClown (1788472) | more than 4 years ago | (#33256644)

Here's a video of a tech worker explaining why these things happen. [youtube.com]

It's very disturbing and you'll see why these things happen.

Re:Video FTW (2, Insightful)

dsoltesz (563978) | more than 4 years ago | (#33256776)

Thank you... you've single-handedly made spending my time on recycled, old digg news completely and totally worth it.

Our University... (1)

arhhook (995275) | more than 4 years ago | (#33256664)

Our University was brought to it's knees when a student in the residents halls was putzing around and accidentally installed a DHCP server on his box. Because the effects were unknown to the student that installed the DHCP server, it took about a day before they knew what was going on and disabled his switchport on the network.

I got a good one too! (1)

debile (812761) | more than 4 years ago | (#33256674)

Someone plugged an home router into the government office where I was doing consulting. (he wanted a switch to plug a networked printer)

The router started giving 192.168.x.x IP to everyone on the floor, soon including a few servers (including the Lotus Notes one)

Took 3 days for the admins to find out the source of the problem and where the router was... abysmal loss of productivity needless to say I gave them a good speech on not routing 192.168 packets on the network and isolating their networks.

Re:I got a good one too! (1)

tagno25 (1518033) | more than 4 years ago | (#33256700)

Took 3 days for the admins to find out the source of the problem and where the router was... abysmal loss of productivity needless to say I gave them a good speech on not routing 192.168 packets on the network and isolating their networks.

The biggest problem there is that the servers where getting their IP from a DHCP server.

Re:I got a good one too! (1)

Bengie (1121981) | more than 4 years ago | (#33256748)

or that the servers are not on their own vLAN with an ACL that doesn't block other vLAN's DHCP

Re:I got a good one too! (1)

ledow (319597) | more than 4 years ago | (#33256760)

And that the switches weren't blocking DHCP from anything but the authorised DHCP server, and that it took 3 days to track down a rogue DHCP server (not hard, you usually get the MAC address in seconds, trace that to a port, disconnect the port and see who shouts that their network connection isn't working - if it's a remote switch on the end of that port, go to that switch, rinse and repeat).

Hell, it would take less that an hour if you just pulled cables at random until that MAC disappeared.

Like most of the things in the story - incompetent admins and IT setups allow human error to be amplified. Seriously, one of them is basically a hospital network not using spanning-tree.

Re:I got a good one too! (4, Funny)

Yvan256 (722131) | more than 4 years ago | (#33256774)

192.168.x.x? That's amazing. I've got the same IPs on my luggage.

Don't forget accidentally triggering the Halon (1)

RogueWarrior65 (678876) | more than 4 years ago | (#33256738)

Way back in the day at the B.U. computer center, the machine room had an extensive Halon fire system with nozzles under the raised flooring and on the ceiling. Pretty big room that housed an IBM mainframe, about a half dozen tape drives, maybe 50 refrigerator-sized disk drives, racks and racks of magnetic tape, a laser printer the size of a small car, networking hardware, etc. etc. One day, the maintenance people were walking through and their two-way radios set off the secondary fire alarm. At that point, you had about 10 seconds to escape. Watching the security camera video afterward was highly entertaining. One moment you saw the operator standing in front of the consoles and the next you saw him bolting out of the double doors.

Re:Don't forget accidentally triggering the Halon (0)

Anonymous Coward | more than 4 years ago | (#33256996)

We've had that happen with our CO2 canisters. Somebody set of the smoke detectors and the guard on duty wasn't at his desk to cancel the 30 second countdown, so out went the power and in went all the CO2 gas. Took us 3 hours to cycle the air in the comp room and get everything rebooted again.

Re:Don't forget accidentally triggering the Halon (0)

Anonymous Coward | more than 4 years ago | (#33257086)

One time, we had a problem with two interns getting trapped in our serving room when one of them managed to trigger the alarm system. Doors locked, air pumped out, replaced by argon gas. Resetting the alarms wasn't a problem, and we managed to evacuate the bodies inside of some carpet rolls before anybody could ask too many difficult questions...

Re:Don't forget accidentally triggering the Halon (1)

Szechuan Vanilla (1363495) | more than 4 years ago | (#33257148)

Bah, amateurs: REAL computer operation personnel can breathe Halon...

Re:Don't forget accidentally triggering the Halon (1)

mrchilly0 (1809392) | more than 4 years ago | (#33257228)

We had a tech in one of our hubs go out for a smoke. He came back in, (this is where the details are sketchy) and he thinks he exhaled his last drag as he walked in the hub. An alarm sounded (later found out it wasn't even the smoke detector) and he hit the dump button. Video showed him tripping over cables as he ran to the OPPOSITE exit since that's the side he parked on. He is the only tech that I know in the company that was subjected to weekly drug tests for 3 months.

My favourite human error - a true story (5, Interesting)

Kupfernigk (1190345) | more than 4 years ago | (#33256782)

This was a server room at an (unnamed) UK PLC. The air conditioning had remote management, and the remote management notified the maintenance people that attention was needed. So someone was sent out, on a Friday afternoon.

When he arrived, most of the staff had gone home and the skeleton IT staff didn't want to hang around. So, they sent him away on the basis that his work wasn't "scheduled".

Everybody came back on Monday to find totally fried servers.

cascade failures (3, Interesting)

Velox_SwiftFox (57902) | more than 4 years ago | (#33256822)

How can this leave out the standard cascade failure scenario?

Trying to achieve redundancy, someone gets what they think is worst-case-30A of servers with multiple power supplies, plugs one power supply on each into one PDU rated 30A, one power supply into the other.

They may or may not know that the derated capacity of of the circuit is only 24A, the data center is unlikely to warn them as they only appear to be using 15A per circuit at most.

Anyway, something happens to one of the PDUs and the power is lost from it. Perhaps power factor corrections (remember the derating?) and cron jobs running at midnight on all the servers that raise the load high simultaneously. Maybe just the failure of one of the PDUs that was feared, causing the attempt at "redundancy".

In any case, all of the load is then put on the remaining circuit, and it always fails. The whole rack loses power.

Power strips (with on/off buttons) are bad (2, Funny)

gavving (1689168) | more than 4 years ago | (#33256860)

So I'm working in this company's datacenter on their networking equipment. But it's installed is such a crappy way that there's a floor tile pulled right next to the rack and the cables are run down into that hole. I'm working around on the equipment and step down into the hole by accident, at that point I notice that it's suddenly alot quieter where I'm standing, I look down and realize I'd just stepped on the power button of a power strip that most of the networking equipment was plugged into. Oh Sh!t. At the time the room was empty except for me, I quickly turn the strip back on. About the time the switches are just finishing coming back up one of the companies IT guys comes in and asks if anything's going on. I look at him a little confused and say "I'm not sure, what's up?". The network's back up by the time they noticed it.... I probably should have admitted it, but no harm, no foul. :)

Re:Power strips (with on/off buttons) are bad (3, Insightful)

Velox_SwiftFox (57902) | more than 4 years ago | (#33256910)

Covering those power strip buttons with a hardened glob fixing them in the "on" position is what an electric glue gun is for.

Re:Power strips (with on/off buttons) are bad (0)

Anonymous Coward | more than 4 years ago | (#33257010)

Those are surge protectors. They are to stop fires if something is drawing too much power, or if lightning hits they save equipment.

Please do not disable them.

OT - Anyone know any LAN mapping software? (-1, Offtopic)

GuyFawkes (729054) | more than 4 years ago | (#33256892)

Just something to scan for example from 10.0.0.1 to 10.0.0.254 and produce a map.

Something like NetworkView, but hopefully free / open, ideally to run on Windows.

data centers 101 (4, Funny)

ei4anb (625481) | more than 4 years ago | (#33256938)

Those data centers in the article sound huge, some may even have up to ten servers!

Re:data centers 101 (1)

cowboy76Spain (815442) | more than 4 years ago | (#33257104)

Well, they are those that probably will have less people and with less experience servicing it.... you can try to manage the first couple of servers with some "flexibility"; when you have hundreds of them everything must be done "by the book" or thing go definitely wrong.

When I got to my current job, a couple of servers (our first rack servers) where installed, and nobody was "in charge" of them. Being myself a guy with initiative, I did the best that I could with them even if I had only experience in programming. The second funny thing I found was that, when one of the mirrored disks failed and I called for a spare, I gave back the good one. The first funny thing is that the backup that the people that did setup the machines didn't really backup anything of importance (it was funny because we found about it just after #2).

Oops... (1)

ReederDa (1874738) | more than 4 years ago | (#33256944)

You've got to admit, although the results were disastrous, someone will remember this and have a good laugh over it. I am now.

Electrical Contractors (1)

EmagGeek (574360) | more than 4 years ago | (#33256964)

I can definitely relate to that one. I've never had one that didn't try to deviate from plan to increase their profit on the job. I've even seen them put breakers in a panel that weren't connected to anything to make it appear as if they ran the circuit, when all they did was piggyback a circuit on another one to save the cost of running the wire. By the time you find the problem, they're long gone.

Gotta watch them like a hawk and make sure they do everything they're supposed to do.

Re:Electrical Contractors (1)

John Hasler (414242) | more than 4 years ago | (#33257262)

> By the time you find the problem, they're long gone.

That's why payment should not be authorized until the work has been inspected and signed off.

Mainframe days story (5, Interesting)

assemblerex (1275164) | more than 4 years ago | (#33256994)

The old tape machines (six foot tall) used to put out a tremendous amount of heat. Space is at a premium, so in the mainframe room the drives were normally put edge to edge,
with one pushing air in and the other pulling air out. The machines had two 10-12" fans per unit, so stacking two or three units was fine. One site had so many machines side to
side (over 7), the air coming out the last machine regularly set things on FIRE. It was not uncommon for the machine to ignite lint going through the stack, with it coming out the
end as a small explosion like dust in a grain silo explosion. A fire extinguisher was kept on hand, and the wall eventually got a stainless steel panel because it was so common.

Re:Mainframe days story (1)

Idarubicin (579475) | more than 4 years ago | (#33257084)

One site had so many machines side to side (over 7), the air coming out the last machine regularly set things on FIRE. It was not uncommon for the machine to ignite lint going through the stack, with it coming out the end as a small explosion like dust in a grain silo explosion. A fire extinguisher was kept on hand, and the wall eventually got a stainless steel panel because it was so common.

I call BS.

Thermodynamics 101: If the air coming out of the last unit is hot enough to ignite things, then what is the minimum temperature of the stuff inside?

I can maybe believe that there was some sort of electrical fault inside that was infrequently arcing (maybe when a dust bunny passed through the fans?) and that might have caused the apparent problem. But there's no way to have functional electronics that are hot enough to ignite organic matter.

Don't forget the classics (0)

coryking (104614) | more than 4 years ago | (#33257014)

I've seen a network brought down when a student (or employee) plugged their toy windows 2000 server into the campus network. Said "server" was configured as a domain controller (or whatever they called it before active directory, it's been a while). Toss in DHCP and their box got DOS'd as the entire campus tried using them for authentication.

Good times. Can you even do that kind of thing these days?

FedEx, get insurance/ship your server (3, Interesting)

AnAdventurer (1548515) | more than 4 years ago | (#33257020)

When I was IT manager for a big retail mfg we had a cross-country move from the SF bay area to TN (closer to shipping hubs and lower tax rates). I was hired for the new plant, and I was there setting up everything (I did not know the company knew next to nothing about technology) and the last thing shipped before the company shutdown for the move was ship the data server via 2 day FedEx. The CFO packed it up and shipped it out, as the driver pulled away from the bay the server fell off the bumper and onto the cement. They picked it up (looking undamaged in it's box). When I opened it there was a shower of parts. A HD drive had detached from the case but not the cable and had swung around in that case like a flail. CFO had NOT INSURED the shipment or taken anything apart. That and much more to save $50 here and there.

DC close to water (0)

jalewis (85802) | more than 4 years ago | (#33257108)

I worked in a datacenter that was two blocks from the harbor. The datacenter is on the second floor, but what the hell do you do if you're in the building and there is a flood, or if you're at home and have to get to the DC? It reminds me of New Orleans, but that didn't stop them from building it.

Big Red Button Story (1)

trydk (930014) | more than 4 years ago | (#33257152)

Many years ago I worked at a mainframe installation (IBM S/360 to give you an idea of my age ;-). The computer was installed at the back of a huge room with plenty of space for expansion. For some incomprehensible reason BRBs (Big Red Buttons) were placed along the skirting board every ten feet or so, which had hitherto not been a problem -- with all the space nobody came near during daily (and nightly) operations.

Every morning at around two AM a guy came with a load of cassettes containing cheques from the banks for clearing. He usually just opened the door to the room and shoved each cassette in to slide, like curling stones, across the floor to the cheque sorter.

And one morning, well ... A cassette decided to slide all the way across the room and unerringly triggered one of the BRBs square on. Half a night's work to be redone.

Data center power (3, Interesting)

PPH (736903) | more than 4 years ago | (#33257156)

Back when I worked for Boeing, we had an "interesting" condition in our major Seattle area data center (the one built right on top of a major earthquake fault line). It seems that the contractors who had built the power system had cut a few corners and used a couple of incorrect bolts on lugs in some switchgear. The result of this was that, over time, poor connections could lead to high temperatures and electrical fires. So, plans were made to do maintenance work on the panels.

Initially, it was believed that the system, a dually redundant utility feed with diesel gen sets, UPS supplies and redundant circuits feeding each rack could be shut down in sections. So the repairs could be done on one part at a time, keeping critical systems running on the alternate circuits. No such luck. It seems that bolts were not the only thing contractors skimped upon. We had half of a dual power system. We had to shut down the entire server center (and the company) over an extended weekend*.

*Antics ensued here as well. The IT folks took months putting together a shut down/power up plan which considered numerous dependencies between systems. Everything had a scheduled time and everyone was supposed to check in with coordinators before touching anything. But on the shutdown day, the DNS folks came in early (there was a football game on TV they didn't want to miss) and pulled the plug on their stuff, effectively bringing everything else to a screeching halt.

This Simply Demonstrates ... (1)

smpoole7 (1467717) | more than 4 years ago | (#33257242)

... that an idiot with his/her hand on a switch, a breaker or a power cord is more dangerous than even the worst computer bug.

(Judging from the houses that I see on my way to work each morning, some people shouldn't even be allowed to buy PAINT without supervision. And we provide them with computers and access to the Internet nowadays!)

(If that doesn't terrify you, you have nerves of steel.)

web guy vs sales dude (1)

nuonguy (264254) | more than 4 years ago | (#33257280)

http://www.youtube.com/watch?v=7wRxASytPuQ [youtube.com] is the most common reason servers go down. Come on, show of hands, how many of you have been a part of a scenario like this?

Ethernet routing loops FTL (1)

Just Brew It! (636086) | more than 4 years ago | (#33257282)

We've had multiple incidents nearly identical to one of the stupid tricks described in the article. One of our (former) techs had a habit of running two cables between the same pair of switches... or even plugging both ends of a single cable into the same switch! Needless to say, neither of these scenarios ends well.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?