[DM-MUG] Fwd: Re: [MacLaw] Sidekick Customer Data Lost in the Clouds

Tue Oct 13 16:59:58 CDT 2009

I just love that some of the words are censored, and some are not.

This certainly has more info than I have heard before. It also is the  
type of thing we SAN admins fear. It takes three days to transfer my  
data, and that is only my tiny 9 TB SAN (also with a single LTO-3 tape  
drive, which slows things down.)

I'm quite surprised. All I've heard is MS is saying that the data is  
gone. I realize that it is better to under-promise and over-deliver,  
but also figure that they would have under-promised a little less,  
considering they have people believing their data is _gone_.

As for taking out the parity: that's what would keep me awake, if I  
didn't have backups. Hardware failures can cause other hardware  
failures. Many hardware failures spell trouble. SANs are a _lot_ of  
hardware. Mine is 24 Hard Drives in two RAID arrays, with a Fiber  
Channel Switch, an Ethernet Switch, and three XServes with two hard  
drives apiece. And that's a little SAN.

As for not telling the client that things died for four days- that is  
unconscionable. They should have had a second device on order within  
ten minutes, so when they did tell MS, they could say that it was  
already on order and would be here within X number of hours.

Screwing up this badly _should_ put EMC out of business. I would  
cancel any contracts with them, if I had them. Anyone want to buy the  
Absolut building?

On Oct 13, 2009, at 4:25 PM, Victoria L. Herring wrote:

>> Here's what my source inside the situation is saying.  Pretty sure  
>> it's not
>> *entirely* MS' fault, even if they're getting the majority of the  
>> blame. :
>> Danger, purchased by Microsoft, was moved into a Verizon Business  
>> datacenter
>> in Kent, WA a short while ago. While this had to do with the MS
>> assimilation, it was done as a one for one move from Danger to a DC  
>> that MS
>> uses heavily. (MS didn't re-write, port, migrate to winblows, etc.)  
>> The
>> backend service uses a variety of hardware, load balancers,  
>> firewalls, web
>> and application servers, and an EMC SAN (Storage Area Network,  
>> think huge
>> drive array connected with fiber.)
>>
>> Well last Tuesday, the EMC SAN took a dump on itself. What I mean  
>> by that is
>> the backplane let the magic blue smoke out. While usually in the  
>> heavy iron
>> class of datacenter products like an EMC SAN this means you fail  
>> over to the
>> redundant backplane and life continues on. Not this time folks. In  
>> the
>> process of dying, it took out the parity drives. What does that  
>> mean? It
>> means the fancy RAID lost it's ability to actually be a RAID. How  
>> much data
>> got eaten by this mega-oops? 800TB. Why wasn't it backed up? It  
>> was, to
>> offsite tape, like it's supposed to. But when the array is toast,  
>> can't just
>> start copying shit back.
>>
>> Apparently EMC has been on site since Tuesday, but didn't actually  
>> inform
>> Danger/MS that their data is in the crapper until Friday afternoon.  
>> On top
>> of that, EMC has done nothing to bring in replacement equipment  
>> between
>> Tuesday and Friday. (In the Enterprise support world, that's fucking
>> retarded, multi-million dollar support contracts are that expensive  
>> for a
>> reason.)
>>
>> So what's being done? Well the good news is that the complex was  
>> slated to
>> be migrated into the Verizon Business cloud services (not MS's  
>> cloud per se,
>> but it's MS's effort.) And as a part of that migration a newer  
>> shinier SAN
>> array was in process of being implemented. But space isn't ready  
>> for it on
>> the datacenter floor, and you can't just toss the EMC raid and  
>> place this
>> one in it's place, it's a different vendor and is 2 racks instead  
>> of one.
>> This means it's being shoehorned into a different part of the  
>> datacenter
>> than was originally planned, one that doesn't have the necessary 3  
>> phase
>> power installed. So there's a bit of work to be done. Not to  
>> mention the
>> restoral of 800TB of backup data from offsite tape.
>>
>> Time to restoral? Looking like Wednesday at the earliest with techs  
>> working
>> all weekend.
>>
>> Lessons to be learned?
>> *Buy a f'n phone that doesn't store it's address book and your  
>> personal data
>> somewhere else, and one you personally can backup yourself.
>>
>> *Don't expect EMC to actually respond to fixing your core business
>> application in any reasonable amount of time. They've gotten lazy,  
>> consider
>> other vendors.
>>
>> *Just because your phone says T-Mobile on it, and T-Mobile is  
>> crediting you
>> a month's service, doesn't mean they fucked up.
>>
>> *Just because Microsoft is involved, doesn't mean MS f**ked up.
>>
>> *And lastly, it's not always a "server" that f**ks up.
>>
>
> -- 
> Victoria L. Herring, Des Moines, Iowa. Blogs:
> http://blog.JourneyZing.com  [photography];
> http://www.herringlaw.com  [civilrights/discrimination];
> http://victorialherring.typepad.com/serendipity/  [personal].
> _______________________________________________
> DMMUG mailing list
> Use this Address to send mail to the list:
> DMMUG at dmmug.org
> Use this page to modify subscription options:
> http://cialug.org/mailman/listinfo/dmmug