Woah. This is astonishing. T-Mobile is telling all users of Sidekick mobile devices that all of their data has been lost, and is blaming a server failure at Microsoft’s Danger subsidiary. These devices automatically store their data in the cloud and restore it automatically from the cloud on power loss – at least, they used to. Read the full article here. Making the situation worse is the fact that resetting the device is often the first troubleshooting action that is performed on these devices when something goes wrong, and the service has been unavailable for over a week so many devices were reset in the course of troubleshooting the outage.

This is unreal. I understand that small businesses don’t have the hardware or resources to test their disaster recovery plan (although they really should take the time to create one). This is different, though. This is big business. Their core business is keeping their customer’s data.

Even if all of their disk based, online replicas and backups failed, you would expect them to have some longer term archives. We may never know the full circumstances behind this loss but I wanted to take a moment to preach about a pet peeve: dependence on online backups.

I’ll start with the lowest level. RAID. I think we’re all on board with this but, just in case: RAID is not a backup strategy. Assume a hypothetical RAID which gives you 100% disk availability. RAID offers you no protection at all from application errors, viruses, user error, malicious users, or corruption.

Replication, online copies, snapshots, and many other online processes suffer from the same weaknesses if not properly planned. One of the worst case scenarios is a rogue administrator. Careful plans need to account for the possibility of a rogue administrator trying to remove data. Compromised systems and virus outbreaks are similar situations.

Again, we don’t have a definitive answer yet about what happened at Danger, but it’s hard to imagine a scenario that off-site backups as part of a comprehensive disaster recovery plan would have failed to resolve.

Update 10/16/09 – Microsoft announced that most of the data was going to be recovered. Interesting. I really hope we get a case study out of this. I can only speculate that they are shelling out some big bucks to some data recovery consultants at this point. I hope we find out someday.