How Does DigitalOcean Cloud Services Crashed a Whole Business


What happened this week to RAISUP, shows that alongside the convenience of Cloud Services, you need an experienced & local Partner that will be available 24/7.

Many bits have passed through the network and the cloud Data Centers services has become a fact. Today, most modern products, from mobile applications to enterprise software, rely on cloud infrastructure. This is not just a trend: once a company establishes a product or service on cloud services, it can enjoy unprecedented flexibility – there are almost no falls, crashes or problems because the cloud is always there.

If there is a jump in demand, the “cloud” will adapt itself – that is, will run new servers to meet demand. Once there is a drop in demand, the cloud will automatically reduce the additional servers to save costs. Some cloud services also offer protection from various attacks, not to mention the ease of creating additional development and testing environments.


The cloud is always there, but

Developers and executives have grown accustomed to the fact that the cloud is always there, but an incident involving one of the companies using the digital ocean cloud services, one of the largest and most widely known cloud service providers, shows the dangers of relying heavily on single provider cloud services.

Company Raisup is a small start-up company that is in its initial stage. Although it is operated by two people, it has already acquired very large customers, some of them from Fortune500, which includes the 500 largest companies in the world. Customers use their services to find business collaborations, and the company used the Digital Ocean cloud provider to provide the infrastructure for its platform.

On the face of it, the choice of Digital Ocean is very logical – it is a large and well-known company that provides cloud computing infrastructure. The company invests heavily in connection with the developers and has large development communities that share information andguides regarding the use of the company’s cloud products.On May 29, the programmer behind Raisup, Nicholas Beavis-(Nicolas Beauvais@w3Nicolas , received an email from Digital Ocean stating that in view of suspicious activity, his account was immediately canceled.

digitalBut 2 days ago, @DigitalOcean decided that it was probably malicious (spoiler alert: it’s not) and locked down our whole account. This means that they couldn’t access their droplets, backups and all their services were down.


What happened

The backup operation of one of the scripts running on the cloud infrastructure has sparked suspicion of one of the mechanisms for digital ocean fraud prevention.

They disabled the account, which immediately disrupted all the company’s services. 

The terrified CTO sent countless emails and turned to the company via social media, and after 12 hours Digital Ocean returned the account with an automated mail announcing everything was normal. But that was not the end of the story, because after a few hours the account was turned off again. He repeatedly sent calls to the company’s customer service – inquiries he documented in a series of agonizing tweets.

Meanwhile all the company’s services just did not work. In addition, the backup of services that included customer information was also stored on the Digital Ocean servers, which prevented the CTO from switching to another provider.

Digital Ocean finally informed him that not only did they refuse to provide service to Raisup, they also refused to give access to backups on the company servers, a complete death blow to the company that remained without the infrastructure and without its information. Anyone who posted a recipe for a cake and was blocked from Facebook knows how frustrating it is to send emails and explanations in the knowledge that there is no one to read them. In this case, this is not a social networking account but a whole business – a life that crashes.

We hear and understand your concerns and apologize for how this was handled. We have restored the account and are doing a thorough investigation of this incident. We will post a public postmortem to provide full transparency for our customers and the community.

Fortunately, Levis has several thousand followers on Twitter and is a developer with a presence in the open source community. His outcry caused a major shock and response in the developer community, especially as developers got used to the fact that cloud service was available all the time or 99% of it.

The possibility that a cloud service would eliminate the service from an existing company with a sword and automatically without the support providing a non-automatic and laconic response caused an earthquake among quite a few developers and managers who suddenly realized that their company might be at risk. The voice of the cry quickly reached Digital Ocean, which immediately restored the situation and promised post mortem. For shaken customers, it is not certain that this was enough, and the storm is still continuing.


Lessons learned

Some of the discussions that developed about this case were interesting and instructive.  Most respondents, most of whom are senior executives and corporate managers (some of them very large), agreed on some important lessons:

LESSON 1 :  Make sure that the backups of the company’s information are made elsewhere, so that one disabled account will not destroy the company. If Raisup had been doing regular backups of the cloud service or to a local server or another region (Data recovery), it would not have been in a situation where one arbitrary decision would have shattered its business.

LESSON 2 : Do not connect to critical business services without local partner or if the same business does not give you contact information needed in cases of escalations.This refers to contact information at a higher level than the automated customer service representative – regional or national manager and its manager. As a business that earns money and has growth aspirations, it must demand from its infrastructure suppliers contact details for the Escalation or manage services by local partner that can immediate escalate and reply.

LESSON 3 : Any programmers should not tie the code to the infrastructure, and to write it in such a way that transferring the code to another AZ or region or provider and deploying it there will be relatively simple and does not require rewriting the product.