A Brief History of Cloud Computing and Security
According to recent research, 50% of organizations use more than one public cloud infrastructure vendor, choosing between Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform and a series of others. 85% of those using more than one cloud infrastructure provider are managing up to four1, seeking the best fit for their applications and hedging against downtime and lock-in to specific providers. Naturally, with any trend in cloud adoption, the practice of Shadow IT plays into this fragmentation. As we look at an evolution like this, it is helpful to first understand historical precedent that lead us to this point in time, to learn from the past and remind ourselves of the progress made by others that we now enjoy. Let’s take a brief trip through history, here with cloud computing, to see how we arrived at our multi-cloud reality.
The Space Age and Dreamers
Around the same time, John F. Kennedy inspired the United States with his decisive proclamation that “We choose to go to the moon!”, leaders in computer science were dreaming of a terrestrial future with similar aspirational bounds. While working at the U.S. Pentagon’s Advanced Research Projects Agency (ARPA, now known as DARPA), then Director of the Information Processing Techniques Office J. C. R. Licklider wrote a memo to his colleagues describing a network of computers which spoke the same language and allowed data to be transmitted and worked on by programs “somewhere else”2. From his 1963 memo:
‘Consider the situation in which several different centers are netted together, each center being highly individualistic and having its own special language and its own special way of doing things. Is it not desirable, or even necessary for all the centers to agree upon some language or, at least, upon some conventions for asking such questions as “What language do you speak?”’3
‘The programs I have located throughout the system…I would like to bring them in as relocatable binary programs…either at “bring-in time” or at “run-time.”’ 3
“With a sophisticated network-control system, I would not decide whether to send the data and have them worked on by programs somewhere else, or bring in programs and have them work on my data. I have no great objection to making that decision, for a while at any rate, but, in principle, it seems better for the computer, or the network, somehow, to do that.”3
Here he is describing the pre-cursors to the internet, and our now ubiquitous TCP/IP communication language that allows a myriad of connected devices to speak with each other and the cloud. His prediction for bringing in programs at “run-time” is all-too-familiar today with our browser-based access to the cloud applications we use, and the foresight even to predict that the physical location of those programs would not matter – leaving it up to a computer, or network to decide how to allocate resources properly.
Shared resources also sparked concern for Licklider:
“I do not want to use material from a file that is in the process of being changed by someone else. There may be, in our mutual activities, something approximately analogous to military security classification. If so, how will we handle it?” 3
While we have solved the challenge of collaborative editing in cloud applications today, his sights pointed to an issue which would eventually become of paramount importance to the information security community – how to handle sensitive data held in a location you do not physically own.
C. R. Licklider’s predictions quickly transformed to reality, and through further efforts at ARPA resulted in the first iteration of the internet, or ARPANET. His inspiration to the development of the internet and cloud computing is undeniable, and like the title of his memo quoted above, “Memorandum For Members and Affiliates of the Intergalactic Computer Network”, aspires to greatness beyond what many think is possible.
Virtual (Computing) Reality
In parallel to the effort made by ARPA and many university collaborators to connect computing devices together, IBM was developing a way to make their large, “mainframe” computers more cost efficient for their customers. In 1972 they released the first iteration of virtualized computing, the VM/370 operating system.4 From the 1972 program announcement:
VM/370 is a multi-access time-shared system with two major elements:
- The Control Program (CP) which provides an environment where multiple concurrent virtual machines can run different operating systems, such as OS, OS/VS, DOS and DOS/VS, in time-shared mode.
- The Conversational Monitor System (CMS) which provides a general-purpose, time-sharing capability.
Running multiple operating systems through the control program, akin to today’s concept of a hypervisor, on one mainframe computer dramatically expanded the value customers could gain from these systems, and set the stage for virtualizing data center servers in years to come. Time-sharing through the CMS gave users an ability to log in and interact with the individual VMs, a concept still used today in virtualization software and anytime you log in to access a cloud service.
Through the 80’s and 90’s, the rise of personal computers took much attention away from the development of mainframe and early datacenter computing environments. Then in 1998, VMware filed a patent for a “Virtualization system including a virtual machine monitor for a computer with a segmented architecture”5 which was “particularly well-adapted for virtualizing computers in which the hardware processor has an Intel x86 architecture.”5 – starting sales of their technology a year later. While others entered the virtualization space at the same time, VMware quickly took the lead by focusing on the difficult task of virtualizing the widely used x86 architecture, expanding the value of many existing datacenter infrastructure investments.
Cloud computing would likely not exist without the resource efficiency of virtualization. Commercial offerings like Amazon Web Services (AWS), Microsoft Azure, and others achieve their economies of scale through virtualized infrastructure, making high-end computing affordable (and sometimes free) for just about anyone.
With no ties to hardware, the abstraction from physical location Licklider predicted begins to meet reality. Applications can exist anywhere, be accessed from anywhere, and be moved as needed, allowing cloud operators to update underlying hardware without downtime for the services they run. Abstraction from physical location also enables virtualized software and infrastructure to exist far from you – and your country. The predicament of cross-border data regulation is a developing issue, with the E.U.’s General Data Protection Regulation (GDPR) taking arguably the broadest reach to date.
Everything Over the Internet
If you’re an enterprise organization running a datacenter in the late 90’s, starting to virtualize your infrastructure makes economic sense. With 20/20 vision we see in retrospect this also created an excellent business model for commercial vendors to build out virtualized infrastructure and offer software to others, who would be willing to pay less upfront for access than to host and maintain it themselves. Salesforce.com jumped on this opportunity early, taking on the likes of Oracle and SAP for the CRM market in 1999.
In 2003, engineer Benjamin Black proposed a new infrastructure for Amazon.com which was “…completely standardized, completely automated, and relied extensively on web services for things like storage.”6 also mentioning “the possibility of selling virtual servers as a service.”6 CEO Jeff Bezos took notice, and in his own retrospect, commented that:
“…we were spending too much time on fine-grained coordination between our network engineering groups and our applications programming groups. Basically, what we decided to do is build a [set of APIs] between those two layers so that you could just do coarse-grained coordination between those two groups.”7
“On the surface, superficially, [cloud computing] appears to be very different [from our retailing business].” “But the fact is we’ve been running a web-scale application for a long time, and we needed to build this set of infrastructure web services just to be able to manage our own internal house.”8
That infrastructure build-out eventually turned into a commercial service in 2006, with the launch of Elastic Compute Cloud (EC2) from Amazon Web Services (AWS). From their 2006 announcement:
“Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud…designed to make web-scale computing easier for developers. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use.”9
The early success of Salesforce.com, Amazon, and several others proved the economic model of delivering services over the internet, firmly cementing cloud computing as a viable method to interact with software and computing infrastructure. Rapid growth in cloud services resulted in vast amounts of data landing in the hands of organizations who did not own it – and couldn’t practically be held liable for how it was accessed. Cloud Access Security Brokers (CASBs) were first proposed in 2012, offering visibility over where cloud data was located, protection for it within services, and access controls. While CASB is a logical countermeasure to cloud data loss and compliance, many IT organizations are still in the early stages of testing.
Enter the Multi-Cloud
With the release of Microsoft Azure in 2010 and Google Cloud Platform in 2011, attractive alternatives to AWS entered the market and spurred experimentation. It was inevitable for competition to arise, but created a scenario where choosing just one provider wasn’t necessary, or even beneficial. Linux provider Redhat puts it well:
“You might find the perfect cloud solution for 1 aspect of your enterprise—a proprietary cloud fine-tuned for hosting a proprietary app, an affordable cloud perfect for archiving public records, a cloud that scales broadly for hosting systems with highly variable use rates—but no single cloud can do everything. (Or, rather, no single cloud can do everything well.)”
Fault tolerance can also come into play, with select but major cloud outages proving that adding redundancy across multiple cloud providers can be a sound enterprise strategy. The most pertinent question that has arisen out of this trend however, is how to manage it all? Manual configuration of multiple cloud environments is naturally going to be a time-consuming effort. To speed deployment time, the concept of infrastructure-as-code (alternatively “programmable infrastructure”) was developed, evolving the nature of cloud computing once again. Author Chris Riley describes the concept:
“…instead of manually configuring infrastructure you can write scripts to do it. But not just scripts, you can actually fully incorporate the configuration in your application’s code.”
Commercial vendors like Puppet Labs, Chef, and Ansible have built technology on this premise, allowing for automated deployment across multiple cloud providers. For security, the challenge of fragmentation is similar, but so are the solutions. Data and applications need to be protected from external and internal threats, even misconfiguration. AWS, Azure, and Google all have well-documented divisions in the shared security responsibility between themselves and the customer.