Monday, April 25, 2011

April 21, the day the cloud was out!

Some blogs percolate for a while, waiting for a good day to be put to paper. For this one - about the cost of reliable cloud - that day was last Thursday, the day a network mishap caused an outage at Amazon Web Services taking down more than 100 customers, many of which cloud providers themselves, in its aftermath, while at the same time a similar mishap at Sony PlayStation´s network stopped about 70 million gamers from connecting.

A lot has been written about the outage, so I won't repeat that here (if you want to catch up suggest you read: TechCrunch for the PlayStation story, GigaOM for an overview of the Amazon issue, eWeek for some interesting analysis and the DotCloud blog for a very readable explanation of day to day use of a public cloud service like Amazon).
Instead, let's focus on the big picture of the cost of reliable cloud, comparing it - again - with the move from mainframe to distributed computing.  History tends to repeat itself, especially in IT where generations of technology tend to be heavily siloed, and staffed with different generations of people who often do not even sit at the same table in the canteen.

Somewhere during the 1980s, IT pros started to realize they could get the same processing power for significantly less money, when selecting distributed servers - till that time used mainly for scientific work - instead of traditional mainframes.  Soon after, companies began porting existing applications to the new platforms, focusing initially on applications that were more compute than I/O or data intensive (sounds familiar?). And indeed, initially the new departmental platform, not requiring all the resource intensive water cooling, air-conditioning or a heightened floor did seem a lot more cost effective. But, it didn't take long after initial proofs-of-concept to find that for some applications we needed more processors or to set up clusters, or uninterruptable power supplies and other redundant features. On the storage front, we started to use the same - not so cheap - storage solutions as we did on the mainframe. And shortly thereafter, the distributed boxes started to look and cost about the same as the systems they were replacing. Let's not forget that at the same time, smart mainframe developers - pushed by the competition from distributed systems - found ways to abandon water cooling, leverage off the shelf standard components like NICs and RISC processors and even re-examined their licensing cost for specific (Linux) workloads.

What does this all have to do with cloud computing? Likewise, we see that running a certain "compute intensive" workload can be done much faster and cheaper in the new environment. But when replacing these one-off batch jobs with services that have higher availability and reliability needs, the picture changes. We need to have redundant copies and failover machines in the data center, and in many cases a backup data center in another part of the country, preferably located on an alternate power grid and connected to multiple network backbone providers. All of a sudden it sounds a lot like the typical set-up a bank would have - and likely with a similar cost profile. So far cloud seems to offer cost benefits, but will this cost advantage still exist if we need to replicate our whole cloud setup at a second vendor - worst case scenario, doubling the cost.  Now cloud has many other advantages beyond cost (elasticity, scalability, ubiquitous access, pay per use, etc.), so many specific use cases are still ideally suited  for the cloud, but if a certain application has no need for these, then it may be worth (re-)considering the business case for a move to the cloud.

Regarding redundancy, the cloud business case actually has two opposing vectors. On one hand, there is the fact that as a user you have less control (you can't fly there and ‘kick' the server) leads to requiring a secondary backup installation. On the other hand, the cloud - with its pay-as- you-go model - offers much more efficient ways to arrange for a backup configuration for running your applications, than finding your own second location and filling it with shiny new kit.

At the same time you have to take into account the likelihood of the cloud having capacity available at the moment you need it. If your own data center fails, I am sure you could find another cloud provider with some capacity. But in this case, where one of the largest (cloud) data centers in North America had issues, all customers, in principle, could be looking to move their workloads elsewhere, it is not certain that enough excess capacity would be available at alternate providers.

In this specific case there seemed to have been technical issues inside Amazon and Sony's data centers that caused a series of events impacting the services running within, but what if there had been a major physical problem, such as a large fire or accident? With more mission critical applications moving to the cloud, companies need to make contingency planning a top priority.

As a result of last week's incidents, enterprises should take stock to assess what services are truly vital to their customers and/or to their own continuity. Organizations (and the world) seem to be more resilient than you might expect. In the Netherlands we saw Telcos ceasing mobile service in a large part of the country  for periods of up to half a day, with ISPs unable to offer Internet services for as long as a week in certain regions. And yet, these companies did not go under. Each industry, government and organization will have to assess its own priorities and success metrics/criteria and standards. As I mentioned in a past blog post, aiming to continue your on-demand video services after a mayor flood, hurricane or nuclear disaster may be over-shooting what is necessary under those circumstances - survival will be the only concern in a case like that.

Looking at the reactions in the market so far, the responses of the vendors impacted by the Amazon mishap seem to be a lot more benign that the reactions of impacted PlayStation gamers. Maybe because these vendors feel they have a good thing going and the last thing they won't to do is kill it by too much honesty. What is refreshing is that not too many vendors crawled out of the woodwork saying "private cloud, private cloud, private cloud" -- not even my colleagues who recently published a book on the topic. Good!  Nobody loves "I told you so" types and there's no point in kicking your opponent when he is down. But the case for private cloud did get a bit better, or is it just me thinking that?

Thursday, April 21, 2011

5 Simple Rules for Creating a Cloud Strategy

Over the last 18 months, the media, technologists, analysts, CIOs and CEOs have all been talking about the cloud. Now, most organizations are embarking on some form of cloud computing, but as always, technology is the relatively easy part. For those of you wanting to keep your feet firmly on the ground, and looking to set your direction and strategy, here are some simple guidelines to help you along the way.

CCL image courtesy of deleted.scenes - http://www.flickr.com/photos/dephineprieur/3571763921/sizes/s/in/photostream/
1. Start your cloud strategy (and any cloud project) with an exit strategy
Now this may seem like contradictory advice, but there is a great risk of vendor lock-in with cloud computing than there is with traditional on premise software, and I'm sure that you will be aware of the downside of vendor monopolies such as high cost, low responsiveness and inflexible vendor business practices, which result in vendor lock-in and prohibitive switching costs.
A second important reason to avoid vendor lock-in to any cloud computing services is that unlike today, where software is purchased and then used to deliver services to customers, in the cloud era organizations are directly dependent on external providers to deliver those services. The impact of a breakdown or contract termination of as a service delivery is much more immediate to the business, it is therefore imperative to have a plan B.
Ideally you will architect your cloud services and contracts in such a way, that you can move to an alternative (Plan B) within a reasonable timeframe. A definition of reasonable depends on the type of business and may vary be between six months and six seconds. Standards, although currently just emerging, will play a crucial role and I recommend that you consider any implementations that are not based on such standards as temporary. Meanwhile the automation capabilities of vendor neutral management tools can help enable such exit strategies in a cost effective manner.


2. Design IT as a supply chain
Although deceivably simple, this analogy will help to change both the way the IT organization thinks and acts and, how other departments perceive IT. Supply chain thinking allows you to both buy and build; it allows you to make all your decisions in the context of what your company needs in order to deliver services to its customers. It allows you to look at what your IT department owns in-house but also all the services you sourced externally too. A good rule of thumb in setting your IT supply chain strategy is the lean mantra: Only do what adds value to your customers and remove any steps/activities/processes that don't add value or aren't legally required.
A supply chain approach means dynamically balancing resources (both internal and external) against rapidly changing goals and constraints towards an optimal end result. This is a very different game from traditional IT where IT operations avoided changing anything that was not broken to "keep the lights on" in the most reliable and stable manner possible. Just as in a car factory the product mix changes constantly, new products are introduced while others are phased out, the supply chain will enable IT to switch services on and off as and when required.

3. Use a portfolio approach for deciding WHAT to move to the cloud
A good cloud strategy is as much about WHAT to do as about HOW to do it. A portfolio approach helps you identify which services could be moved to the cloud and deliver cost savings and agility to the business, versus the more business critical services which - at this time - are less desirable to move to the cloud.
A portfolio approach needs to begin with the strategic goals of your organization in mind. These goals may vary from becoming customer focused to launching products in emerging markets or; reducing cost within the business.
Once the strategic goals are identified, these need to be matched to business or market constraints for example legislation, resources, geography or finance. The final step would be to map the goals and constraints to existing IT services and your cloud based opportunities.
From this exercise you will produce a roadmap of what you need to do; how you allocate human, financial and technical resources to deliver the most critical services and, monitor progress against your plan.
IT portfolio planning is not a one-off exercise and not something that can implemented overnight. It will constantly evolve as the business evolves and as IT gains maturity and experience in consuming and deploying cloud services.  It is a good starting point to determine  your "low hanging cloud fruit" which could add considerable value in terms of benefits or cost savings quickly and, those services which are too business critical to be put into the cloud.


4) Make service costing a core competency
If cloud computing it going to achieve one of the goals it is heralded for, then it needs to remove unnecessary cost - not just move CAPex to OPex.
Regardless of whether a service is completely rendered in house, composed from various sourced components or procured completely as a service, it is essential to understand the exact cost characteristics and the impact on the overall cost of doing business.
IT needs to be able to determine the cost of each service delivered rather than just the cost of individual IT functions. For example, services may include payment processing, issuing tickets, sending invoices, creating an order, facilitating video conference calls and delivering online training courses.
In electronics manufacturing the heads of production are not interested in how much the company spends in total on plastic versus aluminum or copper, but they want to know whether they can offer their new product at a competitive price compared to their competitors.  In the same way, IT needs to prepare itself for such discussions, can they deliver services at an optimal cost, and if not, can they suggest a viable alternative?
Cost reduction is not the only reason or benefit when adopting cloud computing, increased accountability and agility and reliability are a few other areas which cloud computing can really impact positively. So whilst increasing transparency in the cost of IT services will give you insight into areas to optimize from a cost perspective, you may need to balance this view with potentially needing to increase investment in the short term for greater business agility.


5. Treat security as a service
Security or to be precise "fear of a lack of security" is consistently cited as the biggest barrier to cloud computing. However, a recent study conducted by Management Insight and sponsored by CA Technologies showed that 80 percent of mid-sized and large enterprise organizations have implemented at least one cloud service, with nearly half saying they have implemented more than six cloud services. This adoption is happening even though 68 percent of respondents cite security as a barrier to cloud adoption. These survey results indicate that for now cost and speed of deployment are leading reasons for cloud adoption and are strong enough to offset the perceived risks associated with deploying cloud services. This may be a sufficient for now, but as increasingly sensitive data and applications are selected to migrate to the cloud, organizations will quickly reach an impasse.
The security challenge with the "supply chain" model is an organization's ability to dynamically control access of a variety of users across a changing portfolio of applications running internally and externally from multiple suppliers.
The cloud computing model demands a rethink about how security is approached- making it an enabler instead of an inhibitor. The challenge is, not denying access to everyone, but letting the right people access the data they need to have access to (and no more than that) and to actively control the usage to prevent any out-of-the-ordinary activities.

Now having shared some advice and suggested first steps to start planning, I'd like to add one last thought.
The appropriate speed for deploying cloud computing depends on the culture and current state of your organization and realistically you aren't going to be in the cloud tomorrow. The US Federal government issued a directive called "Cloud First." This is basically a type of "Comply or Explain policy" which states that cloud options should be evaluated first (comply) unless there are significant reasons not to do so (in which case departments should explain). Pushing the accelerator that hard might not be everyone's cup of tea, but whatever you do: don't rush in, but also make sure you do not get left behind!