The future of SmartCloud Enterprise+ aka Cloud Managed Services

Because of the recent SoftLayer acquisition, SmartCloud Enterprise (SCE) was sunset January 31st 2014. There were just to many overlaps between the two offerings and SoftLayer seemed to me the more mature platform with more functionality. So, that SCE was stopped (and functionality like SCAS is merged into SoftLayer) was not much of a surprise.

But what does this mean for SCE+ – or, what should it mean for it?

First of all, it means a name change. As announced on this year’s Pulse (IBM’s cloud conference), SCE+ will be rebranded to Cloud Managed Services (CMS).

Second, the good news, CMS/SCE+ will stay with a strong roadmap at least until 2017 (well, the roadmap is specified until 2017, so it is very likely that CMS/SCE+ will stay even beyond 2017. But who knows what happens in IT in the next 5 years 🙂

But why two offerings anyway?

In an essence, SCE+ and SoftLayer are positioned the same way as SCE+ and SCE were originally positioned:

  • SCE+ for cloud enabled workloads
  • SoftLayer for cloud native workloads

CloudNativeVsCloudEnabled

To understand this positioning a little bit better, lets discuss the current capabilities of each offering and the current planned roadmap items:

SoftLayer

SoftLayer provides a highly flexible IaaS platform for cloud centric workloads. The underlying infrastructure is highly standardized and gives full control to the client for everything above the hypervisor including the operating system. Even if the client subscribes to one of the offered management options, it mainly means, SoftLayer is providing limited management tasks on a best can do basis, but without real SLAs and the client maintains full admin access to its instances.

The platform provides high flexibility, so all kind of possible setups can be implemented (by clients), but the responsibility for a given setup remains at the client, not at SoftLayer. In a nutshell, SoftLayer provides an IaaS environment with a very high degree of freedom and control for clients without taking over responsibilities for anything above the hypervisor.

These capabilities fit well for cloud centric or self-managed development (DevOps) workloads, but less for traditional high available workloads like SAP.

Cloud Managed Services (formally known as SmartCloud Enterprise+)

CMS on the other hand was designed and built to meet exactly the requirements of high available, managed production workloads originally hosted in client’s datacenters. CMS provides SLAs and technologies for accomodating high available workloads like clustering and disaster recovery (R1.4). While D/R setups can also be created on SoftLayer, the client must design, build and run them, but can not receive them as a service. This is the main differentiator in the SoftLayer / CMSpositioning. CMS is less flexible above the hypervisor, as it provides managed high available operating systems as a Service with given SLAs. To meet these SLAs, standards must be met and the underlying infrastructure must be technically capable to provide these SLAs (Tier 1 storage).

Due to the guaranteed service levels on the OS layer, this is IBM’s preferred platform for PaaS offerings of rather traditional software stacks like SAP or Oracle applications.

Summary

There are a lot of usecases were SoftLayer does not fit and CMS is the answer to fulfull the requirements. Based on the very clear distinct workloads for SoftLayer and CMS, there are no reasons to think about a CMS retirement.

IBM SmartCloud Enterprise+ 1.3 in a nutshell

On November 19, 2013, IBM SmartCloud Enterprise+ (SCE+) version 1.3 was released. While every new SCE+ release has brought some interesting improvements, I’m particularly excited about 1.3. Tons of new features and improvements were implemented, making it worth having a closer look at the highlights of this version of SCE+.

Completely new portal. Lets be polite, but the old portal had major room for improvement. The new portal was completely rewritten and now meets requirements clients have for such an interface.

New virtual machine (VM) sizes. New standard configurations were introduced including Jumbo for x86 VMs. However, what is even more important are the new maximum possible configurations for a single VM, which can be:

  • Up to 64 vCPUs
  • Up to 128 GB RAM
  • Up to 48 TB storage

These new configurations can enable more workloads to run on SCE+.

Clustering. Even more workloads can now be enabled because of the new clustering options. Clients can choose between operating system (OS) based clustering (for all on SCE+ supported operating systems and platforms) or simple anti-collocation which enables clients to cluster VMs on the application level. Anti-collocation means that two VMs will not be provisioned on the same physical host to ensure availability of at least one node in case a host goes down.

It is important to mention that service level agreements (SLAs) are still based on the individual VM, so there is no aggregated SLA for a cluster.

Anti-collocation (and clustering) does not guarantee that the physical hosts are based in different physical buildings. Even in dual-site based SCE+ data centers, the different nodes of cluster might still be located on one site. This could potentially be removed in a later release of SCE+.

Unmanaged instances. Clients can request unmanaged virtual machines on SCE+ with the following limitations:

  • Managed VMs cannot be transformed to unmanaged ones (or the other way around)
  • Clustering is not available on unmanaged VMs
  • Unmanaged VMs must be based on available SCE+ images; there is still no way to import custom images
  • Migration services are not available for unmanaged instances

Migration services. Migration services for x86 and IBM Power System platforms can now be contracted as an optional part of an SCE+ contract.

Active Directory integration. SCE+ now supports the integration of complex Microsoft Active Directory setups including a client-specific isolated domain or even joining (managed) VMs to the client’s active directory (AD) forest.

Database and middleware alerting and management. Beside the management of the operating system, clients can now choose database and middleware management as an option in two flavors:

  • Alerting only. The client maintains responsibility, but will be alerted by an automated monitoring system in case of failure.
  • Management. IBM provides management for selected database and middleware products (mainly IBM DB2 database software, MS SQL, Oracle, Sybase and IBM WebSphere products).

Custom hostnames and FQDN. Custom hostnames and full qualified domain names (FQDN) can now be chosen during the provisioning of a server VM.

Load Balancer as a service. Beside the currently available virtual software load balancer (vLBF), load balancing as a service is also available. The new service is based on industry leading hardware appliances and provides features like SSL offloading. Currently load balancing is only supported in a single site.

Increased number of security zones. Although three security zones remain standard, clients can request up to 12 security zones if required by the design of their environment when onboarding. Additional security zones can also be requested after boarding through an Request for Service (RFS), but provisioning is then subject to availability. However, there is a hard limit of 12 security zones per client.

Summary

SCE+ 1.3 is a milestone in terms of features and new possibilities. It enables a lot more workloads to be supported on SCE+ and SCE+ based offerings like IBM SmartCloud for SAP (SC4SAP) and IBM SmartCloud for Oracle Applications (SC4Oracle).

IBM SmartCloud Enterprise+ disaster recovery considerations for DB2

Disaster recovery on IBM SmartCloud Enterprise + (SCE+) is usually referring to infrastructure based disaster recovery. Disaster recovery (DR) solutions on the infrastructure level intend to replicate while virtual machines (VMs) include all data from the main production site to the DR site. The advantage of that kind of solution approach is that if a disaster occurs, an exact copy of the production environment including all OS settings and patches is available on the DR site. The VMs on the DR site can than be started and take over the load quite seamlessly (beside the nasty reconfiguration of site specific network settings like IP ranges).

It is planned to provide an infrastructure as a service (IaaS) DR solution as part of the SmartCloud Enterprise + offering in a later release.

Although IaaS DR solutions do their job well, they are rather expensive and complex. Mirroring complete virtual machine images does not only cost a lot of storage space but also appropriates network bandwidth and traffic. So, the question that solution architects should ask is if IaaS based DR is really required!

In many scenarios, a more cost effective and less complex solution is, to consider application level disaster recovery. Let’s take DB2 as an example for many middleware or applications that provide the ability to be either clustered on application level or keep a cold standby aside. DB2 allows us to leverage its HADR function to collect all database update operations and queue them for distribution to other nodes.

Those collected updates can be sent over the network, using a variety of technologies or protocols. The interval between send operations depends on the recovery point objective (RPO) target. A shorter interval between send operations provides a better RPO but might generate more data traffic.

DB2 HADR setup

DB2 HADR setup

Another advantage of such a setup is the ability to provide fail over tests easily. Because the DR systems are up and running all the time, their proper function can be tested at any time by just accessing them.

However, the drawback of such an application level DR solution is the fact that the standby system is required to be up and running all the time and it must be ensured that all updates to the primary system itself (like changed configurations and software patching) are also done on the standby servers.

Summary

 Application level disaster recovery is not the solution for each and every scenario, but can be a valid, cost effective and less complex alternative. Sometimes a combination of infrastructure and application level DR might be the best solution for an environment.

Making SCE+ elastic

IBM SmartCloud Enterprise+ (SCE+) is a highly scalable cloud infrastructure targeted for productive cloud enabled managed workloads. These target workloads make SCE+ a little bit heavier than SmartCloud Enterprise (SCE). The boarding of new clients alone takes a reasonable number of days to establish all the management systems for that client. This somehow applies to provisioning virtual machines, too. Due to the service activation process that is required to be completed before a managed system can be put into production, the provisioning time of a virtual instance takes at least some hours instead of just minutes on SCE. However, that is still a major improvement compared to a traditional IT where the provision of a new server can take up to several weeks!

The benefit of SCE+ is the management and high reliability of the platform. This makes it the perfect environment for core services and production workloads that can easily grow over time to meet the business requirements of tomorrow. Even scaling down is possible to a certain extent. However, if you need to react on heavy, short peaks of load, you need a platform that is truly elastic and can scale up and down on an hourly basis.

What you need is an elastic component, put on top of SCE+ to leverage SCE+’s high reliability for the important core functions of your business application, but have the possibility to scale out to a much lighter – and probably cheaper – platform for short peaks to cover the load. SCE provides all the necessary features you would expect from that elastic platform.



As shown in the video, f5’s BIG IP appliance can be used to control load on servers and scale out dynamically to SCE as required. The base infrastructure could be SCE+ as well as any other environment. However, what is required is a network connection between SCE and SCE+ as this is not part of any of the offerings.

The easiest way to establish a layer three connectivity between SCE and SCE+ is to create a Virtual Private Network tunnel between the two environments. Several options for that are feasible:

  • Connecting a software VPN device or appliance on SCE to the official SCE+ VPN service
  • Connecting a hardware VPN device or appliance on SCE+ to the official SCE VPN service
  • Use the Virtela Meet VPN service to interconnect VPN tunnels from both environments.

Summary:

SmartCloud Enterprise can be used to make SmartCloud Enterprise+ elastic. Several options exist to interconnect the two environments and leverage the benefits of both environments for specific use cases!

The future of managed cloud services

At the very beginning, there were unmanaged virtual machines as a service. These services were mainly used for development and test purposes or to host horizontal, born on the web workloads. For more traditional production workloads another type of cloud service is required. This new service needs to be more fault tolerant to provide the required availability for so called cloud enabled workloads, but also provide a certain management capability to ensure a stable operation. Managed infrastructure as a service (IaaS) solutions such as IBM’s SmartCloud Enterprise+ provide these service levels and capabilities.

But is this the future of managed cloud services?

I don’t think so, although this model fits for some workloads, it is just the start of the journey.

The ultimate goal of managed cloud services is to receive higher value services out of the cloud. In my opinion, the future of managed cloud services are platform as a service (PaaS) offerings.

Todays managed IaaS offerings cause some challenges, both for the service provider as well as for the service receiver. One reason for that is the higher complexity due to the shared responsibility of an IaaS managed virtual machine (VM). In this scenario the service provider manages the VM up to the operating system level, but the client is responsible for the middleware and application on top. In such a setup it is extremely hard to make a clear separation of the operating system (OS) from its higher services. Many applications require OS settings to be set or access to operating system resources.

A managed PaaS overcomes these challenges by providing a consistent software stack out of one hand. All components are designed for the purpose of this platform. If the PaaS service is a database, OS settings, IO subsystem and storage are designed and configured to provide a robust service with decent performance. The client can rely on the service providers expertise and does not need to support a database – or any other middleware – on an operating system platform he has no real insight on its configuration and not full control over its settings.

In one of my previous blog posts on how Middleware topology changes because of cloud, I discussed the change in middleware and database architecture. This is exactly where PaaS comes into play. By moving away from the consolidated database and middleware clusters, clients require agile, elastic and managed databases and middleware as a service to be able to handle the higher number of instances.

Summary

Due to the complexity and limitations, managed IaaS will become a niche offering for accommodating non standard software. For middle-ware and database with a certain market share, managed PaaS offerings will become commodity.

Active Directory on a managed IaaS

In a hybrid cloud environment, parts of the infrastructure are located in a public or shared cloud environment whereas other parts are in a different environment, either on a private cloud or on a traditional infrastructure. As long as this is all managed by one service provider, there is not much of a problem. But usually that’s not the case.

While servers located in the traditional infrastructure are often managed by the client himself, the servers that are hosted in a managed shared-cloud environment are operated by the service provider of that cloud. As long as we are talking about a managed infrastructure as a service (IaaS), that is up to the operating system level. Everything beyond the operating system is normally in the responsibility of the client himself because he knows the combination of middleware and application best.

This setup leads to all sorts of challenges. For all servers in the cloud we have a strict responsibility boundary, however, the layers above the OS are highly dependent on the OS settings and it is indeed very hard to isolate impacts of changes done in one layer to the other layer. The situation gets even more challenging when we talk about services which span not only the responsibility boundaries of a single host, but also over different environments (public/shared cloud and private cloud/traditional IT).

Microsoft Active Directory currently gives clients and service providers some headaches.

Lets briefly scan the interests of the different parties:

The service provider wants to maintain exclusive administrative rights on OS level for the servers in his responsibility. Otherwise it would be impossible to guarantee any service level agreements (SLAs) and/or a certain contracted level of security.

The client requires servers belonging to him in a single, or at least in a consistent environment. This starts with a certain server naming convention, but also includes dns suffixes and namespaces.

On first sight, these requirements sound reasonable, but in respect of Microsoft Active Directory, they are somehow conflicting.

When we talk about exclusive administrative rights on OS level in an MS ADS environment, we need to separate the environments based on responsibility in different ADS forests. Otherwise, the owner of the root domain of the forest automatically holds the Enterprise Admin rights and can create domain and server admin user ids in all subdomains of the forest at will.

ADS trust relationship

ADS trust relationship

However, if we would split the servers in two different ADS forests, they would also live in different name spaces. Furthermore, we would need to find a solution on how users and services of one forest can access resources on the other forest. Well, this can be handled by trusts, but this would introduce a lot of complexity and would be a perfect source for all kind of problems.

And, there is another limitation about the two forest solution: There could be no domain controllers of the forest the client owns hosted in the cloud environment. And that is a real problem, especially when we consider that most clients would like to move most of their easy Windows workloads (like domain controllers) in the cloud.

There are no easy answers on that.

Another solution could be to reduce complexity by moving all Windows servers in the cloud and let the cloud provider manage not only the pure server OS but also the Active Directory Service. However, this would require the service provider to offer ADS management as a service, including all tasks that come along with that (like OUs, user ids, certificates, public keys).

Another possibility could be if one party does not insist on its exclusive administrative rights and accepts this as a risk. If ownership of the domains is with the service provider, the client can have the rights to operate his ADS settings and probably local server admin ids for the servers not in the cloud.

ADS single domain

ADS single domain

Summary

There is currently not a single solution for that problem. The client’s requirements and the service providers’ capabilities need to be considered when designing the future environment. In any case, this needs to be done carefully and well in advance to limit later surprises!