The future of SmartCloud Enterprise+ aka Cloud Managed Services

Because of the recent SoftLayer acquisition, SmartCloud Enterprise (SCE) was sunset January 31st 2014. There were just to many overlaps between the two offerings and SoftLayer seemed to me the more mature platform with more functionality. So, that SCE was stopped (and functionality like SCAS is merged into SoftLayer) was not much of a surprise.

But what does this mean for SCE+ – or, what should it mean for it?

First of all, it means a name change. As announced on this year’s Pulse (IBM’s cloud conference), SCE+ will be rebranded to Cloud Managed Services (CMS).

Second, the good news, CMS/SCE+ will stay with a strong roadmap at least until 2017 (well, the roadmap is specified until 2017, so it is very likely that CMS/SCE+ will stay even beyond 2017. But who knows what happens in IT in the next 5 years 🙂

But why two offerings anyway?

In an essence, SCE+ and SoftLayer are positioned the same way as SCE+ and SCE were originally positioned:

  • SCE+ for cloud enabled workloads
  • SoftLayer for cloud native workloads

CloudNativeVsCloudEnabled

To understand this positioning a little bit better, lets discuss the current capabilities of each offering and the current planned roadmap items:

SoftLayer

SoftLayer provides a highly flexible IaaS platform for cloud centric workloads. The underlying infrastructure is highly standardized and gives full control to the client for everything above the hypervisor including the operating system. Even if the client subscribes to one of the offered management options, it mainly means, SoftLayer is providing limited management tasks on a best can do basis, but without real SLAs and the client maintains full admin access to its instances.

The platform provides high flexibility, so all kind of possible setups can be implemented (by clients), but the responsibility for a given setup remains at the client, not at SoftLayer. In a nutshell, SoftLayer provides an IaaS environment with a very high degree of freedom and control for clients without taking over responsibilities for anything above the hypervisor.

These capabilities fit well for cloud centric or self-managed development (DevOps) workloads, but less for traditional high available workloads like SAP.

Cloud Managed Services (formally known as SmartCloud Enterprise+)

CMS on the other hand was designed and built to meet exactly the requirements of high available, managed production workloads originally hosted in client’s datacenters. CMS provides SLAs and technologies for accomodating high available workloads like clustering and disaster recovery (R1.4). While D/R setups can also be created on SoftLayer, the client must design, build and run them, but can not receive them as a service. This is the main differentiator in the SoftLayer / CMSpositioning. CMS is less flexible above the hypervisor, as it provides managed high available operating systems as a Service with given SLAs. To meet these SLAs, standards must be met and the underlying infrastructure must be technically capable to provide these SLAs (Tier 1 storage).

Due to the guaranteed service levels on the OS layer, this is IBM’s preferred platform for PaaS offerings of rather traditional software stacks like SAP or Oracle applications.

Summary

There are a lot of usecases were SoftLayer does not fit and CMS is the answer to fulfull the requirements. Based on the very clear distinct workloads for SoftLayer and CMS, there are no reasons to think about a CMS retirement.

IBM SmartCloud Enterprise+ 1.3 in a nutshell

On November 19, 2013, IBM SmartCloud Enterprise+ (SCE+) version 1.3 was released. While every new SCE+ release has brought some interesting improvements, I’m particularly excited about 1.3. Tons of new features and improvements were implemented, making it worth having a closer look at the highlights of this version of SCE+.

Completely new portal. Lets be polite, but the old portal had major room for improvement. The new portal was completely rewritten and now meets requirements clients have for such an interface.

New virtual machine (VM) sizes. New standard configurations were introduced including Jumbo for x86 VMs. However, what is even more important are the new maximum possible configurations for a single VM, which can be:

  • Up to 64 vCPUs
  • Up to 128 GB RAM
  • Up to 48 TB storage

These new configurations can enable more workloads to run on SCE+.

Clustering. Even more workloads can now be enabled because of the new clustering options. Clients can choose between operating system (OS) based clustering (for all on SCE+ supported operating systems and platforms) or simple anti-collocation which enables clients to cluster VMs on the application level. Anti-collocation means that two VMs will not be provisioned on the same physical host to ensure availability of at least one node in case a host goes down.

It is important to mention that service level agreements (SLAs) are still based on the individual VM, so there is no aggregated SLA for a cluster.

Anti-collocation (and clustering) does not guarantee that the physical hosts are based in different physical buildings. Even in dual-site based SCE+ data centers, the different nodes of cluster might still be located on one site. This could potentially be removed in a later release of SCE+.

Unmanaged instances. Clients can request unmanaged virtual machines on SCE+ with the following limitations:

  • Managed VMs cannot be transformed to unmanaged ones (or the other way around)
  • Clustering is not available on unmanaged VMs
  • Unmanaged VMs must be based on available SCE+ images; there is still no way to import custom images
  • Migration services are not available for unmanaged instances

Migration services. Migration services for x86 and IBM Power System platforms can now be contracted as an optional part of an SCE+ contract.

Active Directory integration. SCE+ now supports the integration of complex Microsoft Active Directory setups including a client-specific isolated domain or even joining (managed) VMs to the client’s active directory (AD) forest.

Database and middleware alerting and management. Beside the management of the operating system, clients can now choose database and middleware management as an option in two flavors:

  • Alerting only. The client maintains responsibility, but will be alerted by an automated monitoring system in case of failure.
  • Management. IBM provides management for selected database and middleware products (mainly IBM DB2 database software, MS SQL, Oracle, Sybase and IBM WebSphere products).

Custom hostnames and FQDN. Custom hostnames and full qualified domain names (FQDN) can now be chosen during the provisioning of a server VM.

Load Balancer as a service. Beside the currently available virtual software load balancer (vLBF), load balancing as a service is also available. The new service is based on industry leading hardware appliances and provides features like SSL offloading. Currently load balancing is only supported in a single site.

Increased number of security zones. Although three security zones remain standard, clients can request up to 12 security zones if required by the design of their environment when onboarding. Additional security zones can also be requested after boarding through an Request for Service (RFS), but provisioning is then subject to availability. However, there is a hard limit of 12 security zones per client.

Summary

SCE+ 1.3 is a milestone in terms of features and new possibilities. It enables a lot more workloads to be supported on SCE+ and SCE+ based offerings like IBM SmartCloud for SAP (SC4SAP) and IBM SmartCloud for Oracle Applications (SC4Oracle).

IBM SmartCloud Enterprise+ disaster recovery considerations for DB2

Disaster recovery on IBM SmartCloud Enterprise + (SCE+) is usually referring to infrastructure based disaster recovery. Disaster recovery (DR) solutions on the infrastructure level intend to replicate while virtual machines (VMs) include all data from the main production site to the DR site. The advantage of that kind of solution approach is that if a disaster occurs, an exact copy of the production environment including all OS settings and patches is available on the DR site. The VMs on the DR site can than be started and take over the load quite seamlessly (beside the nasty reconfiguration of site specific network settings like IP ranges).

It is planned to provide an infrastructure as a service (IaaS) DR solution as part of the SmartCloud Enterprise + offering in a later release.

Although IaaS DR solutions do their job well, they are rather expensive and complex. Mirroring complete virtual machine images does not only cost a lot of storage space but also appropriates network bandwidth and traffic. So, the question that solution architects should ask is if IaaS based DR is really required!

In many scenarios, a more cost effective and less complex solution is, to consider application level disaster recovery. Let’s take DB2 as an example for many middleware or applications that provide the ability to be either clustered on application level or keep a cold standby aside. DB2 allows us to leverage its HADR function to collect all database update operations and queue them for distribution to other nodes.

Those collected updates can be sent over the network, using a variety of technologies or protocols. The interval between send operations depends on the recovery point objective (RPO) target. A shorter interval between send operations provides a better RPO but might generate more data traffic.

DB2 HADR setup

DB2 HADR setup

Another advantage of such a setup is the ability to provide fail over tests easily. Because the DR systems are up and running all the time, their proper function can be tested at any time by just accessing them.

However, the drawback of such an application level DR solution is the fact that the standby system is required to be up and running all the time and it must be ensured that all updates to the primary system itself (like changed configurations and software patching) are also done on the standby servers.

Summary

 Application level disaster recovery is not the solution for each and every scenario, but can be a valid, cost effective and less complex alternative. Sometimes a combination of infrastructure and application level DR might be the best solution for an environment.