Embracing ARM in the Cloud Datacenter


Embracing ARM in the Cloud Datacenter



Cloud datacenter infrastructure is proliferating quickly, driven by exponential data growth and the hyperscale elasticity needed to accommodate rapid fluctuations in Cloud compute workloads and long-term capacity plans.

These scale-out, dense-server Cloud platforms are tiered across application-optimized software stacks, underpinned by flexible pools of homogeneous server, networking and storage infrastructure designed and configured to meet exacting performance-per-watt targets. For Cloud operators, bottom line revenue is sharply aligned to aggregate I/O, power consumption and associated costs. Space savings, compute density and thermal versatility play important roles in the equation.

In parallel, reflecting the growing popularity of open computing platforms, datacenter architects increasingly seek greater ‘openness’ at the server processing layer. They naturally want the freedom and flexibility to source processors from multiple vendors, giving them greater leverage to negotiate performance specifications, power profiles and pricing. They also seek a healthy, established software ecosystem that can rival the x86 software ecosystem for robustness and longevity.

They needn’t look far.

Enter ARM

The arrival of server-caliber 64-bit ARM technology marks a pivotal change in the Cloud infrastructure market, and will help fuel massive multi-vendor innovation around ARM processors’ core building blocks. With products leveraging ARM processors coming to market from AMD and others in variants from ‘brawny’ to ‘wimpy’, Cloud datacenter designers stand to benefit from an expanded array of processing performance options to meet highly targeted workload needs at competitive price points and with significant power efficiency. Allocated to the appropriate applications, ARM can provide long term strategic value.

Cold data storage is a good early candidate for ARM processors. Put simply, you don’t need high performance x86 processors to occasionally transfer small volumes of data from hard drives to the network. You need enough processing power to support CRC calculations and other data integrity measures, but x86 processing may be overkill. Server-caliber ARM processors can provide more than enough processing performance for these functions, with additional options for further maximizing I/O. AMD’s ARM-based solution, for example, integrates 14 SATA ports with integrated connectivity to disk drives.

Media streaming is another compelling use case for ARM. Here you need “good-enough” processing performance to ensure smooth multimedia content delivery, while balancing compute density and costs against hyperscale content and throughput volumes. Single chip ARM-based SOCs at the server processing layer will provide significant power efficiency and thermal benefits that enable low power, high density racks to stream at the requisite I/O with a low cost of ownership.

Web applications written in PHP or other high level programming language are good candidates for ARM. These apps aren’t machine dependent and can therefore run on ARM or x86, so any kind of LAMP stack application just runs, no code recompilation required. If you’re currently using FORTRAN or C++ it will be necessary to recompile, and this is where the software community will take time to validate and support ARM. But for an enterprise with a lot of end code written in Java, for instance, it will be able to get software up and running quickly on ARM.

Using ARM, big web properties with hyperscale compute resources – such as Google, Baidu, Facebook, Amazon Web Services, etc. – could translate even relatively small performance per watt gains into huge aggregate financial benefits. These entities are particularly well attuned to the value of open architectures, and are keenly interested in exploiting this trend at the processing layer.

For datacenter operators running high-concurrency workloads on dense front-end web servers, ARM-based processors can also counter some of the performance degradation issues that arise from the inefficient use of x86 processor caches, while simultaneously offering power efficiency and high server processor density. A significant number of datacenter workloads have inherently low instructions per clock (IPC) and high cache miss rates. For these workloads, ARM-based processors with smaller cores and caches can enable the equivalent performance as traditional server processors with large cores and caches, while minimizing power and area requirements.

The research and academia domain will take a leadership role in evaluating and implementing ARM for high-performance computing (HPC) applications, which, to date, have been dominated by high-performance x86 processors. Commodity ARM processors have already made inroads into this domain, as evidenced by the Barcelona Supercomputing Center’s (BSC) selection of low power ARM-based Samsung processors to build a new supercomputer. Via its Mont-Blanc initiative, BSC is endeavoring to build a 200 petaflop system that uses only 10 MW of power. Though ARM isn’t a likely candidate in the short term for matrix multiplication-intensive applications like neural networks or machine learning, it can be well suited for the numerous single-purpose HPC apps that don’t need floating point precision but do need improved power efficiency.

Network function virtualization (NFV) is another attractive use case for ARM-based processors, enabling networking/telecom service providers to simplify infrastructure deployment and management via a fully virtualized communications framework (Figure 1). With NFV, much of the intelligence currently built into proprietary, specialized hardware is accomplished instead with software running on general-purpose servers. By abstracting network devices such as routers and gateways within a virtual server, storage, and network environment, core network functionality can be scaled and managed with newfound agility. And by minimizing dependencies on customized, integrated hardware, operating system, middleware and application stacks, service providers can also accelerate development cycles and help avoid vendor lock-in.

Figure 1
Network function virtualization is becoming a widely used method of leveraging high-performance general-purpose processors to perform a variety of specialized network functions without having to use custom hardware.

ARM-based processors are particularly well suited for NFV in part because they can be tightly mated to network interface controllers in silicon. This results in NFV-optimized SOCs that consume minimal power while providing significant space-saving benefits. This latter attribute is particularly valuable for field-deployed telco equipment in dense metropolitan environments, where a pole-mountable, shoebox-sized system is far more desirable to deploy than a floor-standing, cabinet-sized system.

Working with companies like ENEA, AMD is helping pave the way for service providers to design and deploy NFV infrastructures such as Open Platform for NFV (OPNFV) on ARM. Leveraging advanced software-defined networking (SDN) capabilities, NFV holds the promise to meet exacting network performance, management flexibility, and cost requirements. Initiatives such as OpenDataPlane (ODP) open source project help the cause by providing open application programming environments that are portable across networking SOCs of various instruction sets and architectures.

ARM is also distinguished by its robust security capabilities, which stem from its extensive use in mobile device applications. The ARMv8 architecture takes these capabilities a step further, particularly with regard to encryption and decryption processing efficiency, while simultaneously introducing 64-bit ARM support for datacenter infrastructure already accustomed to 64-bit x86 computing. ARMv8 cryptography extensions incorporate instructions to accelerate Advanced Encryption Standard (AES) and Secure Hash Algorithm (SHA) cryptography algorithms. This native, hardware-based cryptography support avoids the performance, power and cost penalties introduced via pure software-based implementations, and can complement existing hardware accelerators.

ARM-based processors that are enabled with TrustZone technology can allow the establishment of key-protected zones of multiple computing devices that operate as unified security zones, backed by hardware based access controls. This ability to support multiple independent security profiles with a single core is a compelling feature for datacenter architects seeking to build a secure private network, implement advanced DRM schemes, or establish a trusted boot capability, for example.

Evolving with ARM

Momentum continues to grow for ARM in the Cloud datacenter, yielding significant industry advancements. Red Hat recently made its Red Hat Enterprise Linux (RHEL) Server for ARM Development Preview available to members of its ARM Partner Early Access Program, providing a common standards-based operating system for existing 64-bit ARM hardware. HP recently announced its first ARM-based servers, and AMD’s first 64-bit ARM implementations – high-performance, low power K12 cores, AMD Opteron A-Series ‘Seattle’ server processors, and AMD Embedded R-Series ‘Hierofalcon’ SoCs – are on the near horizon.

ARM penetration and propagation into the Cloud datacenter will be facilitated in part by its vast, established software ecosystem. Continued innovation in open source software enablement and complementary virtualization technologies will further advance the transition to Cloud datacenter infrastructure based on both x86 and ARM processor platforms, aligned appropriately to the applications that stand to gain the most performance and/or power benefits from each architecture.

Leveraging ARM at the processing layer, dense-server Cloud datacenters can scale out more flexibly and power efficiently, reducing total cost of ownership. Equally important, with ARM, datacenter architects now have the freedom to choose processors from a wider, more open, and more competitive marketplace.

Cambridge, UK
+44 (1223) 400 400