Focus Areas

EP Analytics’ expertise and tools can assist enterprises in maximizing the return-on-investment in HPC systems. We assist clients with Performance Characterization, Energy Efficiency, System Design and Emerging Technology Integration.

Performance Prediction

Energy Efficiency

From 2005 to 2010, the electricity costs of operating HPC systems and server farms/datacenters increased by 36% (from 7 Giga-Watts to almost 10 Giga-Watts total in the United States), according to a 2011 report on electricity growth within datacenters [2]. This growth was actually dampened by the global recession; it had been forecast by the Environmental Protection Agency to double and it is very likely to double in the 2010-2015 timeframe as the US economy continues to rebound [3]. Utility costs and constraints on power delivery are limiting the expansion of large scale computing systems just as society and commercial enterprises are coming to rely more heavily on such systems in the domains of High Performance Computing (HPC), Cloud Computing, and “Hyperscale” computing. Developing more energy efficient modes of operation for large-scale server installations with minimal performance disturbance will reduce the cost of operation, reduce carbon emission, and enable the same computation to be completed with less energy, helping usher in an era of “Green Computing.”

To bring green computing to reality, EP Analytics, Inc. envisions an Energy Efficiency Management Platform (E2MP), a power-aware, green computing technology that can enact performance-neutral power and reliability management policies on high performance computing (HPC) centers. E2MP’s design allows it to take a system-wide, holistic view of power and reliability management and dynamically make fine-grain power and thermal adaptations at the compute-node-level and at the facility-level in response to the behavior of the applications running in the facility. E2MP continuously monitors a number of important metrics, including chip temperature, instantaneous per-component power draw and ambient room temperature, then relates those metrics via predictive models to specific application software behavior (e.g., quantity of main-memory traffic) and uses those relationships to steer systems towards better energy efficiency and reliability. This is a transformative vision that requires an integrated and symbiotic hardware/software stack for managing energy and has the potential to significantly reduce the overall energy consumption of servers and server-farms/datacenters.


Related Papers & Presentations

October, 2015

Building Blocks for a System-wide Power and Thermal Management Framework

Abstract: Next generation Exascale systems face the difficult challenge of managing the power and thermal constraints that come from packaging more transistors into a smaller space while adding more processors into a single system. To combat this, HPC center operators are looking for methodologies to save operational energy. Energy consumption in an HPC center is governed by the complex interactions between a number of different components. Without a coordinated and system-wide perspective on reducing energy consumption, isolated actions taken on one component with the intent to lower energy consumption can actually have the opposite effect on another component, thereby canceling out the net effect. For example, increasing the setpoint (or ambient temperature) to save cooling energy can lead to increased compute-node fan power and increased chip leakage power. This paper presents the building blocks required to develop and implement a system-wide framework that can take a coordinated approach to enact thermal and power management decisions at compute-node (e.g., CPU speed throttling) and infrastructure levels (e.g., selecting optimal setpoint). These building blocks consist of a suite of models that inform the thermal and power footprint of different computations, and present relationships between computational properties and datacenter operating conditions.

Ananta Tiwari, Adam Jundt, William A. Ward, Jr.†, Roy Campbell†, and Laura Carrington
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: ICPADS (International Conference on Parallel and Distributed Systems), 2015. Available upon request.

September, 2014

Efficient Speed (ES): Adaptive DVFS and Clock Modulation for Energy Efficiency

Abstract: Meeting the 20MW power envelope sought for exascale is one of the greatest challenges in designing those class of systems. Addressing this challenge requires over-provisioned and dynamically reconfigurable system with fine-grained control on power and speed of the individual cores. In this paper, we present EfficientSpeed (ES), a library that improves energy efficiency in scientific computing by carefully selecting the speed of the processor. The run-time component of ES adjusts the speed of the processor (via DVFS and clock modulation) dynamically while preserving the desired level of the performance. These adjustments are based on online performance and energy measurements, user-selected policies that dictate the aggressiveness of adjustments, and user-defined performance requirements. Our results quantify the best energy savings that can be achieved by controlling the speed of the processor, with today’s technology, at the cost of negligible performance degradation. We then demonstrate that ES is effective in automatically calibrating the speed of execution in real applications, saving energy and meeting the desired performance goal. We evaluate ES on GAMESS, an abinitio quantum chemistry package. We show that ES respects the stipulated 5% performance loss bound and achieves 16% decrease in energy required to complete the execution while running with a power draw that is 18% lower.

Pietro Cicotti, Ananta Tiwari, and Laura Carrington

Accepted to: CLUSTER, 2014. Available upon request.

September, 2014

Characterizing the Performance-Energy Tradeoff of Low-Power ARM Processors in HPC

Abstract: Deploying large numbers of small, low power cores has been gaining traction recently as a design strategy in high performance computing (HPC). The ARM platform that dominates the embedded and mobile computing segments is now being considered as an alternative to high-end x86 processors that largely dominate HPC because peak performance per watt may be substantially improved using off-the-shelf commodity processors. In this work we methodically characterize the performance and energy of HPC computations drawn from a number of problem domains on current ARM and x86 processors. Unsurprisingly, we find that the performance, energy and energy-delay product of applications running on these platforms varies significantly across problem types and inputs. Using static program analysis we further show that this variation can be explained largely in terms of the capabilities two processor subsystems: floating point/SIMD and the cache/memory hierarchy, and that static analysis of this kind is sufficient to predict which platform is best for a particular application/input pair. In the context of these findings, we evaluate how some of the key architectural changes being made for upcoming 64-bit ARM platforms may impact HPC application performance.

Michael Laurenzano, Ananta Tiwari, Adam Jundt, Joshua Peraza, Laura Carrington, William Ward, Jr.†, and Roy Campbell†
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: Euro-Par, 2014. Available at Springer.

August, 2014

Adaptive Model-Driven Facility-Wide Management of Energy Efficiency and Reliability

Abstract: We present the blueprint for the Energy Efficiency Management Platform (E2MP), a power-aware, green computing technology that can enact performance-neutral power and reliability management policies on high performance computing (HPC) centers. E2MP’s design allows it to take a system-wide, holistic view of power and reliability management and dynamically make fine-grain power and thermal adaptations at the compute node level and at the facility-level in response to the behavior of the applications running in the facility.
E2MP continuously monitors a number of important metrics, including chip temperature, instantaneous per-component power draw and ambient room temperature, then relates those metrics via predictive models to specific application software behavior (e.g., quantity of main-memory traffic) and uses those relationships to steer systems towards better energy efficiency and reliability.

Ananta Tiwari, Michael Laurenzano, Adam Jundt, William Ward, Jr.†, Roy Campbell†, and Laura Carrington
†High Performance Computing Modernization Program, U.S. Dept. of Defense

Accepted to: MODSIM (Workshop on Modeling & Simulation of Systems and Applications. Workshop sponsored by the U.S. Department of Energy, Office of Advanced Scientific Computing Research.), 2014. Available upon request.

Want to know more about our services and expertise? Contact Us Today