Recently someone asked me, "What are the consequences of not adopting a DCIM solution for my data center? Does it really matter?" Good question; here's my perspective:
In its 2011 Data Center Industry Census, Datacenter Dynamics reported that data center operators still cite energy costs and availability as their top concerns. This insight--coupled with the fact that enterprises will continually compete to grow market share with more web services, more apps, and more reach (e.g., emerging markets)--indicates that data center operators need to run their data centers much more efficiently than ever to keep up with escalating business demands.
Therefore, the disadvantages of not using a DCIM solution becomes evident when enterprises can't compete because they are taxed by data center inefficiencies that curb how quickly and adeptly a business can grow; for example:
• Unreliable web services that frustrate customers
• Limited or late-to-market apps that hinder the workforce
• Unpredictable data center operating costs that squeeze profitability
This is where OpenData software by Modius can help. OpenData provides both the visibility of and real-time decision support for data center infrastructure, so you can better manage availability and energy consumption. OpenData helps data center operators optimize the performance of their critical infrastructure, specifically the entire power and cooing chain from the grid to the server. For instance, with OpenData, you can arrive at a power usage baseline from which comparisons can be made to determine the effectiveness of optimization strategies. OpenData also provides a multi-site view to manage critical infrastructure performance as an ecosystem vs. isolated islands of equipment--from a single pane of glass. And, because OpenData monitors granular data for the entire power and cooling chain, you can validate--or invalidate--in near real-time whether day-to-day tactical measures to improve data center performance are actually working.
We’re proud to announce Modius will be joining Intel at its Gartner Data Center Conference booth at the annual show, taking place Dec. 5-8 at Caesars Palace in Las Vegas.
Earlier this year, Modius and Intel announced their plans for integrating OpenData Enterprise Edition with Intel DCM to capture power and temperature data from a broad range of servers with Intel processors.
Intel DCM enhances our OpenData platform with increased visibility and analytics of server-level performance by providing thermal and energy intelligence from the CPU and power supply. This helps both IT and facility managers better understand and manage the power consumption at the rack and server levels, as well as the cooling and airflow distribution requirements of their computing sources.
Gartner show attendees will be able to experience first-hand how our OpenData® Enterprise Edition integrates Intel® Data Center Manager (Intel DCM) to capture and manage server-level power and thermal data. By combing server-level data with performance data captured from other infrastructure equipment (e.g. CRACs and Air Handlers), the OpenData system will allow data center managers to analyze many rack-level challenges in the data center, including cooling and airflow distribution—which can be impacted by system power density, variable system processing loads, asset turnover and malfunctioning or broken equipment.
The integration of Intel DCM data-capture capabilities directly into Modius OpenData enables a set of functionality not previously available before, including:
Device- and rack-level capacity analysis with real-time alarming for power deviations
Device- and rack-level temperature analysis with real-time alarming for temperature deviations
End-to-end visibility of the power chain from the UPS to individual servers
Floor-level cold spot and hot spot identification and remediation
The benefits to data center managers from this technology integration include:
Improved data center efficiency and lower power costs
Expanded data center capacity
Early warning of data center problems and potential outages
If you’re going to be in Vegas at the show, please stop by Intel’s booth (booth #J) and see us. We’d love to show you, firsthand, how OpenData Enterprise Edition with Intel DCM can help you better manage power and thermal management at the rack and server.
Donald Klein, VP Marketing & Business Development
Modius customer, Dave Shroyer of NetApp, will be a featured panelist at the Silicon Valley Leadership Group’s Annual Data Center Efficiency Summit, taking place today at the IBM Almaden Research Center in San Jose.
Shroyer, a work place resources (WPR) manager with NetApp’s Integration & Technology Group, joins executives from Triton, IBM and Cisco in presenting “From the Killer App to the Chiller Tap: Holistic Management of IT and Facilities.”
The panel discussed the fact that the management and monitoring of IT and facilities increasingly needs to be done in a coordinated fashion in order to achieve the maximum IT performance and the maximum facilities efficiency. Historically, separate IT management tools have managed IT's killer apps, and separate facilities management tools have managed power and cooling. A new generation of monitoring and management tools is emerging that source data from everywhere in the data center – apps, servers, switches, storage, UPSs, CRAC units, air handlers and chillers – that promise to deliver a holistic approach for managing the data center as an integrated unit.
Shroyer and his colleagues shared their insights on exactly how far away we are from such an ideal end-state, as well as when we will we be able to report data center metrics in terms of billions of SpecWeb per GigaWatt. They discussed their own integration of such tools, how they use them and where these promising data center management software tools are headed.
Marina Thiry, Director of Product Marketing
Different isn't always better, but better is always different.
We are sometimes asked how Modius OpenData is different than a BMS. “Why should I consider Modius OpenData when I already have a BMS?”
In short, the answer comes down to using the right tool for the job. A BMS is installed at a large building to monitor and control the environment within that building, for example: lighting, ventilation, and fire systems. It helps facility managers better manage the building’s physical space and environmental conditions, including safety compliance. As concerns about energy conservation have gained critical mass, feature enhancements to BMSs have evolved to become more attuned to energy efficiency and sustainability. However, this doesn’t make a BMS a good tool for data center optimization any more than a scissors can be substituted for a scalpel.
Unlike a BMS, OpenData software by Modius was designed to uncover the true state of the data center by continually measuring all data points from all equipment, and providing the incisive decision support required to continually optimize infrastructure performance. Both facility and IT managers use OpenData to gain visibility across their data center operations, to arrive at an energy consumption baseline, and then to continually optimize the critical infrastructure of the data center—from racks to CRACs. The effectiveness of the tool used for this purpose is determined by the:
- operational intelligence enabled by the reach and granularity of data capture, accuracy of the analytics, and the extensibility of the feature set to utilize the latest data center metrics
- unified alarm system to mitigate operational risk
- ease-of-use and flexibility of the tool to simplify the job
To illustrate, following are the top three differences between OpenData and a typical BMS that make OpenData the right tool to use for managing and optimizing data center performance.
- OpenData provides the operational intelligence, enabled by the reach and granularity of data capture, accuracy of the analytics, and the extensibility of the feature set, to utilize the latest data center metrics. Modius understands that data center managers don’t know what type of analysis they will need to solve a future problem. Thus, OpenData provides all data points from all devices, enabling data center managers to run any calculation and create new dashboards and reports whenever needed. This broad and granular data capture enables managers to confidently assess their XUE, available redundant capacity, and any other data center metric required for analysis. Moreover, because all of the data points provided can be computed at will, the latest data center metrics can be implemented at any time. In contrast, a BMS requires identifying a set of data points upon its installation. Subsequent changes to that data set require a service request (and service fee), which means that even if the data is collected in real-time, it may not be available to you when needed. Thus, the difficulty and expense of enabling the networked communications and reporting for real-time optimization from a BMS is far beyond what most would consider a “reasonable effort” to achieve.
- OpenData provides a unified alarm system to mitigate operational risk. With OpenData, end-users can easily set thresholds on any data point, on any device, and edit thresholds at any time. Alarms can be configured with multiple levels of escalation, each with a unique action. Alarms can be managed independently or in bulk, and the user interface displays different alarm states at a glance. In contrast, with a typical BMS integration the system only reports alarms native to the device—i.e., it doesn’t have access to alarms other than its own mechanical equipment. When data center managers take the extra steps to implement unified alarming (e.g., by feeding into the BMS the relay outputs or OPC server-to-server connections from the various subcomponents), they will often only get the summary alarms as a consequence of the cost charged per point and/or the expense of additional hardware modules and programming services to perform the communication integration with third-party equipment. Thus, when personnel receive an alarm, they have to turn to the console of the monitoring system that “owns” the alarming device to understand what is happening.
- Ease of use and flexibility to simplify the job. OpenData is designed to be user-driven: it is completely configurable by the end-user and no coding is required, period. Learning how to use OpenData takes approximately a day. For example, OpenData enables users to add new calculations, adjust thresholds, add and remove equipment, and even add new sites. In contrast, using a BMS to pro-actively make changes is virtually impossible to administer independently. Because the BMS is typically one component of a vendor’s total environmental control solution, the notion of “flexibility” is constrained to what is compatible with the rest of their solution offerings. Consequently, a BMS adheres to rigid programming and calculations that frequently require a specialist to implement changes to the configuration, data sets, and thresholds.
In summary, the only thing constant in data centers is flux. Getting the right information you need—when you need it—is crucial for data center up-time and optimization. For the purpose of performance monitoring and optimization, using a BMS is more problematic and ultimately more expensive because it is not designed for broad and granular data capture, analysis and user configuration. Ask yourself: What would it take to generate an accurate PUE report solely using a BMS?
The following table summarizes key differences between OpenData and a BMS, including the impact to data center managers.
 The “X” refers to the usage effectiveness metric de jour, whether it is PUE, pPUE, CUE, WUE, or something new.
There has been plenty of discussion of PUE and related efficiency/effectiveness metrics of late (Modius PUE Blog posts: 1, 2, 3). How to measure them, where to measure, when to measure, and how to indicate which variation was utilized. Improved efficiency can reduce both energy costs and the environmental impact of a data center. Both are excellent goals, but it seems to me that the most common driver for improving efficiency is a capacity problem. Efficiency initiatives are often started, or certainly accelerated, when a facility is approaching its power and/or cooling limits, and the organization is facing a capital expenditure to expand capacity.
When managing a multi-site enterprise, understanding the interaction between capacity and efficiency becomes even more important. Which sites are operating most efficiently? Which sites are nearing capacity? Which sites are candidates for decommissioning, efficiency efforts, or capital expansion?
For now, I will gracefully skip past the thorny questions about efficiency metrics that are comparable across sites. Let’s postulate for a moment that a reasonable solution has been achieved. How do I take advantage of it and utilize it to make management decisions?
Consider looking at your enterprise sites on a “bubble chart,” as in Figure 1. A bubble chart enables visualization of three numeric parameters in a single plot. In this case, the X axis shows utilized capacity. The Y axis shows PUE. The size of each bubble reflects the total IT power load.
Before going into the gory details of the metrics being plotted, just consider in general what this plot tells us about the sites. We can see immediately that three sites are above 80% capacity. Of the three, the Fargo site is clearly the largest, and is operating the most inefficiently. That would be the clear choice for initiating an efficiency program, ahead of even the less-efficient sites at Chicago and Orlando, which are not yet pushing their capacity limits. One might also consider shifting some of the IT load, if possible, to a site with lower PUE and lower utilized capacity, such as Detroit.
In this example, I could have chosen to plot DCiE (Data Center Infrastructure Efficiency) vs. available capacity, rather than the complementary metrics PUE vs. utilized capacity. This simply changes the “bad” quadrant from upper right to lower left. Mainly an individual choice.
Efficiency is also generally well-bounded as a numeric parameter, between 0 and 100, while PUE can become arbitrarily large. (Yes, I’m ignoring the theoretical possibility of nominal PUE less than 1 with local renewable generation. Which is more likely in the near future, a solar data center with a DCiE of 200% or a start-up site with a PUE of 20?) Nonetheless, PUE appears to be the metric of choice these days, and it works great for this purpose.
Whenever presenting capacity as a single number for a given site, one should always present the most-constrained resource. When efficiency is measured by PUE or a similar power-related metric, then capacity should express either the utilized power or cooling capacity, whichever is greater. In a system with redundancy, be sure to that into account.
The size of the bubble can, of course, also be modified to reflect total power, power cost, carbon footprint, or whatever other metric is helpful in evaluating the importance of each site and the impact of changes.
This visualization isn’t limited to comparing across sites. Rooms or zones within a large data center could also be compared, using a variant of the “partial” PUE (pPUE) metrics suggested by the Green Grid. It can also be used to track and understand the evolution of a single site, as shown in Figure 2.
This plot shows an idealized data-center evolution as would be presented on the site-performance bubble chart. New sites begin with a small IT load, low utilized capacity, and a high PUE. As the data center grows, efficiency improves, but eventually it reaches a limit of some kind. Initiating efficiency efforts will regain capacity, moving the bubble down and left. This leaves room for continued growth, hopefully in concert with continuous efficiency improvements.
Finally, when efficiency efforts are no longer providing benefit, capital expenditure is required at add capacity, pushing the bubble back to the left.
Those of you who took Astronomy 101 might view Figure 2 as almost a Hertzsprung-Russell diagram for data centers!
Whether tracking the evolution of a single data center, or evaluating the status of all data centers across the enterprise, the Data Center Performance bubble chart can help understand and manage the interplay between efficiency and capacity.
Modius OpenData has recently reached an intriguing milestone. Over half of our customers are currently running the OpenData® Enterprise Edition server software on virtual machines (VM). Most new installations are starting out virtualized, and a number of existing customers have successfully migrated from a hard server to a virtual one.
In many cases, some or all of the Collector modules are also virtualized “in the cloud,” at least when gathering data from networked equipment and network-connected power and building management systems. It’s of course challenging to implement a serial connection or tie into a relay from a virtual machine. It will be some time before all possible sensor inputs are network-enabled, so 100% virtual data collection is a ways off. Nonetheless, we consider greater than 50% head-end virtualization to be an important achievement.
This does not mean that all those virtual installations are running in the capital-C Cloud, on the capital-I Intranet. Modius has hosted trial proof-of-concept systems for prospective customers on public virtual machines, and a small number of customers have chosen to host their servers “in the wild.” The vast majority of our installations, both hardware and virtual, are running inside the corporate firewall.
Many enterprise IT departments are moving to a virtualized environment internally. In many cases, it has been made very difficult for a department to purchase new actual hardware. The internal “cloud” infrastructure allows for more efficient usage of resources such as memory, CPU cycles, and storage. Ultimately, this translates to more efficient use of electrical power and better capacity management. These same goals are a big part of OpenData’s fundamental purpose, so it only makes sense that the software would play well with a virtualized IT infrastructure.
There are two additional benefits of virtualization. One is availability. Whether hardware or virtual, OpenData Collectors can be configured to fail-over to a secondary server. The database can be installed separately as part of the enterprise SAN. If desired, the servers can be clustered through the usual high-availability (HA) configurations. All of these capabilities are only enhanced in a highly distributed virtual environment, where the VM infrastructure may be able to dynamically re-deploy software or activate cluster nodes in a number of possible physical locations, depending on the nature of the outage.
Even without an HA configuration, routine backups can be made of the entire virtual machine, not simply the data and configurations. In the event of an outage or corruption, the backed-up VM can be restored to production operation almost instantly.
The second advantage is scalability. Virtual machines can be incrementally upgraded in CPU, memory, and storage capabilities. With a hardware installation, incremental expansion is a time-consuming, risky, and therefore costly, process. It is usually more cost-effective to simply purchase hardware that is already scaled to support the largest planned installation. In the meantime, you have inefficient unused capacity taking up space and power, possibly for years. On a virtual machine, the environment can be “right sized” for the system in its initial scope.
Overall, the advantages of virtualization apply to OpenData as with any other enterprise software. Lower up-front costs, lower long-term TCO, increased reliability, and reduced environmental impact. All terms that we at Modius, and our customers, love to hear.
Speaking to a standing-room-only audience at the 2011 Uptime Symposium, Modius CEO Craig Compiano talked about the evolution of data center maturity: keeping pace with business needs. He introduced Modius’ Data Center Optimization Roadmap, which illustrates how optimization capabilities can be logically divided and accomplished in incremental steps. These steps deliver tangible benefits that continue to be leveraged as data center capabilities mature and become more relevant to the enterprise it supports.
The value of this roadmap immediately resonates with anyone who has worked on a long-term IT project—like managing a data center, for instance. All too often failures occur because the project team did not have the foresight to discern how their technology implementation might evolve over time. Consequently, early investments become outmoded in about 18 months, and the stakeholders are confronted with rapidly diminishing returns on their investment, if they are ever fully realized at all.
Instead of thinking about adding functionality and capacity in terms of incremental hardware (e.g., adding more servers), consider maximizing the capacity of your current investment, such that resources are more economically utilized within the existing infrastructure (e.g., identifying stranded capacity). Let’s take a closer look at the Data Center Optimization Roadmap to see how this can be accomplished.
Click Image to Zoom
Modius sees the operational maturity of the data center in three stages. At each stage, the operational maturity of the data center increases with the level of strategic relevance it provides to the enterprise.
Stage 1 is device-centric: Continuous optimization requires gaining visibility of data center assets—from racks to CRACs—including those assets at different sites. Whether assets are being monitored from across the hall or across the continent, granular visibility into each device is necessary to understand how resources are being utilized by themselves and within the larger system that is the network.
The only way to accomplish this is by measuring where, when, and at what rate power is being consumed. Device-level visibility enables us to eke every kW of power, to maintain safe yet miserly cooling levels, and to ensure every square foot of the data center floor is effectively being used. (Walking around the data center floor and spot checking metered readings is no longer effective.)
With this device-level insight, you can identify tactical ways of maximizing utilization or reducing energy consumption. And, as a result of more efficient use of resources, businesses can defer capital expenses.
Stage 2 is business user-centric: The second stage in advancing data center optimization requires the alignment of data center performance information with the business user’s requirements. (By business users, we mean either internal users, such as a department or a cost center at an enterprise, or external users, such as the customers at a co-lo facility.) This level of optimization can only be achieved once the mechanisms are in place to ensure visibility of data center assets by their end users, per Stage 1. For example, monitoring and decision support tools must have the ability to logically sort and filter equipment by business groups, rather than the physical location of equipment in a data center (e.g., racks, rows or zones). Likewise, these tools must be flexible to accommodate business-driven time periods, rather than time periods convenient only to data center operations.
By enabling this business user-centric view—that is, by making data center operational intelligence meaningful to the end-users of the data center—IT and Facility personnel can now engage business users in a productive dialog about how their business requirements impact data center resources. Now, data center managers can begin to optimize throughput and productivity in a way that is meaningful to the business, which significantly advances the strategic relevance of the data center to the enterprise.
Stage 3 is enterprise-centric: The third stage in advancing data center optimization requires making available data center operational intelligence with enterprise business intelligence (BI). We are not suggesting anything complicated or unwieldy, only that by including data center performance and resource data, enterprises can provide a more complete picture of the true cost of doing business. By aligning “back end” data center operations with “front end” enterprise business processes, we can understand how market pressures impact the infrastructure, which in turn helps improve business continuity and mitigate risk.
For example, product and marketing managers can now have visibility into the data center resources supporting their web services. They can drill down to their back-office systems and account for the commissioning and decommissioning servers, plus the energy and human capital required to run and manage those services. Another example: supply chain managers or sourcing managers can now see where and at what rate energy is being consumed across data center operations, enterprise-wide. This enables them to make better decisions about where to source energy, in addition to forecasting how much is needed.
These improvements are evidenced by enterprise agility—enterprises that can rapidly respond to a dynamic market and economic pressures. It is at this stage of maturity in data center operations that a data center can have a profound impact on whether a business can compete and win in the marketplace.
Different isn't always better, but better is always different.
Marina Thiry, Director of Product Marketing
Early last year, Gartner published a short research note that has since had an unexpectedly significant impact on the vocabulary of data center management professionals. Prior to March 2010, which is when Dave Cappuccio published “Beyond IT,” the term ‘data center infrastructure management’ (or DCIM) was rarely ever used. Instead, the most common terms describing software to manage power and cooling infrastructure were ‘data center monitoring’ or ‘data center asset tracking’ or ‘BMS for data center.’ We know this, because here at Modius we use an inbound analytics application to track the search terms by internet users to find our web site.
By end of last month (April 2011), the simple search term DCIM has outpaced all of them! Go to any web site keyword tracking service (e.g. www.hubspot.com) and see for yourself. In April, there were over 10,000 queries for DCIM on one of the major search engines alone. As a longtime software vendor for the enterprise, I find it hard to remember ever seeing a new title for a software category emerge so suddenly and so prominently. Now everyone uses it. Every week it seems there is a new vendor claiming DCIM credentials.
From our perspective here at Modius, we find this amusing, because we have been offering the same kind of functionality from our flagship software product OpenData since long before the term DCIM has been around. Nonetheless, we know find ourselves in a maelstrom of interest as this new software label gains more buzz and credibility. So what is exactly is DCIM?
The graphic below is my summary of the major points from the original research note. Note that DCIM was originally positioned as filling a gap between the major categories of IT Systems Management and Building Management or Building Automation Systems.
As more and more software vendors have jumped on the DCIM bandwagon, we have noticed that 4 distinct sub-categories, or segments, have emerged:
- Monitoring tools for centralized alarm management and real-time performance tracking of diverse types of equipment across the power and cooling chains (e.g., Modius OpenData)
- Calendar-based tools for tracking equipment lifecycles (i.e., particularly with respect to recording original shipment documentation, maintenance events, depreciation schedules, etc.)
- Workflow tools specifically designed around data center planning and change management (e.g., “If I put this server in this rack, what is the impact on my power & cooling systems?”)
- Tools for control and automation of cooling sub-systems (e.g., usually computer room air conditioning systems or air-handling units)
At Modius, we focus on segment #1. We find the challenges to connecting to a diverse population of power and cooling equipment from a range of vendors is a difficult task in and of itself. Not only are the interface challenges non-trivial (e.g., translation across multiple communication protocols), but the data storage and management problems associated with collecting this much data are also significant.
Moreover, we are puzzled at the number of segment #3 applications which position themselves as DCIM tools, yet don’t have any real-time data capabilities of any significance. We believe for those systems to be the most effective, they really need to leverage a monitoring tool in segment #1.
So, in conclusion--and not surprisingly--we define the DCIM software category as a collection of different types of tools for different purposes, depending on your business objectives. But one point we like to stress to all of our customers is that we believe that real-time performance tracking is the foundation of this category, and we are looking to either build out new capabilities over time, or to partner with other software companies that are pursuing other areas of DCIM functionality. After all, improving the performance of a facility is the ultimate end goal, and we before we do anything else, we can’t manage what we can’t measure.
PUE in an Imperfect World
Last week I started discussing the instrumentation and measurement of PUE when the data center shares resources with other facilities. The most common shared resource is chilled water, such as from a common campus or building mechanical yard. We looked at the simple way to allocate a portion of the power consumed by the mechanical equipment to the overall power consumed by the data center.
The approach there assumed perfect sub-metering of both the power and chilled water, for both the data center and the mechanical yard. Lovely situation if you have it or can afford to quickly achieve it, but not terribly common out in the hard, cold (but not always cold enough for servers) world. Thus, we must turn to estimates and approximations.
Of course, any approximations made will degrade the ability to compare PUEs across facilities--already a tricky task. The primary goal is to provide a metric to measure improvement. Here are a few scenarios that fall short of the ideal, but will give you something to work with:
- Can’t measure data-center heat load, but have good electrical sub-metering. Use electrical power as a substitute for cooling load. Every watt going in ends up as heat, and there usually aren’t too many people in the space routinely. Works best if you’re also measuring the power to all other non-data-center cooled space. The ratio of the two will get you close to the ratio of their cooling loads. If there are people in a space routinely, add 1 kWh of load per head per 8-hr day of light office work.
- Water temperature is easy, but can’t install a flow meter. Many CRAHs control their cooling power through a variable valve. Reported “Cooling Load” is actually the percentage opening of the valve. Get the valve characteristics curve from the manufacturer. Your monitoring system can then convert the cooling load to an estimated flow. Add up the flows from all CRAHs to get the total.
- Have the head loads, but don’t know the mechanical yard’s electrical power. Use a clamp-on hand meter to take some spot measurements. From this you can calculate a Coefficient of Performance (COP) for the mechanical yard, i.e., the power consumed per cooling power delivered. Try to measure it at a couple of different load levels, as the real COP will depend on the % load.
- I’ve got no information about the mechanical yard. Not true. The control system knows the overall load on the mechanical yard. It knows which pumps are on, how many compressor stages are operating, and whether the cooling-tower fan is running. If you have variable-speed drives, it knows what speed they’re running. You should be able to get from the manufacturer at least a nominal COP curve for the tower and chiller and nominal power curves for pumps and fans. Somebody had all these numbers when they designed the system, after all.
Whatever number you come up with, perform a sanity check against the DOE’s DCPro online tool. Are you in the ballpark? Heads up, DCPro will ask you many questions about your facility that you may or may not be prepared to answer. For that reason alone, it’s an excellent exercise.
It’s interesting to note that even the Perfect World of absolute instrumentation can expose some unexpected inter-dependencies. Since the efficiency of the mechanical yard depends on its overall load level, the value of the data-center PUE can be affected by the load level in the rest of the facility. During off hours, when the overall load drops in the office space, the data center will have a larger share of the chilled-water resource. The chiller and/or cooling-tower efficiency will decline at the same time. The resulting increase in instantaneous data center PUE does not reflect a sudden problem in the data center’s operations; though it might suggest overall efficiency improvements in the control strategy.
PUE is a very simple metric, just a ratio of two power measurements, but depending on your specific facility configuration and level of instrumentation, it can be remarkably tricky to “get it right.” Thus, the ever-expanding array of tier levels and partial alternative measurements. Relatively small incremental investments can steadily improve the quality of your estimates. When reporting to management, don’t hide the fact that you are providing an estimated value. You’ll only buy yourself more grief later when the reported PUE changes significantly due to an improvement in the calculation itself, instead of any real operational changes.
The trade-off in coming to a reasonable overall PUE is between investing in instrumentation and investing in a bit of research about your equipment and the associated estimation calculations. In either case, studying the resulting number as it varies over the hours, days, and seasons can provide excellent insight into the operational behavior of your data center.
Last week I wrote a little about measuring the total power in a data center, when all facility infrastructure are dedicated to supporting the data center. Another common situation is a data center in a mixed environment, such as a corporate campus or an office tower, at which the facility resources are shared. The most common shared resource is the chilled-water system, often referred to as the “mechanical yard.” As difficult as it sometimes can be to set up continuous power monitoring for a stand-alone data center, it is considerably trickier when the mechanical yard is shared. Again, simple in principle, but often surprisingly painful in practice.
One way to address this problem is to use The Green Grid’s partial PUE, or pPUE. While the number should not be used as a comparison against other data centers, it provides a metric to use for tracking improvements within the data center.
This isn’t always a satisfactory approach, however. Given that there is a mechanical yard, it’s pretty much guaranteed to be a major component of the overall non-IT power overhead. Using a partial PUE (pPUE) of the remaining system and not measuring, or at least estimating, the mechanical yard’s contribution masks both the overall impact of the data center and the impact of any efficiency improvements you make.
There are a number of ways to incorporate the mechanical yard in the PUE calculations. Full instrumentation is always nice to have, but most of us have to fall back on approximations. Fundamentally, you want to know how much energy the mechanical yard consumes and what portion of the cooling load is allocated to the data center.
The Perfect World
In an ideal situation, you have the mechanical yard’s power continuously sub-metered—chillers, cooling towers, and all associated pumps and fans. Not unusual to have a single distribution point where measurement can be made. Perhaps even a dedicated ATS. Then for the ideal solution, all you need is sub-metering of the chilled-water going into the data center.
The heat load, h, of any fluid cooling system can be calculated from the temperature change, ∆T, and the overall flow rate, q: h=Cq∆T, where C is a constant that depends on the type of fluid and the units used. As much as I dislike non-metric units, it is easy to remember that C=500 when temperature is in °F and flow rate is in gal/min, giving heat load in BTU/h. (Please don’t tell my physics instructors I used BTUs in public.) Regardless of units, the total power to allocate to your data center overhead is Pdc=Pmech (hdc⁄hmech). Since what matters is the ratio, the constant C cancels out and you have Pdc=Pmech (q∆Tdc⁄q∆Tmech ).
You’re pretty much guaranteed to have the overall temperature and flow data for the main chilled-water loop in the BMS system already, so you have q∆Tmech. Much less likely to have the same data for just the pipes going in and out of your data center. If you do, hurrah, you’re in The Perfect World, and you’re probably already monitoring your full PUE and didn’t need to read this article at all.
Perfect and You Don’t Even Know It
Don’t forget to check the information from your floor-level cooling equipment as well. Some of them do measure and report their own chilled-water statistics, in which case no additional instrumentation is needed. In the interest of brand neutrality, I won’t go into specific names and models in this article, but feel free to contact me with questions about the information available from different equipment.
If you’re not already sub-metered, but you have access to a straight stretch of pipe at least a couple feet long, then consider installing an ultrasonic flow meter. You’ll need to strap a transmitter and a receiver to the pipe, under the insulation, typically at least a foot apart along the pipe. No need to stop the flow or interrupt operation in any way. Either inflow or outflow is fine. If they’re not the same, get a mop; you have other more pressing problems. Focus on leak detection, not energy monitoring.
If the pipe is metal, then place surface temperature sensors directly on the outside of the inflow and outflow pipes, and insulate them well from the outside air. Might not be the exact same temperature as the water, but you can get very close, and you’re really most concerned about the temperature difference anyway. For non-metal pipes, you will have to insert probes into the water flow. You might have available access ports, if you’re lucky.
The Rest of Us
Next week I’ll discuss some of the options available for the large population of data centers that don’t have perfect instrumentation, and can’t afford the time and/or money to purchase and install it right now.