Posted by Donald Klein on Tue, Aug 31, 2010 @ 01:26 PM
You may have asked yourself, “Why do I need another monitoring and reporting product if I already have five?” True, you most likely don’t need another monitoring product, but rather what you really, really need is a system to link these systems together.
Why? Because several different monitoring systems operating in their own silos doesn’t help you improve your business. Instead, what you need to do is build business logic for optimization and capacity expansion strategies, as well as decrease the time spent to repair problems.
To do this effectively, you need a super system: what we call the “mother of all monitors”. This is a system that cannot only collect a superset of monitoring data from different point solutions, but also connect directly to other devices that may not currently be monitored (e.g. generators, transfer switches, breaker panels, etc.). And it needs to do this with the kind of scalability, analytics, and ability to integrate with other management systems that you would expect from an enterprise-class tool.
Here at Modius, we are already seeing this happen in the field. There is a current trend among data center managers to link their monitoring platforms together so that they have one common central platform to view and navigate to distributed monitoring systems. We have designed our application, OpenData, with a “Monitor of Monitors” architecture in order to provide operators with a single pain of glass into both the facilities infrastructure including power-chain, cooling, and redundancies as well as IT system level information.

The key problems solved are:
- System-level metrics - Link system level IT metrics to facilities capacities
- Trouble shooting - Accelerate trouble shooting and fault dependency mapping
- Alarm management - Reduction in “noise-level” alarms
- Analytics - Building business-level metrics (BI) for capacity, efficiency, etc.
- Controls-based integrations – Improved automation based on broad data capture
Here is some more detail on each of these benefit areas …
1) System-level metrics
Typically, IT system-level metrics are collected by system management tools and will provide logical properties based on MIB-2 or the Host MIB (RFC-1514). This provides IT managers with data on the operating health of the equipment and capacity related to CPU, Disc, I/O, Memory. What management systems typically do not provide, however, is how facilities (power, cooling, etc.) impacts the cost of operations and the amount of optimal cooling.
By linking IT system-level metrics with unified facilities monitoring through a single portal, higher level business and operating metrics can be formulated to reduce the cost of operations by tuning available cooling resources to the actual needs of each server instance or other IT gear.
2) Trouble shooting
By consolidating event and performance data into a single view, you can quickly determine the cascade of failures with the visibility to determine the impacts of facility equipment. An example could be a PDU failure and what devices are in the path of the affected circuit. In redundant environments there will be a fail-over to the second PDU but in most cases the assurances of a successful hand-off are difficult to predict. By linking both facilities BMS, PDU’s, UPS, Genset with system level IT information the relationships are documented, visualized, correlated and actively monitored.
3) Reduction in rogue alarms
By linking point solutions and consolidated even level data, a complete historical view may be achieved. Through this historical view, alarm flows can be optimized and reduced operationally. An example would be a BMS received alarms at a rate where the alarms become noise as they are not easily tuned. Also contextually, it is very difficult to look at what a typical operating condition is as there is not enough or broad enough history to proactively set truly meaningful thresholds or deviations.
4) BI-based business metrics
With a single point of consolidation, you can quickly build reports and dashboards across platforms. An example would be a stock chart type view when you can visualize a period of time. This is used to determine deviations from the norm which might cause downtime or affect operational performance. With several independent systems it becomes impossible to correlate based on time or carry enough history to gain the insight necessary to prevent a potential outage.
5) Single application launch point
The “Monitor of Monitor” architecture brings a unified structure to gain access to operational and control systems. An example use case would be to identify cooling requirements based on broad-based data capture (e.g. an array of environmental sensors at the rack level, or real-time server-inlet temperatures taken directly from servers themselves) and then tie the resulting performance metrics into building control systems to tune VFD’s and cooling output. Integrating the BMS application directly to the monitoring system allows the use the real-time data required and feedback mechanism to optimize cooling and cost without overheating the IT equipment.
Conclusion
If you would like more detail on how Modius can help with any the above topic areas, please reach out directly using info@modius.com, and we will be happy to set up an appointment.
Posted by Donald Klein on Mon, Aug 23, 2010 @ 06:20 PM
Here at Modius, we are seeing an increasing number of requests among Co-locations (Co-los) and Managed Service Providers (MSPs) to help them capture more robust and accurate power measurement data. In one sense, this trend is nothing new because all data centers—whether captive inside an enterprise or an outsourced service provider—need accurate power measurement, typically for improving:
- Capacity optimization
- Energy efficiency
- Uptime assurance
But we find that Co-lo’s and MSP’s have a special need that takes power reporting to the next level: Providing disaggregated energy consumption and power usage data by customer at a very granular level, often by rack or even a group of servers. Typically, they need detailed power metering for each customer, principally for:
- More accurate customer billing
- Detailed status reporting to the customer (in real-time) through a customer portal
Customers are now wanting this information not only to be sure their power bills are accurate, but also to try and determine their available power capacity, usage trends, and accurate data to support reporting on PUE and Carbon management. Or even more of a challenge, they need to unify data across different locations because their customers are spread across several different buildings.
Theoretically, some of this data can been captured from the servers. In fact, with distributed systems management tools, reporting on server energy consumption (at the server level) is relatively commonplace. But this data source is incomplete. What if you want to factor in cooling and other related energy consumption? Or what if you also want environmental reporting for bottom/middle/top for each rack? Now, this is much more challenging …
In general, most Co-Lo’s don’t have access to the server instrumentation data at the chassis level. And in terms of power and cooling, we’ve found that most co-location providers are still struggling to unify a broad range of equipment into a single monitoring fabric and extend the framework across disparate systems and locations.
Happily, there are several Co-Lo’s operators taking the initiative by unifying their monitoring of power and cooling equipment with a real-time data center monitoring and measurement system like Modius OpenData. And many are augmenting power and cooling data by installing new breaker level metering and. Moreover, many are even using this data to create centralized customer portals to provide their customers with reporting and a real-time view of their power capacity and consumption. Further, they are adding a layer of analytics and baselines on energy efficiency and reliability.
As the industry becomes more competitive, service providers cannot continue with business as usual. Many Co-lo’s and MSP’s have taken this initiative so that they can differentiate themselves, have better visibility on how they can extend their internal resources, and provide PUE and Carbon reporting services to their customers.
We believe the underlying driver behind this trend is the fact that an increasing number of corporations and enterprises with large IT departments are being tasked by their senior management to provide comprehensive reports on power usage and their relative efficiency, regardless of whether the enterprise owns their own data center facilities or outsource part of their infrastructure.
Be it end-users, Co-lo’s or MSP’s, everyone is increasingly looking to software providers like Modius to solve the comprehensive measurement and reporting problem, and we believe they are finding that Modius OpenData is the right product at the right time and value.
Posted by Mark Harris on Mon, Jun 07, 2010 @ 03:56 PM
OK we have heard about the 'Greening' world around us, the price of power, the costs of cooling, the need for energy efficiency and ultimately The Green Grid's "PUE" KPI for a few years now. What originally sounded like a great way to definitively calculate the energy efficiency of getting IT work done, still seems like a great way to do so, but also seems like just the START of the journey...
Remembering that alot of work went in to the creation of PUE, it is considered by many to be a great place to start TODAY towards the goal of optimizing energy usage. Remember, you can't optimize that which you don't understand. That said, PUE may not be viewed down the road as the single best metric, but for now, it is MUCH better than what we had just a few years ago. Nothing. PUE is a metric that is well understood and can be determined for ANY END-USER that chooses to calculate it. It can be calculated in real-time using a fairly small investment in time and resources.
Today the EPA took the next step to allow end-users to compare their energy conservation and efficiency efforts to those of their peers. Basically, any company the wishes to can audit their PUE, document their findings, hire a PROFESSIONAL (recognized audit partner) to verify their claims, and then submit to the EPA. Those data centers that rank in the top 25% of their peer group will be considered as having an 'Energy Star' compliant data center. (And the bragging rights that go with the star).
So what does this mean to the industry? Well, I think we'll hear alot of companies that applaud the move by the EPA for Energy Star data center recognition. Many companies have worked hard to eliminate energy inefficiencies and love telling the world about their successes. The new Energy Star rating will allow this message to be even louder, since it will provide some apples-to-apples comparison. It supports the ROI measurements for these efforts. Peers will get a sense of what is POSSIBLE by people doing like environments. Some CIOs and CFOs will stand up and say, "Why is my closest competitor X% more energy efficient making the same type of widget?"
We will also see a bunch of complaining about the use of 'PUE' as the main KPI used in the determination for Energy Star. The more vocal opponents will argue that PUE as a KPI is err'd from the start or meaningless and can be manipulated or contrivedby the unscrupulous. In turn, we'll see a resurgence of pushes for "DCeP" (or one of the 10 proposed proxies) as a better KPI from these nay-sayers. I say it's good to see more energy on KPIs like DCeP, but we need some forcing function, NOW! Rememeber, the goal is to get companies to ACT NOW... mid course corrections welcome!
I think PUE was a great first step. I think Energy Star for Servers and then Energy Star for Data Centers is a great SECOND step(s), but why would we be nieve to think all of this would stop there?
Energy Star for Data Centers is circa 2010. Perhaps the folks at EPA will have a Energy-Star-PLUS recognition in 2012 (they could call it "Energy Star for Data Centers 2012" or similiar nomenclature) based upon any potentially agreed upon proxy for DCeP. Or perhaps they would use a different metric/KPI? Not sure. But what I am sure is, that we need to force ENERGY EFFICIENCY PROGRESS NOW. For companies to stand up, articulate their best practices and be tested and challenged by their constituents. We all need to LISTEN and LEARN from each other.
Status quo will no longer work. As an industry we need to push the design and re-architectures of existing space to be highly efficient. Too much waste in the past and nobody really understood it. We need to do the hard work, build containment aisles or modify air flow on on inlet-temp or overall pressure, we need to install sensors and monitoring, install spot cooling, refresh older hardware servers, etc. etc etc.
The energy efficiency work has just started, and it's a very long road ahead. Let's stay on track and work towards a common goal. Doing more with less, making every KiloWatt count, reducing the cost of doing business. Remember, we are all on the same planet, using the same resources.
The EPA's "Energy Star for Data Centers" 2010 is a GOOD thing...
Posted by Mark Harris on Fri, Jun 04, 2010 @ 01:34 PM
I was flipping through the 2007 report to congress issued by Jonathan Koomey ("Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431") and on Page 10 came across a very easy to read, but impactful diagram which provides some great insight into the future of the IT industry, and can be discussed in terms of end-users as well.

I suspect that this chart could be applied more or less to ANY individual company in their quest for energy efficiency. If there is some level of 'greening' at play in a corporation, then this chart can be a crystal ball into your 5 possible futures.
You can see from the diagram varying impacts on energy consumption, (starting at the top) going from taking NO NEW ACTION, all the way through DOING EVERYTHING POSSIBLE. I would suggest today that most companies are somewhere approaching the "Improved Operations Scenerio". If you look above, you'll see this green curve essentially takes the overhead out of operations, but does very little to have any significant long term effect on the SLOPE of the curve.
In the chart, the "State of the Art Scenerio" is a good depiction of what is POSSIBLE (expected) if all business processes are tuned and all equipment is refreshed with the latest. This would create a real-time infrastructure ("RTI" as defined by Gartner) that self-tunes itself based upon demand. Most importantly... It would also lower the most basic cost per transaction. A CPU cycle would actually cost less!
These are very exciting times ahead...
Posted by Mark Harris on Thu, Apr 29, 2010 @ 07:09 AM
Understanding the power consumption of any given discrete device in the data center may be accomplished in a number of ways including measurement and modeling technologies. While many approaches have been tried over the years, today there are four main ways to determine the power being consumed.
- Faceplate Values. Each manufacturer places a service value ‘plate’ which identifies things like model and serial numbers, manufacturer’s contact information, safety certifications and power requirements. The power requirements are usually listed as the voltage range acceptable for the included power supplies, as well as the maximum current to be drawn by any configuration and working condition of the device. For a complex device, this faceplate power consumption value is listed as the maximum possible and may be 4 or 5 times the actual power being drawn in normal operating conditions. Since this is printed information required on every device, it essentially has no additive administrative no-cost.
- iPDU Monitoring per outlet. Newer environments have begun to deploy measured or metered power distribution devices within each rack. These iPDU have enough intelligence to allow network inquiries to be made of the iPDU itself, with the most granular of these devices offering discrete values for the power being consumed PER-OUTLET. These PER-OUTLET iPDUs make ideal sources of raw power consumption values, although they tend to be costly to do so.
- Monitoring via operating system service. Most modern hardware telco, server and switch designs and their associated operating systems include what is known as ‘System Services’ or ‘Daemons’ which are intended to allow access to granular operating information. In most modern cases, device drivers are included in the standard software builds which enable power consumption metrics to be read from the actual power supply unit, assuming that the power supply was instrumented in hardware when the device was manufactured. In cases where this hardware instrumentation exists, there are no additive costs to gaining access to the power consumption for these devices across an IT infrastructure.
- Modeling the device. It could be argued that a tremendous portion of the installed IT equipment that was purchased more than 3 years ago has little or no instrumentation capability in hardware. In these cases it is impossible to programmatically read power consumption metrics. Instead one approach has been to model the power consumed based upon a model of the hardware configuration of the device. Mostly for servers, it could be argued that a good approximation for a device can be calculated by knowing an inventory of components inside each device, and then the power consumption of each of those components. Coupled with some workload information and a fair assessment of consumption can be derived.
It should be noted that each and every Enterprise will likely find themselves dealing with MULTIPLE approaches (from the above list) in determining power consumption. Some devices and configurations will lend themselves to highly granular network inquiry, while other older devices may need to be modeled to determine power. It is these sources of power consumption that will need to be gathered, normalized and then ultimately fed into some form of higher value asset or resource management suite.