Posted by Donald Klein on Tue, Aug 31, 2010 @ 01:26 PM
You may have asked yourself, “Why do I need another monitoring and reporting product if I already have five?” True, you most likely don’t need another monitoring product, but rather what you really, really need is a system to link these systems together.
Why? Because several different monitoring systems operating in their own silos doesn’t help you improve your business. Instead, what you need to do is build business logic for optimization and capacity expansion strategies, as well as decrease the time spent to repair problems.
To do this effectively, you need a super system: what we call the “mother of all monitors”. This is a system that cannot only collect a superset of monitoring data from different point solutions, but also connect directly to other devices that may not currently be monitored (e.g. generators, transfer switches, breaker panels, etc.). And it needs to do this with the kind of scalability, analytics, and ability to integrate with other management systems that you would expect from an enterprise-class tool.
Here at Modius, we are already seeing this happen in the field. There is a current trend among data center managers to link their monitoring platforms together so that they have one common central platform to view and navigate to distributed monitoring systems. We have designed our application, OpenData, with a “Monitor of Monitors” architecture in order to provide operators with a single pain of glass into both the facilities infrastructure including power-chain, cooling, and redundancies as well as IT system level information.

The key problems solved are:
- System-level metrics - Link system level IT metrics to facilities capacities
- Trouble shooting - Accelerate trouble shooting and fault dependency mapping
- Alarm management - Reduction in “noise-level” alarms
- Analytics - Building business-level metrics (BI) for capacity, efficiency, etc.
- Controls-based integrations – Improved automation based on broad data capture
Here is some more detail on each of these benefit areas …
1) System-level metrics
Typically, IT system-level metrics are collected by system management tools and will provide logical properties based on MIB-2 or the Host MIB (RFC-1514). This provides IT managers with data on the operating health of the equipment and capacity related to CPU, Disc, I/O, Memory. What management systems typically do not provide, however, is how facilities (power, cooling, etc.) impacts the cost of operations and the amount of optimal cooling.
By linking IT system-level metrics with unified facilities monitoring through a single portal, higher level business and operating metrics can be formulated to reduce the cost of operations by tuning available cooling resources to the actual needs of each server instance or other IT gear.
2) Trouble shooting
By consolidating event and performance data into a single view, you can quickly determine the cascade of failures with the visibility to determine the impacts of facility equipment. An example could be a PDU failure and what devices are in the path of the affected circuit. In redundant environments there will be a fail-over to the second PDU but in most cases the assurances of a successful hand-off are difficult to predict. By linking both facilities BMS, PDU’s, UPS, Genset with system level IT information the relationships are documented, visualized, correlated and actively monitored.
3) Reduction in rogue alarms
By linking point solutions and consolidated even level data, a complete historical view may be achieved. Through this historical view, alarm flows can be optimized and reduced operationally. An example would be a BMS received alarms at a rate where the alarms become noise as they are not easily tuned. Also contextually, it is very difficult to look at what a typical operating condition is as there is not enough or broad enough history to proactively set truly meaningful thresholds or deviations.
4) BI-based business metrics
With a single point of consolidation, you can quickly build reports and dashboards across platforms. An example would be a stock chart type view when you can visualize a period of time. This is used to determine deviations from the norm which might cause downtime or affect operational performance. With several independent systems it becomes impossible to correlate based on time or carry enough history to gain the insight necessary to prevent a potential outage.
5) Single application launch point
The “Monitor of Monitor” architecture brings a unified structure to gain access to operational and control systems. An example use case would be to identify cooling requirements based on broad-based data capture (e.g. an array of environmental sensors at the rack level, or real-time server-inlet temperatures taken directly from servers themselves) and then tie the resulting performance metrics into building control systems to tune VFD’s and cooling output. Integrating the BMS application directly to the monitoring system allows the use the real-time data required and feedback mechanism to optimize cooling and cost without overheating the IT equipment.
Conclusion
If you would like more detail on how Modius can help with any the above topic areas, please reach out directly using info@modius.com, and we will be happy to set up an appointment.
Posted by Mark Harris on Fri, Jun 25, 2010 @ 11:40 AM
I spend a great deal of time talking about data center efficiency and the technologies available to assist in driving efficiency up. Additionally a great deal of my time is spent discussing how to determine success in the process(es). What I find is that there is still a fundamental missing appreciation for the need for 'continuous' real-time monitoring to measure success using industry norms such as PUE, DCIE, TCE and SWaP. I can't tell you how many times someone will tell me that their PUE is a given value, and look at me oddly when I ask 'WHEN was that?'. It would be like me saying 'I remember that I was hungry sometime this year'. The first response would clearly be 'WHEN was that?'

Most best practice guidelines and organizations involved here, (such as The Green Grid, and ITIL) are very clear that the improvement process must be continuous, and therefore the monitoring in support of that goal must also be. PUE for instance WILL vary from moment to moment based upon time of day and day of year. It is greatly affected by IT loads AND the weather for example. PUE therefore needs to be a running figure, and ideally monitored regularly enough that the Business IT folks can detremine trending and other impacts of new business applications, infrastructure investments, and operational changes as they affect the bottom line.
Monitoring technologies should be deployed that are installed permanently. In general, 'more is better' for data center monitoring. The more meters, values, sensors and instrumentation you can find and monitor, the more likely you'll have the raw information needed to analyze the data center's performance. Remember, PUE is just ONE KPI that has enough backing to be considered an indicator of success or progress. There surely will be many other KPIs determined internally which will require various sets of raw data points. More *IS* better!
We all get hungry every 4 hours, why would we monitor our precious data centers any less often?
Posted by Mark Harris on Fri, May 28, 2010 @ 04:54 PM
While I've seen my share of some pristine new data centers over the past few years, as well as a huge number of large scale retro-fit projects where old centers are being turned into new usable data center space, I have also seen an alarming number of older 'house of cards' data centers that are up in modern production and appear to be 'hands-off'.
These data centers are typically chock full of older devices and interconnects that were passed down from generation to generation of IT managers, only to realize that what they inherited was unmanageable. While it is true that these data centers will ultimately find their way into extinction in a world focused on operational efficiency and pro-active management and best practices, we can all feel the pain involved when we encounter something like this.

Above is one of the most interesting centers I've seen, and would appear to have conflicting priorities as to what is required to move forward. While I don't have a comprehensive sequence of steps required to migrate to a highly supportable, efficient and monitored data center, let me suggest one step that will help tremendously... Find the YELLOW patch cord and disconnect it.
Seriously, when I saw this photo I had to laugh and take a second look. Was it some new thermal blanketing technology? Or a way to eliminate blanking panels? The reason I make light here is that there are countless data centers that are in similiar out-of-spec designs and would benefit from adopting new data center technologies, new power distribution, cooling and monitoring solutions, but are challenged by WHERE TO BEGIN and the magnitude of the task at hand.
In the monitoring world for instance where Modius delivers value, we regularly find data centers with NO VISIBILITY to their energy usage and easily can identify hundreds or thousands of points of monitorable data that would help get energy usage under control. We are ready willing and able to take on chaos and make sense of it.
Posted by Mark Harris on Sun, Apr 04, 2010 @ 07:13 AM
The granular management of all assets being placed or moved within a datacenter has become highly desirable over the past several years. Important to note is that most major companies will claim to already solved the asset management needs with an array of typically disconnected and many times complex sets of tabular asset manager products. These same companies are now quietly looking for 'something else' to help get them to where they 'really' need to be...
The newest generation of asset management suites are focused on visually representing assets with a drag-and-drop approach to adds, moves and changes. These new lifecycle management suites allow equipment to be added, moved or changed in existing facilities in a highly predictable and efficient manner. Examples of these modern suites include Aperture, Altima/Netzoom, Rackwise, nLyte, Avocent, ShowRack, APC, VisualDatacenter, Raritan/dcTrack, FieldView and a handful of others. Each of these management software suites has been crafted to allow complex data centers to be visually articulated with a high degree of fidelity, identifying everything from the manufacturer, model and serial number, to the purchase date, PO number, owner’s name and physical location.
In typical scenario, the user will graphically navigate using a drill-down tool which mimics the ‘Google Earth’ model… starting with very macro views and then selectively drilling-down to progressively more detailed views of smaller areas. In each view, various operational metrics are constantly reported such as ‘power being consumed’ within the current view. Ultimately single discrete values can be displayed.
Historically, these suites have relied on ‘faceplate’ information. This faceplate information is based upon the manufacturer’s published specification for a specific given device. It is usually the maximum value. A 1U web server for instance may have a published faceplate power consumption of 450 Watts, but the actual power draw in normal operation may be a much lower 150Watts or less. This discrepancy creates the potential for huge errors and inefficiencies when planning for overall capacity and expansion opportunities.
Consequently, one of the newest customer requirements needing to be addressed by EACH of the asset management suite vendors is to add real-time metric data. The desired metric data will obviously include the value for Power consumption, but may also include less intuitive values for fans speeds, inlet and CPU temperature, CPU and RAM utilization, available disk space, etc. While these values are relatively easy to come by as an individual user of each system, many different technologies must be exercised to programmatically and remotely retrieve these values in real-time.
This is currently where many of the latest generation of visual Asset Managers struggle. While their systems are amazing at handling the visual manipulation of IT assets, moving racks and routers along floorplans and data centers, the systems are simply not built with a large enterprise in mind when it comes to gathering Real-Time metric data. Gathering metric data for 12 servers at a trade-show is very appealing, but doing the same type of metric gathering in production against 12,000 or 112,000 servers is a bigger fish to fry. To do so requires a distributed collection architecture that is purpose built to collect any and all data from any device which is network addressable.
Real-Time monitoring with OpenData is the technology that will support the replacement of these faceplate ‘theoretical’ values with actual observed values… allowing a significantly more accurate view for planning purposes. Modius' OpenData(r) is built on a fully distributed bus architecture, is firewall friendly, and can be deployed easily to provide any asset management tool's need for Real-Time monitoring. OpenData SUPPORTS rather than replaces Asset Management suites, and has been crafted with API's and Web Services interfaces to allow the OpenData gathered metric data to be CONSUMED by any number of other applications, including the current crop of Visual Asset Managements solutions. The combination of a best-of-breed visual asset management tool with a highly granular metric monitoring solution like Modius OpenData allows business costs to be much more understood and ultimately will allow existing data centers to provide significantly more capacity and increases the lifespan of the data center itself.