Parting the clouds for IT admins: We chat to CloudPhysics
Part 1: Kabuki theatre with sales and pre-sales
A company's IT Infrastructure is crucial to its survival and success, which is why most companies invest heavily in that area to reduce risk and increase performance.
Despite these investments, the complex and dynamic nature of IT infrastructure environments means that most companies do not have a complete picture of their IT infrastructure and the workloads that run on it. There is a risk and performance penalty associated with this. The risk lies in making changes to your largely unknown environment - like adding resources, adding workloads and changing policies.
The performance penalty arises from the fact that an unknown IT infrastructure cannot use its resources optimally.
I recently spoke to the CEO of a company hoping to exploit this situation: CloudPhysics. It has developed a non-disruptive data collection technique for VMware vSphere environments where data about the IT Infrastructure and the workloads that run on it are collected. This data is analysed using VMware knowledge databases and anonymised user-data from other CloudPhysics customers.
Based on this analysis, IT administrators are given a set of recommendations with precise execution directions to reduce risk and optimise performance in their IT Infrastructure.
CloudPhysics offers services that allow IT administrators to dry-run emergency procedures like fail-overs on the real production environment and evaluate the impact of new technology on their production environment.
I interviewed John Blumenthal, CloudPhysics CEO, to get more insight into its product and to hear his views on the market.
WtH: So how are things going John?
John Blumenthal: I am quite amazed by the attention we received. The launch at VMworld in San Francisco went very well and we seem to have a strong appeal to VMware admins and specialists that design and implement IT infrastructure.
CloudPhysics overall process
WtH: CloudPhysics extensively gathers information and suggests changes and optimisation strategies. Does it implement them as well?
John Blumenthal: Not yet. We do make recommendations; our goal is to not just find a problem, but also to provide an execution path and a remediation plan in the analytics that we are delivering. We believe that is the next generation of how data is put into use by a VMware admin or designer.
It is not enough just to index and search this data to look for correlations but to actually find causation. There is a known phrase in data science, that correlation is often not causation. In fact they’re often times never related.
We think that a lot of log analytics platforms which effectively allow you to do these forms of searches, fall short of what an administrator needs to find, which is an analytical based answer that provides a direction of what to do.
However we do not take the final step, which is to actually execute the recommendation. That has to do more with the nature of a SaaS service attaching to your network and the concerns people have about a remote system actually executing changes in your environment. So we go as far as the data and the execution plan, but not the actual execution at this time.
WtH: Do you imagine developing a perhaps locally installed add-on that does allow for execution in the near future?
John Blumenthal: We do. We ultimately intend to do that and deal with all the security concerns. So in that sense it will look like a highly informed resource management approach. Among the team members we have people that were responsible for a lot of the resource scheduler at VMware.
Our idea has been to implement greater quality and quantity of the analytics that drive those changes. As the market adopts our solution, we will step forward with options for making these final changes, as you pointed out.
WtH: So a large portion of your team consists of ex-VMware and ex-Google employees right?
John Blumenthal: Yes indeed. One of my co-founders, Iran Ahmad, was core in the DRS team and the author of storage DRS and Storage IO control, and Carl Waldspurger, who works with us as an advisor, was the actual principal engineer responsible for the original architecture and implementation of DRS. Carl spends quite a bit of time with us on architecture and direction.
Our goal is to ultimately model, simulate an entire data centre.
VMware and Cloud Physics
WtH: Is it true that VMware is looking at CloudPhysics and scratching its head now, thinking they should have come up with this solution?
John Blumenthal: VMware was and is a great company and many of us have made our careers there, so in many ways it is regarded as the mothership. Many of the things we were working on were not really in scope with the work that was being done at VMware, mainly because this is a SAAS oriented approach to delivering analytics which is unlike the on-premise approach that VMware took.
We have many discussions with VMware [as] we are in the partnership programme. We still have a great deal of allegiance and interest in offering more value to VMware customers.
WtH: How quickly can CloudPhysics include new technologies like PernixData, Infinio and others, and suggest recommendations on these?
John Blumenthal: Something like PernixData is very interesting layer that your IT infrastructure might contain.
Our goal is to ultimately model, simulate an entire data centre. Today we have broken it down in smaller discrete simulations, one of which has to do with caching. We have a caching analytics service with a module that allows us to work with any vendor and make tweaks to that model to incorporate effectively how their caching mechanism works. We sit down with a lot of storage vendors like Fusion IO, Proximal Data and we are very well known with the PernixData guys from our times at VMware.
Sitting with Satyam Vaghani and Frank Denneman would be great to update the information on PernixData so that a user can run a CloudPhysics service before the procurement of Pernix and understand the value proposition and benefit of introducing Pernix before they even purchase. Use real data to do that, avoid a POC and the cost and exercise that comes along with that.
WtH: So CloudPhysics customers can dry-run new technology to see how it will impact their actual production environment?
John Blumenthal: That is one of our main use cases yes. The procurement process is often a very wasteful exercise in today’s IT Infrastructures because your storage vendors do not actually know your environment and conversely you don’t really understand their technology.
The way the dance goes today, is a kind of kabuki theatre with sales and presales. It involves trying to replicate a production system and generate data that may or may not be indicative of what would actually happen in production.
So we looked at his and said, we can build a model of caching technology that has a mathematical model basis to it. We then gather workload traces non-disruptively from a cluster – and that is our secret sauce, being able doing these collections non-disruptively.
Then we can run these traces through our simulator and then literally within 2 to 3 per cent variance indicate to a user what the benefit would be for one or a group of workloads that are running in a cluster with a cache of a certain size. That benefit is highly accurate and highly quantitative.
You can avoid a lot of danger involved with making wrong purchases this way too.
Being able to simulate exactly what is in production and do that non-disruptively without having to spin up a proof of concept is what we believe is the future of how IT infrastructure will be sold.
WtH: Do you consider this to be your biggest use case?
John Blumenthal: It is the one that is bringing revenue to the company most immediately. We build this as one of our first services about a year ago and it gathered [the] interest of many storage vendors. It is the basis of the company's first revenues.
But expanding upon that are other services that we introduced that are focused less on procurement accuracy and efficiency and more on Risk and Safety.
For example we have a High Availability simulator, which has an HA health check service attached to that too. This is based on the work that Frank and Duncan have put together in their analysis and writings on High Availability. We have actually encapsulated much of that in our HA simulator and HA health check services.
The nature of the problem we are solving here, is as you provision virtual machines or modify HA policy groups, you don’t have visibility into the impact of those changes.
Meaning you do not know whether you have reserved enough remote resources in order to succeed in the event of a fail-over. Our simulators allow you to look at the consequences of a particular change and understand very accurately whether you are actually wasting resources by having too much capacity or not enough in which case you will not have a successful fail-over.
Additionally we have a couple of other services that are focused on understanding particular operational hazards [that] are starting to kick in and take on dramatic interest among the user base that we are involved with.
Blumenthal claims his firm provides Google-level IT infrastructure utilisation from its VMware virtualised server data sensing and analysis. The second part of Willem's interview with John Blumenthal talks about this and will be published next week. ®