How Microsoft is serious about supporting Linux and cloud rivals with OMS

Introduction and working with the community

When Microsoft first announced its new Operations Management Suite (OMS) cloud monitoring service last May, it wasn’t the case that handling Linux systems was an afterthought, but the tools certainly didn’t have parity with what you could do for Windows Server.

At the time, Jeremy Winter, who runs the OMS team, talked about Linux as being “on our roadmap to come” with the first option being deploying Microsoft’s own management agent into a Linux VM – which could be on AWS or VMware, not just Windows Server or Azure.

In the old Microsoft days, things might have stayed at that level, like a cloud version of the System Center approach. But the way Microsoft approaches cross-platform is rather different now and even veterans of System Center like Robert Reynolds understand that Microsoft needs to fit in with the existing Linux ecosystem, and that’s why the Linux agent for OMS is now a plugin to the popular Fluentd, even though Microsoft had originally experimented with an agent for the Linux systemd service manager.

Working with the community

“When we started the preview, we had our own management agents,” Reynolds explains, “but the Linux community said to us ‘we already have agents deployed, there’s already an open agent infrastructure’. So we worked with the community to find out which agent they preferred, which one they thought would be the long-term choice and the majority of people we’re working with have adopted Fluentd, so we went with that.”

And in a move that would have once been unusual but is quickly becoming ‘business as usual‘ at Microsoft, the OMS plugin for Fluentd is being open sourced. “We’re Microsoft, so there’s scepticism,” admits Reynolds. “We’re going to earn it. Part of earning that support and trust in the Linux community is being part of it, so that’s how we’re making decisions for Linux support and how we’re delivering them.”

There’s already support for connecting to existing open source monitoring services like Nagios and Zabbix from OMS. “We have the ability to plug in to those existing data streams. So instead of having to go replace an entire infrastructure or technology that’s already there, things like Nagios and Zabbix we can immediately connect to and start to pump the data in from.”

The OMS team has also started working on allowing customers to create custom logs and environments – that work will continue through the next year, he says. But the willingness to work with the Linux community has already led to what he calls “steady growth – 10% and more a month – in on-boarding Linux machines” since the Fluentd plugin came out last October.

Supporting rivals

There’s the same commitment to supporting VMware and rival clouds like AWS and OpenStack. “With our backup service in OMS, we support VMware backups, so machines running on VMware can be backed up from one VMware environment to another VMware environment that’s running on-premises, and we managed to simplify a lot of the technology that’s needed there.

“That also allows us to do VMware to Azure to give you failover sites. And we did RedHat support so that you can have VMware and RedHat instances that are being protected to Azure.”

The same approach applies to managing virtual machines, wherever they are. “There are two ways to think about a bunch of VMs,” he points out. “We can go and put agents in them, but the other option is that the platform itself will provide some monitoring and manageability and we can plug into that instead. In AWS, for example, that’s CloudWatch, and over time we’ll connect to those APIs and be able to collect data and analyse that.”

For alerts, you can already use a webhook to send an OMS alert to a range of services. “You can just cut and paste the webhook URL from PagerDuty, Zendesk, Slack; anything that supports webhook. And that’s a big, big list.”

“The notion of making sure that it’s truly on any operating system is key,” says Reynolds. “If you want to get the 100% view, at the right level of fidelity, into your environment, this is where we’re bringing all this data together.”

On-demand development and the next stage

On-demand development

Traditional Microsoft customers on Windows Server and Azure get the benefit of this new attitude as well. For example, OMS can show you on the dashboard which of your systems are patched and up-to-date and which are missing some updates you could apply. That doesn’t work yet with the new Windows Update for Business service for Windows 10 – just updates published through Windows Update. But Reynolds says: “If customers want Windows Update for Business, OMS will do that.”

“This is a very different mode of development to how we’ve operated in the past,” he explains. “Before, we would have gone and looked at our competitors in the space. We would have said ‘here’s a list of the union of all their features and we have to get as many of those as we can before we have a shippable product’. This approach is more continuous improvement – we start with something that has value, that’s been validated with the private customer cohorts we work with.”

(Cohorts is one of the terms you hear at Microsoft a lot now – it means the group of users who have signed up for a specific level of preview, like the Windows 10 Insider Fast Ring, but for business customers Microsoft does private previews under NDA.)

“Then we improve that rapidly as we go to the public cohorts and we gain more experience with customers using that. So it’s the customers using the service who are going to drive it.”

“It’s a very different model of software delivery for us,” Reynolds observes. “We pivot, we deliver rapidly and, sometimes, reliably on our schedules. Really, it’s driven by quality and customer feedback.” Rapidly means about 300 updates a month – Reynolds notes that “we’re constantly making small improvements, driving the service forward and polishing it.”

Zabbix on OMS

Next stage of OMS

One common request is to be able to take action right from the dashboards. “Customers want to be able to automate the patching of the things that are showing up here as critical,” says Reynolds. “We’re getting a lot of feedback saying ‘let me set a rule with a set of policies and based on this assessment that you’re showing me, go fix all the missing critical updates’.”

Those sorts of options will come in the next stage of OMS. “There’s insight and there’s action. With OMS, we’ve got the platform for insight and we’re starting to build a solution that enables customers and community partners to scale that. Then the next step is going to corrective action. We’ll get to the first step of that later on,” he says.

There’s no date for that kind of automation, but OMS gets updated every month, and Reynolds says the priority is what customers are asking for.

via Blogger