Episode 34

Observability Explained with Mike Maciag

Mike Maciag, Chief Marketing Officer at Dynatrace joins Tech Transforms to talk about the power of observability. Careful monitoring is of paramount importance for any successful operation, and observability can take your agency to the next level. Listen in as Carolyn and Mark get some tips and tricks for improving cybersecurity posture with the most accurate technology.

Episode Table of Contents

[00:31] The Vital Role That Observability Plays in IT
[10:40] Observability: When You’re Asking the Systems to Share
[22:48] The President’s Memo on User Experience
[34:01] Let Machines Do the Stuff That Doesn’t Matter

Episode Links and Resources

The Vital Role That Observability Plays in IT

Carolyn: Today, we get to welcome Mike Maciag, who is Chief Marketing Officer of Dynatrace. One of our own, one of the clan is here with us today. And as CMO, Mike is responsible for Dynatrace's global marketing organization. We're really excited to hear his expert opinion on observability and the vital role that it plays in IT, and especially the cloud.

Mike: Thank you, Carolyn. Mark, nice to be with you both today. And I know this is a long time in coming, but I'm excited to be sitting down and talking to you today.

Carolyn: We've been able to talk to a few of our guests a little bit about APM. And just recently we talked to a former CIO at VA. He is very bullish on APM, and he talked a lot about the advances that they were able to make in the VA with APM. Just that at least within the VA, APM moved from a nice to have to a must-have. And what I'd really like to hear you talk about, just to dive right in, Mike, is so there's the APM part. But then in my mind and I might be positioning this wrong. In my mind, I think that observability is like APM 2.0. But can you speak to that APM versus observability? What's the difference?

Mike: As long as we're talking about terms, we might want to mix monitoring in there as well. All terms that are thrown around, is it monitoring, is it APM, is it observability? And it's changed, it's changed a lot. Let me start with the simplest definition, then maybe we can unpack it from there. Think of observability as the umbrella term, as the broadest umbrella term that goes above all of this.

Monitoring, APM, Observability

Mike: Observability fully includes APM, and observability also subsumes monitoring, both of the things that we've been doing. There are kind of two megatrends in the industry that have been driving this move towards observability. One is the move to the cloud.

More and more systems are moving to cloud architectures, probably more important digitally native architectures. We're going from monolithic systems that we could understand, that we could see, that we could touch. We could understand what's happening with them into cloud increasingly complex, even multi-cloud architectures that are driven by microservices and the like.

The reason for that movement is it has made digital transformation, application development faster and easier in that regard. Which is this digital transformation fundamentally looking at everything that I've been doing in every aspects of my business. Whether it be on the front end or in the services I provide. Whether it be on the front end or in the backend machine to machine conversations is happening in cloud architectures. And we're trying to figure out how we can automate more of it and things are happening that way.

Does that make sense, just from a starting point, from observability’s umbrella, fully subsumed monitoring, fully subsumed APM, kind of in that the drivers being cloud and digital transformation making that happen. And I can get into more details.

Mark: That absolutely hits the mark. And we also say end-user performance or experience.

Mike: That's right.

Carolyn: Yes, that sets me straight. Because me saying that observability is APM 2.0 is wrong. APM, like you said, it's underneath observability. It might be, I guess, one way into implementing an observability platform into your organization, but it's not all of it.

Where the User Touches the Applications

Mike: Yes. When you say observability, kind of what pops into my mind is thinking through there's APM, there is infrastructure monitoring as part of that, what's going on in the infrastructure that's underneath it. There's, as Mark was kind of alluding to, digital experience management. Where does the end-user fit into this? And kind of making that happen.

Then you have increasingly even elements of systems that are achieving what they need to achieve have security in there as well. Because really, we think about a world where software works perfectly. The expectation is that we live in a world where software works perfectly. Now, that's a vision. It's a long way coming. But to make that happen on an end-to-end basis, you really need to bring all of those things in there.

APM, I often think about as the high ground in this, because APM is where the user touches the applications. It's where the business needs meet the IT needs of what's happening. And it's kind of what people can touch in that area. It's a very interesting place to enter. That obviously is an important part of it. But it's absolutely essential to have the infrastructure that's monitored underneath it and the user experience. At least specifically as you may kind of thought.

Mark: You mentioned a couple of different things. And in the federal market, there's two things. And if we have time, maybe we can talk about these. But one is the executive order that the president came out with at the end of the calendar year around end-user experience. It was something very new that we had seen coming out of the government. So maybe we can talk about that maybe a little bit later if we had time.

Infrastructure Monitoring Is Observability

Mark: But the second one, and you mentioned security, was zero-trust. The whole cybersecurity, and of course everybody's trying to figure out ways that they can improve their security posture. And people like Carolyn and I figure out how we can tap into the cybersecurity budgets that have been allocated to that.

Carolyn: Well, okay, for our listeners, I want to back up just a little bit and define APM. It's application performance monitoring. You made me realize that we didn't define that, Mike. Because when you said there's the infrastructure monitoring too. You're right, in my mind, application performance monitoring includes infrastructure monitoring, but not necessarily. That would be the observability.

Mike: Yes. Carolyn, just to not get too inside baseball and Dynatrace, I understand why you think about it that way. Because our APM does in fact include. We think of APM as full-stack. It goes all the way down to the infrastructure that it's monitoring. When people work with Dynatrace, they're getting that as part of included. So absolutely makes sense why you kind of giving your steeping of Dynatrace kind of thinks of it that way. The rest of the world does not, by the way.

They think of APM and infrastructure as two different things. You basically buy those off cardless. We don't think they can be separated. Because what you want to be able to do, I mean, the goal here is to simplify cloud complexity to the point where you can get a precise root cause answer if something were to go wrong. And drill all the way down to, "Here's a specific line of code that's making that happen." Or, "Here's the piece of infrastructure that's making that happen."

How Can We Better Position the Concepts of Observability and Federal

Mike: Let's say it's in a Kubernetes environment, just a container that spun down in a second. But it does that 60 times an hour, you need to be able to find that as it comes and goes. That's why you need to have full-stack as you kind of think about that.

Mark: You said some interesting things there, Mike, and I want to dig into this a little bit deeper. Because in the federal space, we feel like we're three to five years behind the commercial market. And the use of these concepts of observability, even APM, we rarely see RFPs coming out that have APM listed in it.

We might see infrastructure monitoring, we might see other terminology like that, but we rarely see these concepts. And the government has been in this transformation for years, moving to the cloud. Some agencies have had more success than others. Can we talk a little bit about how we might be able to better position the concepts and terminology of observability and federal better?

Mike: You mentioned three to five years behind, debatable exactly how many years. But the curve that the commercial space has gone through increasingly seems to be exactly the curve that the federal space is on. Which is with the moves towards moving to the cloud, whether they be trusted clouds or public clouds. The same kind of breakup of monolithic architectures has taken place.

When you break up the monolithic architectures, speed and scalability come with that, and flexibility come with that. And the other truth that I think you'll run into is complexity also comes with that. Guess what, no one, and I'm guessing the federal government is the same, is getting additional resources to monitor this in the old way.

Observability: When You’re Asking the Systems to Share

Mike: The idea that a system should be able to be monitored. You can understand whether the system is up or down and they go figure it out from their monitoring health.

When you move into observability, what you're doing is you're asking the systems themselves to share, to become observable, to put out data that says, "Hey, here's what's going on with me". And so that it can begin to understand in that way.

That's the purpose of trying to simplify that complexity. So that when you don't have greater resources to get your jobs done, you can still stay on top of it. The last thing that people want to do is get bogged down in monitoring and not be able to innovate. And be able to drive those new apps that are driving better services for citizens that are driving more security in DoD-oriented areas, et cetera. That's where this idea of observability is.

I'll even go one step further than that. Observability today does not include the concepts of intelligence and automation, but we think it should. And that's because this overwhelming amount of data that's being generated by these systems is really beyond the capability to the old ways. Where I'm going to put some data up on dashboards.

I can look at the dashboards and figure out what's going on and have a good sense of what's going on. It's just not possible to stay on top of it that way. We think about it as moving to a world where we're providing answers. The answers are allowing people to automate more and get more out of their teams.

Mark: Well, that's a good answer.

What the World Is Lacking in Terms of Security

Mark: We'll get that out to the sales team right away. One of the things that you mentioned that we run into is security. Some of the customers that we have, have a very different or stringent, higher stringent security requirements than others. Obviously, as you can imagine. That's maybe a level of complexity we run into. It's certainly an issue. We see that come out a lot. Is that the same kind of answer that we would provide about security?

Mike: Yes. One of the things that we're seeing more and more of kind of in the security sphere is how do you think about security in real-time and finding precisely identifying security issues in production? We have all kinds of things in the world that try and keep the bad guys out, or the bad actors, or the bad code out. We have even more things in the world that test, and says, "Okay. Before I do a check-in, kind of do a static code analysis on this and understand whether it's got known vulnerabilities in it."

What the world has been lacking has been the idea of, "Okay, so now there's something out there. How do I know who has it, or what systems have it, and how do I precisely identify it and make it happen?" Log4Shell helped us see this kind of in very specific ways, later not as large of an issue. But Log4Shell showed the same thing. Which is all of a sudden there was a zero-day exploit that was out there. Or it was a zero-day exploit that was discovered in a very popular open-source package that could be manipulated. In the entire world, they find it and fix it overnight.

Identifying Vulnerability Through Observability

Mike: By providing observability on the whole stack and understanding where it existed, our customers at Dynatrace were able to find that instantaneously. The minute it was identified as a vulnerability, we could show specifically what was going on and at least helped people with the, "How am I going to get to the point where I know exactly what happened and I can close that door as fast as I possibly can?"

Now, as we move on, it gets to, "Okay, great. Now let's move it into, 'I'm going to take automatic action and do a remediation on that.'" And there's more and more of that going on. But security is playing an increasingly large role in this. We should really be talking about DevSecOps teams to correct myself, are increasingly expected to build security into the applications and in the infrastructure, and setting up and ensure through things like what we're doing.

Carolyn: How do you see observability fitting into DevSecOps?

Mike: It's an absolutely essential piece of it, and here's why. DevSecOps, just in the broadest, most simple terms is the idea that responsibility for all of this shifts left. When I say shift left, it used to be we'd write monolithic code, we'd throw it over the wall. The people would operate the code on the other side of it. And there'd be this finger-pointing game of, "It didn't work well. What I gave you worked. Your system must be messed up," et cetera.

The DevSecOps at the broadest sense is let's shift that responsibility left and give development the responsibility to build operability into reliability, resiliency into the product, as well as building the security of the product from the beginning.

How Observability Fit Into DevSecOps

Mike: To make that happen, you need to provide the instrumentation so that they know what's happening in production. Or what would happen in production when I put it in production.

Then if I can provide precise root cause and get it to the next level of like, "Not only did this go wrong, or could it go wrong, or there was a slowdown, but here's specifically why," I can go fix it faster. I want to be able to make this happen. And really the purpose behind all of this is the world wants and expects flawless and secure interactions. Whether that's a machine to a person or whether that's a machine to a machine, you expect it to be flawless.

That's a fair expectation. And as we go more and more digital with the world, and that's kind of the whole idea of digital transformation. That's why we expect this flawless result. In the commercial sector, it may be in many ways more forgiving than elements of the federal sector, where you guys are talking in your audience sense.

The idea of having something go wrong or making a wrong assumption in software that the interaction doesn't go right can be immense. It hits not hundreds, not thousands of users, but tens of thousands to millions, to hundreds of millions of citizens.

Mark: Well, it could be life dependent. I mean, and the DoD in the IC space where mission criticality means the life or death, it couldn't get any more gray than that.

Making Decisions With Precise Accuracy Is Required

Mike: Yes, that's absolutely right. A big part of this then is all of this data that these modern systems are putting out, it's like, okay, how do you take that data and you turn it into an answer so that you know specifically what's happening? And then once I have, if I can get my answers precise enough, how do I then automate based on that? So that I can get to a point of being able to automate as things go on?

Mark, to kind of go on your life and death scenario, it's like sometimes I talk about this from a self-driving car's perspective. Which is it's a car needs to observe everything that's going on in its environment in real-time to kind of make it happen. What's it like outside, what's the speed limit, where am I on the road? Are there other issues to deal with? But then it needs to make decisions, and it needs to make decisions with precise accuracy.

In order to automate, you need to be able to make decisions with precise accuracy. You can't approach a crosswalk in a self-driving car if that day ever comes, and be unsure whether it's a shadow or a pedestrian. You just can't and you need to get down to that.

It's no different than IT, and it's no different in the observability space. Which is if you're going to automate remediation and allow people to innovate, that's going to have to happen with very precise root cause and a positive AI that's kind of underneath it and those types of things.

Mark: Well, that's a great example of that, kind of putting in it context so everybody can understand.

Monitoring Versus Observability

Mark: Carolyn, if it's okay with you, I know that Mike started tapping into this whole DevSecOps concept and I wanted to ask a question about that. Maybe you could peel in and back a little bit further, Mike. And so in a recent article by Dark Reading, you stated that today's rapid pace of innovation coupled with the complexity of modern software development has elevated the need for automated orchestration.

Mike: Yes.

Mark: Can you talk a little bit about this and how do you see this changing for us?

Mike: Yes. I remember the entire context of the article. But I certainly kind of understand the subject and kind of what we're talking about that way. This complexity curve is not going to stop. As we go from monolithic architectures to cloud architectures. As we go to containers and microservices, as we go to multi-cloud, as we go to huge scale. These systems, we go to change that just does not stop. It's kind of a constant change.

These systems are all generating immense amounts of data. Both in the variety that they're generating, the volume that they're generating in the speed at which they're doing it. Basically what it says is things have to change in the way that you manage your systems.

We started at the top of this as monitoring versus observability. That's a good example of we just need to think, kind of change our mindsets as we're going to go through that. You have to change the way that the teams work as well. And that is getting the teams from reactive, "Hey, I've got a problem. How do I go fix it?"

Observability Data

Mike: To proactive looking and observability data, and anticipating what problems are going to come up and how do I address them before they impact end-users. Otherwise, people would just be completely buried and there'll be...