Episode 52

UK Home Office: Metrics Meets Service with Dimitris Perdikou

Dimitris Perdikou, Head of Engineering at the UK Home Office, Migration and Borders joins Carolyn and Mark to discuss the innovative undertakings of one of the largest and most successful cloud platforms in the UK. With over 3,000 technical users, and millions of end users, Dimitris sheds some light on his experience with SRE, User Experience, and Service Monitoring.

Episode Table of Contents

  • [0:21] Inside the Massive Programs That the UK Home Office Offers
  • [7:00] The Importance of Observing Cost Efficiency
  • [12:25] The Monitoring Pack of the UK Home Office
  • [17:59] UK Home Office Take on a Good User Experience
  • [24:09] Why UK Home Office Didnt Have to Reinvent the Wheel
  • [30:20] Let the Experts Do Their Job
  • Episode Links and Resources

Episode Links and Resources

Transcript

Carolyn:

Welcome to Tech Transforms Sponsored by Dynatrace. I'm Carolyn Ford. Each week, Mark Snell and I talk with top influencers to explore how the US government is harnessing the power of technology to solve complex challenges and improve our lives. Hi. I'm Carolyn Ford, here with Mark Snell. Hey, Mark.

Mark:

Hey, Carolyn. Good to see you.

Carolyn:

You too, and today, guess what. We're talking to somebody from across the pond. Today, we're welcoming Dimitris Perdikou, Head of Engineering at UK Home Office. Dimitris, did I just slaughter your name, or did I get it right?

Dimitris:

No, that was good that time. Thanks, Carolyn.

Carolyn:

Okay. All right. Dimitris is leading one of the largest and most successful cloud platforms in the UK, with over 3,000 technical users and millions of end users, so we're really looking forward to getting some insights today on observability, SRE adaptation, maybe like how he navigates things like user experience. Welcome to Tech Transforms, Dimitris.

Dimitris:

It's great to be here.

Mark:

Hi, Dimitris.

Dimitris:

Hey, Mark.

Carolyn:

So, let's first start out with, for all of us on the American side, tell us a little bit more about your role as the head of engineering and migration and borders. You know, what... Well, I'll stop there and just let you talk.

Dimitris:

ers part, and more [inaudible:

Carolyn:

You got a lot of people you're managing. This just gives me even more appreciation for the fact that you took some time to talk to us.

Dimitris:

tories, to see how [inaudible:

Mark:

As you all continue to grow in your expansion of the platforms in the UK, can you talk about some of the technology that you're introducing, some of the new tools that you're applying onto this?

Dimitris:

through instead of [inaudible:

Dimitris:

And I think particularly, in that monitoring observability space is really exciting at the moment. There just seems to be more products than I can get my head around every day. I was just talking to Gartner a few minutes ago, about getting their insights as well, into that. We're obviously looking at, where we can, moving to more managed services as well. I don't really want to be managing services that I don't have to, if there's a good offering on the market, but often, that comes out as a cost, and there might not be an open source variant, so we're always battling with the is it worth going with something open source versus paying someone to take that pressure away from us.

Mark:

Well, you mentioned observability. Can you talk a little bit more about the roles that technologies like artificial intelligence or machine learning, anything observability or playing when it comes to the work for migration and borders?

Dimitris:

Yeah, sure. We've actually got quite a mixed observability stack at the moment, from some older tools like Zabbix to traditional login like ELK, but then we've got quite a mix at the moment, in that monitoring, observability, and alerting space, around your big open source, Prometheus, and then one of the leading licensed products, which is Dynatrace, so... And there's always a bit of a challenge, I think, between those two. Dynatrace definitely gives us a lot, and I've really started to be impressed with over the last few years, the AI machine learning there. I remember when I first went to implement it, I got a bit taken back with our security team about the amount of access it needs, because particularly, it's almost got to access half the estate.

Dimitris:

across the estate [inaudible:

Mark:

How are those technologies holding up to the scale that you need to deliver for such a huge audience?

Dimitris:

Yeah, sure. I think when we first moved to it, because we were also quite big in the FinOps space as well, and saving money, one of the big things we do is we shut down most of our test environments every single evening and on the weekend to save money. And I think when we first did that, some of our AI ML was really confused, because it couldn't work out what was going on. They were being trained for a production system 24/7, how that works, so it saw something disappear overnight. It wasn't really sure how to react to it. I think so far, it's come a long way now. I think they've been adapting the models and training them, and getting them to learn... They just learn faster, whether they actually learn what's happening every evening when they get shut down.

Carolyn:

That's interesting. Mark, do you know, is that a practice in the US? Like, US agencies, to shut down test environments every night?

Mark:

I don't know. I mean, I imagine it's agency by agency.

Carolyn:

Yeah.

Mark:

I don't think there's any sort of unified approach on that.

Carolyn:

Yeah, so you guys do it to save money, Dimitris?

Dimitris:

, and there's some [inaudible:

Carolyn:

Oh.

Dimitris:

They've only got so much capacity in their data centers anyway, and so I'm sure the more customers they can get impressed the better.

Mark:

That's kind of interesting, because I thought that's counterintuitive to what I think they would want to do. I would think they'd want to increase utilization, but that's great that they do that.

Dimitris:

e money like that. [inaudible:

Carolyn:

That's [inaudible:

Dimitris:

... papers out there.

Carolyn:

That's awesome, but it's a lot, man. I can't even shut my computer down every night. It's too much to ask.

Dimitris:

it's costing money [inaudible:

Carolyn:

Okay, I'm going to rat hole for a second, because the carbon footprint, you just totally piqued my interest. Have you been able to monitor the carbon footprint since you started shutting down the test environment, and how big of an impact has it made?

Dimitris:

We haven't given it that glance yet. We did it from a financial optimization benefit. The carbon footprint, I would say, has only really become big discussion the last year or two. All the big, so your Azure, GCP, and Amazon, they've all started bringing out dashboards over the last few years, and that's one of my topics of reading for this year actually. I haven't actually gone and installed it yet, but I know they've got a dashboard that shows you roughly what your carbon emissions, basically usage is, and how you track over time.

Carolyn:

That's so cool.

Mark:

Dimitris, you just made everybody on both sides of the political aisle happy.

Carolyn:

Okay, but I also love money. Can you tell us about how much money it saved you?

Dimitris:

from shutting down [inaudible:

Carolyn:

This is... Like, we could end the podcast now. I'm so happy to hear about this, and it's new to me. I want to switch gears though, a little bit, and talk about site reliability engineering. This is a term, SRE, that's fairly new to me. I started hearing it... I mean, I'm not in engineering, so I started hearing it about a year ago, and it's become, at least for me, a little bit of a buzzword, but definitely really important over the last few years. So, can you talk about how SRE is being utilized and adapted across industries?

Dimitris:

Yeah, sure. I have to admit, I mean, I heard about it, I think about three years ago now.

Carolyn:

Oh, I don't feel so bad.

Dimitris:

of the things that [inaudible:

Dimitris:

So I helped form the SRE team within our department, I think it was about three years ago now, and we've been experimenting exactly how to run this. We initially ran it with kind of a two-pronged approach in a sense, so one focus on writing down the best practice and really getting to the bottom of that. What does good monitoring look like? What does monitoring your service as opposed to the infrastructure look like? And what does a good RCA template, if you followed it, look like? And then, the second prong was about working with each of the application teams across the department, and really seeing if they've met those standards or had any feedback into improvements they were expecting.

Dimitris:

ve been looking to [inaudible:

Carolyn:

No, everybody's baby is the most beautiful.

Dimitris:

Exactly.

Carolyn:

For sure, but okay, what you just described, and Mark, I... What you just described sounds a lot like user experience, which is a big push here in the US, and in fact, an executive order came out... Has it been a year? Over a year, Mark?

Mark:

It's been about a year, but it was vague in what it was saying, but it generally was saying, "Look, you need to improve your user experience, whether it's internal or external."

Carolyn:

Yeah, and it sounds like your SRE team has figured out like, "All right, here's what we want to monitor, and here's how we're going to determine whether or not we've improved the user experience." Am I simplifying SRE too much?

Dimitris:

ser experience, so [inaudible:

Carolyn:

The user like me, the end user?

Dimitris:

Yeah, the end user.

Carolyn:

Okay.

Dimitris:

So traditionally, in traditional monitoring, a few decades back, someone would have just said, "How much CPU in your server is running? And if it goes above 80% or 90%, then throw me an alert," or memory, or whatever that metric might be. Now, we're saying, "Is the user getting a response time in X number of milliseconds?" And if that changes, yeah, you need the supporting information. Maybe the CPU or memory's gone up, but that's not the important thing, because your CPU might be bouncing around all the time, and it's really hard to get that right. And if the user's still getting a good user experience, that's what really matters at the end of the day.

Mark:

Well, that's really cool that you took the initiative to set these standards. Is that kind of... Is it going across other entities of the government, or just within your organization?

Dimitris:

We only focus within our department, but we're trying to put together some best kind of... join up the SRE teams in the government departments. I think in the UK, we're a bit more... It's a bit different, I think, than the US, in that we've got a central government department as part of our cabinet office called GDS, the Government Digital Service, who is kind of the front door for user experience to all of the particular public, and they maintain what we call GOV.UK, which is where a lot of people go to find information, then get linked out to the services from other departments. One of the things they do is they also distribute some of the standards for the wider government, to try and encourage people to follow, and they were some of the first, I think, in UK government that had started with this SRE journey.

Carolyn:

So as Mark mentioned, our user experience executive order was vague, just said, "Go make the user experience better." So I would be curious... I would love to know some of the other things that you guys monitor besides CPU usage. Like, are there two or three others that would be at the top of the list, that you're like, "If we can nail this, we're delivering good user experience"?

Dimitris:

etic transactions. [inaudible:

Carolyn:

I'm sorry. What does that mean? Like, the synthetic word gets thrown around a lot, and I'm like, why do you say synthetic? Why don't you just say testing?

Dimitris:

well. Maybe that's [inaudible:

Dimitris:

And then the good thing is, you see more modern applications, even if they're doing an API call, so it might not be a system that is user focused or public focused. It might be a system-to-system transaction as well, which is two APIs calling each other, and you can do the same thing, because an API call is actually very similar. Every time you call that API for another service, you're expecting the same kind of response back in the same kind of time. So you can use that throughout your-

Carolyn:

so basically, what you're saying is it's exactly what it says it is. It's not a person doing the test. I think I could have figured that out if I would have just thought it through. Okay, so you're using the synthetic testing, so what else are you looking for to determine whether or not you're having a good user experience?

Dimitris:

kind of quality of [inaudible:

Mark:

So Dimitris, are these self-imposed or self-created SLOs that you have put in place within the organization, or are they actually formalized with your customers, and you're customers say, "Hey, these are the things that we want"?

Dimitris:

They're self-imposed. I think obviously, there's some are backed by certain legal aspects of how the uptime of some of our systems, particularly the more critical ones, but they're largely self-imposed. I think I suppose just giving it a focus from the get-go of an application is the most important thing, because it starts changing everyone's behavior and approach to how to deal with the systems.

Carolyn:

So, do you think that there are improvements to be made when it comes to SRE development, and what would those improvements be?

Dimitris:

to web DevOps and [inaudible:

Dimitris:

I think the big one for us that's changed is there's a lot of organizations that have been around for a long time, and not your new startups, that have a lot of enterprise systems and a lot of traditional service management, that are going to have a bigger change when they're moving from monitoring the service. Just this week, I was... I've been challenged by someone who, they weren't able to monitor the exact same metric with a new tool, and I was like, "But, why do you need to? Are you measuring the service or the metric?" And they were still challenging that, as if the way they'd always done it, that's the way they need to continue doing it.

Carolyn:

What a good question.

Mark:

That's a great question.

Carolyn:

"Are you monitoring the service or the metric?" That's a really important question. And they couldn't answer? They just said, "This is the way we've always done it"?

Dimitris:

Yeah. And they were like-

Mark:

That's interesting.

Dimitris:

orward with it, so [inaudible:

Mark:

Are they open to looking at a new way?

Dimitris:

I hope so. That's my mission for a meeting later this week.

Carolyn:

All right, let's talk FedRAMP, Mark.

Mark:

Well, I wanted to get your thoughts around something, Dimitris, and security is a very important issue for everyone. In the US, as government agencies move to the cloud, move all their workloads to the cloud, they need to adhere to certain standards and controls, control requirements. We call that FedRAMP, and that is a platform that is really kind of driving how people are moving to the cloud. Do you have a similar type of system in the UK?

Dimitris:

Yes, I've just been trying to understand a bit more about FedRAMP, actually. It seems like quite an interesting concept. As far as I know, we don't have anything as enforced as that, but we've got, actually, a dedicated department, NCSC, the National Cyber Security Center, that focuses on cybersecurity all across government, and really help all the different departments. I'd actually highly recommend people go and have a look at their blog, and if you just Google NCSC government UK, something like that, then I'm sure you'll come across it, which gives a lot of best practice, and they're recommending it, because they're not only looking at how we do this as government, but how private organizations should be doing it as well, which is quite useful. So a lot of the best practicing guidance is used across there. Obviously, as a government department, we work very closely with them to talk through how we're doing things, and how we're implementing things, and how we secure it, and so on, and help... The recent meetings they've been having, they've really helped us look at what specific areas we want to look at, and how other people in government have solved those problems, so we don't have to reinvent the wheel.

Mark:

Okay, interesting.

Carolyn:

Mark.

Mark:

Are you working with all of the major cloud CSPs?

Dimitris:

We are largely on Amazon at the moment, but we're also building our Microsoft Azure footprint as well.

Mark:

Oh, yeah. Got you. Okay.

Carolyn:

So Mark, I'm going to ask this question out loud. Who is behind FedRAMP for us? Is it Congress? Who enforces it?

Mark:

Well, no. Well, actually, the FedRAMP, they actually have a PMO. It's the government who is enforcing that, so yeah. And they're controls and requirements that all vendors need to adhere to, and to be authorized and accredited on before they can deploy into the cloud.

Carolyn:

How long has it been around? Like, I mean, it's been around as long as I can remember, as long as I've been in this space, but is it like 15, 20 years?

Mark:

I don't know, but it's been around quite a few years, but the FedRAMP Authorization Act was just signed into law recently.

Carolyn:

Huh. Okay.

Mark:

That's kind of big news, because they've been trying to do it for years, but Congress, they would never pass it.

Carolyn:

Yeah. Well, I'm just thinking about what Dimitris said about NCSC, so it's being driven by the cybersecurity side of things. Would you say that's the big impetus around FedRAMP for us, is cybersecurity?

Mark:

I think it's GSA.

Carolyn:

Okay. Okay.

Dimitris:

Sorry, what's GSA?

Mark:

Edit this out, if that's not right.

Carolyn:

I absolutely won't. GSA. Okay, what does that stand for, Mark? We should know this.

Mark:

General Services Administration.

Carolyn:

There you go.

Dimitris:

ose at some point. [inaudible:

Mark:

I think it's pretty extensive, but yeah-

Carolyn:

And it's public, right? Like, the-

Mark:

Yeah.

Carolyn:

I'll find a link, Dimitris, and I'll send it to you.

Dimitris:

All right, thank you.

Carolyn:

Yeah. All right, I'm going to... Mark, do you have any more questions for Dimitris, serious ones before we get to the questions that I care about?

Mark:

Serious ones? No.

Carolyn:

Okay. Dimitris, I'm going to give you the last word before we move on to the fun questions, but do you have any other thoughts that you would like to share with our listeners?

Dimitris:

Elastic, New Relic [inaudible:

Carolyn:

. We don't need to [inaudible:

Dimitris:

at's how we're now [inaudible:

Mark:

Well, that's interesting you bring that up, because you know, we call that do it yourself here in the US, where there are government agencies that have pockets of teams that want to build things themselves. And I know that there are positives and negatives about doing it that way, but the fact that now you have to maintain and develop and keep these codes, code streams up to date, that's a challenge. You mentioned development time, and freeing up your developers to do other things. I think that's a big issue, so it's kind of a commercial in a lot of ways, for COTS type technologies, you know?

Carolyn:

Yeah.

Dimitris:

to find the people [inaudible:

Carolyn:

There we go.

Mark:

That's right.

Carolyn:

We just got the sound bite right there, Mark. Well, and I mean, I'm thinking about like growing up. I grew up on a farm. We had a well, and my dad didn't know how to fix wells. And the well would go out, I swear to you, every time he would leave town, and so we would be without water, like no water. And he would always tell my mom, "No, I'll fix it when I get back." So sometimes, we were without water for like two or three weeks, and I will tell you what. He always called in a specialist in the end, to fix the damn well. So, I'm just, you know, thinking about how just what you said, Dimitris, like let the experts do that. Give time back to your developers. Focus on the high-stakes tasks.

Dimitris:

e... I've actually [inaudible:

Carolyn:

Yeah, and when you do that, guess who suffers. Me, because I don't have water for three weeks. It's about the little people. All right, we're going to go to our tech talk questions, which are just some fun, quick-hit questions, and if there's one that we ask you that you don't want to answer, do it anyway. I'm just kidding. Just say you don't want to. So I'm going to ask the first question. If you could wave a magic wand, what would you wish for in technology? What would you bring into being?

Dimitris:

That's a hard one.

Mark:

This will tell us if you're a Trekkie or Star Wars.

Carolyn:

It's very important. Think carefully, Dimitris.

Dimitris:

ing in technology? [inaudible:

Carolyn:

Anything.

Mark:

Anything.

Carolyn:

Anything.

Dimitris:

I don't know. My imagination isn't that good at the end of the day.

Carolyn:

Come on. You don't want to teleport?

Dimitris:

kind of thing. So [inaudible:

Mark:

Reading minds. That's dangerous.

Dimitris:

Yeah, exactly.

Carolyn:

You want to read minds?

Dimitris:

but just something [inaudible:

Mark:

That can get you in a lot of trouble in your relationships, Dimitris.

Dimitris:

Yeah. No, you hear those tips like, "Oh, someone moves in a certain way," or they move their arms in a certain way in a meeting. I never remember any of those things, so if something can just tell me that all the time, so I can-

Carolyn:

What about like a heads-up display?

Dimitris:

Yeah.

Carolyn:

"Okay, she just crossed her arms. You need to walk away."

Dimitris:

Yeah, exactly. Something like that.

Mark:

Well, that's been happening for thousands of years.

Carolyn:

All right, Mark.

Mark:

So-

Carolyn:

Oh, go ahead.

Mark:

So Dimitris, what do you think that the next big technology for this coming year is going to be?

Dimitris:

Oh, at the moment [inaudible:

Mark:

Well, you know, around the corner.

Carolyn:

Whatever you want to say.

Mark:

What's the next big thing coming around the corner? Yeah.

Dimitris:

For me, the next big thing, which I'm trying to learn more about, I feel like it's so far beyond my understanding, is the quantum computing. It's been told about for quite a while. I'm hoping... It seems to be picking up slowly in terms of the actual use cases. I can't say I know that much about it. It's probably on my reading list to really get my head around there, but I think when that comes in, it's going to change a lot of how we do things. It's completely different mindset. It's not learning like just a new programming language. I say that like it's so simple anyway. It's going to be a massive change of things. Everyone's announcing things all the time. I just can't work out when it's actually going to make a meaningful difference to how we develop our applications.

Carolyn:

Yeah, I agree. It's kind of scary, too. Like, I mean, mostly I get my information from sci-fi, so it really goes down the scary, dark hole, but I mean, that quantum computing... And it feels like we're right on the cusp, right? We're right on the edge of it happening, so... All right, I'm always looking for something new to read, or watch, or listen to, so do you have any favorites? And it can be work related, it can be just for fun, whatever you want to share with us.

Dimitris:

I'd go for... I'm slowly getting addicted to podcasts. I had not listened to any podcasts until about two years ago. Some of my friends would, I think, be a bit shocked, because they were always trying to convince me. I'd actually go for something called The Happiness Lab by Laurie Santos. She's part of... I can't remember, one of the big American universities. And mostly, it's quite... She's a psychologist by trade, but I supposed in my own, I suppose both personal life and as well as work and leadership, it helped me become a lot more person... focused on other people, managing and leading other people. And some of the tips she gives there, although sometimes it might be more personal focused or work focused, I feel like they overlap a lot, those kind of skill sets, so I'd highly recommend that. She's got quite a back catalog now as well, which I recommend.

Mark:

Well, that's interesting.

Carolyn:

Yeah, that's a good one.

Mark:

Have you ever read The Art of Happiness by the Dalai Lama?

Dimitris:

No. It's on my list of my growing book list on Amazon.

Mark:

Yeah.

Carolyn:

Yeah. It's a good one. In fact, it's on my shelf behind me, so all right. We're going to let you go, Dimitris, because you've been very generous with your time, and I know that you have other commitments, so thank you so much for joining us today.

Dimitris:

Thank you very much for having me.

Mark:

It's a pleasure to meet you, Dimitris.

Carolyn:

Yes. And thank you listeners, for joining us on Tech Transforms. Make sure you share this episode and give us a like, and we will talk to you next week. Thanks for joining Tech Transforms Sponsored by Dynatrace. For more Tech Transforms, follow us on LinkedIn, Twitter, and Instagram.

About the Podcast

Show artwork for Tech Transforms
Tech Transforms
Tech Transforms talks to some of the most prominent influencers shaping government technology.

About your hosts

Profile picture for Carolyn Ford

Carolyn Ford

Carolyn Ford is a passionate leader, doer, adventurer, guided by her father's philosophy: "leave everything and everyone better than you found them."
She brings over two decades of marketing experience to the intersection of technology, innovation, humanity, and the public good.
Profile picture for Carolyn Ford

Carolyn Ford

Carolyn Ford is passionate about connecting with people to learn how the power of technology is impacting their lives and how they are using technology to shape the world. She has worked in high tech and federal-focused cybersecurity for more than 15 years. Prior to co-hosting Tech Transforms, Carolyn launched and hosted the award-winning podcast "To The Point Cybersecurity".