Follow me:
Listen on:

Day Two Cloud 136: The Role And Responsibilities Of A Kubernetes Operator (Sponsored)

Episode 136

Play episode

Today on the Day Two Cloud podcast we examine the role and responsibilities of Kubernetes operators; that is, the humans in charge of running Kubernetes. An operator does more than just update the cluster and make sure nodes don’t go down. Kubernetes operators have to support application and security teams, handle capacity planning, keep an eye on versioning, and more.

This episode is sponsored by F5’s NGINX team. Our guests are Jenn Gile, Sr Manager of Product Marketing; and Brian Ehlert, Sr Product Manager.

We discuss:

  • Responsibilities of a Kubernetes operator
  • When this role makes sense in a Kubernetes adoption cycle
  • Essential skills and background for this role
  • Reasons to bring a Kubernetes operator into the organization
  • The need to understanding networking and infrastructure
  • More


  1. Kubernetes operators are an important part of getting Kubernetes into production and keeping it there
  2. To be a good K8s operator, you need to really understand Kubernetes networking concepts and tools.
  3. NGINX’s Microservices March 2022 program is a 1-month free program on Kubernetes networking that can give you a good foundation and help you decide if this is something you want to pursue further.

Show Links:


NGINX Microservices March 2022 Educational Program


@GileJenn – Jenn Gile on Twitter

@BrianEh – Brian Ehlert on Twitter



[00:00:05.210] – Ethan
Welcome to Day Two Cloud. Today we got a sponsored show coming at you from F5 Networks, the NGINX team within F5. And we are talking today about Kubernetes operators, the humans, the people that would be titled Kubernetes operators. What did you think of this show, Ned?

[00:00:21.420] – Ned
I thought it was really interesting how they approached what an operator might be responsible for. And it’s more than you might think. It’s not just updating the cluster and making sure nodes don’t go down. They really dug deep into the fact you have to support the application teams and the security teams and be very mindful of versioning within the Kubernetes cluster. So that’s what jumped out to me.

[00:00:42.750] – Ethan
Yes. Please enjoy this conversation. Please enjoy this conversation with Jenn Gile, senior manager of product marketing at F5 Networks for the NGINX team, and also Brian Ehlert, senior product manager. Brian and Jenn, welcome to Day Two Cloud. Thank you for joining us. Jenn, first question to you. What is a Kubernetes operator? Because we’re talking about a human being, right?

[00:01:05.120] – Jenn
Yeah. We’re not talking about the machines. We’re talking about the people. And so the Kubernetes operator, which we’ve seen, called a lot of different things. You might have seen it as a cluster operator. It’s kind of analogous to what we would have called a sysadmin in the precloud era. So we’re seeing a lot of different actual job titles for people who are in this domain. It can be a cloud architect, it could be an SRE, maybe there’s a general It Ops title. They’re probably part of a larger org maybe they call a platform Ops team. And so in terms of the job description, we’re looking at people who are responsible for Kubernetes as a piece of infrastructure. So they’re going to be helping other teams run Kubernetes. They may have planning and monitoring responsibilities, scaling, resilience, larger tasks, anything that’s going to be helping the business run Kubernetes.

[00:02:02.730] – Ned
Okay. So the difference would be they’re not deploying applications necessarily in the cluster and babysitting those applications. They’re responsible for the health management and monitoring of the cluster as a whole. Would that be a correct?

[00:02:15.340] – Jenn
I’m going to say you’re partially correct because it really depends on the size of the team and the company. A Kubernetes operator could be your entire job or it could be part of your job. And so if you’re in a really, let’s say small company, a startup where people do 50,000 different things, you might actually be doing both. In bigger teams. It’s probably your sole job, whereas the people focused on the code are somewhere else.

[00:02:43.770] – Ned
Okay. So that begs the question, at what point in my Kubernetes adoption lifecycle does this role actually become useful?

[00:02:52.590] – Jenn
I would say. And Brian, I’d love to hear your thoughts here, too. I think it really depends on how invested you are in Kubernetes, how complex it is, how big it is. What are you doing with it. How many services, how many teams? Brian, what do you think?

[00:03:09.120] – Brian
I’m going to look at that a little differently, Jenn, as being somebody who grew up in it a long, long time ago. I’m going to say that it’s a useful skill set to have, and it’s a useful focus to have whenever you start your Kubernetes journey. Once your business makes a commitment to Kubernetes, you need somebody in there that has a clue. It’s the same way we looked at virtualization back in the day, right? I mean, back in early 2000s when we were working with VMware without a Gui, for example. But you get into the same type of thing because it’s the skill set, it’s the accompanying skill set. You’ve got virtualization underneath, you’ve got Kubernetes above that, and you still have relationship to storage and networking and compute and everything else and all that goes in there.

[00:04:02.600] – Ethan
So when you say someone with a clue, it sounds like you’re someone who understands infrastructure and all the components, because just because we have abstracted stuff away doesn’t mean we don’t need to understand storage and networking and all of that. Is that what you’re getting at?

[00:04:16.370] – Brian
Absolutely. So there’s definitely a skill set here, especially when you get into problem solving and troubleshooting. Right? Because you still have in the Kubernetes world, yes, you have Kubernetes taking care of things. A pod runs out of memory, it gets recycled. Right. But obviously somebody has to realize at some point in time that the resource limit or what’s going on in that pod is a bad thing. And they need to be able to have the skill set to investigate that. And yeah, it’s running out of memory because the resource limit is too tight, because that’s how we set that in the Kubernetes world, where the VM might be fine, the node might be running along perfectly happy and not have a clue in the world, but the pods crashing all over the place, or a storage driver might be misbehaving or something like that. So you still have to have the skill set in order to be able to troubleshoot these things and to kind of look at it as a system. Because you’re not just looking at one specific thing, you’re not just focused on a single application. There’s more to it.

[00:05:18.490] – Ethan
You’re still managing an infrastructure stack, an application delivery stack of things that happens to come under the common heading, the umbrella of Kubernetes. But all those components that make up that stack are still there. There’s one fine distinction I want to understand here, though. As we define this Kubernetes operator role. Will that operator be deploying applications, writing the YAML or Kubectl commands, etc. All that are going to deploy the app, describe it, and so on? Or will that person be strictly focused on cluster management? How many nodes do I need to have in the cluster to handle the load and so on?

[00:05:53.530] – Brian
We see both among our customer bases. So generally it’s involved in all the above, right? There’s portions of the configuration that are handed off to the app team as we think about the Kubernetes API and all the various objects and role based access control and whatnot there’s certain aspects of configuration that are given off to the app team, but a lot of the operators that are responsible for the infrastructure, they’re also responsible for deployment onto the infrastructure. And that’s just a common pattern that we generally see among our customers.

[00:06:29.250] – Ethan
I ask that partly in the context of the certified Kubernetes administrator program, because part of that training is you’re deploying apps. That’s part of what they’re teaching you to do.

[00:06:39.390] – Brian
Like I say, you get down into other aspects of that and some shops dole that out. So as you get into specific aspects of networking, there might be certain aspects of ingress configuration or something like that that the app team might own because they might own the pathing and the rewrite rules and some other things. But yet the deployment that’s owned by the operator individual who is also part of the same team of operators that’s responsible for keeping the platform healthy.

[00:07:08.130] – Ned
Okay, so it’s not just a matter of knowing how Kubernetes at a cluster level works, and it’s not a matter of having some general knowledge of networking and storage. You also have to have some domain specific knowledge around writing YAML and deploying manifest whatever other technologies they might be using, whether it’s helm or something else to get that deployment done.

[00:07:30.330] – Brian
And it’s interesting when you talk about apps and deployment, because most of the infrastructure bits that manage Kubernetes run on Kubernetes. So it’s like an inception thing, right? So as you add infrastructure pieces to Kubernetes, guess what that’s an app that you’re deploying on top of Kubernetes to manage Kubernetes to a certain degree. So yes, you’re going to run into all this stuff anyway, even though it might not be the customer facing application or what a Dev team is producing or some other thing that you’re doing there that’s paying money for the business, right? This is part of what you need to keep the world running, right?

[00:08:09.940] – Ned
It’s turtles all the way down.

[00:08:11.570] – Brian

[00:08:12.470] – Ned
It reminds me a lot of back in the VMware days. We were all there when I first started using it. Your V Center would run on a separate physical system, but then eventually everybody moved their V Center inside the VMware cluster. And that was like scary because now you are running the thing, that was managing the thing in the thing. And I was like. Oh, no!

[00:08:32.690] – Brian
Just don’t let it my live migrate itself and you’re okay.

[00:08:38.930] – Ned
That’s never happened to me. Anyway. So what additional skills would you imagine someone who is looking to become a Kubernetes operator? What additional skills should they pick up or focus on?

[00:08:53.210] – Jenn
We see a lot of people focusing on what we’ve talked about so far, building the apps and everything. There’s the hidden stuff, not as flashy or fun stuff that sometimes we call the plumbing. It’s the networking parts. It’s your traffic management stuff. It’s your visibility and monitoring. And Brian like, you said it’s not quite the same as it would be outside the cluster because you’re probably dealing with YAML and you’re dealing with Kubernetes native tools and they don’t perform in the same way necessarily, or they have different names. Right. So, for example, if we’re talking Kubernetes networking and traffic management tools, there’s this little thing called an Ingress controller, and why they decided to call it that is a longer story, but in kind of layperson terms, it’s a fancy load balancer. And so you need to know having all that knowledge of using a load balancer outside of Kubernetes is definitely going to serve you because you can be thinking about the load balancing algorithms and the traffic that goes through it, but also understanding how that connects into the services and what kind of information that you can get realizing that you’re looking at L7 traffic management at that point.

[00:10:15.490] – Jenn
And so often you’re going to have a similar skill set but a new vocabulary.

[00:10:22.350] – Ned
Right, one thing I want to focus in on a little bit is the visibility and observability tools that you need to have for Kubernetes because it is a different paradigm. And the example Brian, you were giving of a Pod keeps recycling itself because it’s running out of memory. You need a way to pick up and realize that’s bad.

[00:10:41.390] – Brian

[00:10:42.130] – Jenn
Undesirable behavior. Right.

[00:10:45.650] – Brian
And there’s other behaviors that happen. I mean, back in the day, virtualization Admin, the skill set was an art form. It was more art than science because there were things you knew. An app was behaving badly and you’re in a terminal server situation. Right. You kind of picked up these things and they were self observed skills. It’s like CPU throttling in a Kubernetes environment. Only the Pod knows that it’s being CPU throttled right. Outside of that Pod, nobody has any clue that these things are going on. So a lot of it is you have to get the visibility, but then you have to learn and understand enough that you know what to look for. You know the actual signs to look for, to understand what the symptoms are.

[00:11:34.450] – Ethan
It’s still CPU and memory, isn’t it, Brian?

[00:11:38.910] – Brian
There’s limited resources, right. It always comes down to the limited, check the physical layer first. That always applies. But we’re looking at it different, right. So we’ve got physical hardware underneath. We abstract that away with virtualization. Now we pile Kubernetes nodes on top of that, and we’re running containers. So we’ve got all these different layers of abstraction that are going on there. We’ve got networking level abstraction at two levels. We’ve got hardware level abstraction at three different levels. And then we get into the complexity of you start to look at a node as a system and how one pod actually impacts another pod or some, or just the load on the system as a whole impacts everybody else that’s on there. So it’s part of this whole art form. And I do call it an art form because I know that there’s so much you can learn through certification and whatnot. But then there’s just the hands on that you have to do these things and you have to have some understanding of the world around you, and that’s where you get into the art form and the skill set that I think really makes folks excel.

[00:12:47.540] – Ethan
Now, if I am a competent network engineer and I bring that network engineering experienced to the Kubernetes domain, is that going to serve me well? Or is it more like I need to unlearn so that I can think correctly about how Kubernetes moves packets around the cluster?

[00:13:02.950] – Brian
It’s different. They still move around the cluster, but it’s different. It’s just like the word kubeproxy. It’s not a proxy. Everybody gets stuck with the word proxy in their head, but it’s not a proxy. But the basic skills apply and the basic rules apply. But how you go about troubleshooting, it is different. It’s just fundamentally different because of where you are.

[00:13:27.460] – Ethan
So I need to take my networking knowledge and then take some time to learn how Kubernetes moves traffic around the cluster, how it flows from client into. Well, the ingress controller, whatever the term is, into the container that’s actually going to serve up content and then back out. Once I understand those sorts of patterns, life is going to be better for me.

[00:13:49.740] – Brian
Yeah. To start out, to start out with. And then you get into the complexity of service meshes and ingress controllers and how you expose your services. And people start to talk about egress rules, egress now and whatnot. So it all plays.

[00:14:03.830] – Ethan
Now, do I have to get into tools like Project Calico? And there’s several other different projects that could be delivering networking services for Kubernetes. Is there something like that I should focus on or just start with what Kubernetes gives me out of the box?

[00:14:19.290] – Brian
If you’re just starting with Kubernetes right, go with what you get out of the box. I look at when you get into the CNI, when you get into things like Calico and Flannel, that’s where the networking team, the folks that are not the Kubernetes platform team, is now getting involved in Kubernetes as a stack. And they want to get greater value out of Kubernetes as a stack from a networking perspective. So you can get into the more complex things that projects like that do. It’s just like Jenn mentioned, ingress. You can start Kubernetes without ingress, but at some point in time, somebody’s probably going to want the flexibility of header rewrites and path re routes and a few other things. So then you introduce ingress. And I think that’s one of the great things about the Kubernetes world is you can layer in the tools AKS you need them.

[00:15:11.060] – Ned
Something else that jumps out to me in the world of Kubernetes and the skills involved. Kubernetes itself is an orchestrator. It likes to do things in an automated way. And so as an operator, I got to imagine that understanding automation and being able to wrangle that is going to be huge for part of your job.

[00:15:32.610] – Brian
It’s all workflows. I mean, workflow engines were all the rage 10-15 years ago. Everybody had a new workflow engine, right. And we get to it now. And you look at the Kubernetes system and we take it for granted that it is a whole huge amount of coordination under the hood as this eventually consistent system is. You define what you want through YAML, and then Kubernetes just makes it happen for you. Some people are like, it’s magic. Some people don’t take the time to ever understand what’s actually happening. They just know that they get their results out at the end. But sooner or later, somebody has to understand what’s going on there.

[00:16:11.140] – Ned
Now, there’s a lot to learn here. Obviously, Kubernetes is a wide and vast sea that you could get lost in pretty easily. Do you have any recommendations? If someone is just starting to learn to be a Kubernetes operator, they have that goal in mind. What are some ways that they can gain additional knowledge?

[00:16:31.950] – Jenn
I’m a big fan of use case based learning, thinking about something that you want to achieve and going out studying that thing and implementing that thing because you can look at theory all day. But if you’re not putting it into practice, it’s nice, but it’s not going to help a whole lot. So let’s say you’re probably not starting off with a blue green deployment, but that’s the first thing that popped into my head. So if that’s what you’re starting off with, great focus on studying that part, focus on all the components that go along with it. And for now, maybe don’t worry about something else. But there’s still going to be those fundamental things that you do want to study. I would say starting with the basics of what Kubernetes networking is and how it’s different from more traditional networking models. What’s a node, what’s a cluster? How do these things move around? How are they connected? What are the ways that I can move things around? So that really good foundational knowledge before getting into the use cases.

[00:17:34.780] – Ethan
So this use case method that you describe, it’s pretty common. A lot of us do that. We have some problem to solve, some itch to scratch, and so we’ll dive in and make the technology solve this problem that we have okay. With Kubernetes. Most of us that are going to be trying to scratch that I need to learn Kubernetes itch will come up with a project that’s not going to be some massive scale thing. And so is that going to be good enough? Like, say, I want to use minikube on my laptop to learn that? Is that adequate to teach me what I need to know? Or should I really be looking at multiple physical nodes so that I have a real world sense of how Kubernetes does what it does?

[00:18:09.430] – Jenn
And I think something like Minikube is a great first step. If you don’t have access to those multi node clusters, immediately, start with what you do have access to. But let’s pretend you have the job already. And this is the case for a lot of people we talk with is they implement Kubernetes and they’re like, hey, you look like you should be able to do this, get in there. And so there is that on the job learning of what is available. But if you are thinking about architecting Kubernetes long term, you’re probably thinking, okay, today I only have a single cluster, tomorrow I might have five. And so getting into some of those practice environments with multiple clusters is going to be key for sure.

[00:18:54.590] – Ethan
Now we’re kind of implying running Kubernetes on Bare Metal, too. It’s sort of the context of this conversation here. What about managed Kubernetes services that are on the sundry clouds that are out there?

[00:19:05.520] – Jenn
Yeah, I mean, they definitely make it easier to get started in a lot of ways. There’s a lot of plug and play functionality that makes something from a cloud provider Kubernetes to something like Rancher or OpenShift. We have a lot of customers choosing to use those because you can get up and going faster. While they all have slightly different names and processes for things, they all do the same thing at the end of the day, right. It’s all Kubernetes under the hood.

[00:19:36.510] – Brian
They might have different interfaces. For example, you’ll run into that, you’ll run into that too. So it’s always the fundamentals are generally the same. The fundamentals all relate. And it’s just okay, so your company standardizes on one over the other. So you’re just going to be used to seeing their interfaces is really what that ends up playing out to be.

[00:19:55.620] – Ethan
What I’m hearing is rather than start with Azure Kubernetes service, let’s say start with doing it on Bare Metal and then go to the cloud managed service, it sounds like the better way to go.

[00:20:07.390] – Jenn
Yeah, I would think so. And we do see some people choosing to go back to Bare Metal. And so even if, you know, hey, we use Azure, I should just learn Azure. There could be a future where you won’t be in that case, or you may be adopting a different cloud. And so, yeah, Bare Metal is great.

[00:20:23.240] – Ned
Yeah. We recently had a conversation that was talking about exactly that. You may think you’re only going to ever use this one cloud, but the reality is that’s probably not going to be the case. So having a good understanding of the general concepts beyond just what a particular cloud offers you is going to be very beneficial to you.

[00:20:40.440] – Ethan
Do either of you have an opinion on the CKA program.

[00:20:43.950] – Jenn
An opinion or multiple opinions?

[00:20:46.490] – Ethan
I mean, multiple opinions is great. Give us that nuance Jenn.

[00:20:50.730] – Jenn
At risk of slightly repeating myself, hands on is great. Hands on is what’s going to give you the most, both education and value. But that said, we both work with a lot of people at F5 who pursue the CKA and find a lot of value in it, and we see a lot of people in the industry who have it. And so my view is it’s really great to pair with some real world experience of it’s kind of like really any certification where it can help you learn the language, get the high level context before you have to apply it. We hear great things from our colleagues who are pursuing the CKA. A particular thing that gets mentioned fairly often is doing study groups along with the CKA and learning from other people. Super valuable. Brian, I know you have thoughts on this one as well.

[00:21:44.730] – Brian
I have thoughts on certification in general. When I was in Ops, I was all about certification when I was in IT. Right. Because it’s the way you prove yourself to your manager sometimes. Great tool. The CKA. I’ve talked to people that have gone through and passed the CKA, and they’re like, I don’t know how I would have passed that without some actual experience. It’s just a little too hard to be just book learned and theoretical about. Right. You actually have to go through the paces and do the things and actually understand what you’re doing to pass the exam. And from that aspect, I think it’s a really good, rigorous certification.

[00:22:26.830] – Ned
I can second that as someone who has failed the certification, and I was fairly close on it. I just wasn’t fast enough. I think that’s what I learned is you not only need to know this stuff, you need to know it pretty intuitively and be able to type things out quickly in a terminal. And if you can’t do that, you’re going to struggle passing. But the process of studying for it did teach me a ton about Kubernetes in general, so I think it was definitely beneficial even if I didn’t get the certification.

[00:22:56.110] – Jenn
Yeah. Those timed tests are a whole different conversation, right?

[00:22:59.280] – Ned
Oh, yeah. Another thing that we have in our list here is Microservices March. That’s not something I’m familiar with. So, Jenn, can you tell me what Microservices March is?

[00:23:10.700] – Jenn
Yeah, we kicked it off at NGINX last March Unsurprisingly. It’s called Microservices March, and it happens in March. We kept it kind of easy there. And the whole point of micro services MArch is it’s a time when our group focuses kind of our outward programs, resources, et cetera, on a microservices topic and provides that education to the community. And so for this year, the topic we’ve gone narrow is on Kubernetes networking. And the reason we chose that topic is because we do find a lot of the people that we work with, whether it be people using open source or looking at commercial products, they are adopting Kubernetes without that skill set, and it causes problems. Right. The things that we hear from people who are already in Kubernetes are things that any survey will probably tell you as well. There are security issues. It’s complicated. Scaling it is hard. Knowledge is often something that people call out. And so what we’ve done with Micro Services March this year is made a four week free program where people can come and learn the very beginning of Kubernetes networking. What is a node, what is node Port, load balancer, et cetera, progressively up through okay, what do API gateway use cases look like in Kubernetes?

[00:24:37.730] – Jenn
How can I deploy those? What can I do with it? What kind of tools. Spoiler it’s not the same as doing it outside Kubernetes. How do I make my clusters secure and resilient? So that’s not just adding a WAF, but that’s authentication, rate limiting the like and then advanced scenarios, Canary Blue-green. Service mesh. We talk a lot about service mesh, and I think just about any conference you go to these days, it makes up half the Kubernetes topics that are in there, which makes it sound like everybody’s using it. But what we’re going to be talking about more is how do you decide when you’re actually going to get some value out of using a mesh? Because it adds a lot more complexity.

[00:25:23.590] – Ned
Okay. Yeah. Service mesh is something we’ve covered a few times before in the show. And the end result of most of those conversations is, well, we’re not using it yet, but we are thinking about it. I don’t think our use case warrants it, but maybe it will in the future. So maybe it’s a goal we’re moving towards.

[00:25:42.130] – Ethan
Ned, I think one of the titles of one of our episodes was You Don’t Need a service Mesh.

[00:25:46.470] – Jenn
I think I listened to that one. And you’re probably right, most people don’t need one. Right. As were earlier in Kubernetes usage, if you’re not complicated enough, if you’re not doing a lot of automation, you’re probably not going to get the value right.

[00:26:04.260] – Ned
But it’s like you don’t need a service mesh until you do. And being able to recognize that moment when you okay, I have enough problems. And service mesh solves those problems.

[00:26:13.660] – Jenn
Well, and thinking about those problems from day one and thinking, okay, I just launched my first cluster. I don’t want a mesh right now. I don’t need a mesh right now. But what will it look like in a couple of years when I do?

[00:26:26.210] – Ned
Right. And how do you put it in at least? You mentioned one thing that people run into a stumbling block was scaling problems. So what type of scaling problems do you see folks encountering that you’re going to help them understand better during Microservices March?

[00:26:42.110] – Jenn
Yeah. It’s kind of ironic, right? Because that’s the whole point of Kubernetes is to help you scale. Where we see the most scaling problems have to do with limitations based on how they’re handling the traffic moving through the system. So if, for example, you’re trying to do all of your incoming traffic using something like the load balancer object, it’s going to start to have problems at higher volumes and you will be limited in what you can scale. Brian, I think I’m going to pitch it over to you because this is something I know you think about a lot and talked to a lot of customers about.

[00:27:20.390] – Brian
Yeah. You can look at scalability a lot of different ways. We have some customers that are extremely latency sensitive with their applications where every single hop matters, every nanosecond matters, and we have other customers with other types of APIs that it’s not such a big deal. Right. So you get into scalability from different aspects. It’s just like if you deploy certain infrastructure pieces, do I do it as deployment? Do I do it as a daemon set? And what are the implications of doing those things? Do customers allow their clusters to auto scale horizontally? Right. Dynamically add nodes. We have one customer that I was talking to recently. They’re setting up Ingress controller, but they actually want it to horizontally scale from tens of pods to hundreds of pods in order to handle what they’re bursting is. So you get into some of these scaling scenarios and you can kind of slice and dice them different ways depending on what the situation is. But there’s always all kinds of scaling. So we want to maximize what we can do in the pod through the throughput, and then isolate what then comes down to either design. Topology design.

[00:28:41.950] – Brian
This is where we get back to what CNI do you choose? Because that might have some impact. How do you actually model it within your environment? Because that can actually have some impact.

[00:28:56.690] – Jenn
Another side of the scalability has to do with functions, non functional requirements that are built into your services, your apps. We see a lot of people building in authentication and authorization into the app, and that works fine if you have one and you’re not updating it all that frequently, but it can start to cause limitations on how you’re able to develop those apps. And so it’s kind of a different way to think about scaling. But thinking about what are the non functional things that I can offload off that app? I don’t need that app to handle authentication. That can be done by Kubernetes traffic management tools, just for example.

[00:29:38.740] – Ned
Right. That leads into security issues and dealing with various security problems that might crop up. I know that’s something that you mentioned as well. I’m going to throw this one over to Brian. What should people be learning about or mindful of when they’re assessing security in the context of Kubernetes?

[00:29:55.910] – Brian
Oh, my. We can get into so many things here. I’m going to avoid CVEs, but at the same so generally we talk about networking. Right. So we talk about application level security scenarios, but we get into all the aspects of things. Right. So you get into CVEs and base images and platform security and isolation, all the same problems that we’ve dealt with in operations for years. Right. So isolate it and those kinds of things. But you do get into some unique things with Kubernetes when it comes to a security perspective because you can deploy all the same tools. Right. So you can put a WAF at the edge of your cluster if you really want to do that, you can put a denial of service product at the edge of your cluster, as in it’s right there in your cluster if you want to do that. We have scenarios these days where we see more customers moving workloads into the cluster. So something that they might have used a load balancer before as an outside the cluster, they’re taking those functions and moving those into the cluster. So we can talk about security from an app perspective, like Jenn mentioned early on OIDC and JOT authentication and tunneling traffic, and where service mesh comes in with mTLS and whatnot.

[00:31:23.290] – Brian
Or we can talk about migrating functions where you’re reducing your layers as you have through the system. So there’s lots of different ways to think about this.

[00:31:35.170] – Jenn
Lately we’re seeing a lot of people have requirements for end to end encryption that’s becoming pretty prevalent, especially if they are subject to kind of some kind of zero trust architecture requirements. And so thinking about implementing end to end, it’s different. In Kubernetes. There’s more things to get to and getting it between the services. That is when you start to maybe consider needing a mesh for those specialized use cases.

[00:32:05.970] – Ethan
The complexity here is somewhat endless. You said zero trust and my brain started going, and how would I bolt on zero trust to a Pod architecture distributed throughout the cluster? I don’t know what I mean. Yeah, there’s a lot to that.

[00:32:19.230] – Brian
It’s a thing and something that companies are starting to wrangle with. Right. So the zero trust advisory out of the White House recently. So there are areas that are trying to think about zero trust in a Kubernetes world and what that means for the business and then what it means to what they can accomplish when it comes to Kubernetes. Right.

[00:32:41.470] – Brian
Because we have the flexibility to bolt on sidecars, for example, and a few other things because the platform gives us those capabilities. But now we’ve got trade offs with performance because we’re adding hops into the system and you get into some of these things. So how much can you squeak out of this to align with that business requirement over there.

[00:33:05.130] – Ethan
You had opened up this bid on security talking about WAF, Brian. So let’s map WAF functionality onto this architecture you would describe maybe WAF at the edge of the cluster. Does that mean it’s the front door to the cluster, but it’s not a part of the cluster? As long as I can get through the WAF, then my request will make it into the cluster and be serviced or WAF functionally like in the cluster itself?

[00:33:28.690] – Brian
Yeah, well, you can do it a couple of different places, right. So generally we think of applying when we think of it at the edge of the cluster. So traditionally we’re used to WAF being some appliance that some network person takes care of. Right.

[00:33:41.000] – Brian
That’s outside the physical cluster.

[00:33:43.530] – Ethan
I have been so blessed to have done exactly that. Yes.

[00:33:49.030] – Brian
Now that we take the capability of Ingress. Right. And now we can take that WAF capability and move it into the edge of the cluster itself, because that’s basically where Ingress sits, is at the edge of the physical cluster. If you want to think of a traditional network topology kind of model here. And the only thing after Ingress is internal Kubernetes networking within the network. Right. So whatever your CNI is behind that. So we can stick WAF there, we can stick a denial of service product there. And you could still have one way out at the edge to do your major protection. But we can also bring API specific settings and tunings right into the edge of the cluster and apply additional rules and additional protections there. So you might have behaviors that get in through your first level because of the gross level that it’s doing there. But we want something finer grained now, and that’s better tuned to the application right at the very edge of the cluster.

[00:34:50.420] – Ethan
Now, going back to the conversation, what is the role of Kubernetes operator? As we begin to layer on these additional services, the amount we can expect one human being to do it becomes untenable after a while to have them do the WAF and the cluster management and app deployment and so on. And so I’m suspecting some of this is going to be the larger the org the more fine grained. Not just our policies are, but our roles and responsibilities are. And are going to be spread out across multiple people and maybe teams.

[00:35:18.630] – Jenn
That’s where we’re starting to see these larger organizations adopting a platform Ops team model where they have multiple people. And yeah, they may have some more focus. There might be someone who is only focused on these WAF policies and security policies and they’re not doing other things.

[00:35:37.070] – Brian
And you just go back to the Kubernetes API, you might have the security object is the responsibility the security policy object for the WAF configuration is the responsibility of the security team. The operator, the platform operator. He makes sure that the thing is deployed and the WAF is running and whatnot, and the security team takes care of the rest. The application team takes care of that piece. We’re also seeing that, I mean, just the true role based access control model happening through the Kubernetes API. We’re starting to see a lot of that in the larger customer enterprises as well.

[00:36:11.410] – Ethan
And the more fine grained the security policies need to be, the more that becomes a necessity because it becomes a knowledge domain expertise challenge. One of the challenges I had dealing with WAFs is I didn’t build the web app and it was protecting some custom web app we developed internally. I don’t know how to sanitize the field input for this. I don’t even know what the app does something to do with finance. That’s about as much as they tell me. And there was this frustration of having an expectation that, hey, network human, you need to run this WAF. It’s like, well, I can put traffic into the thing, but I don’t know how to build the policies. That’s a domain expertise I don’t have that’s got to be someone else working with me to get that policy delivered. And it feels like very quickly we can get there with a Kubernetes cluster as well as we layer services and services and services on top.

[00:37:00.920] – Brian
Yeah. I mean, security operations is finally starting to become a thing. It’s definitely a specialty that’s starting to show up. And within the Kubernetes world, it’s a little unique just because the way things interplay. I mean, it’s a bit generic, but at the same time, it’s a bit unique because they still need to understand how it applies in the world. Right.

[00:37:21.900] – Ned
And they need to be involved earlier on in the conversations when the design is happening and the developers are working on their application, because that’s when the security team can pump the brakes on something or they can start designing the policies that will go into the WAF that’s going to help the application team.

[00:37:37.590] – Brian

[00:37:38.870] – Ned
We’ve talked to a few folks that are building platform teams or are on one. And one of the things they keep telling me is their main job is to empower the application developers to do things on their own by giving them solid templates. Is that something that you’re seeing out in the wild as well?

[00:37:56.730] – Brian
Yeah. We generally talk to a lot of customers about this as a mode of self service. Generally the terminology that’s using it, they’re enabling their teams, whether that be through custom resource definitions, whether that be through GitOps, and them directly managing. Yaml. Like I said, certain aspects of the configuration. But for larger companies, enabling self service, where we truly become more of a utility, network becomes a utility. Kubernetes compute becomes a utility. Where we think of operations as a utility is definitely a trend, a heavy trend that we see in larger organizations. Right.

[00:38:37.810] – Ned
So set up some guide rails.

[00:38:40.190] – Brian
Exactly. Guardrails, not gates. That whole thing.

[00:38:48.390] – Ned
Beyond empowering the applications teams, are there any other scenarios you can think of that the Kubernetes operator is making things better for the folks consuming the cluster?

[00:38:59.200] – Jenn
Yeah, Kubernetes version management control is a whole thing, and it may not sound like it, but there are typically at least three distinct things you have to think about if you’re going to be changing your Kubernetes version. So what just came out? 01.22? Am I a version behind.

[00:39:20.910] – Brian
23. You are behind.

[00:39:21.760] – Jenn
Good grief. So let’s say I’m going to adopt 1.23. It’s not as simple as just moving to that platform version. I have to look at two other things, at least minimally. So one is going to be the APIs, because with each version, new APIs are added, old ones are removed. And so, for example, if you upgrade the platform without looking at the API, you might break some things. The other area is, what are your tools that are compatible with that platform version? So if you again upgrade the platform, but let’s say your Ingress controller is not compatible with it, you’re hosed. So with the operator, what they can do, or somebody really who’s responsible for keeping an eye on version management. But the operator is really at a great place to be doing it is they’re working really cross functionally to figure out what are all the things that are going to be impacted, who are all the people who are going to have to go in and update resources or policies or whatever the case may be?

[00:40:27.180] – Jenn
And that’s not a one week project. In all likelihood, it’s probably okay. Our goal is to adopt it in a month or two. Let’s make a plan today so they can be really helpful for that.

[00:40:39.760] – Brian
And that’s a really good point, Jenn, because Kubernetes releases on a regular schedule. They release, I think it’s four versions a year.

[00:40:48.050] – Jenn
Roughly quarterly, something like that.

[00:40:51.930] – Ned
They dropped it down to three now.

[00:40:52.870] – Brian
Yeah, they dropped it down to three, but it’s continuously sliding. Right. And then they have their standard deprecation schedule and whatnot. So basically, you’re constantly updating the platform and the Kubernetes API objects are updating right along with you. Jenn brings this up because we recently ran into this with Ingress that graduated from v1beta1 to v1. And it’s like, oh, my gosh, now we’ve got to be compatible with v1. Customers have to be compatible with v1. There was this beautiful crossover period where objects were magically updated in the background through the Kubernetes API machinery that a lot of people didn’t realize. They get this free pass during. But as soon as the removal happens, they don’t get that anymore. And if they didn’t realize they needed to update their YAML files. I realized it might just be a simple version string and everything else stayed the same. They upgrade Kubernetes and boom, all of a sudden it’s broken. So, I mean, it’s just an interesting mix of things that you actually have to pay attention to, unlike the traditional enterprise world where it’s like, oh, yes, that thing is supported on that version for five years.

[00:42:09.570] – Brian
We don’t live in that kind of world anymore very much. Yeah.

[00:42:13.980] – Jenn
Having someone who’s paying attention to even just which versions are supported, you don’t want to end up in a situation where your cloud provider forces you to upgrade and you’re not ready for it. And we’ve seen it happen.

[00:42:28.770] – Brian
We had it happen to some customers. So and so cloud provider actually upgraded our clusters under us, and what do we do?

[00:42:40.870] – Ned
That sounds messy. Yeah. It sounds like if you need to have someone who’s mindful of the version. March of versions and getting in and monitoring what’s compatible and what isn’t, that almost sounds like a full time job on its own. If your clusters are big enough.

[00:42:58.550] – Jenn
Could be, yeah.

[00:43:00.070] – Ned
One thing I want to come back around to is this microservices March because we touched on it briefly. But I want to get a little more detail from you, Jenn, on what’s included in this one month program. Is it a free thing that anybody can sign up for, and what expectations is there of the person taking it? Do they have to attend Webinars or give me the general background on it?

[00:43:23.820] – Jenn
Absolutely. So it’s designed to be a choose your own adventure. There are no expectations. It’s completely free. So if you were to do everything, it’s roughly 16 hours over the course of March, which if you’re kind of planning for that 4 hours a week, that’s manageable. So I’ll kind of just go over what the units are and then what kind of you get to do with each unit what you can choose. So the first week we’re going to be looking at architecting Kubernetes clusters for high traffic websites, and that’s sort of our introduction. If you’re already super familiar with how traffic flows around Kubernetes, maybe you don’t need to go to the first week if you already know how to use an ingress controller and you’ve seen it scale, you’re probably okay skipping it. But if you’ve never done that before, you’ll start the week off by attending a live stream. We’re working with a Kubernetes training provider called Learnk8s, and they’re going to be doing the training part. So you start with a high level webinar, live so you can ask questions. Then after that you’re given a collection of extra reading and videos so that you can decide, you know, I knew all of X, but I didn’t know anything about Y. I want to go learn more about the Gateway API or I want to go learn more about ingress controllers.

[00:44:44.810] – Jenn
You can go do that. I’m not going to tell you which part to go and learn about. And then the third part is going to be a hands on lab where you get to take what you’ve learned and again, this is going back to that whole concepts are great, but without practice it’s kind of useless. And so you get to take that and do a little hands on exercise. And so the first week is going to be deploying an ingress controller, running some traffic through it, watching it scale up and down. Pretty simple, but important stuff. So each week we’ll follow that same format of livestream, reading videos, hands on lab. So the second week is on exposing APIs. How do you deal with API gateways? The third week is going to be looking at microservices security, pattern, sidecars service, mesh, WAF, all kinds of fun stuff. And then the fourth week is on advanced deployment strategies. Basically, you got the whole thing up and running. How do you keep it there? How do you roll out a new app version without taking everyone offline? It includes access to our Slack community, which we’ve just recently stood up.

[00:45:51.280] – Jenn
We’ve been wanting to do it for a while, and we thought this is a really great time to bring in a whole bunch of new people to talk nerdy stuff on our Slack, which is what we want. And so, yeah, basically, full commitment. If you choose to do the whole thing, it’s only 16 hours. Opt into whatever you want very much fits whatever your needs are.

[00:46:12.380] – Ned
Right. The most important part for me is you get hands on experience. It’s not just watching someone talk to you through a screen for 16 hours. That’s not my style of learning.

[00:46:22.610] – Jenn
I don’t think that’s anyone’s, but I could be wrong.

[00:46:26.200] – Ethan
And speaking to that, it sounds like if I get into the Micro Services March program, the hands on lab, that stuff provided for me, or is it bring your own lab?

[00:46:35.870] – Jenn
It’s all free. So we’re working. We’ve done this in a way that really caters more to people who are not necessarily interested in standing up minikube on their machine or going full lab environment. So it’ll be a browser based lab. It’s all self contained. And so in some ways, you can still break it, certainly. But it’s going to be, again, the guardrail terminology. We’re making sure that you’re not going to just have a total fail with something that’s not relevant to what you’re supposed to be learning about. So it’s kind of that dip your toe in the water. We will offer a version of it where if you want to go stand it up on minikube on your own, have at it. You can do it completely outside that environment.

[00:47:21.620] – Ned
Okay. And if folks want to sign up for it, is there an easy URL to remember or it is okay.

[00:47:27.620] – Jenn
There is. It is nginx dot com slash mm for microservices. So microservicesmarch.

[00:47:34.630] – Ned
That’s pretty easy to remember. I appreciate that you kept it very short and brief.

[00:47:41.010] – Jenn
I like easy.

[00:47:42.180] – Ethan
OK, slash mm for microservicesmarch. I am thinking about going in for that one myself. We’ll see. We’ll see. Jenn, if people want to follow you or reach out to you or Brian, is there somewhere you’d recommend they go?

[00:47:56.870] – Jenn
Yeah, I mean, we’re both on the standard social media, but where we do most of our writing is on the NGINX blog, so that’s NGINX com slash blog. We kept it simple with that one, too. You can also occasionally find us over on the New Stack.

[00:48:12.850] – Ethan
Excellent. Well, thanks to both of you for joining us today on day two cloud, thanks for sponsoring the show. We appreciate that. Ned and I, we got hungry families to feed and you guys help us to do that. Virtual high fives to you out there listening. We much appreciate you staying through to the end. And if you have suggestions for future shows, we would love to hear them. You can hit Ned or I up on Twitter at day two Cloud show or fill out the form on Ned’s fancy website, And if you like these engineering oriented kind of nerdy shows, you can go to PacketPushers.Net slash subscribe and you’re going to get a list with a bunch of links of all the different podcasts, blogs and so on our newsletters. It’s all nerdy content designed for your professional career development and it’s all free. Until then, just remember, Cloud is what happens while IT is making other plans.

More from this show

Day Two Cloud 153: IaC With GPPL Or DSL? IDK

On Day Two Cloud we’ve had a lot of conversations about using infrastructure as code. We’ve looked at solutions like Ansible, Terraform, the AWS CDK, and Pulumi. Which begs the question, which IaC solution should you learn? A Domain Specific Language...

Episode 136