Follow me:
Listen on:

Day Two Cloud 175: Deploying Kubernetes And Managing Clusters

Episode 175

Play episode

On today’s Day Two Cloud we continue our Kubernetes conversation with guest Michael Levan. Today’s show focuses on Kubernetes deployments and managing clusters once they’re up and running. We discuss whether Kubernetes is really more complex than traditional application infrastructure; examine management options such as GitOps, manifests, and Kubectl; share useful tools; and talk about why Kubernetes is all about APIs.

Sponsor: Kolide

Kolide is an endpoint security solution that helps your end users solve security problems themselves. They get smarter about security and you get more compliant computing. Find out more at

Show Links:

Day Two Cloud 174: Building Kubernetes Clusters – Packet Pushers

Kubernetes Unpacked – Packet Pushers

@TheNJDevOpsGuy – Michael Levan on Twitter

Michael Levan on LinkedIn

Kubernetes For Network Engineers – YouTube

Service Mesh And Ingress For Kubernetes – YouTube – Michael’s blog posts



[00:00:01.290] – Ethan
Sponsor Kolide is an end point security solution that helps your end user solve their security problems themselves. They get smarter about security, and you get more compliant computing. Find out slash day two cloud. That’s Com multicloud. Welcome to day Two Cloud, and we’re bringing it today with part two of our Deploying Kubernetes series with our guest Michael Levan. Michael is a leader in Kubernetes and containerization. You can find out all about him at michael And of course, as we mentioned along the way, he is the host of the Kubernetes Unpacked podcasts. And in today’s show, Ned, we get into we did the building of Clusters last week, and this week we’re talking about managing clusters. What did you take away from this episode?

[00:00:59.660] – Ned
I think one of the biggest things that took away was the philosophical point behind managing Kubernetes and the fact that it’s all API driven. So if you’re coming from a CLI background where you’re used to interacting with things at the command line or through scripts, things shift a little bit with Kubernetes. But the more things change, the more they stay the same.

[00:01:19.930] – Ethan
The more things change, the more they stay the same, indeed. Enjoy our conversation with Michael Levan. You can find him on Twitter at the NJ DevOps guy. Michael, you came back for part two, and I got to address what, for me, is the elephant in the room. And that’s the axiom you kind of hear out there, that Kubernetes is complex. Is kubernetes really that much more complex than traditional application deployment platforms? We’ve done with VMs for years where you got middle boxes and you can do layer seven application rewriting, and there’s networking lizard greed that goes on. There’s application performance monitoring and all that stuff.

[00:01:54.410] – Speaker 3
So I think the answer to that is kubernetes is a data center in itself. So throughout history, from a tech perspective, we would have platforms or tools, whatever you want to call it, that did VMs and Hypervisors and networking and security and storage and applications and yada yada with Kubernetes, it’s all under one roof. It’s all one thing, and you’re managing all of it with an API. And because of that, it can be extremely complex. So, yeah, I’ll argue that if somebody thinks that Kubernetes isn’t complex, they didn’t dive in enough yet.

[00:02:39.560] – Ethan
Well, to put a finer point on it, is it more complex than the way we used to do things?

[00:02:47.290] – Speaker 3
No, it’s not more complex. Like networking isn’t a new thing. Storage isn’t a new thing. Deploying applications isn’t a new thing. The only difference is you’re managing your infrastructure with an API. You’re managing your platform with an API. So that’s the big difference. But no, networking is networking. Storage is storage. Infrastructure is infrastructure. In fact, I tell everybody all the time when people reach out to me, hey, how do I get into Kubernetes? What’s the first step? And I say, do you have a sysadmin background? Have you managed data centers and all this? And if they say no, I say that’s the best place to start.

[00:03:27.190] – Ethan
Oh, well, actually that’s kind of comforting. So if you’ve got a background in multitiered applications with load balancers and firewalls and all that stuff, you have a basis upon which to learn Kubernetes because you’ll kind of know what’s going on.

[00:03:40.830] – Speaker 3
You’Re 70% of the way there.

[00:03:43.020] – Ned
Yeah, just a lot of mental translation from this is how I did it previously, and this is how I intended to do it. Now, I guess another portion of that is you’re taking a bunch of different job functions that would be typically separate people. So you had your storage admins and your network admins, and maybe you had like load balance or admins that were their own specialized group. And now you’re compressing all of that into a single person, potentially, who’s now responsible to understand the complexities of all these different things that are under the Kubernetes umbrella. And I would argue that’s almost maybe one of the problems is the idea that you can compress the knowledge from all these different arenas down into a single human being.

[00:04:25.560] – Speaker 3
Yeah, I think that it’s a huge problem right now. My recommendation always to every organization that I talk to is you should have a high velocity team. You should have five to six people on the team managing your Kubernetes environment. One of them is a security expert, one of them is a networking expert, one of them is an infrastructure expert, one of them has a software development background. Unfortunately, organizations don’t do that. And I think that’s one of the biggest problems right now, why people think Kubernetes is just complex as a whole and they can’t implement it, et cetera. But I think that if more organizations had high velocity teams like that, implementing Kubernetes would be far smoother.

[00:05:01.240] – Ned
Michael. I have one kubernetes. I just need one person. One thing I know is because I’ve done a little bit of the training for the Kubernetes certified administrator or CKA certification, I did some training for that. And one thing that I noticed about the training is it is extremely CLI heavy. You are just hitting being hit over the head with cube cuddle or cube CTL or however you want to say that. And I’m curious, in the real world, are people actually spending all their time at the command line running cube CTL, or are they using something else to manage Kubernetes?

[00:05:42.370] – Speaker 3
Yeah, they definitely shouldn’t be. If you’re spending all of your time using cube CTL, you’re probably Kubernetes wrong. You should be doing things like implementing a Git ops workflow. You should be doing things like deploying via CACD, for example. If you’re doing some type of local development and you’re running qubectl Apply on a Kubernetes manifest, that’s fine. But there shouldn’t be people dedicated in production just like running cubectl apply or cubectl create all day. Absolutely. Not.

[00:06:13.690] – Ned
Yeah, I remember it was asking me to do things like create a persistent volume and then attach that to a pod and doing all these other things using Cube CTL. In the back of my mind, I’m like, I don’t think you would actually do it this way.

[00:06:28.540] – Speaker 3
Yeah, not on the local terminal. No. You would automate that process in whatever automation tool you’re using or platform that you’re using. But a lot of that automation, you’re still going to need to understand and know the cubectl commands. So you still need to use them and run them. But the way that you’re using them and running them is going to be different from the certification. So if you go to the certification, like you said, it’s all lab based. You have 2 hours. You’re doing everything on a terminal manually. You’re not going to do it like that manually. You’re going to have like maybe a CI CD pipeline or whatever running the commands for you or script that’s, running the commands for you to get all that up and running. You’re not going to just be sitting on the terminal like that.

[00:07:11.790] – Ethan
But the tools that you’d be using instead for your automation are not actually making Cube CTL calls. They’re hitting the Kubernetes API directly. Right?

[00:07:21.300] – Speaker 3
Right. Yes. So like, for example, if you think about tools like Flux or Argo CD which are get op space tools, they’re looking at changes in your source control. And if your repo, if you have a Kubernetes manifest in there, if it changes, it’s making an API call to Kubernetes. But technically it is all still doing the same thing. Because remember, when you’re using Cube CTL commands, what is it doing? It’s making an API call. When you run Cube CTL, apply cubectl, create, you’re just using a post or you’re making a post API call. It’s pretty much what you’re doing. You’re making API calls all day in Kubernetes because Kubernetes itself is an API. All of the resources that you create, it’s all driven by an API. So regardless of if you’re using Cube CCL or you’re using a CSV pipeline or using GitOps or whatever the case may be, you’re always making API calls. So underneath the hood, it’s all doing the same thing. Yeah.

[00:08:13.930] – Ethan
My first exposure to a model like that was sitting in a Juniper site and the folks were talking about how the Junos CLI is in fact just an API client. That’s all it really is. When you are making a call or typing a command in at that Junoscli, it’s doing an API call in the background. And they even showed it to us. There was a network capture. They showed the API call being made and they said their journey to automation as Juniper was a lot easier because of that. Because not every network operating system was built that way. And so some other folks were struggling mightily to know how to move on from a CLI oriented configuration stack to being friendly to automation tools and that kind of thing. Well, okay. So then, if that’s the case, is it possible, with all the Kubernetes related tools and the automation that’s out there, do we not see Kubernetes anymore eventually, Mike? Do we actually use maybe we end up using a platform that it might be Kate’s at its heart, but it’s abstracted away from us in a way that we don’t know or care that it’s Kubernetes under there.

[00:09:18.710] – Speaker 3
At some point, what we know of Kubernetes today will go away. Whether it’s called Kubernetes or whether it’s called something else, the platform itself will eventually die and go away. The thing that won’t is what Kubernetes gave us. Because underneath the hood, right, taking out the technical marketing and the buzzwords and the Kubernetes is making our lives easier. What Kubernetes is doing is it’s allowing us to manage our entire data center with an API. Making API calls for storage, for applications, for infrastructure. It’s allowing us to manage everything with an API, or rather manage a platform with an API. That underlying piece to the puzzle is not going away at least anytime soon. Until we come up with something where nobody uses APIs anymore because they’re using whatever, that’s when it’ll go away. But what Kubernetes gave us, the ability that it gave us, will not go away, which is managing everything with an API. Kubernetes itself will eventually move on and be something else, or whatever the case may be.

[00:10:29.220] – Ethan
Well, okay, so being very specific here, you think the Kubernetes APIs that have become kind of the de facto standard now for managing a data center, you think we might AWS an industry move away from those, but the model making API calls to manage our data centers are going to stay, right?

[00:10:47.880] – Speaker 3
Yeah. I mean, eventually Kubernetes is not going to be a thing anymore.

[00:10:51.750] – Ethan
Whether it’s five years, eventually, you mean years, I assume.

[00:10:56.360] – Speaker 3
Yes, absolutely. We’re going to see Kubernetes around for a very long time. And what’s going to end up happening is this just my prediction. Kubernetes isn’t just going to go away one day and there’s going to be a new thing. I don’t think that there’s going to be this whole orchestration wars like we saw with Dr. Swarm and mesos and Kubernetes. Maybe we’ll see nomad in there, whatever. But the whole idea of Kubernetes is going to eventually move away as a platform. But the underlying piece that we get from it managing everything with an API, that’s not going to go away. Now, in terms of when it’s going to go away, every company right now is focusing on Kubernetes or orchestration in general, and containerization. With that being said, by the time everybody ends up implementing it, which we’re not even there yet, not a lot of organizations are doing it a couple of years to get everybody actually utilizing Kubernetes, and then that’s going to have to sit in a data center or wherever for years. So we probably got a good ten to 15 years of this genre before it ends up moving into something else.

[00:12:10.760] – Speaker 3
I’m sure you guys know Eric, right? I know you do. We were having a conversation the other day and we were just talking about Kubernetes and stuff, and just like my focus points around it and all that, and he said something that I really like. Kubernetes right now is what Vsphere was in 2005, 2006. So we’re right there. We’re just breaking into it. The door is just opened, right?

[00:12:41.010] – Ned
And if I think about where I was at with VMware in that time range, right, I was probably setting up some of my first clusters clusters and doing some basic V motions and really just finally glomming onto what VMs could do. And I was working at a very small company. I was working on like a 250 person company. So that’s probably how I would be exposed to Kubernetes. Today is several years down the line after it’s already kind of made its way through some of the more bleeding edge or larger companies and now trickling down and, yeah, the death of VMware has been wildly exaggerated, still doing extremely well, and you would be hard pressed to walk into a data center that’s not running VMware somewhere. So the idea that Kubernetes is just at that point, that’s 16 years ago. Oh, my God, that’s 16 years ago.

[00:13:34.010] – Speaker 3
Deep breath.

[00:13:35.360] – Ned
The idea that 16 years ago, if we’re at that same inflection point for Kubernetes, then it is still going to be a force to be reckoned with in 20 years.

[00:13:45.090] – Speaker 3
Absolutely. 100%. And yeah, you know what? And we’re kind of already seeing a transition, which is interesting. On the last episode, we were talking about the fact that you can have like, Fargate profiles and EKS, for example, now. So literally with a Fargate profile and EKS, you’re not managing the control plane anymore and you’re not managing the worker nodes anymore. So we’re actually already seeing we’re just doing everything with an API. We’re seeing it, we’re there, we’re there. So it’s just a matter of how much more we get there at this point. Yeah.

[00:14:21.100] – Ned
Wow, that is an interesting way to think about the control plane is actually the important part, and everything else, and how the worker nodes get instantiated and are managed is going to just evolve and change over time. So with that in mind, let’s kind of shift back to managing a Kubernetes cluster, which was ostensibly the topic of the conversation in terms of keeping my Kubernetes clusters up to date. I probably do want to eventually take advantage of new features as they’re rolled out. How up to date should I be keeping my Kubernetes clusters?

[00:14:57.860] – Speaker 3
Yeah, I mean, I don’t think you ever want to be more than one major version behind. So 125 is out now from the kubernetes API. If you’re running 123, you should probably start to think about your move to 124. But you know, it’s also going to depend on what’s being added and removed. So, for example, since I believe it was 122 or 123 PSP or Pot security policies, which is like essentially what OPA is doing at this point, was deprecated in 125, they’re completely removed. So if you got an entire environment right now running Pot security policies, you got to think about what your path forward is. If you try to just upgrade to 125, half of your environment may break and die in a fire. So there’s going to be that piece. There’s also the security piece as well. Listen, Kubernetes is at the heart of it. A bunch of apps. Apps have vulnerabilities. You may have something in 123 that was fixed in 125. From a vulnerability perspective, I can’t think of anything off the top of my head, just an example. But moving forward, of course we’re going to see that everything has vulnerabilities.

[00:16:15.700] – Speaker 3
You’re going to want to upgrade from a vulnerability perspective, everything that we know about upgrading versions, just in infrastructure in general, that we’ve been doing for 20 years, it’s the same concept.

[00:16:28.460] – Ned
When you upgrade Kubernetes, what does it actually mean to upgrade a Kubernetes? Because we know the control plane is made up of multiple components. So what am I actually updating when I apply when I move from 124 to 125?

[00:16:43.210] – Speaker 3
Yeah, so you’re upgrading a few things. Number one, you’re going to be upgrading all of the control plane components, etcd. The API server scheduler, yada yada. You may also be upgrading your container runtime, which we haven’t even talked about, and it’s a humongous discussion in itself, but the container runtime is also being updated. And then your version of the Cube CTL that you’re using, the command line is being updated. So it’s overall the API that you’re updating. The API contains all of these different components, which is all being updated. And again, that upgrade path is going to be 1000% different based on where you’re running it. So like with Cubadm, for example, there is a command, I think it’s Cube adm upgrade, and you specify what version you want to go to. In AKS, for example, I’m not recommending this per se to do it this way. I just did it to see what would happen. But there is like on the AC cly, when you use AWS, there’s an upgrade command and you could just figure out what version you want to go to and go do an upgrade. So there’s different methods to upgrade depending on where you’re running.

[00:17:52.910] – Speaker 3
Got you.

[00:17:53.710] – Ned
Okay. And in terms of the upgrade process, do you recommend doing an in place upgrade where I’m actually upgrading the software on each node, or do you recommend more of a roll out? Roll in where I’m rolling in a new node and retiring an old one and just kind of doing a rolling upgrade instead.

[00:18:13.810] – Speaker 3
Yeah, I think rolling upgrades are usually the way to go, especially, for example, if you’re running Cubadm, you can upgrade one at a time to make sure that everything’s working properly. You can upgrade your control plane components, and then you can upgrade your worker nodes, and then obviously, you can roll back if you want to, and the same rule applies. I know we’re talking about clusters right now, but I just want to put the word out there. The same rules apply for upgrading pods. So it’s essentially like blue green deployments, almost. You have something called rolling updates to where you can there’s multiple different ways, like Canary deployments and stuff, but you can have one subset of your pods running version one of your application. If you want to upgrade to version one two, you can do a rolling update to where some customers are still on one one, some are on one two, etc.

[00:19:05.510] – Ethan
I did not expect you to say that because I guess I’m thinking about the OpenStack model, which was you don’t upgrade OpenStack in place, you build a new cluster, an open stack cluster off to the side, migrate your apps to it, because that’s just the only way you can do it and manage the risk effectively. But you’re saying with Kubernetes, I can actually do a staggered in place upgrade?

[00:19:27.640] – Speaker 3
Yeah, absolutely. I would recommend that. I wouldn’t recommend creating a new cluster and just trying to migrate over. Yeah, from a platform perspective, we saw that with open Stack. We saw that with, like, ESXi, for example. We always wanted to create new clusters and move everything over. Never wanted to do in place. But yeah, with Kubernetes, it’s absolutely doable.

[00:19:47.160] – Ethan
Well, the implication there is there’s some kind of interversion compatibility between at least the previous and the current, whereas you couldn’t count on that very often with lots. If you were going to a new version, it was kind of an all or nothing approach, right?

[00:20:01.710] – Speaker 3
Yeah, with Kubernetes, it’s very different because everything is more or less from an API perspective. So you have different major versions, you have different minor versions. There’s always different upgrade paths. Like, if you Google around, like, what’s the upgrade path from 123 to 124? It’s going to show you step by step, you should go to this minor version, or maybe you don’t have to. It’s all going to depend on what minor version you’re on and what major version you’re on.

[00:20:27.540] – Ethan
Michael I want to move on to sort of the hands on, day to day operational stuff. Now, we said earlier in this episode that most folks are not doing what tends to be taught to you in certified Kubernetes administrator training, which is a lot of Kukudl commands, a lot of building YAML and Kukuddle apply and this kind of stuff. But could you at least explain that traditional CKA training style approach as a baseline and then maybe I guess we should transition to what happens in the real world.

[00:21:02.210] – Speaker 3
Yeah. So just to clarify, those commands that you’re using, like the Q CTL commands, building Kubernetes, manifest all of that and applying them and deploying them, that still does occur, but in production, you’re just not doing it on the terminal. Like, there’s just not a person sitting there on the terminal typing a gajillion cube CTL commands, but those commands are still being used. It’s just being used from an abstracted perspective or from an automated perspective. Yeah. So essentially with the CTA, and this is a certification that I actually recommend people going through, I’m not one for certifications. I have very few, and they have expired very long ago. So I never say, like, hey, you should go get all these certifications, but with the Kubernetes certifications, it’s actually all hands on, so there’s not multiple choice. Like, you’re chucked into an environment, and you’re like, go figure it out and go deploy it and go fix it, so you’re actually getting that handson experience, which is very, very cool, and I certainly do wish more certifications went down that path. So from the CKA perspective, you’re essentially in a terminal or in multiple terminals, and you have certain scenarios that you have to fix or do you have to deploy.

[00:22:16.560] – Speaker 3
And that’s a combination of using cube CTL commands. That’s a combination of running YAML to be able to deploy your Kubernetes resources, to edit them and all that.

[00:22:28.090] – Ethan
We’re taking a short break from the podcast to tell you about sponsor Kalid Kolide. Kolide is an endpoint security solution, and they use a resource that most of us in it would never really think about, the end users. Because end users were problems start, right, not solutions. Well, Kalid challenges that thinking, because if you can leverage your end users to mitigate the security issues that they are carrying around in their backpacks, that is a huge win. Now, let’s say you’re doing your device management the traditional way with an MDM. Well, you know, the joy of loading agents onto employee devices, agents impact performance, and they can be a privacy horror show, privacy being a thing all your users know about now. So Kolide does things differently. Instead of forcing changes on your users, Kolide notifies folks via Slack when their devices are insecure, and then provides step by step instructions on how to solve the problem. And using this Kolide approach, the interaction feels more friendly, more educational, more inclusive, and less intrusive, because now it isn’t doing something to your device. Instead, you’re working with it to help keep the company secure. It’s the whole attitude of, we’re all in this together, and as it, you still get the views you need into the managed device fleet.

[00:23:42.810] – Ethan
Kolide provides a single dashboard that lets you monitor the security of everything, whether the endpoints are running on Mac, Windows, or Linux, so you can easily demonstrate compliance to your auditors, customers and the Csuite. Give Kolide a shot to meet your compliance goals by putting users first. Visit SaaS day Two cloud Find out how. And if you visit slash Dayto cloud, they’re going to send you a goodie bag, including a Tshirt just for activating a free trial. That is Daytoloud. And now back to today’s episode. That approach you said that feels very manual. Even if I’m not doing them by hand, even if I’m using automated tools to do that, it still feels manual, I think is the best way to do it. It’s a process that I’m engaging to bring up a thing. Is that typically what’s going on? You’ve blogged about some other tooling, more of a get ups approach. Is that the right way to do it? The get vROps way.

[00:24:48.270] – Speaker 3
Yeah, I would say so. For deploying Kubernetes resources, going with a get ups approach, whether you’re using Argo, whether you’re using Flux, whether you’re using one of the other Get Ops controllers out there, it absolutely is the best way to deploy in today’s world. Before GitOps, what we would have to do is let’s say we had a CI CD pipeline. Well, we would have to run a whole bunch of Qctl, apply or create commands in our CI CD pipeline, which it was automated, but it was like automating a manual effort. It feels like duct tape, but with GitHub it’s an actual controller. So each tool that you use, it’s a controller, so it feels native from a Kubernetes perspective. Just like there’s a deployment controller, there’s an ingress controller, there’s a Git Ops controller, and you manage it and utilize it the same way. So it’s declarative. It feels very native in a Kubernetes methodology. And you’re not running a bunch of commands because it’s a controller. Because that controller is installed on your Kubernetes cluster. It’s way more how can I put it? It’s a way more repeatable process versus having to put a whole bunch of keep CTL commands in a CI CD pipeline or deploying it via your terminal.

[00:26:09.000] – Ethan
Walk us through what that feels like, then. Because for those of us that are very CLI centric or have been in for much of our careers and we’re used to making things happen manually. What you just described, there’s a controller and there’s a get vROps thingy and it’s all magical and there’s a pipeline. It all feels like, so hands off and a little bit convoluted. It feels like it’s out of our control, what’s going on. And then we engineers tend to be very we really want control of things.

[00:26:34.210] – Speaker 3
Sure. So the good thing is you’re not losing any more controllers if you’re using any other controller. So like when you’re deploying a Kubernetes pod, it’s the same thing as deploying via GitHub because it’s all using a controller on the back end. So here’s what it kind of looks like. First you deploy the controller so let’s say you’re using Argo. You deploy Argo, which it just runs as pods inside of your Kubernetes cluster. And then at that point you can utilize Argo and you can go to the UI and use Argo commands and all of that. And then you set up application deployments via Argo. So for example, let’s say you have a Kubernetes manifest that’s pointing to a repo called Test. You would tell Argo, hey, I have an application that’s sitting in the test repository, check it, make sure that it’s deployed, and then it’s pretty much hands off at that point. What ends up happening is, let’s say you go into the Test repository and you have your Kubernetes manifest there, and you change container image version from one one to one two. Argo is doing interval check ins and you can set this up.

[00:27:46.120] – Speaker 3
You want it every 30 seconds, every 60 seconds, whatever. It’s going to look at the repository, hey, are we up to date? Hey, are we up to date? Aka is my current state the same thing as my desired state, which is what a controller does. So it all comes back to what Kubernetes controllers do as a whole. Kubernetes controllers confirm that your current state is your desired state. If you did a Kubernetes deployment and you had two pods in that deployment, it’s the controller’s job to say, do I still have two pods running? Do I still have two pods running? If you have one pod running, the controller says, oh, deploy a second one. Same thing with GitOps. GitOps is doing the same thing. That’s why people are moving towards GitOps, because it’s a declarative way, it’s a native way to manage Kubernetes deployments. So that GitOps controller is going to look at that test repository and it’s going to say, am I deployed properly? Am I deployed properly? If you change that Kubernetes manifest, it’s going to say, oh, container image changed from one to one two. I’m going to run it, I’m going to update it.

[00:28:47.580] – Speaker 3
Oh, replicas changed from three to four, gotta deploy another one.

[00:28:52.910] – Ned
This reminds me a lot of the debate between the way that Puppet tended to approach things versus the way that Ansible tended to approach things. And let me just kind of unpack that for a second, right? The Ansible way was very much I’m pushing a configuration out to a destination machine. So I have this Ansible playbook, and when I run Ansible, it goes and connects via SSH to a machine or to a switch or whatever, and pushes the configuration onto that switch. It has some sort of evaluation to check the current status of the switch so it can be idempotent, but it’s a push mechanism. Whereas Puppet had the capability to install an agent essentially on all the machines, and then that agent would run and do what Argo is doing. It would check in periodically to the Puppet server and go, hey, what’s the. Current configuration. Has it changed? No.

[00:29:43.990] – Speaker 3

[00:29:44.290] – Ned
Well, I’m just going to check locally and make sure I’m still compliant. But if the configuration had changed, it would pull down a new copy of that configuration and do an evaluation loop and then apply the updated configuration to the machine. So we’ve been down this path before, right? This is not new ground we’re trending. It’s just a different we have to do that shift in our mind from this is what we called it when we were doing it with virtual machines and Puppet and Ansible. Now we’re calling it GitOps and Argo. Is there more to it than that, or am I oversimplifying it?

[00:30:19.160] – Speaker 3
No, you’re spot on. If you look at any of my content, my blogs, my videos, anything, when I’m describing GitOps, I always say the same thing. Get Ops. Is configuration management for Kubernetes so different? It’s the same thing. Like, for example, I used to use PowerShell DSC a lot, which was like your configuration management for your PowerShell environments. And you would set up intervals where it would constantly check in. Same thing. We’re just calling it something different. Same technology we’ve been using with Puppet and Ansible.

[00:30:47.740] – Ned
That makes me feel better, right?

[00:30:51.490] – Speaker 3
Yeah. I think the reality with everything new that we see, it’s always based on something else. Very rarely does anything come out that’s like new. It’s always based on something else.

[00:31:10.010] – Ned
I mean, to get to a philosophical point that you brought up earlier, the fact that Kubernetes is the final shift towards API first interaction, I think that is a true paradigm shift. Sure. Is a mapping of what we had to what is now. But I think the big change and you pinpointed it exactly is we’re moving to an API first realm where everything is configurable via the API. And Kubernetes does that. The cloud operators do it. And I think that’s kind of the operational model moving forward for it.

[00:31:48.640] – Speaker 3
Absolutely. Yeah. So I guess to rephrase what I said before is it’s not that we don’t have new philosophies and new ways of doing things. It’s that the underlying pieces don’t change. Like, for example, yes, we’re moving towards a more API driven approach, but like, APIs are APIs. We’ve had APIs for years. It’s all the same thing. Yeah.

[00:32:12.110] – Ethan
If I go this GitHub’s route, am I making my Kubernetes environment more complex or less complex? Or did I just ask the wrong question?

[00:32:23.890] – Speaker 3
Sorry, guys, my camera is about to die. So if it does, that’s why my camera goes off. I was trying to change it, but I’m sorry, could you ask me that question one more time? Apologies.

[00:32:34.540] – Ethan
If I go the Git vROps route, which sounds cool, like, that’s probably what I should be doing, but I’m making my Kubernetes environment more complex or less complex or is that not even the right question to ask?

[00:32:45.160] – Speaker 3
Yeah, I mean, you’re making your environment more efficient at that point because without GitOps, how do you deploy? You got to run Qctl commands locally or in a CSCD pipeline or something like that. With Git vROps, you’re just making your deployments less complex and you’re making things more efficient.

[00:33:05.290] – Ned
Yeah, nice.

[00:33:06.460] – Speaker 3
On the flip side, you gotta learn GitOps and you have to learn a new tool.

[00:33:16.390] – Ned
To that point, aside from the get ups way, what other tools am I going to be adopting as I move into managing my Kubernetes Cluster? Can you give us just a broad overview of either the types of tools or specific tools you see being used out in the wild?

[00:33:35.290] – Speaker 3
Yeah, so if you go to I forget the exact link, I will give it to you guys so you can put it in the show notes. There’s a CNCF landscape page and it shows all of the tools. You know what page I’m talking about? I do, yeah. So what tools are you going to learn? Well, I guess it depends on which one of the thousands.

[00:34:05.210] – Ethan
Pick some highlights though, because we got Argo and Flux you’ve mentioned tied to Git vROps. Maybe talk about those two and some other ones that you see commonly deployed out there.

[00:34:14.190] – Speaker 3
Yeah. So I think that there’s like this, there’s an overarching category. So like if you look at service measures, a bunch of tools. If you look at monitoring, there’s a bunch of tools. If you look at observability, it’s a bunch of tools. Choose your own adventure. But what I will say is there are certain tools that you’ll most likely see in each category. So for example, from a service mesh perspective, you’re probably going to see Istio or Linkerd. From a secrets management perspective, you’re probably going to see Vault Hash Corp Vault, which isn’t Kubernetes Centric, of course, but a lot of people are using it for secrets management. From a monitoring and observability perspective, a lot of the times you’ll see Prometheus and Grafana. So there are a ton of tools out there. But what I would say is this don’t focus too much on the tools that you’re using. Focus more on what you’re trying to implement. So for example, if you know, hey, I need to implement monitoring and observability, well, we could talk about tools all day. We could talk about container insights on AWS, we could talk about Azure Monitor, we could talk about Prometheus, we could talk about Grafana.

[00:35:20.940] – Speaker 3
But the question isn’t necessarily what tools you’re using, the question is what Azure you trying to accomplish. So with Istio, for example, great service mesh, but the question is why do you need service mesh? You know?

[00:35:34.290] – Ethan
Yeah, okay. So I can get mired down the rat hole of tools because they’re cool and I see people blogging about them, but I need to start at the proper foundation and what that is, why am I doing this? And if AWS, soon as I can answer why I’m doing this, then what the tools are that fill in that gap come into focus. It’s got to be fair to point out that the CNCF page that we’re talking about with the endless number of logos and project names, not all of those are equal in popularity. A lot of them are very similar to what another one on that same list does. And there may not be one right answer, but as you said, Michael, there tends to be a few projects that percolate up to the top. And so getting your head around those and some of the ones you mentioned there would be the right place to start. You’re going to find more community, you’re going to find more documentation around them or people’s blogs that show you how to use these tools. So you’ve got that getting started point. Now I do want to drill into one thing here though, and that is monitoring.

[00:36:37.910] – Ethan
If I am monitoring a kubernetes cluster, man, there are so many layers in that stack, it just seems a little bit overwhelming and complex. I don’t want a generic tool that assumes I know what I want to be monitoring, like the network monitoring systems of all. Hope you know what SNMP stuff you want to monitor, because we’re not going to tell you. Okay, is there some kind of a goto monitoring solution for kubernetes that helps you get started?

[00:37:02.060] – Speaker 3
Absolutely. So the really cool thing about kubernetes is it’s not even about which tool to go with. It’s about the metrics endpoint. So you have a metrics endpoint on a kubernetes cluster, which essentially it exposes every metric. If you set it up in that fashion, you can decide if you want to expose certain metrics for certain APIs and certain resources. But once you expose that metrics endpoint and you expose the resources to like pods, deployments, whatever, you want to be able to get captured in that metrics endpoint, it’s just there. Like at that point, it’s just there on the monitoring or observability tool you’re using.

[00:37:39.720] – Ethan
So for example, you’re saying monitoring endpoint. I don’t know that that’s obvious what that is. Does that mean I’m telling kubernetes here azure some things I want you to expose, and then there’s an end point created there that I can now pull.

[00:37:54.860] – Speaker 3
Yes. So on the kubernetes control plane, you’re going to have that metrics endpoint. So it’s literally just that slash metrics for almost all of the kubernetes resources. So for pods, for deployments, etc. There’s always a metrics endpoint. And then that metrics endpoint gets consumed by whatever monitoring and observability tool you’re using.

[00:38:16.910] – Ethan
So just like with network stuff, there’s SNMP. And if the SNMP OID is populated by that particular device, I can pull it and I can get back data in whatever format the bib tells me it is. There’s a metrics endpoint that I can pull. And again, I would kind of need to know what it is. I’m assuming there’s something that describes for me what the metric is that I’m pulling and what it means to me. Okay, that’s fine. There’s all those metrics. So you’re telling me the tools that I’m choosing are going to know about all these endpoints and start me off with a good set of metrics endpoints that I should be monitoring?

[00:38:55.090] – Speaker 3
Yes. So the good thing is that each tool, it only needs to know about the metrics EndToEnd in the kubernetes API. The metrics endpoint, it consumes the data from the pods, from your ingress, from your deployments, or whatever else you’re exposing to the metrics endpoint. Once your kubernetes resources and objects are exposed to the metrics endpoint, that metrics EndToEnd is the only thing that gets consumed by your monitoring and observability tool. So at that point, all the data is there. You just have to expose it to the metrics endpoint. So luckily, your monitoring and observability tool only has to look at that metrics endpoint. That’s it.

[00:39:35.660] – Ethan
So I have to pick and choose within my kubernetes configuration what elements within the API I want to expose to the metrics endpoint. So it’s not like kubernetes is just loaded with thousands of metrics endpoints that can be consumed. I have to basically turn the switch.

[00:39:52.780] – Speaker 3
On for some kubernetes resources? Yes. For the basic ones, like your pods, your deployments, your services, the metrics endpoint is on by default. But if you have like, a certain third party operator or controller that you’re using for maybe like ingress or whatever the case may be, you may have to turn on the metrics EndToEnd. Well, not turn on the metrics EndToEnd, but you would have to turn on metrics to be consumed by the metrics endpoint.

[00:40:22.570] – Ethan
Now, does this tie to open telemetry in some way?

[00:40:27.790] – Speaker 3
Yes. So open telemetry is observability that isn’t tied to a specific platform. So, for example, prometheus is an observability tool, but of course you have to use prometheus for that. Open telemetry is you can pull metrics from multicloud places into one location. But here’s the weird thing about open telemetry, and maybe I’m thinking about it the wrong way, but OpenTelemetry is whatever you want to call it an end point, a platform tool in itself. So then aren’t you tied to open telemetry if you use open telemetry? So it’s weird. It’s like an inception thing for me that maybe I don’t fully understand the point, but when I see open telemetry, I’m kind of just like you could use any observability tool to do the same thing.

[00:41:32.740] – Ethan
Not yet.

[00:41:33.700] – Speaker 3
Yeah, I’m not sold on the idea. I like the fact that the whole idea behind open telemetry is like your open source version of getting observability metrics, whether it’s traces, whether it’s your logs, whether it’s your what am I forgetting? Traces, logs, metrics. There you go. But I kind of feel like it’s doing what other tools and platforms are already doing. I think the whole idea behind it is like to not be locked in, but then you’re kind of locked into open telemetry.

[00:42:12.110] – Ned

[00:42:14.510] – Speaker 3
I don’t know. It’s arguably a path that I haven’t gone down incredibly in depth, so I’ll say that I could be wrong here, whoever’s listening, if you want to tell me I’m wrong, please feel free to do so. I love knowing when I’m wrong. So maybe there’s something that I’m missing from the open telemetry piece.

[00:42:33.340] – Ethan
Now, as I’m setting up my monitoring platform, whatever it is, I’ve got my metrics endpoints that are there are there very useful things I very much want to be monitoring because they tell me important things like, I don’t know, resource exhaustion or something within my cluster.

[00:42:51.340] – Speaker 3
Yeah. So at that point, you could set up, for example, like an audit policy to where you can describe in your policy what exactly you want to consume. You can consume everything, which is obviously going to be a lot, or you can consume certain resources. Well, yes, certain Kubernetes resources and objects.

[00:43:15.040] – Ethan
If you were setting this up, what would be the resources and objects you’d be monitoring for? Darn sure, yeah.

[00:43:21.270] – Speaker 3
So I would say from my perspective, I turn it all on in the beginning, and then I see what I actually need and what I regretted want. Yeah, because here’s the thing. When you’re retrieving metrics for a pot, right, what you’re actually doing is you’re retrieving metrics for an application. So in that application, when you’re retrieving those logs, those metrics, those traces, there may be some things that you care about and some things that you don’t care about, but you don’t know until you know. Once you know, you can say, okay, I’m going to turn off the metric for this, or whatever, because I actually don’t need it. It also depends on what path you’re taking from an observability perspective. So monitoring is about looking at data. Observability is about doing something with the data. So monitoring, you have your graphs, you can see what’s going on, your CPU utilization, memory network bandwidth, et cetera. Observability is about, oh, I have this trace that keeps failing, and I need to perform some type of action on it. So it also depends on what your end goal of monitoring and observability is.

[00:44:39.860] – Ethan
My end goal is for everything to be amazing and stay tuned. That’s what I want.

[00:44:46.010] – Speaker 3
Exactly. And that’s exactly why I tell everybody, turn it all on and see what you need and see what you don’t need, because you can always flip the switches. Let’s say you have an audit policy and you’re consuming everything. At some point you may say, you know what? For this, I only need to consume logging in and authentication for whatever Kubernetes resource. So it kind of all depends on what your end goal there is. Because here’s the thing. Just like with any other tech stack or any other tool or any other platform, you could turn it all on and kind of see what happens. And then maybe you might not need a piece of it and then you turn it off, but you really don’t know until you know.

[00:45:27.850] – Ethan
Well, there azure, some things that sound much more interesting data points than they turn out to be in practice. And then there’s also the more obscure ones that you don’t think you need until you have a problem and you’re troubleshooting. Then you find out, oh, if I’ve been monitoring that, then I would have seen this coming. And you don’t know until you’ve had some experience and then you know.

[00:45:47.410] – Speaker 3
Exactly. And then at that point you can set up certain alarms or certain repeatable processes. Let’s say something in your application fails, your observability tool picks it up. You can then create some type of alert based on that metric. You can even maybe create some type of repeatable process for that metric.

[00:46:07.240] – Ethan
Well, Michael, well, this is the end of part two, and those of you listening, we recorded part one and part two back to back. So we’ve been at this kind of a marathon recording session here for a couple of hours, getting all this information in. And Michael, as a reminder to the audience, you are the host of the Kubernetes Unpacked podcast on the Packet Pushes Podcast Network and what kind of conversations you’ve been having on that show.

[00:46:32.510] – Speaker 3
Yeah, so it’s been funny enough, when I first started the podcast, I was like, I feel like it’s going to be in a specific area of Kubernetes. Maybe it’s going to be more about Kubernetes in the cloud or whatever, but because of the vast amount of guests that I’ve had on and the different walks of life, because Kubernetes is just such a beast in itself, I’ve just been having conversations about everything and anything Kubernetes from a production perspective. So anything from how to deploy with TerraForm, to what service mesh to use, to how to think about security. It’s literally been everything and anything. So if you do listen to the podcast, everything from a how to run Kubernetes in production perspective, that’s what it’s all about.

[00:47:20.440] – Ethan
Okay, very practical then. Handson and engineering friendly is what I’m hearing there.

[00:47:25.030] – Speaker 3
Yeah, yes, very engineering focused. We’re not throwing any fluff into that podcast. It’s all engineering heavy, ready to go implement these ideas into production. Alright.

[00:47:37.360] – Ethan
And Michael, where can people find you on the Internet?

[00:47:40.180] – Speaker 3
Yep. So on LinkedIn pretty heavily. You can just look up Michael Levan L-E-V-A-N on Twitter at the DevOps guy. You can also check out my GitHub at admin turn DevOps and I post a lot of blogs on Dev Two. So dev two, the DevOps guy.

[00:47:57.160] – Ethan
Very good. Well, thank you for joining us and for making the time. And if you are listening there, you missed last week’s episode, the part one. Don’t miss that. That was where we focused on building clusters rather than managing them, managing them being the focus of today’s show. Virtual high fives to you for tuning in, by the way. Again, you are an awesome human with impeccable taste in podcasts. And as Michael was just describing, his Kubernetes unpacked podcast is great. You can find it wherever you listen to podcasts and then subscribe. And if you have suggestions for future Day Two Cloud episodes, Ned and I want to hear him. You can hit us up on Twitter at Day Two Cloud show or go up Day Two Cloud IO and fill out the request form. And if you are looking for more folks in our world to interact with, you can do that at the Packet pushers free Slack group. It is open to everybody, vendors included. Just go to PacketPushers. Net slack and join. It’s a marketing free zone for engineers to chat, compare notes, tell warstores and solve problems together. Again, that’s packet slash Slack.

[00:48:55.840] – Ethan
And until then, just remember, Cloud is what happens while it is making other plans.

More from this show

Day Two Cloud 180: Understanding AWS EC2 At The Edge

On today's Day Two Cloud podcast, we speak with Jan Hofmeyr, a VP within Amazon Web Services (AWS). This show was recorded at AWS re:Invent 2022 in Las Vegas, and we discuss EC2 at the edge, AWS Outposts and how local zones work, connecting Outposts to...

Episode 175