On today’s Day Two Cloud podcast we talk about Infrastructure as Code (IaC) and patterns and practices you might want to put in place because IaC is, at its heart, code. So you might want to apply some software development practices to it, particularly for the parts of your team who know what they’re doing with infrastructure but may not be familiar with things like repositories, re-usability, unit tests, and so on.
Our guest is Rosemary Wang, a Developer Advocate at HashiCorp and author of Infrastructure as Code, Patterns and Practices.
Show Links:
@joatmon08 – Rosemary Wang on Twitter
Infrastructure as Code, Patterns and Practices – Manning
Design Patterns: Elements of Reusable Object-Oriented Software – Addison Wesley
Book Review : Design Patterns: Elements of Reusable Object-Oriented Software – Gary Woodfine
Transcript:
[00:00:01.130] – Ethan
Why should you care about CDN 77 to retain those 17 out of 20 people who click away due to buffering? CDN 77 is a global content delivery network optimized for video and backed by skilled 24/7 support. Visit CDN 77 dot slash packet pushers to get your free unlimited trial. [00:00:31.930] – Ned
Welcome to day two, Cloud. And today we’re going to be talking about infrastructure’s code. The patterns and practices you might want to put in place, because I don’t know if you know this, ethan infrastructure as code is code, and you might want to apply some software development practices and patterns to it. [00:00:51.410] – Ethan
And from an operational perspective, as me, as an infrastructure human, that when you haven’t done a lot of coding and you begin to think about what is the right way to do this? This is where a conversation like this I found it incredibly helpful to begin thinking about the patterns, the way you should store data and the way you should do testing and so on. Those patterns make themselves apparent once you’ve been doing it for a while. But if you’re starting out from zero, like many of us are, because we’ve been artisanally hand crafting our configurations for many years now, and now we want to automate it. And how do we do it right? A show like this helps you do it right. [00:01:28.540] – Ned
Yeah. And we picked the perfect human to do this. Our guest today is Rosemary Wang. She’s a developer advocate at HashiCorp. And she wrote a whole book about this exact topic. So enjoy this episode with her. Well, Rosemary, welcome to the show. And hey, you wrote a whole dang book, so congratulations on that. How was that experience for you and would you ever do it again? [00:01:52.950] – Rosemary
Yeah, thank you for the congratulations. The experience of writing a book was great, but also terrible. I’ve never experienced such highs and such lows, partly because of writing each chapter, finishing it, and then getting it reviewed and realizing that I completely missed a million things. And I think it was worth putting it all down on paper. I had a lot of these ideas for a long time, so for me it was worth it. But I don’t know if I would ever do it again just because I don’t think I have another topic that I would write this much on. The book is like 300, 400 pages, including examples. And I’m like, do I have 300 or 400 more pages of a topic that I actually talk about? I don’t think so. [00:02:39.410] – Ned
Not now. Who knows? Who knows what the future holds, right? [00:02:42.740] – Rosemary
Yeah. Yeah. [00:02:43.560] – Ned
And reading through the book, reading through the book, it was very clear to me that you have been thinking about this stuff for a while. This wasn’t just thoughts you dashed off on a piece of paper. This was stuff that you’ve lived in for a while. And just to clarify about the book, for those who haven’t seen it or haven’t leaped through it since you work for Hashi Corp. As some people might know. People could reasonably assume that this book and this podcast we’re doing right now is just a thin veneer to talk about TerraForm. Is that true? [00:03:16.350] – Rosemary
TerraForm is not the only thing out there. I love everybody’s like, oh, you know, it’s all about TerraForm. I hate to tell you everybody, I’m sorry. Grab the tissue boxes. TerraForm is not the only tool out there. It’s not the only infrastructure as code tool out there either. I had been working on infrastructure as code for a long time, and it predated TerraForm even. And one of the book itself, infrastructures Code Patterns and Practices, is called Patterns and Practices because I realized that a lot of the things I was learning from the software development space as well as other tools outside of TerraForm, we weren’t really bringing into TerraForm, or we weren’t really thinking about it across all of these different functions. So it is a book for a lot of different infrastructures code use cases, not just for TerraForm. And it’s adjacent, of course. A lot of people use TerraForm. So naturally, when it came down to the examples, everybody who read an early manuscript was like, can you just put it all in TerraForm? And I was like, Well, I was trying to avoid it, but here we go. [00:04:20.050] – Ned
Yeah. And the book does have it’s mostly Python examples that then translate into TerraForm, correct? [00:04:27.030] – Rosemary
Oh, yes, that was a whole interesting debate too. Initially, I wrote everything in TerraForm, and it was a fair assessment from early readers where they said, hey, I’m a developer, I want to learn this stuff. But I don’t really care much for TerraForm specifically. I don’t really understand domain specific languages. And when you write a book that tends to be tool specific, it doesn’t have much longevity. We all know that tools upgrade constantly. And when you write a patterns book and you want to show code examples and make sure there’s longevity to the concrete examples you are showing, it’s really difficult to use or rely on a tool. So the result was that after some discussion across different readers, different reviewers and the publisher, we decided, you know what, let’s try it in Python. Some people like it, some people hate it. I understand it’s polarizing. Don’t worry. For the writer myself, it was really difficult sometimes going from Python, converting it to TerraForm JSON, and then maybe sometimes like taking TerraForm HCL hashtag configuration language and then porting it into JSON too. So it was lots of complexity from the abstraction. But I think it helps aid in the longevity of the patterns of examples. [00:05:47.460] – Ned
There was something interesting to see, how it would be laid out in a general purpose programming language. Instead of some of the stuff that TerraForm kind of hides for you, it does it in the background for you, so you don’t have to think about it. Python clearly made you think a little deeper about how that would actually be implemented as opposed to having a tool do it for you. Let’s dig into some of these patterns and practices, and I’ll start with a chapter that I found fascinating to start with, which is Module Patterns for Infrastructure as code. You introduced a bunch of terminology that I was unfamiliar with. I don’t know if these are like existing terms that I just hadn’t encountered before, but you introduced things like Singleton, Composite, Factory, and Builder. What was the thought process behind developing those types? [00:06:32.750] – Rosemary
Sure. So if you are in the software development space, or you’ve been in sort of the software architecture space, you’ve probably heard of a book called Software Design Patterns. And it was a conceptual book describing a lot of patterns that you could use to write code. And it was specifically applied to software engineering. Right. The idea is like, how do you build composable, maintainable sustainable code across teams? And the first time I came across Design Patterns as a book and had to read it, I was pairing with a Java programmer. I was not a programmer by any means, and I had to learn these design patterns because it’s very specific. Sometimes it’s really specific per language. But in this space, in the Java space specifically, this programmer was particularly influenced by it, so I had to learn it. The terms in my book come from the Design Patterns book. However, it has been modified and adjusted in the infrastructure’s code context because there are some differences. One big difference is that infrastructure’s code is mostly declarative, meaning I want it this way. Right. But programming languages tend to be imperative, meaning for these sorts of things, I want to loop in this manner, or if I want this, then I get this right. [00:08:00.010] – Rosemary
So it’s conditional, looped based, it’s imperative, it’s explaining the how and not the target what? So the patterns don’t fully apply. And that’s where I wanted to at least have some familiar terms for people who knew the development patterns but didn’t understand them in the context of infrastructure. [00:08:21.390] – Ned
That is something I keep coming up against, is that infrastructure is code? Yes, it is code, but like you said, it’s a different type of code. It’s not application code, it’s infrastructure code. And so you do have to treat it like its own animal. And some of the concepts and the patterns and practices don’t translate one to one. So I appreciate that you kind of created this translation bridge for people who might be familiar with the concept from application code. [00:08:49.690] – Ethan
Since you made that abstraction. Would you say that the book was aimed more at developers or more at operations and infrastructure folks? [00:08:56.810] – Rosemary
It was actually aimed for both, and that was a really ambitious approach. The reason why I aimed for both, and I think in some ways it had to be aimed for both folks, because you get a lot of developers now who need to work a little bit on infrastructure because of public cloud offerings, right? They have to have some knowledge. And I was getting a lot of developers coming to me saying, what’s the minimum I need to know about infrastructure? But then I also got the reverse, which was an administrator, an infrastructure engineer, platform engineer, whatever. Someone who works on infrastructure in their day to day. And they were like, I’m having a really tough time scaling this across my teams. I don’t know the practices that I should know. And the thing is that a lot of these practices that we’ve at least adapted for infrastructure came from software development in the first place. And so having to translate and bridge, which is a great term, but that translation I just found was missing. So that’s why the book tries to address both and comes at it toward the middle. [00:10:01.310] – Ned
Okay, yeah, that’s the feeling that I got is and I come definitely from the admin side of things, the last time I did any actual programming was back in college. And I don’t want to say when that was. It was not recent. Another portion of the book you got into was the way in which that you create abstractions for modules and link them together. And you described this concept of inversion, of control and dependency inversion. And I’m going to be honest with you, I read that section like five times. I highlighted things, went back, and I’m still not sure I got it. So pretty much the whole reason I asked you to be on the show is just so you could explain it to me. Maybe it’ll actually click. [00:10:49.150] – Rosemary
Yeah, I’m going to be honest, too. Most of these design patterns I have probably had to read in the software books 1015 times and then try to translate it. And so translating to infrastructure, it was even harder at times because I found myself going back. But once you get a sense of inversion of control, as well as dependency inversion, it becomes a really useful way of describing how you decouple infrastructure, groups of infrastructure from each other. So we’ll start with inversion of control. Inversion of control is a very I think in terms of the concept, sounds very cool, right? You’re like, oh, yeah, you’re inverting control in ways that you don’t want some kind of higher level resource, right? Or a low level resource communicating with each other in a specific way. So inversion of control is probably better referred to as a Hollywood principle, where I’ll call you, you don’t call me. Right? And the reason why it’s important is that when you describe one resource that depends on another, you can have the lower resource, let’s say a network. Call the server and say, I will give you my network ID. But the problem is what happens when you have 100 servers. [00:12:12.570] – Rosemary
It’s not really intuitive. When you have 100 servers, you have to have that network push out 100 network IDs of the same ID to 100 servers. How does it know which server to push to? Don’t know. Instead, with inversion of control, what you’re doing is saying, okay, the server will call the network instead of the network calling the server. So if the server needs the ID, the server will pull the information from the network. So it’s a pull model, not a push model necessarily. That’s kind of a different way of thinking about it. But in a more tangible example, that means that you can add any number of servers. You could have 200, 300 servers. And as long as they know which network they need to retrieve the Identifier from, they can just call the network Identifier and get the information they need. [00:13:01.530] – Ned
Okay. I think that’s something that I was definitely missing was that push pull comparison and the Hollywood principle. I like that. That might actually stick in my brain better than inversion control. [00:13:13.790] – Rosemary
Yeah, I think it’s a higher level way of saying the metaphor. But you don’t call us, we call you. Right. So the servers or any higher level resource will call the lower level resource. Okay. Yeah. So that’s inversion of control, it’s very fancy sounding. Then there’s dependency inversion and dependency inversion offers. There’s not a great pithy metaphor for it. Sorry. But you can think of it kind of as a translator or a layer between the server and the network. Right. So we’re going to go back to the server network example because I think that’s probably one of the more simplistic dependencies we could describe. But a server needs information about the network Identifier, it’s probably not a great idea for that server just to call the network and say, give me your IP address. Right. Or give me the IP address, IP address range that you’re sitting on, for example. That server ideally should go through some middle layer. A middleman or a middle person of some kind. You can think of it as like you maybe call I’m trying to find a good analogy, like the Hollywood, but you maybe call your agent and your agent calls the studio for what’s going on. [00:14:40.170] – Rosemary
You don’t want to call the studio directly. So we can make that analogy. [00:14:44.800] – Ned
Okay. [00:14:45.770] – Rosemary
Yeah. And dependency version is a lot harder to convince people to do because why would you need another layer in between? Why do you want a middleware, a piece of middleware, like a middle component? It doesn’t really make sense, but it actually makes a lot of sense in infrastructure. Because if, let’s say the network Identifier changes, you don’t want the server affected necessarily. Right. Maybe the server all they need is a network ID. So you want to make sure that middle layer keeps that passing, that network ID. You don’t have to have a network that changes. Oh, subnets or this name or this whatever other Identifier you have. So it protects the server from changes to the lower dependency. [00:15:29.140] – Ethan
Does that mean the piece that sits in the middle would be like a source of Truth? [00:15:35.610] – Rosemary
It could be. It could very well be a source of truth. Exactly. [00:15:39.950] – Ned
It almost sounds to me like in your example, if something changes about the network that the server doesn’t have to care about, then ideally the server never knows that that portion of the network changed AWS long as the ID it’s using remains relevant. And that middle piece is what’s extracting the information the server might actually need and holding it for the server to retrieve whenever it needs it. [00:16:04.720] – Rosemary
Exactly, yes. So you just have the information you need and it could be a source of truth. Right. The idea is that sometimes people do store it in a separate place, they store it in some configuration manager and that becomes a source of truth for the network, for example, that the server resides on or the server is using. So it very well could be a source of truth, but it doesn’t have to be. There are also many other variants of this where it may not be a source of truth, it just might be like an infrastructure API even. [00:16:36.270] – Ethan
I bring up Source of Truth because that’s often used for networking, since you brought that up as an example, Rosemary, where the source of truth may not be actually holding configuration state or anything like that, but it is where you’re supposed to go as your baseline for what reality is. And if you’re mismatched from that source of truth, then you’re out of step with the intended configuration of the network. And so you go to the source of truth as this is the arbiter of what reality is supposed to be, whether it is or not, and everyone can go check that repository of data to know where things are at or pull the next IP range that needs to be pulled, et cetera. [00:17:13.610] – Rosemary
Yeah, exactly. I think the networking space does this very well, actually. Right. Because a lot of these things do get pulled from a shared configuration space or shared configuration store somewhere. It’s a lot harder in Cloud, I think, with Cloud, because with Cloud offerings in particular, it’s so easy just to provision these resources. It’s just like, oh, I’ll just connect it all up and then you never actually realize where the configuration is going. What configuration you’re using. It’s very hard to keep track of. It’s extremely dynamic. And so the result is that dependency inversion. I see a lot more carefully constructed in data center or devices in general versus cloud offerings. It’s easier to ignore that as a principle because you can just get what you want and who cares where the resources are. [00:18:07.610] – Ned
I feel like it would be useful in the sense of being able to abstract a server configuration so rather than it having to know which network API to tap to get the information. Now it just knows how to tap that middle piece to get the information it actually needs. And that middle piece is responsible for translating from the network API to the information it needs. So I think it could help in that way too. [00:18:32.470] – Rosemary
Exactly. [00:18:34.310] – Ned
All right. Well, Ethan, you mentioned a repository source of truth and I’m going to get into a difficult area now, Rosemary, so follow me down the rabbit hole. I know you have some opinions on this. When you get started with infrastructure’s code, you’re usually doing it on your own. You’re maybe saving your files to a local directory. You’re versioning things by just like version two of the directory, version three. But eventually you got to put that stuff in version control, especially if you’re going to work with a larger team. So what are some considerations around repository structure, versioning and branching? And I know that’s a broad topic, so feel free to carve off whatever slice you want to start with. [00:19:16.290] – Rosemary
This is the one that’s going to get me in hot water. I can already eat something because there have been entire conversations where people have very entire teams have been broken up over repository structure and branching. So we’re going to try to do this as neutrally as possible. I have, of course, my own personal opinion, so everybody I will articulate my personal opinion and everybody can disagree with them. So I guess we’ll start with probably repository structure because I think that it’s very something you think about too early on. Most of the time. It’s easy to say, especially in infrastructure’s code, let’s just put it in one repository and then we’ll run it and everything gets created at the same time. Or if you’re doing some kind of configuration, every configuration gets applied at the same time. And eventually you’ll start breaking things down into folders subfolders, for example. So if you have maybe ansible playbooks that you’re running and TerraForm and many other maybe packer image builds for some reason or other, you’re going to start dividing those into different folders in one repository, right? That is what is fondly called as a mono repo or a single repository structure. [00:20:34.020] – Rosemary
And it’s very natural to start that way. There hasn’t been a time in infrastructure’s code when I’ve worked with folks where they haven’t started with some kind of single repository structure. Because it’s easy, you can see everything, you can express the dependencies within that code. So it makes it really easy to find everything and to find where things are linking to each other and that makes it great. But single repositories don’t scale well, especially when you have more and more people working on them. And it doesn’t scale well in a couple of different ways. Right? First, access control. Unless you have a really good build system that handles the access control of who is going to go into those folders and make changes. One of the biggest problems that you’ll have with a single repository structure is well, you can’t just give everybody access to make changes to everything in that repository. Maybe you only want a sysadmin working with an ancient playbook, you don’t want them necessarily working on, I don’t know, some like API automation script for a Cisco switch, which is in a different folder. And access control becomes really difficult in a single repository structure. [00:21:45.500] – Ethan
I’m laughing because you said you can’t just give everybody access to everything. And I’m like, yeah, but you can and you do. Yeah, that is what happened so many times. [00:21:54.530] – Rosemary
You can and you do. And there’s no judgment. If anybody wants to do that and has the security posture in which they wish to do this, that is okay. Go into it knowing that you are giving a lot of people access a lot of different people access to a code base or to infrastructure code that they may or may not know what to do with it. Right. And if you have controls in place to gate, to test, to understand what the impact is, if someone does make a change, then that’s fine. There are a lot of big organizations, large organizations that do have a mono repo and they do it very well. And that’s because they’ve built tools as well as mechanisms to check that if someone is interacting with that one repository, everything is going to still work the way they expect it to. So there’s a lot of benefit to a single repository, specifically an ease of dependency management. You can just go in and see this ancient playbook must run first versus this TerraForm, must run seconds, et cetera. Inevitably, however, in infrastructures code, most people end up going to a multirepository structure or a multi repo structure and that is where the problems start to the problems start to get solved from a collaboration standpoint, but they don’t necessarily get solved from, I guess, sort of a cultural or knowledge transfer standpoint, right? [00:23:27.810] – Rosemary
When you have a multiple repository structure, it’s very easy to forget other configurations exist, so you don’t necessarily know, oh, this ansible playbook should run before or after this TerraForm, for example. And so there are benefits to both mono repo and multi repo, at least in my opinion. Usually it’s probably better just to start with multi repo because unless you have a really good build tool that is able to handle recursive folder searching AWS well, as differences and access control, it’s very difficult to scale this when you have more and more people. [00:24:06.110] – Ethan
The challenge with that though, doing multi repo out of the gate is having enough knowledge about what the structure is going to need to be so that you’re not having to change it and change it and change it as you go forward. Do you have any guidelines around that? [00:24:19.730] – Rosemary
Yeah, so I separate by provider, I also separate by tool sometimes as well. This allows me to swap if, let’s say, maybe we didn’t want to use ansibles, we wanted to shift to Puppet, for example, then I can actually control and I understand which ones have what kind of information in them and the kinds of different mechanisms because both tools work differently. So that’s one big problem with Playbooks, for example, or modules, TerraForm modules that I think is okay in a mono repo it’s not purely multirepo and not purely monoipo either, right? Some people choose to say, okay, I’ll separate the TerraForm modules into their own repository but they’ll have sub directories in there and that works just fine, right? But putting everything into every bit of infrastructure, every bit of platform configuration into a mono repo tends to overload build systems really quickly. So usually it’s a combination of multi repo per tool or per cloud provider and then subdivide it into different functions within that repository. [00:25:34.570] – Ned
Okay? Now stuff isn’t going to stay the same in the repository. You’re going to want to make changes. When I think of the montrepo, I think of a big challenge is when I do want to introduce a change, I can’t just test the change in that repository with that one fold folder. I have to test the change. I have to test everything in the folder because of that one change. Whereas I might be able to do just basic testing in a multireep scenario with just the repo that was updated. Though I guess eventually I want to do an integration test against all the repositories depending on what I’m changing. So I guess I’m still in the same problem. What are the approaches in terms of updates and versioning for the different repo structures? [00:26:18.170] – Rosemary
Yeah, and that also gets to a very tricky discussion around testing. I highly recommend everybody test. I’m just making this playing his statement testing is important, but when it comes to, let’s say, modules, right? So we’re going to exclude sort of a configuration aspect, right? Because Playbooks Azure going to be different than the configuration itself, which will get applied. So Playbooks or modules are groupings of resources or pieces of automation. The testing, those in isolation, those tend to be something that if you did a mono repo you would test per sub directory or per each of those modules or groupings, right, and you could test them in isolation. And what I usually say is the minimum integration test that you would need, right? So for you to know that it works. So for example, if you have a network and you maybe need to know whether or not it is actually a public network module that works correctly, maybe set it up as part of your integration test. Just set up the public network and then put a server on it. Test that it actually has public routing, something like that. But you don’t have to set up private networks for it. [00:27:31.890] – Rosemary
You don’t have to set up everything. But when it comes to multireepo specifically, you are right and that multirepo is a little easier. You can break out the testing and just say, just test this module when I make a change. But you lose a little bit of the integration capability. So having sort of that good minimum integration testing capability built AWS part of that singular repository there as part of that module or grouping, whatever it is, will help alleviate will help alleviate your concerns over any changes that you might have. Then there’s the actual usage of the modules or the groupings. And that’s when you’re applying it to production or applying it to staging. And when that happens, that route has a whole series of other testing requirements as well. That is when you can truly test, truly integration test right? And that involves a lot more end to end. Does this work? Does this work? Doing your due diligence on testing each piece of functionality. And in that case, most configurations, like if you’re separating, you’re creating something for staging, production, et cetera. Most of those tend to actually be mono repos. So they have a repository for Kubernetes cluster, and then you have production, staging, test maybe, or something as subdirectories in there. [00:28:57.790] – Rosemary
And hopefully you duplicate them and they’re all the same, but most of the time they’re not. And the result is that your tests will exist within those subdirectories. So your tests will generally exist within subdirectories. If it’s in a multireep, if you have a multirepo approach in which you have no subdictories involved, then it would be a top level sort of testing and make sure you just do your integration test. [00:29:24.470] – Ned
Okay, what about branching? Because I’m going to back up and explain the branching as I understood it at one point, and I don’t know if this is correct, you can correct me if I’m wrong. The branching that I had been taught at least and been instructed to teach others, was you might have a branch per environment. So you have your main branch. That’s the branch. That’s the source of truth. And then if I want to push an update to my development environment, I would push any change from my main branch to my development branch. So I’d have these long lived branches per environment, and changes would come from main as a feature or a bug or whatever, and then get merged into the various environments. Is that a good pattern? Is that a bad pattern? Hit me with your best shot. Tell me what I’m doing wrong. [00:30:15.430] – Rosemary
I mean, it’s not the worst pattern, but it doesn’t necessarily what I would say. When you do branch per environment, it is very easy to cherry pick. So for those who are not as familiar with version control in general, it’s very easy when you have an environment per branch to just pick the things that are most important to push to. Production or push to a main and then just kind of like nudge it in there. The problem is then it’s different, right, than the sequence of changes that you applied to any of the other branches. So the reason why it gets to be a little bit complicated when you have a branch per environment is that you are changing time. It’s almost like you’re saying I applied step ABC to dev and then suddenly now you’ve only applied B, but then in production you realize, oh, I don’t need B, so then you never applied that step B ever. Which means now your dev, your staging and your production are all different. So you’re changing the timeline of the changes that you would apply. And that’s kind of the problem. In general, with feature branching or separating branches by feature or by environment, it’s very easy to forget the sequence of changes that you’re applying. [00:31:44.590] – Rosemary
And infrastructure works best, and infrastructure automation works best when there is a well defined sequential series of changes to be applied that gives you predictability, that allows you to test more predictably as well. And so I think that people do successfully have do feature branching in infrastructures code pretty successfully, but it doesn’t necessarily mean dividing by environment. Sometimes for folks it means dividing by features themselves. So maybe someone is implementing a license change on one branch versus another person is implementing a new network on another branch. [00:32:29.870] – Ethan
Let’s pause the podcast for a bit. Research suggests that 17 out of 20 people will click away to the buffering or stalling. And I am definitely one of those 17. There’s lots of stuff to watch out there and there’s no reason to wait around. If your company delivers online media, consider CDN 77. They are a globally distributed content delivery network and they’re optimized for video on demand as well as live video. CDN 77 is not some newcomer to the scene. They are used today by many popular sites and apps, including Udemy, ESL, gaming, live sports and various social media platforms. And that makes sense to me. CDN 77 has scale. They have a massive network with distribution points all over the globe and plenty of redundancy. While that means you shouldn’t have problems, what happens when you do need tech support? CDN 77 offers 24/7 support staffed by a team of engineers. No chatbots, no tickets getting routed around queues, while no one actually does anything. Just no nonsense dedication to your issue. To get your online media back to 100%. To prove that CDN 77 will work for your content delivery, visit CDN 77. [00:33:37.480] – Ethan
Com packet pushers to get a free trial with no duration or traffic limits, that’s CDN 77. Com packet pushers. For a free trial, you can push hard. For serious proof of concept testing, CDN 77 dot packet pushers. And now back to this week’s episode. [00:33:58.330] – Ethan
Can you talk to us about testing some more, Rosemary? Because when I find testing with infrastructure’s code a little bit abstract compared to applications. Applications. You stand up the app on an environment, whatever that test environment is, and you go forth and you test. And you’re not affecting the production environment with infrastructure, you can’t always do it like that because sometimes what you’re trying to roll a change out to is production because there is nothing else you can do and there’s no platform for testing as such. So compare and contrast that for us. How is testing IAC similar to application? And how is it different? [00:34:32.630] – Rosemary
Yeah, the testing, you want it to be similar. If you have, we’ll assume, the worst case scenario where you have no development environment, right? You don’t even have a test environment for infrastructure. This is what happens sometimes you just have to go straight to production. And so in that case, testing is about capturing the known knowns and making sure that everybody else understands that these are the things that should be consistent about this system. Right. This is how the system should work. If anything, it is more about knowledge. I treat testing for infrastructure about giving someone else the knowledge to understand how a system works. Not fully. That the system itself is reliable, or it’s 100% perfect, or it’s 100% aligned with expectations. That’s impossible, especially when you don’t have a development environment. [00:35:30.330] – Ethan
You’re really abstract at the moment. But if I have knowledge about, say, networking is my specialty, if I know that the network should have X, Y, and Z routes appearing in the routing table, is that like knowledge? And that’s a thing I’m testing for after I’ve made my change. Is that what you’re getting at? [00:35:45.810] – Rosemary
Exactly, yes. And hopefully someone else like, if I have never seen this network before, and you’re telling me, rosemary here’s this network, it’s just not working. I should be able to look at this test and say, hey, it should have XYZ routes on it somehow I only see X and Y. There’s something a little bit weird about this, right? So it’s matching expectations and reality so that someone else in your team can also look at it. [00:36:11.240] – Ethan
Well, it’s automating what a complex network change would have included anyway. At the end, before you blessed that the change is successful, you would have gone through a series of ten or 20 tests that verify you’ve got this many routing adjacencies and this many routes in the routing table and whatever all the other critical tests are to validate that the infrastructure is indeed at the state you expect it to be in now. You’re doing it in some kind of an automated way. [00:36:34.540] – Rosemary
Okay, exactly. Yeah. And automation just helps to make it a little bit faster. B communicate it so that you don’t have to necessarily put it in a piece of documentation somewhere someone has to go look for what is this? Because I used to have checklists, basically when I used to put in changes there were checklists of like, oh, this is exactly how we know that this is working. And I always forgot something on the checklist. Right. So you can think of it as a way to do this automated checklist, especially when you have a lot of infrastructure to make changes to. But in general, testing without sort of the safety net of safety testing allows you to sort of provide, get some more information about what’s happening in production. It’s about knowledge. Again, it’s not fully about a functional, perfectly available resilient system. It can help with that. But the real value of testing comes when you’re able to use it in a development environment or use it in a testing environment that will help mitigate any risk of failure in production. But when you have to go straight to production, testing itself is more about knowledge. [00:37:48.470] – Ned
Yeah. And I would say the other important part of that is you’re updating the tests in theory as you’re updating the code. And both of them live together. So rather than having that checklist, which lives somewhere else and is updated on its own cadence, now you’re updating them together. And so that knowledge gets captured, including the change that you’re making. What should it look like now? Now that I’m putting that change into place? [00:38:10.830] – Rosemary
Yeah, exactly. And the benefit to that too, is that you don’t know sometimes what impact your change might make. And if maybe an adjacent test fails that says, hey, this network isn’t connecting correctly, that’s maybe an indicator to you that your change may not have gone well, and you get an early indication that there’s maybe something you should debug further. And so I think that testing a lot of people think about tests as like, okay, there’s a pressure to do this really well. There’s a pressure to make sure that it always produces 100% availability of my system. And the reality is, most of the time it’s just a great way to get visibility into how your system is supposed to behave and your expectations about the functions of that system as well. So I think with going straight to production, it’s a very different kind of mentality. If you do have the luxury of a testing environment or a development environment, you can use it to mitigate or catch problems or failures before they go to production. [00:39:14.130] – Ethan
That’s a very practical side of it. Do you also include things like compliance checking for regulatory compliance or security scanning? Would that also be general testing or is that a separate process? [00:39:26.870] – Rosemary
I treat it as general testing and all the security people are probably like, no, why are you just lumping it under tests? I’ve said this many times, but to me, security tests and compliance tests are tests or scans or static analysis. At the end of the day, they’re taking information about that system, metadata about that system, and then understanding it just in the security context or compliance context. Right? So when you think about like a network, we’ll take a cloud network security group rule, for example, or a network group rule, should it be zero? You know, it can’t be zero, zero, zero, because that means it’s open to the entire world. You know, it’s a very simplistic example, but, you know, TCP all zero zero, zero slash zero. You know, we shouldn’t have that right there. There’s that from a functional standpoint, which you could test for, but it’s also a security problem too, right? And in that case, it’s the same test. It’s parsing for metadata. About that rule. I’m sorry for all the security and compliance folks out there who are listening, but testing for security and compliance, at the end of the day, yes, it’s for a different requirement, but it is all pretty much testing. [00:40:49.390] – Ethan
I can see that testing being separated out along organizational lines where they just really want to compartmentalize all of those functions and so they’re going to say, no DevOps. People need to run these tests and that’s your thing. But at the same time, do you want to pass the functional tests and then fail security tests but have that be in production? Wouldn’t you want to test it all at once? [00:41:10.550] – Rosemary
Yeah. Can’t it be functional and secure? I mean, I hope so. I kind of hope so. [00:41:17.030] – Ned
We had an interesting conversation yesterday with another guest who was talking from the security side of things, and he was like, as part of the security tests, we should add value to the operations folks and help them catch potential functional issues based off of the deep security scanning we’re doing. So it sounds like you can meet in the middle. It doesn’t have to be a battle. Cool. One other thing that I come across a lot is when you’re writing your infrastructure as code, you want to avoid hard coding values into it. You want it to be abstract, you want it to be reusable. But those values have to live somewhere, especially if you’re going to use it across a few of different environments. So I think my big question and I haven’t gotten the satisfactory answer or a perfect setup, and it probably doesn’t exist, but where should you be storing your configuration values versus the infrastructure as code? [00:42:15.050] – Rosemary
Yeah, but it’s an infrastructure AWS code, just configuration nodes. There’s like a whole debate about it. We’re not going to go down that rabbit hole. There’s a whole rabbit hole. Actually, the best way to answer this is to describe the same answers that in the software development space folks have been pondering over. Same question actually exists for software development. Right. Where do you put those configurations? A lot of times people are putting them in files, separating them per environment and then having their code or infrastructure as code retrieve that configuration from a file per environment. And you hope that those files per environment match. Right. But it’s not always the case. Sometimes you do need some differences, some minor differences between them. So what I would say is that the same principle applies, the same practice applies from an infrastructure’s code standpoint, if you can, you use a top level set of configuration. Personally, my personal opinion on this one, so asterisk disclaimer there, but personal opinion on this one is that you use a top level configuration and that top level configuration is reflective of all the things that should be pretty much the same across all these environments, across all these different common infrastructure as code elements, right? [00:43:35.590] – Rosemary
So this could be something like name tagging, I don’t know, image identifiers for example, if you’re using virtual machine images, cider blocks, if you have specific Cider blocks that you know must be allocated to this particular application grouping or something, all of those things should probably remain mostly the same across all these different environments. Because if you do change those values, it may or may not affect how the system behaves. So you make sure that those top level values are consistent across these multiple environments. And then if you do need to tune or Tweak configurations across different environments, have a separate configuration per environment that is an override. So you can do this in TerraForm and you can say, okay, I’m going to pass this variable file for dev and in this dev variable file maybe I’ll just allow traffic from this address, for example, that’s more specific to development. So having overrides is really important, but keeping a top level configuration is the most important thing to start with. Now, where you store that a couple of different places, configuration manager, configuration stores are pretty popular, right? Sometimes people will just set a configuration store in the form of a repository and they’ll put it in there. [00:45:01.970] – Rosemary
But it’s one central place that some other configuration can go and not configuration infrastructure’s code or code can go and retrieve that information. And then if you need to update it, you can update it. A lot of people prefer to bundle it though with the code. And if you do that, do a file based approach, just separate the overrides based on environment. [00:45:27.050] – Ned
What about sensitive values? Because especially if I’m putting this stuff in a repository, I probably don’t want to store my API keys and my database passwords directly in the repository. So any recommendations around what to do about sensitive or secret values? And I’m not asking you to shill for Ashley Corporal or anything, just like general what patterns and practices do you see at a higher level? [00:45:55.010] – Rosemary
Yeah, a lot of people store sensitive values. They do the most basic form which is encrypt and commit them to version control. Some people do that. Now, this is not a podcast about secrets manager, secrets managers and how you scale them. We’re not going to go into that. But also why you would scaling your secrets in general. There are pitfalls to encrypting your secrets and storing them in version control. But there are people who do that and there Azure folks who do that just with their configuration. So what they’ll do is they’ll do something like create a secrets dev dot JSON or secrets dev TF will do TerraForm just because it’s easier, but TerraForm. And then they’ll pass it in as an override. And then within TerraForm itself or whatever infrastructure is code, they’ll decrypt it. The repository itself has the ability to decrypt it. Now, it’s not the greatest pattern. There’s some side effects to this in which you don’t audit necessarily the life cycle of that secret, but it does protect it to a certain degree in other situations. What I like to do is store it in a secrets manager and just retrieve it dynamically. [00:47:14.990] – Rosemary
So in most infrastructures code now, whether you’re using a configuration management tool or provisioning tool, whatever you’re using nowadays, there’s this ability to call an API and then take that information and put it into a value that you need. And that ability to call the API is really valuable for secrets. Because if you need to change that secret, you only need to go to wherever you’ve stored it and it will retrieve it from the API directly. Now, this does require network access, which some people don’t like. They’re like if you have offline. If you need to apply this stuff offline, it’s not going to be possible. But if you are using infrastructure AWS code, you’re putting it on a CI framework, it’s going to be on Routable networking, then it’s a perfectly sufficient configuration to put it in your code, retrieve it from the API of wherever you store your secret, and push it in dynamically. It’s a lot easier than statically hard coding it everywhere and trying to figure out where you put it. [00:48:25.010] – Ned
All right, last question, and this is your chance to vent. Rosemary, if you got something to get off your chest, is there some flim flam or misconceptions around managing infrastructure code you’ve heard that just drives you crazy and you just wanted to spell right away? [00:48:47.210] – Rosemary
This is actually a really tough question, mostly because the misconceptions, most of the misconceptions I think we’ve sorted out in a couple of different ways. I will say that there is a huge amount of controversy around the actual term infrastructure as code. I think that there’s been some folks who have thought about it as infrastructure software, infrastructure as code, infrastructures configuration. There’s a lot of variance to it, right? Because the claim is that domain specific languages are not technically infrastructure as code. Really, the only way you can claim something is infrastructure as code is if it’s using a programming language. And there are entire threads dedicated to sorting this note. And what I will say, at least personally when I see that, is like, well, here’s the problem. A lot of the things that we are doing with infrastructure. Now we’ve borrowed from the development space. We’ve borrowed these development practices declarative imperative. No one said infrastructure owns the declarative approach, right? It just makes it easier to manage infrastructure. And if it means that it’s easier to do it in a domain specific language where it’s really opinionated and some people can learn it really quickly, that’s perfectly fine. [00:50:12.290] – Rosemary
It doesn’t make it less code, right? Or it doesn’t make it lesser than because of it. It just means that it’s offering a very opinionated pattern and you have to either learn to work within that opinionated pattern of a domain specific language, or you have to find an alternative. And in that case, if you want more flexibility, go use a programming language. Or find a way to go lower level. Maybe write your own infrastructure code with your own programming language if you want that kind of customization. But if you want to take advantage of very specific behaviors, very specific patterns, and the declarative approach as it was intended to for infrastructure, then a lot of these tools will offer domain specific languages and they’re more accessible to people. So that’s my, I guess, side vent, which is infrastructure as code is still infrastructure as code. You can still apply a lot of development practices to it. It doesn’t really change it. You can still argue about feature branching versus trunk based development, which we didn’t touch on, but you can still argue about all these development practices and still treat it almost as code. And it doesn’t matter if it’s domain specific programming. [00:51:27.070] – Rosemary
If it’s YAML, I know everybody’s like YAML. If it’s YAML or some other format. At the end of the day, if it works and it allows you to create predictable infrastructure changes and allows you to collaborate with your team safely and push changes to production, then who is to say that it’s not something as code? I don’t know. [00:51:50.050] – Ned
See, you’re taking the Pragmatic approach and you’re just deflating all of these wildfire threads out there. [00:51:56.630] – Rosemary
I’m not deflating them. I’m pretty sure it’ll just like spring back up again. I’ll be like, no, infrastructure should be a programming language. And I’m like, okay, that’s great, except what about if people just don’t really care, they just want to create the infrastructure, in which case maybe it’s a little bit too much. In all honesty, I can understand why people also would want programming languages to create infrastructure as well. Sometimes you do want some custom behaviors or you do want more flexibility, and there’s nothing wrong with that. But again, it doesn’t make it less as code. [00:52:35.090] – Ned
And I think that’s a perfect point to ride out on Rosemary. If people are interested in picking up this book and reading more about infrastructure code and patterns and practices, where can they go to get the book? [00:52:48.390] – Rosemary
So you can go to Manning.com. There’s infrastructure code patterns and practices. There’s an ebook as well as the printed copy that you can obtain from there. I believe they are also distributed through many different booksellers of your choice, but I would say the primary distribution is through Manning and its website. [00:53:10.910] – Ned
Okay, we will include a link in the show notes to that. Is there anywhere people can follow you? Do you have a blog Twitter presence? Where are you at on the internet? [00:53:19.430] – Rosemary
Yes. So I’m on the internet as Jakobal tradesmaster of nuns. Right. So Joatmon eight on Twitter. You can find me, Rosemary Wang, on LinkedIn as well. And you’ll find me speaking usually or talking on the Hashtag Corp channels as well, so you can find me at any of those social media channels. [00:53:42.880] – Ned
Awesome. Rosemary Wang, thank you so much for appearing today on Day Two Cloud. And hey, listeners out there, high fives to you, virtual high fives to you for tuning in. If you have suggestions for future shows, we would love to hear them. You can hit us up on Twitter at day Two Cloud Show or fill out the request form on our fancy website. It is day two. Cloud. We have a whole tab just for you to suggest things. Suggest things. We would love to hear what you want to know more about. Did you know that Packet Pushers has a weekly newsletter? It’s called Human Infrastructure Magazine and it is loaded with the best stuff we have found on the internet, plus our own feature articles and commentary. It’s free and it does not suck. You can get the next issue via Packet Pushers net newsletter. Until next time, just remember, Cloud is what happens while it is making other plans.
Podcast: Play in new window | Download