Follow me:
Listen on:

Day Two Cloud 129: Practical Advice On Optimizing Cloud Costs

Episode 129

Play episode

Optimizing cloud costs means much more than just looking at your bill and hunting down unused instances. It’s about how to understand the full lifecycle of cloud workloads, how to deal with management that wants predictable spending even as your actual usage varies, and how to set up repeatable processes for ongoing optimization.

Our guests are Fred Chagnon, Principal Research Director; and Jeremy Roberts, Workshop Director, both at Info-Tech Research Group.

We discuss:

  • The goals of cost optimization
  • Getting management buy-in for optimization processes and job roles
  • Communicating cost savings to management
  • Picking the right metrics
  • More


  1. Cost is just a metric. It can be measured
  2. There are tools to do this, but without roles and responsibilities we don’t know who should be using them, and what they should be doing.

Sponsor: ITProTV

Start or grow your IT career with online training from ITProTV. From CompTIA to Cisco and Microsoft, ITProTV offers more than 5,800 hours of on-demand training. Courses are listed by category, certification, and job role. Day Two Cloud listeners can sign up and save 30% off all plans. Go to and use promo code CLOUD to save 30%.

Show Links:

Cloud FinOps – O’Reilly Media

@fredchagnon – Fred Chagnon on Twitter

Jeremy Roberts on LinkedIn

Info-Tech Research Group


[00:00:00.490] – Ethan
[AD] This episode of Day Two Cloud is brought to you in part by It Pro TV, start or grow your It career with online It training from It Pro TV and we have a special offer for all you amazing Day Two Cloud listeners sign up and save 30% off all plans ITPro dot TV slash day two cloud and use promo code cloud at checkout to save 30% off all plans. [/AD] [00:00:31.290] – Ethan
Welcome to Day Two Cloud today we’re going to talk about cloud cost optimization. Our guests are Fred Chagnon, principal research director at Infotech Research Group, and Jeremy Roberts, workshop director for infrastructure and operations at Infotech Research Group. And these guys have done research and gone deep on how you optimize cloud costs. That doesn’t simply mean look at your bill and look, we shouldn’t be spending on that. We’re not even using it. I mean, that is a component of things, but they go way more deep because it’s also a cultural thing. It’s a process thing. It’s understanding from the entire life cycle of the workloads that you put into cloud, how to optimize and control those costs, how to deal with management when they are looking for a predictable budget.

[00:01:16.180] – Ethan
But the way you consume cloud isn’t predictable depending on what your business’s compute needs are from month to month. Ned is not with us today. He had to step out because reasons. And so you’ve just got me as the host. Please enjoy this conversation with Fred Chagnon and Jeremy Roberts of Infotech Research Group. Fred and Jeremy, welcome to Day Two Cloud today. It is lovely to have you both and Fred, starting with you. You pitched this show to us because you guys did some research about cloud cost optimization. So tell us what this research was. Who did you talk to to gather data, et cetera.

[00:01:56.010] – Fred
Sure. Jeremy and I work at a research and advisory firm. We get the benefit of advising on topics that we research, but sometimes it happens the other way. Sometimes we get asked questions that we have not fully researched yet. And cloud cost management was one of those slow Burns that as our members were moving to the cloud, they started to come up with these questions. The research that we did was pretty organic in the sense that it primarily came from people who had actually solved some of these problems. So when you’re talking to somebody, a CIO, an It leader or an engineer about challenges they’re having. Oftentimes you do learn about other things that they’re doing very well. So a big part of the research that we’ve done in the body of cloud cost management has come from people who have solved the part of the problem, but not necessarily the whole enchilada. And so taking in a whole bunch of these little tiny solutions, we have the benefit of aggregating it all. And that’s where things like a good framework can come up with, or at least even seeing what the entire problem is so that others can bite it off piece by piece.

[00:03:07.710] – Ethan
So you’re describing what would be a nuanced solution to a complex problem. But initially, if I’m a business and I’m starting to get those cloud bills in, I could be simply overwhelmed by the size of the bill that maybe I wasn’t expecting or didn’t quite budget for. And so it’s a simpler problem. It’s like the bill is too big. Fred, can you help me? Does it start like that?

[00:03:30.950] – Fred
It absolutely starts like that. One of the first things we want to do to help somebody is basically take a look at their bill and just start with that. Right. I feel like I’m knocking on the door saying, Show me your water bill, except that I actually do want to help them, not sell them more stuff. So we start there. But we make it very clear that this is where we’re starting. This is not where it ends. And in fact, the problem is not usually the high bill. And the solution isn’t usually simply just, oh, you’re spending a lot on Amazon EC2. You should probably use fewer of those or just buy some reserved instances. The nuanced problems end up being things like you’re spending a lot on EC2. Oh, I didn’t realize it was that high. And what were you spending three months ago? Oh, it’s a lot higher. Oh, somebody turned on some virtual machines, and we didn’t know about that. Right. And now we’re getting into governance problems, roles and responsibilities, problems.

[00:04:27.340] – Ethan

[00:04:27.880] – Jeremy
Some fundamental changes as well. Right?

[00:04:29.710] – Jeremy
So thinking about you’re in the cloud now all of a sudden you have to account for costs that you haven’t traditionally accounted for before. Right. So folks will come to us with this bill and they’ll say even if it’s a SaaS environment. Right. So it’s not necessarily an EC2 instances or an instance in Azure virtual machines. It’s a cloud email client. Well, now I’ve actually got a total cost for an individual user, right. At a given point. And I didn’t have that before. So do I show that back to the business unit who hired the person, or does that exist in my domain as a central IT shop? Right. And so folks come to us very often with questions like that. And as Fred mentioned, we were able to sort of aggregate some of those problems and aggregate some of the responses and come up with some general statements that have helped us help further. Folks, if that makes sense.

[00:05:15.410] – Ethan
That’s an interesting metric cost per user. So that would be the end consumer or more like the IT operations person responsible for a given set of things that are being spun up in the AWS cloud.

[00:05:29.650] – Jeremy
Sure. So a good way to think about it is historically, TCO has been a fairly complicated formula. Right. So I’ve got a data center. I have some hardware in there. I’m depreciating it over a particular cycle. I have some folks who have to manage that hardware. I’ve got things like heating, cooling, insurance and all that stuff. But this is for a shared data center. So identifying my actual cost to serve an individual end user or my cost to support a particular application was quite difficult. And the cloud, all of a sudden takes that and says, we are going to outsource a lot of that underlying management to somebody else, and they are going to give us a fixed cost. So whereas before, we didn’t actually know exactly how much it cost us per user because we didn’t take the time to conduct that TCO. Now we actually do. And so now we have a new tool at our disposal. Right.

[00:06:15.200] – Jeremy
Which is granularity or measured service, which is a fundamental characteristic of cloud services. And some managers will come and say, Well, now that I know how much it costs, right? It’s not a central cost that’s going to be shared by everybody. It’s you hired a person or you instantiated a virtual machine. Right. And it’s tagged back to your Department or an application that you own or support. Does IT even have any business paying this bill? Do we care or do we just show it back to you? And then you sort out any sort of behavioral issues that are going to introduce additional cost. Right.

[00:06:46.190] – Ethan
Behavioral issues. Yeah.

[00:06:48.000] – Jeremy
Shame back, as we like to say, you’re not going to incur or pay the cost yourself. Excuse me. You can at least be shamed. So it’s about, of course, managing the overall cost. It’s about managing the complexity of your environment. It’s about managing variability and all that, which I’m sure we’ll talk about throughout the rest of this podcast. But it’s also about taking this new tool set and taking full advantage of it. Right. And the things that were historically difficult for us to do are now easy. How do they fit into our conceptualization of what a good cost management framework actually looks like? And I know that that’s a very deep and philosophical question. I think Voltaire was the first to ask it’s something we confront on a daily basis as analysts in this space.

[00:07:32.130] – Ethan
Actually, so let’s drill into that. And invoke Voltaire, what is cloud cost optimization, which is really the title of this podcast? We’re not simply saying the cloud bill is too high. We gotta reduce cost. Yeah, that’s part of it, I suppose. But cloud cost optimization is more nuanced. So what are we talking about getting done here?

[00:07:53.770] – Fred
It starts with the price, and that’s usually the thing that spurs, but it’s really three dimensional with the things that we want to manage or optimize. So cost is certainly one. Variability, that’s another one. And again, when you come from a capex world where all the assets were bought and paid for, there isn’t cost variability. I’m sending traffic in the data center down a network pipe that was bought and paid for and storing it on disk that was bought and paid for. But in the cloud, that network could be data transfer out, and that could be very variable. My storage costs could be variable. I could be sending information to a database that’s billed by transaction and the number of transactions. If I have any seasonality at all to my workload, the number of transactions goes up and down. So that’s not to say, don’t use those services that are variable. But variability of the bill is certainly an issue because it creates problems predicting and budgeting.

[00:08:55.990] – Ethan
What kind of variability are customers typically seeing? It feels like it should be within a bound. Right, Fred? So it shouldn’t be wildly variable. It should be a little variable depending on your load and demands for a given month.

[00:09:11.890] – Fred
That’s right. And sometimes variability is a good thing. Like there are services that you could be procuring in the cloud that you don’t use one month and then the next month you use a lot of, like, security services that do incident detection or whatever. If you have no such incidents to follow up on, I think it’s guard duty. It doesn’t do a whole lot. But if you’ve got something you’ve got all of a sudden some kind of thing that requires investigation. Now your next month’s bill of that particular service is very high, and that’s very hard to predict. But the classical approach to that would be you might have been paying for a tool that you’re kind of paying a flat rate for, and it seems like wasted money, except the one month of the year you have to use it. And data recovery is another thing, right? The cost that it takes to restore data from the cloud back to another cloud, or maybe back to your data center. You only pay when that happens. And so there’s a variable cost to that. But you can always rationalize that as well. That’s the cost of recovery.

[00:10:16.290] – Fred
But we didn’t have to do that before. And that’s why it’s a bit of a shift. So sometimes optimization, sometimes cost management is really just about understanding that. That’s the new cost model. And we just have to understand how we can explain it, not necessarily reduce it. So that’s the variability aspect. The other one is complexity. Jeremy mentioned that we didn’t do all the TCO stuff before in the data center because it was really hard. The cloud service providers have done it for us because that’s their business. They have to bill us. But if you look into the guts of cost and utilization reporting or anything like Azure Advisors and other service that sits on top of their cost, it is super complex. And think about the people that are usually getting the invoices, right? Your IT asset managers or procurement people, they’re not trained in this stuff. So they just get a bill and go, wow, this is a five page AWS bill. I don’t even understand anything that’s on here, but I know that it’s probably those infrastructure people or it’s probably IT. So what are the strategies that we can do to reduce complexity to make the bill more understandable, categorizing things and making it all sort of again, because with complexity, when you have clarity in your bill, you can manage the individual cost better, too, because you start to understand what they are.

[00:11:47.640] – Fred
But if the bill is complex, you can’t even get started.

[00:11:50.000] – Ethan
You’re talking about interpreting the bill. So taking that obtuse thing, you get that’s pages and pages long from the cloud service provider and making it give you a framework that you know how to interpret the thing. So I would compare it to back in the day, we’d have WAN bills, and there’s a number of companies I worked with where there was some human that was responsible to audit those on a monthly or at least quarterly basis to make sure we weren’t paying for circuits that we’d asked to be disconnected. And, hey, three months later, we disconnected that. But we’re still paying for it. What’s going on and then go to the WAN provider and say, hey, guys, and get that all sorted out. It feels like a similar role here, except for the obtuseness of it where there’s not a circuit ID you can match to a site and kind of figure it out from there.

[00:12:31.830] – Fred
That’s right.

[00:12:32.300] – Ethan
You just got pages and pages and pages of obtuseness.

[00:12:35.870] – Jeremy
It’s not always intuitive, either. Right. Like what’s an elastic beanstalk if you’re a payroll clerk, is this Ethan trying to sneak a drink by and expense it. What are people actually buying? What does this do? Why would I pay money for this? Why is it $0.17 but repeated 40,000 times. Right. There are things in the cloud that I think a lot of folks aren’t necessarily used to handling. This isn’t a universal problem. Sometimes it’s fairly simple. If it’s the transactions or gigabytes of storage or something like that, it’s easy enough to interpret, but that complexity just getting this massive bill that might only be for an enterprise, maybe $17,000. But yeah, it is 40,000 lines. Right. Like there’s going to be some manpower required to actually parse the thing and to process it, and that historically hasn’t necessarily existed.

[00:13:30.150] – Ethan
That manpower you talk about is that an issue of educating the humans to understand what they’re reading or give them a technological solution that can parse that bill and present it to them in a more understandable way.

[00:13:43.290] – Fred
I think the role is really at the root of a lot of these issues. That’s usually where we start in terms of our advice is understanding what all the roles are, because it’s still going to probably be that clerk’s position to get the invoice. But there are things that can happen before that invoice arrives that makes that invoice more readable. And so when that invoice gets forwarded to somebody else just for analysis, and that’s a primitive way of doing analysis. By the way, all the metadata is already there and kind of help sort it out. The biggest challenge that we start to uncover when we get into this is just asking people whose job it is to A. Get the bill. B. Who do you have to show it to? Who are the parties that are responsible for consuming that bill and reacting to it? We uncover that at pretty much every organization we’re talking to. There is a role that is missing because it didn’t exist before in the data center. Ethan, you mentioned the WAN person was doing it a lot in the case of the telco, but yeah, traditionally in IT, that role was not present.

[00:15:02.890] – Fred
There wasn’t somebody who was responsible for looking at a bill month by month or even more frequently than that, and making adjustments or splitting it up and showing it back to different business units. So we’ll go through sort of all the plan functions, all the build functions, like architecture, and whose job is it to make sure that when we design clustered databases in the cloud that they’re done in a way that is cost optimal, somebody needs to be accountable for that. And then in operations, it’s more about who’s getting reports, who’s the one that’s going to, who’s going to get alerts if we exceed the budgets, who sets what those budgets are. So there’s a lot of questions about who and that’s where we start to find the gap of maybe it should have been the architecture team, but they didn’t know that. Maybe it should have been the engineering team, but it wasn’t really part of what they did. Or maybe there is a new role. And I’ve seen many different versions of this role. I mean, Cloud cost manager, cloud. I think what was the really cool one, like Cloud financial analysts, cloud economist.

[00:16:16.040] – Fred
I think that’s my favorite, but yeah, really, there’s opportunity also for maybe not a new position you hire for. It’s not like it’s not on your LinkedIn profile or anything, but certainly it’s a responsibility that somebody undertakes.

[00:16:32.700] – Ethan
But you’re also talking about that economic viewpoint being introduced to a project early on. That is there’s roles and responsibilities that are assigned to cost so that you don’t have an architect that’s like I’m going to overbuild this thing because I’ve always overbuilt things and I’m going to get that money once we have to have a different mindset when procuring cloud resources, you don’t buy the biggest instances. It’s different from that to get to that optimized cloud cost. But to pull that off, you’ve got to have that happen very early in the project, or it could be you’ve built something that’s hard to back out of architecturally.

[00:17:16.130] – Fred
That’s right. So there is certainly a role play. We’ve been talking a lot about the governance space. The other space is in the architecture realm, designing with cost in mind, that’s a newish discipline. That’s the newish part of the discipline. But guess what? We’ve been designing with other targets in mind the whole time. When an architect designs a system, they might know how many transactions it’s supposed to. They have a transactions per second thing in their mind, and they want to hit that they might over provision or overbuild. But ultimately, they want to hit that transactions per second. Well, why not set cost targets? Because we know now that this isn’t going to be. We didn’t get all the money upfront. It’s not capitally funded. We didn’t just get a million dollars. And that’s sort of what the budget we have to build it. And I’m not just going to architect my solution now to meet that fixed cost. What I have to do instead is understand, what are the cost targets for this platform I’m building or procuring or whatever. And how can I design the system such that it’s going to achieve those relative cost targets and not go wildly over?

[00:18:28.250] – Ethan
It does feel a bit antithetical to what happened with Shadow IT in the cloud to begin with. Devs just wanted things to be easy. I can swipe a card off I go. Yeah, now we’re talking about now there’s going to be a budgeting cost analysis component to this IT project right from the beginning, like we always had. But I guess that’s because we’re all smarter now, we all know that it’s not going to be cheaper to run this in the cloud. I’m going to need to control costs and pay attention to this carefully. Am I right in saying that there is a bit of a shift here where some people that were used to just having things easy. It’s no longer as easy as it was. And it’s kind of like the security game, Fred right. You got to have those security people involved right up front, even though you don’t want to.

[00:19:15.150] – Fred
That’s right. The shift is that there’s a lot less work that’s done up front and you pay for it month by month or hour by hour or sometimes second by second. And that does take a different mindset to prepare for. So that’s really ultimately, the shift that is taking place is like, you know, it’s like anything where you moved from something you built yourself to, something you’re paying for at a utility rate.

[00:19:45.070] – Ethan
[AD] I’m going to interrupt the podcast for a minute here to talk about IT Training. You remember the ransomware attack on the gas pipeline last year? It caught your attention, probably caught mine. There’s a key thing here. Cybersecurity professionals are in demand to prevent that kind of thing, but there are not enough humans out there to fill all the positions. There’s over 500,000 open cybersecurity roles. You can become a cybersecurity professional if you get some training, some online training. It is never too late to start a new career in IT or move up the ladder. IT Pro TV has you covered for your training. They cover everything. Comp Tia to Cisco, the EC Council to Microsoft. They’ve got all of it, including the cloudy stuff, more than 5800 hours of on demand training, and the way they present the information. Some presenters are like they’re reading from the book and they’re super boring. That is not IT Pro TV’s format at all. They use engaging hosts that they’re going to present the information in a talk show format and really keep it interesting and they do it live. They’re live every day. And then once they recorded that live show, it goes studio to web in 24 hours.

[00:20:56.390] – Ethan
As you’re digging through their website looking for content. All the courses are conveniently listed by category certification, job role. You can find what you’re looking for without a lot of trouble. And then when you pick the thing and you’re ready to go, you can stream IT Pro TV courses either the live stuff or the on demand stuff from anywhere in the world via whatever platform you like. Roku, Apple TV, PC or there’s apps on iOS or Android. Learn IT, pass your certs and then get a great job. Maybe in cybersecurity with IT Pro TV, visit itpro dor tv slash day 2 cloud for 30% off all plans. Use Promo Code cloud at checkout. That’s ITPro dot tv slash day 2 cloud. Day two cloud is day the number 2 cloud and then use Promo Code Cloud at checkout. One more time. IT pro dot tv slash day 2 cloud and use Promo Code Cloud at checkout to save 30% off all plans. And now let’s get back to the podcast. [/AD] [00:22:00.490] – Ethan
Is there a way when I go through this process to communicate to management that costs are being managed or that maybe you figured out a way to save some money or something like that? How do you communicate that considering the arcaneness of our bills and such?

[00:22:17.350] – Fred
Well, if you’re not already hearing it from them, that’s first of all, a good thing. Right. So this is a great way to be Proactive. And I would just basically use those same three principles that we mentioned. Right. So you present management with there are going to be three challenges of cost in the cloud. One of them is going to be the price. One of them is going to be our ability to understand what we’re being built for, the complexity. And one of them is going to be the variability. And so cost management practice. And basically getting good at doing this discipline is going to help control those three things, not eliminate any one of them, but keep them all in some element of control.

[00:23:02.650] – Ethan
Yeah. Keep them all some element of control. It’s funny. One of the challenges I’ve always had as an engineer trying to communicate to management is knowing what they know and don’t know what they have context for when I’m communicating an idea. So that can be, don’t get too technical but still make a point kind of a thing. Something they can hold on to and walk away with and remember the important thing you’re trying to get across.

[00:23:28.900] – Jeremy
They don’t care about elastic beanstalks. No, you don’t want to. And this is something that we end pretty much every engagement that we do with is a targeted communication plan. Right. So what do you, as a developer, need to know? Well, guess what? You’re going to have to optimize some of your code because we’re being billed per transaction. What do you, as the end user need to know, you’re going to log into a web portal to access this service where you didn’t before. Or maybe we just migrate the back end. You don’t know anything, right? And it’s just smoother. It works better where you are. What do you as the CFO need to know? Well, guess what?

[00:23:56.260] – Jeremy
Now we’re going to either need to change our allocation or you’re going to have to give us the fund that we draw down. We’re not capitalizing our resources anymore. What do you, as the CEO need to know? Oh, we’re digitally transforming. There’s targets for communication that every individual group is going to need. And just by publishing a pamphlet and handing the same one out to everybody, you’re going to get nobody reading anything. So we always suggest that you create your communication with that in mind.

[00:24:23.400] – Fred
Yes, the exercise really is all right for each one of those audience members that Jeremy mentioned, what do they need to know? And how do we communicate that to them? Right.

[00:24:33.000] – Fred
So the CFO is a good example. What do they need to know? Well, they’re going to care about our ability to do chargeback and showback to different business units. Okay. They’ve been talking about chargebacks. Okay, that’s interesting. Let’s tell them that there are things that we need to do to establish the ability to do showbacks, Ie, tagging and categorization. And we are prevented from doing chargebacks because the financial operations people at the company actually don’t have the right accounting buckets to allow us to do that anyway. So we’re going to crawl, walk, run. We’re going to start by doing showbacks, and if they really want chargebacks, then we’re going to have to work with finance operations. It’s their problem to solve. The other aspects of management. And what they need to hold on to is again, going back to those three dimensions with cost savings. I think the biggest myth about cloud is that we go to the cloud to save money. That’s something that most managers or leaders need to. We’re not saying, well, that’s a lie. That’s a fallacy. It’ll never happen. What we need them to understand is it is possible to save money in the cloud, but it’s probably not going to be the outcome we need to set their expectations for.

[00:25:56.460] – Fred
The outcome is probably going to look like this. The things that we do in the cloud are cheaper when we do them in the cloud rather than building them ourselves. So in that sense, we are actually saving money. But our bill may be more expensive than what we’re used to paying in the past because guess what? We’re actually using a lot of that innovation in the cloud. So at the end of the day, we’ve saved money because we’re doing things we weren’t doing before, and we’re not paying as much for them as we would have in the past if we built them ourselves.

[00:26:27.860] – Ethan
You’re saying it’s more of a cost benefit analysis, Fred?

[00:26:30.980] – Fred

[00:26:30.980] – Ethan
Yeah, the dollar spent might be higher, but it’s such a big win. For these reasons, we get to do all this stuff as a business. It would have been harder for us to do before.

[00:26:39.210] – Fred
That’s right. I’ve unlocked new markets or I’m able to I’ve increased resiliency. I can do business now in more geographically than I could before benefit benefit benefit things we couldn’t do before. That’s why I would shift the cost conversation.

[00:26:54.500] – Ethan
I mean, is that cost conversation still happening a lot?

[00:26:57.330] – Fred

[00:26:57.330] – Ethan
I guess. Is that myth still prevalent? Oh, I’m going to save money going to the cloud.

[00:27:02.190] – Fred
It is. Yeah, it is.

[00:27:04.670] – Ethan

[00:27:04.670] – Fred
I’m equally surprised, but it has to do with where people are in their in that journey. So I am surprised when I hear that from somebody who is clearly very mature in the cloud. It’s sort of like what you haven’t realized this yet. But when you have an organization that still hasn’t put more than just a few workloads in the cloud, or maybe they’re just consuming SaaS. And now they’re starting to look at taking the platforms that are their bread and butter and putting them into platform as a service or something. They still believe that they’re going to save money in the cloud, and the money that they’re going to save is on like, I don’t have to use IT as much. But what they don’t realize is they didn’t have to use IT as much when they moved to SaaS because that was sort of non commodity stuff. But their IT Department is very actively involved in those platforms that are still running in their data center, because that is their bread and butter. And when those go to elastic Beanstalk or whatever, their IT Department is still really involved in them.

[00:28:01.920] – Ethan
Yeah, I differentiate it, Fred, as if it has to do with application delivery, your business has an application for your internal support or for your customers, something that’s public facing. And you have to build a platform to deliver that application. Whether you’re doing that in your own data center data centers or Colo facilities on your own metal, or you move that to cloud, you haven’t really shifted what has to happen to get that done. You still need technical expertise, deep technical expertise.

[00:28:32.350] – Fred
That’s right.

[00:28:32.740] – Ethan
To do that effectively. So you’re not going to make IT people go away just because you moved it to AWS, and there’s an API involved now.

[00:28:40.530] – Fred
Right. And that’s about helping them understand that cloud. It’s unfortunate that we always use the same word cloud to describe SaaS from PaaS, from IaaS, because those are what you have to understand clearly to understand. When I move Exchange to Exchange online, it means one thing, but that was moving to the cloud. But when I move Oracle databases to Amazon Redshift or something like that, then that’s something else.

[00:29:08.420] – Ethan
Yeah. If you’re running your own Exchange and you just moved to O365, okay, that is basically outsourcing email. That’s a different animal than picking up where you’re doing your app delivery and moving it to a different platform. As soon as there’s a platform involved, you’ve got to have those technical experts.

[00:29:24.590] – Jeremy
Or even then hosting Exchange in an infrastructure environment which you could theoretically do if you wanted to, right. It’s a virtual machine. You could run it in AWS. I think we actually ran SharePoint out of AWS for a while. At one point, it’s definitely a possible outcome. I’m not sure that it makes a lot of sense, but to back to your original point, right. Is this cost conversation still being waged in such simplistic terms? In my experience, I don’t tend to see people say the cloud should be cheaper. I tend to see people say I’m not moving to the cloud because it’s more expensive. We say, Well, if that’s the only thing that you’re considering, then absolutely, you’re going to have difficulty making that case. I think it was Benjamin Graham who’s a famous investor. He’s like Warren Buffett’s mentor. He said cost is what you pay. Value is what you get.

[00:30:11.110] – Ethan

[00:30:11.460] – Jeremy
And so we like to frame the cloud in the context of value as opposed to strict cost outlay, because you’re almost always going to lose on the cost side, not necessarily universally. Right. There are some workloads that are excellent fits for cloud, and they can be optimized, and it could be a big money saver. But in terms of things like agility and resiliency and modernization and the ability to hire people to maintain your platform and elimination of technical debt and all these things, there’s a lot of potential value there. And so we want to make sure that we’re not just glossing over that. And I think a colleague of ours actually wrote a research called From Cost to Value. Having that cloud conversation.

[00:30:50.970] – Ethan
I want to overstate this a bit, Jeremy, but I would put it that cloud means you can take the infrastructure for granted. I think that is exaggerating it a bit. But the point I’m trying to make is as opposed to ordering servers, you have to rack and put operating systems on and put a hypervisor on and all of that and then have the people that got to maintain them and keep them going and fix the hard drives and the power supplies when they fail. You can forget all that. You can take all that for granted now because you’re in cloud. You can’t take any of the rest of it for granted. But it does give you this agility, the ability to stand up that infrastructure very quickly and get that tedious time done. No long buy cycles, no long waiting for stuff to show up in the dock and get it installed. Even if you’ve got all that and it’s networking, you need to stand up some component of the network so that it’s secure and multitenant. And all of that, you got to put in the requests and have people get all that done. Forget all that.

[00:31:51.670] – Ethan
Now you’ve got infrastructure as code. All that means is you can take that for granted and get some big value out of that and bring things to market more quickly. What’s that worth to you? It is quantifiable. You can put a dollar value on it if you dig deeply enough.

[00:32:07.360] – Jeremy
And some folks will. I mean, the way that I see that phrase very often, and the strategies that I have a hand in is doing more valuable work. Right? Unless you are Amazon, right? Unless you are Azure, you’re not in the business of racking servers, right? You’re in the business of doing other stuff and that’s sort of a means to an end. So if I take that responsibility away from you, theoretically. And I say this theoretically, you should be doing something with that reclaimed time and effort that is specific to your niche or whatever that happens to be. So if I’m not spending time troubleshooting my exchange server, maybe I’m building out workflows that are going to enable my people to process orders more quickly. Right. If I’m not worrying about the underlying infrastructure, my developers have more time and more of my energy to help optimize for a custom application that we’re building. Right.

[00:32:56.360] – Jeremy
So that’s really what the cloud is designed to do. I think, Fred, I’ve heard you refer to it as like a force multiplier. IT writ large as a force multiplier. That’s what the cloud. I think that’s the promise of the cloud. And if we go back to the origins, I know this is day two cloud. Let’s talk about day zero cloud first, if we go back to AWS, right there’s this myth floating around that Amazon created AWS because they just had all this extra compute that they needed to get rid of. They say Jeff Bezos walked into the office. He was just holding so much compute. It was falling out of his hands like a Grammy in Michael Jackson’s hands at the awards in the 80s. I just got so many of these, I’m dropping them everywhere. So much compute. I need to give it to someone. No, that’s not what happened. What happened was they found that every time they needed to stand up a new project, they were repeating a lot of work. So they’re not selling excess compute. They’re selling that systematization. They’re selling that standardization. They’re selling a process and a tool set.

[00:33:52.180] – Jeremy
Right. And I think that when you start thinking about cloud that way, the value starts to become a little bit more apparent.

[00:33:57.080] – Ethan
Okay. But I could argue this the other way. It’s because we are seeing cloud repatriation happen. So if I argue the point this way, we’re talking about undifferentiated heavy lifting. I don’t want to be in the business of rack and stack and running data centers, because why would I bother? But then you could say, well, wait a minute. If I took cloud operational principles like you were just saying, that’s really how AWS got started and what they were selling. And I bring those in house because Kubernetes, let’s say I decide to build Kubernetes expertise in house and then begin standing up applications there and go to that cloud model consuming infrastructure via API automating infrastructure stand up via pipelines that my Devs can mostly do. And all I got to worry is keeping my Kubernetes cluster up and running as an infrastructure professional. Well, is there a case to be made for bringing it back in the house?

[00:34:45.630] – Jeremy
Congratulations. You’ve invented the private cloud, but there is certainly a case to be made for it, right? They are operating infrastructure in a certain way, and then they’re charging you a margin on top of what it costs them to do it. If you can operate at a scale anywhere close to them or your core business involves infrastructure provisioning in a way that maybe a traditional business core business doesn’t involve it. You could certainly make the case that running a private cloud is valid for you, or even a hybrid cloud solution or something like that. Take an example of a major company like a Dropbox. I don’t have the numbers off the top of my head, but they were sort of the arch repatriator. And there was an article that Andreessen Horowitz, which is the venture capital firm, put out not too long ago talking about repatriating cloud solutions. And they were arguing that public cloud providers by their very existence and the fact that they have among their customer base, many US based startups are just shaving market cap off of these startups, right? Because once you get to a certain scale, of course, it makes sense for you to bring some of that back in house, right?

[00:35:48.440] – Jeremy
There are some things that you are actually adding value by managing internally. There are some things where scale plays in your favor or particular skill or expertise or developing your own hardware makes sense. I mean, think about Netflix as another example. We talk a lot about Dropbox and Netflix as being some of these big cloud customers. Netflix is an AWS customer. They’re loud and proud about their use of AWS for certain things. But when it comes down to the things that are core to their business, they actually will develop like their own hardware. Right? They’ve got ISP partners. They’ve got points of presence. They’ve got a CDN network that they’ve developed in house because to them, that was core to their business in a way that maybe some of the services that they purchased from Amazon aren’t. So I completely take your point.

[00:36:32.580] – Ethan
Well, you make an interesting point about scale. So that Anderson Horowitz report that you mentioned was panned a bit by some people because I believe they cited Dropbox as the main example of those companies that repatriate a lot and save money. And people were saying, yeah, but that’s not a model for everybody because look how big Dropbox is, how much data they’re moving around and all of that. So yeah, of course they could save money. Could everybody save money repatriating? Maybe not. Probably not. In fact, some folks argued.

[00:37:03.330] – Fred
Yeah, it’s hard to say that perhaps they were trying to say in that report that you have to be of the size of Dropbox and to repatriate and save money. And that’s not necessarily true. But it is an example of what Jeremy was saying that Dropbox got to I think it was like 200,000 users or some point of scale that they decided at this point it’s going to make sense for us to bring it back or bring it back. I’m pretty sure they were born in the cloud, but at this point, it’s going to be better for us to we can now buy storage at a particular volume that it’s going to be cheaper for us to continue to grow predictably on premises.

[00:37:38.550] – Ethan
In this conversation, we’ve talked about managing the costs that we have, and there’s almost been this undercurrent of yeah, once you understand your cost, that’s probably the cost. But is there an element also of we’re wasting money here because certain workloads, certain database instances, whatever are spun up that no one’s really using or they were over provisioned. Can we catch those things, too? Is that a common problem?

[00:38:04.050] – Fred
It is. When we go through the roles and responsibilities and we simply just ask the question whose job it is to periodically audit the infrastructure for those sort of entities for things that are unused, things that could be turned off to right size workloads. Frequently, I don’t have a case where somebody recognizes that’s not their job, whether or not they’re actually doing that different story. And that goes back to again in the data center. When we had our own data center, there was never a priority on turning stuff off because somebody else is paying the space in power bill. And I mean, I’m guilty. We used to leave servers racked in the data center so that no one else would take the space, and we would only decommission a server when we had a newer one to put in there. I mean, that was the practice that we got in habit of. So again, this is about understanding that when you’re not in the room and the lights are on, mom and dad are paying the bill so turn the lights off. Right. Whose job is it going to be in operations likely to periodically look at the environment and determine usage.

[00:39:14.420] – Fred
So it’s just a function that needs to be done. What I like most about the whole co-optimization practice is that it just fits so well into traditional IT operations because cost is just a number. Cost is a metric, and IT operations are really good at managing metrics as long as they know that it’s something they have to manage. So if there’s a number that I’ve got to keep my eye on and there are practices I can put in place to ensure that that number remains a nominal level. Now I just need to know what those things are, and I just need to know how often I need to do them and put those processes and procedures in place and actually run them and report back on them.

[00:40:00.480] – Ethan
Well, Fred, you and I are network guys from way back. And of course, we had our network management stations and our red light, green light and measuring utilization of bandwidth links. And you make it sound like measuring something like that.

[00:40:12.890] – Fred
It is exactly measuring something. What’s the difference between measuring cost within a certain bounds and measuring CPU within a certain network bandwidth within a certain bounds or disk utilization? We have to just go and figure out how we can get that data, right? And that’s it Amazon or AWS, Amazon or Azure make that data available. We have to go and get it. Sometimes it comes in in a weird format. Sometimes it comes in. Oh, you got to turn on this redshift database so you can get the cost and utilization reporting out of Amazon. Solvable problems. But we have to get the data out and we have to get the data in front of the right people again. Roles and responsibilities. Whose job is it and then make sure. Okay, fine. It’s Fred’s job to monitor cost reporting on a month to month or week to week basis. That’s fine. Now we know it’s Fred’s job. How does Fred get the data? Let’s go and make sure that he’s got the data. How do we want him to present the data? What does he do with the data? Does he just react on it, or does he send a report to somebody?

[00:41:12.220] – Ethan
Yeah, that was my next question. What do I do with the data? Because if you’re making it IT Ops responsibility or you’re suggesting it’s appropriate for it Ops to be monitoring cost.

[00:41:21.730] – Fred
Something that happens over and over again on a repeatable schedule is an operational task. Yes.

[00:41:29.110] – Ethan
Well, I mean, normally, if something was broken from an engineering perspective, a line goes down, a VM falls over. I fix it. If it’s a cost out of bounds, I’ve exceeded threshold on some cost. I mean, I can’t shut it down just because it went high. What is an action item? I take. Alert, I don’t know someone in accounting, a manager or something. You’re saying?

[00:41:53.650] – Fred
Maybe it’s an architecture problem there. Maybe we’re finding that it’s frequently over cost and that information needs to feedback to design. That’s a logical place to take it. Certainly it needs to go to management. They would rather hear from their IT Ops people who are looking at the bill in a more near real time manner than getting an invoice.

[00:42:17.050] – Jeremy
Fred, do you remember a little while back, you and I work with a public sector organization that tackled this exact problem. And what we did was we actually built out what we call the right sizing workflow. Right. So it was basically alert generated. All right. What happens now? What do we do right? They’re a specific organization that they had some budgeting issues and the amount of power that their IT folks had might have been a little bit different than what you would get in a private sector company or even a different type of organization. So keep that in mind. But basically what they said was all right. It’s going to escalate to probably the manager, or maybe even the CIO. The CIO will look at it and they’ll say, how extreme an overrun is this? Can I allocate money from somewhere else and just cover this? Is it an ongoing problem if, yes, I will solve the problem that way. And then we’ll sort of review how we ended up here. If no, I’m actually going to have to go and talk to our finance people and see if I can allocate additional funds and worst case, I’m going to actually have to go testify in front of a subcommittee at the State House to allocate these additional funds.

[00:43:21.010] – Jeremy
And that’s usually an everlasting job stopper if you do that too much. So they had a workflow they designed specifically for this, right. And they called their right sizing workflow. And it basically incorporated all of the layers of approval that they would need to do additional spend. They also at the architectural level, had incorporated, like buffers. Right. So it wasn’t like if they were a cent over, there was no money. They were projecting a cost, and they were hoping that it would land sort of within a range. And this is again, a lot of this was sort of perspective. They didn’t have a huge cloud footprint at the time, but they had addressed this issue. And this is how they had chosen to deal with it. One thing or some best practice. Excuse me, that we suggest in the cloud and the infrastructure side is set up alerts. The provider will give you a trend line saying, hey, you’re looking dangerously close to exceeding a budget. Unless you’re using credits. They are unlikely to actually cut off your service when you hit that budget. But they’ll send you an alert saying you’re getting close, and then you can do stuff also, like tag individual workloads for rightsizing.

[00:44:23.150] – Jeremy
So you can say we instantiated this. I’m going to give it a tag, and then I’m going to run a report and I’ve got a list of workloads that I need to review because they’ve been up for three months or however long. So there are techniques that you can use to manage this.

[00:44:36.250] – Ethan
If I’m listening to this show and I’m an organization of whatever size, maybe I’m trying to decide if I’m too small of an org to go through all this headache with this cost optimization stuff. Is there a too small of an organization, or is this one of those things? If I’m smart, I’m baking this in from the beginning.

[00:44:52.930] – Jeremy
I would say if you’re smart, you’re baking it in from the beginning. But I think that the amount of effort that you put in is going to vary.

[00:44:58.330] – Fred
Yeah, I think there are basically sort of crawl, walk run approaches to cost management, and they’re sort of simple approaches you can take in the realms of governance and billing and alerting and showback and chargeback and budgeting in those kind of realms. That is easy and less effortful. But the moment any one of those becomes a problem area, you can up your maturity game a little bit, right. So maybe today I don’t have to do any show back. So I’m just not even going to put any effort into it until all of a sudden I’ve got business units saying, hey, we pay IT too much money and thought, okay, well, you know what I’m going to do? I’m going to start to implement some tagging categorization, and I’m going to see if I can maybe even change my billing account profile with Microsoft so that I get multiple invoices one per business unit. I’ll do some work to ensure that I can do a little bit more showback, and now I’m upping my maturity in the showback space or in the reporting space, so I can tackle that problem. But I didn’t have to do that until it became an issue because we’re not saying that there’s one way to do cost management, and it looks like this and everybody should do it that way.

[00:46:16.630] – Fred
I think it’s good to just sort of like, keep an eye on where the problem areas are and there are levels of maturity that can be applied to each of those problem areas.

[00:46:25.760] – Jeremy
It’s pretty simple. I mean, it could be as basic as just turning on the Azure Advisor, and they’ll send you an alert saying, hey, it looks like this workload has been spinning, and you’re not using it or whatever. That’s easy enough. And then they can get as complicated as I’ve got this multi cloud deployment, and I’m paying Zesty or one of these cloud management platform vendors $100,000 a year to help me manage this wild deployment that I’ve got, and I’ve got a dedicated staff member, and then it could be everything in between. Right. And I think that’s really a decision that’s going to need to be taken based on the size of the platform, the relative maturity, the complexity of the environment, the perspective of potential savings that you may actually get. You don’t want to spend $100,000 to save $50.

[00:47:07.610] – Ethan
The point you’re making here, though, is if I haven’t set up the appropriate metrics I should be measuring. It’s not impossible to retrofit my existing environment.

[00:47:17.900] – Jeremy
No, it’s a little more difficult, right? Like you might have to tag things at the point of reinstantiation or have a rule going forward where Tags are mandatory and then work through your backlog or whatever. But yeah, you can do it.

[00:47:29.470] – Ethan
Yeah. Well, guys, it’s been a great conversation. I’ve enjoyed this very much for people that want to dig into this topic more and find out more stuff about how to manage their costs in the cloud. I know you guys do some research and can help there, and I think there’s some other resources you can recommend too Fred.

[00:47:46.870] – Fred
I really like the O’Reilly book Cloud FinOp by JR Storment and Michael Fuller. It’s very practical, and it informed a lot of what we did. It helped us figure out sort of those different areas, and it was published in December 2019, so it doesn’t feel like it’s too old yet. And yeah, that would be my sort of follow up reading for this topic.

[00:48:10.840] – Ethan
And Fred, are you on the Internet anywhere? You do anything social these days?

[00:48:15.610] – Fred
I’m on Twitter. I’m at FredChagnon. I’m sure that will be written down somewhere, so I don’t have to spell it. I’m on LinkedIn and yeah, that’s pretty much I’m findable

[00:48:25.640] – Ethan
Same thing for you, Jeremy. Any resources you’d like to recommend or social places that people can find you.

[00:48:32.150] – Jeremy
So I strongly suggest doing the basic fundamentals courses for the cloud provider of your choice. So even if you don’t want to go all in and get all of your Azure certified DevOps and everything like that, do the fundamentals, and they’ll give you their basic cost saving techniques, and that can be 15 minutes in a module. It’s very well spent, right? I mean, Microsoft actually just has a list of things that you can do to save money, put it in a cheaper region, refactor it to fit your specific needs and stop lifting and shifting. They’ll give you a long list, and that’s not necessarily the Bible, but it’s a great place to start because they get these questions a lot. And this is what they suggest. So I very much recommend some of the vendor material on this because they’re going to have some specific advice. As far as keeping in touch with me. I’m unfortunately not on Twitter. I was at one point I just couldn’t keep up with it, but you can find me on LinkedIn and Jeremy Roberts. I work at Infotech Research Group with Fred. So that’s how you can sort me through the doubtless dozens of other Jeremy Roberts and feel free to give me a call or send me a message. If you’re trying to sell me something, be creative. I do respond to those. My favorite is the guy who sent me a sales pitch. I said yes, and then he sent me the same sales pitch without responding to my original one two weeks later. So don’t be that guy, but everybody else hit me up.

[00:49:46.450] – Ethan
Now, Jeremy, when you’re not doing stuff at Infotech Research, you’re involved in polisci in some way, aren’t you?

[00:49:53.310] – Jeremy
Yes. I’m a trained political scientist. If you search me up, you can find some articles or one article anyway, that I’ve written on Arizona politics, and you can find an interview I did about that on the NPR affiliate in Arizona. He had a golden voice. Not quite as golden as yours, Ethan. And it was a great experience. So I’ll just say I like talking about myself and the work that I’ve done. So check that out too.

[00:50:15.830] – Ethan
Great stuff. Fred Chagnon and Jeremy Roberts, thank you thanks to both of you for being on the show today. Virtual high fives to you out there for tuning in. If you have suggestions for future shows, we would love to hear them. You can hit up me or Ned Bellavance, who was absent today. But we both monitor on Twitter at day two cloud show. If you’re not a Twitter person, fill out the form of Ned’s fancy website Nedinthecloud dot com and let us know the topics you would like us to cover. Did you know that you don’t have to scream into the technology void alone? The Packet Pushers Podcast Network has a free slack group that is open to everyone. Vendors included. Visit PacketPushers dot net slash slack and join. Read the rules. One of those rules is that it’s a marketing free zone for engineers to chat. Compare notes, tell war stories, solve problems together, et cetera. Packet pushers dot net slash slack we hope to see in there. There’s about 1900 engineers as we’re recording this that are in that chat room. Until then, just remember, Cloud is what happens while IT is making other plans.

More from this show

Episode 129