Follow me:
Listen on:

Day Two Cloud 112: Complex Multi-Cloud Networking

Episode 112

Play episode

Today’s topic on Day Two Cloud is complex multi-cloud networking. In this episode, we focus on the challenges of stitching together a fabric across more than one public cloud. How do you architect a fabric given the constraints of each cloud? We also drill into the idea of API gateways.

Our guest is Chris Oliver, a network architect at NI who deals with multi-cloud networks as part of his day-to-day responsibilities.

We discuss:

  • What makes cloud networking complex
  • Whether on-prem networking knowledge translates easily to the cloud
  • Virtual network functions vs. cloud-native options
  • Stitching together multiple public clouds
  • The role of API gateways
  • More

Sponsor: Zesty

Zesty provides an autonomous cloud experience by leveraging advanced AI technology to manage the cloud for you. Our AI reacts in real-time to capacity changes and enables companies to maximize cloud efficiency, reduce AWS bill by more than 50%, completely hands-free. It’s cloud on auto-pilot. Find out how to spend less and do more at

Show Links:

Chris Oliver On LinkedIn

Heavy Networking 589: Cloud Networking’s Good, Bad, And Ugly: What CSPs Don’t Tell You (Sponsored) – Packet Pushers



[00:00:02.050] – Ned
Zesty provides an autonomous cloud experience by leveraging advanced AI technology to manage the cloud for you. Their AI reacts in real time to capacity changes and enables companies to maximize cloud efficiency and reduce their AWS bill by more than 50% completely handsfree. Cloud on autopilot. With Zesty, companies can spend less and do more. Check them out at Zesty dot Co.

[00:00:33.440] – Ethan
Welcome to Day two Cloud. Our topic today, complex multicloud networking. So this is a topic we’ve hit before on Day two Cloud. Yes, cloud networking is hard. It’s so hard.

[00:00:46.230] – Ethan
But genuinely, what we get into in this show is when you’re in several different cloud environments and you were constrained by what is happening with the cloud native networking, but need to stitch all those environments together because of your application fabric. What does that look like? And how do you architect that? And so we start off with the problems and then talk about an API gateway as a connectivity problem that needs to be solved. And our guest is Chris Oliver. Ned, what stood out to you in this conversation we have with Chris?

[00:01:18.020] – Ned
These are the kind of episodes that I really love, because we’re not just talking at a pure theoretical level. We’re not talking to someone from a vendor. We’re talking to someone who’s living this day to day and getting into some of the details of the implementation, actually seeing where things are broken and don’t work and then what they had to do to work around it. So that was what stood out to me is just the honesty and the clarity of the conversation.

[00:01:40.960] – Ethan
Honesty and clarity, and to some degree, the brutality of what Chris has had to deal with as a network architect designing for a very complex solution for an API gateway he needed to deliver for his company NI. Please enjoy this conversation with Chris Oliver.

[00:01:57.240] – Ethan
Chris Oliver, welcome to the Day two Cloud podcast. And hey, I know you’ve been on the Packet Pushers podcast network before, but if you would, would you introduce yourself to just a sentence or two so people know who you are?

[00:02:08.360] – Chris
I’m Chris Oliver. I work for NI. I’m a network architect. Glad to be back on the show.

[00:02:16.210] – Ethan
All right, man, thank you very much. And the topic of this episode, why I wanted to get into this with you. You live in this world of complex multicloud networking. That is from some of the stories I’ve heard from you, you’re in several different clouds you have on Prem stuff. You’ve got it all connected. There are different projects that need to communicate with each other, depending on what environment they’re in, in different ways. And it’s like, okay, Chris is living that. I was going to say misery. Maybe that’s too strong of a word, Chris, but this the pain of networking complexity and multicloud like we talk about.

[00:02:53.640] – Ethan
So let’s talk about, first of all, set it up for us. Why, in your experience, Chris, why is multicloud networking so complex? Because I could argue. I mean, it’s just networking, isn’t it?

[00:03:05.180] – Chris
It is just networking. I mean, all the protocols are the same. Nothing’s lost on that portion of it. But translating their just translating how they go about doing things makes it very difficult. And I would say from the IT Department, we probably wouldn’t have liked to have really been in all the clouds we are. But business drags you around, so you get to figure it out. Projects start up before you know it. They found a feature they want at one cloud or another, and they’re off down that track. You get to figure out how to then connect it, continue on.

[00:03:43.340] – Ned
Right. You said translation and in that way. You mean that the actual protocols down at the lower levels are the same, but the way that they name it, or maybe the way they go about implementing it changes depending on the cloud.

[00:03:57.920] – Chris
Absolutely. Everyone has their unique naming to it. The they create their totally different interfaces to it. And maybe the path way you go about implementing different pieces together, you got to figure out what’s different about each cloud and work that. I mean, each cloud is they’re very particular, right? They’ve defined their environment exactly the way they want it. And if you go against the grain, you really put yourself in a lot of pain.

[00:04:27.620] – Ethan
I mean, is this a terminology problem? Mostly, if I’m an on Prem network engineer and I know switching and routing pretty well, can’t I just go to whichever the cloud is? And once I kind of figure out their nomenclature, can I just make a few assumptions and go, or is it harder than that?

[00:04:47.240] – Chris
No, Once you kind of get into it, I guess you’re not completely having to start over each time, but you definitely have to figure out the new interface. Figure out the new API calls, figure out how to set up your consoles, do your scripting or anything else. I mean, there’s a whole lot of layers in there that just to get started, create a big hurdle for you to get to begin the process. And then, like I said, the they’re so prescriptive about how they want things done. Each one of them may have a slightly different idea.

[00:05:18.130] – Chris
So translate what you did one time won’t necessarily directly work in additional clouds that you enter.

[00:05:26.990] – Ethan
And again, so it’s not so it’s not just a naming thing. In other words, there’s I’m going to probably make some bad assumptions if I’m an on Prem engineer, is what I’m hearing you say. Does that sound right?

[00:05:41.000] – Chris
Yeah. I mean, let’s take, like, Azure. Azure and if you have your express route connectivity or anything, there’s a virtual router in there that drives the Azure express route connection. So the last mile connection there in the cloud, it’s all IP SEC and stuff is you set a bunch of IP SEC sessions between different VNet and traffic can flow through there. They don’t really ever call it out. But when you pick your express connect size, so you pick a gig or 500 meg or something like that.

[00:06:15.580] – Chris
That virtual instance that’s sitting there was sized automatically on the back side to do that work. But that gateway also can do East West. But you didn’t actually pick those things. You didn’t decide that. That’s what you didn’t plan performance wise for that to work that way. Then you wind up down the road finding other performance issues. And it’s because you’ve got a 500 meg or a gig instance running there that’s capable of 500 meg or a gig of traffic. You’ve got five or six hundred meg of traffic coming on Prem and then a bunch of East West stuff going on.

[00:06:51.020] – Chris
Those things, they’re not spelled out. But that’s the way that that Sizing works or any way of controlling that. It’s just something you step into that doesn’t have that same for scenario in AWS. You don’t get in that same problem.

[00:07:06.920] – Ethan
I’m curious if we drill into this sizing thing. Okay, so let’s say I’m limited to 500 megabits per second. How are they doing that? Is it a shaper or a policer or how nasty is it?

[00:07:17.160] – Chris
Yeah, it’s more that it’s just the instance size machine that it’s running on. How many virtual CPUs and stuff that are associated with it? So it’s not. It’s not something you can just go adjust even though you’re paying. Maybe you only need a gig connection for your on Prem connectivity, but you need to have actually sized it as a gig and a half or something as you have some East West you can take care of. Fortunately, you can size it up. There’s a penalty. You just pay more for it.

[00:07:48.800] – Chris
No big deal. But then you want to make it smaller, you can get it smaller. You have to destroy the whole thing and start over. Yeah, it’s a whole direct connect.

[00:07:57.490] – Ethan
There you go.

[00:07:58.010] – Ned
That’s the penalty. It’s like, oh, yeah. Wait, you don’t mind? And if you pay us more, oh you want to pay less. You silly person.

[00:08:06.970] – Ethan
Yes. Have that problem with my VPS is as well. I can size them up, though. They’re happy to do that for me as soon as I want to size them back down. No, sir. Rebuild.

[00:08:16.320] – Chris
Destroy and reset and start back over and figure out why. Why was I having performance from the first place? Those kind of things. Every time you step into a new cloud, you got to figure out little oddities like that.

[00:08:29.790] – Ned
Okay. And you said that was a specific to express route and Microsoft. You said you didn’t observe the same issue if you’re using Direct Connect through AWS. Is that just a difference in the way that they implement the connection on their side?

[00:08:41.820] – Chris
So then I said they’re very opinionated on how things work. They don’t have transitive routing in AWS. So you have VPC to direct connect. Those are individual paths, so you don’t run into the same problem. You can only drive. You got a one gig direct connect, then it sizes itself to support one gig, and you’re not going to exceed that because you’re just going North South. It changes things in there. So you kind of get into to do transitive. You wind up spinning up a CSR or Aviatrix box, or are you go in down their transit gateway path and you do those pieces, and those are totally different.

[00:09:26.390] – Chris
They don’t necessarily have. You don’t even size a transit gateway per se. It’s a different problem, right? It’s going to do East West. Unless you’ve configured a bunch of those policies to split the route tables and isolate everything.

[00:09:40.880] – Ethan
Per environment, you end up having to do a whiteboard diagram that shows every module that traffic is going to be transiting through whatever the cloud object is. The virtual network function is, and then understand what the limits are along the way. And as you’re pointing out here, it’s not all the same per environment. It is starting to get clear to me here where the multicloud networking complexity comes in. Can I just say, Chris, that this sucks, man, it doesn’t sound like any fun at all.

[00:10:12.580] – Chris
It was definitely an interesting first couple of years when we first got into it, just getting your head into the cloud networking, and then all of a sudden it’s. Oh, no, we’re going to move that over to move it from Azure to AWS, because one of the vendors is an overall stack wants to stop supporting Azure, so then you bail off and go up. I’m going to figure this out in another cloud. Get it back online of a full production environment has to be come live very quickly.

[00:10:44.140] – Chris
Wasn’t the long dev time and stuff that was spent on developing an Empire product the first time around. Bail off, get it done, be finished in a week or two and ready for production again.

[00:10:55.930] – Ned
I think you made an important point. There is you’re running a third party piece of software and you have to stay under their support contract. And if they say, hey, you know, we used to support Azure and AWS, but you know what? We’re just going all AWS now. They’re defacto forcing you into a multicloud scenario that maybe you didn’t even want to be in.

[00:11:16.040] – Chris

[00:11:17.080] – Ned
Yeah. That’s. Someone should put that on the risk register when you’re incorporating some COTS components. If we want to stay on their support, we have to follow where their support goes.

[00:11:27.820] – Chris
Exactly. Yeah, definitely. When you bring the third party into it, there’s another layer. So business wants to drag you between different clouds, and then the features that they’re the third party that they may have selected may drive you into another cloud. So I don’t think many would say they plan to be in multiple clouds. I think it starts out as a single cloud, and then something happens, you wind up in a additional cloud and then all the extra headache, and I’m not even. I mean, I’m speaking from a network perspective, only there’s a ton of other stuff when you think of Administrating and everything else in each cloud, just just in the network scope.

[00:12:03.740] – Ethan
Well okay if I do network design? Chris, one of the things I like to do is think about redundancy and resiliency, so redundant switches and routers for these mission critical things. First hop redundancy protocol is pretty common thing. Dual paths where I can, and obviously those kind of things are budget constrained and so on. But the point is, I think a lot about having that resiliency so that if one thing breaks, the network keeps going, can I bring that thinking to cloud? How does that work?

[00:12:34.750] – Chris
It’s more than bringing they actually their patterns that they use force most of the time, almost entirely force you into having redundant equipment in there. You don’t really create. Say you have a VPC and you’re creating an IP Sec session back to an on Prem. It’s not like you’re creating two primary and a backup. It forces you into, it actually creates four tunnels to do it, so they create two VGWs or two whatever you want to consider that terminates the IP Sec sessions. There are two of them running there, and then you wind up with four tunnels on your end that you have to distribute across your redundant equipment.

[00:13:14.710] – Chris
That’s forced. Direct connect is forced. You wind up with redundant stuff inside there. Well, maybe direct connect. Maybe it’s not as forced, but it is highly recommended that you have two paths and you will continue to get an alert from the system from AWS every day that you don’t have two paths. You don’t have to do it, but you’re going to be annoyed by not doing it. All your redundancy. You still think along those lines, but you have to flip things around and sometimes on it.

[00:13:47.210] – Chris
Think about it as, their availability zones. If you’re not using native cloud networking components, then you do end up having to create two instances, two EC2 instances, different availability zones, and then figure out what happens with the subnet routing tables and stuff, making sure that that traffic is preferred locally and doesn’t jump to another AZ. So you get to pay for it just because you’re trying to get to a gateway. And there’s lots of little things you got to be careful with. You find out later in billing or something you didn’t think about.

[00:14:23.960] – Ned
The placement of the NAT gateway can be really decisive on how much you pay in cross AZ traffic costs. Whoops, everything’s transiting over to a different subnet to get out. We see on that one.

[00:14:38.910] – Chris
Definitely things like that, just be careful with and maybe you won’t see it very often fully documented out. I mean, I guess people’s articles and stuff. Like you said, the rabbit hole, you can go down trying to figure out how to implement something.

[00:14:51.880] – Ned
Right, you did mention briefly using third party instead of cloud native functions or appliances. Is there a trade off to using something like a virtual network function appliance as opposed to using the native functions in a given cloud? And are there some benefits to doing that over using the native solution?

[00:15:17.260] – Chris
So we took off down the path of trying to, when we first move to the cloud. We’re going to learn all the native pieces and figure out all the tools we need to use and build everything that way.

[00:15:29.980] – Ethan
Because that’s the right way, right? Cloud native.

[00:15:32.080] – Chris
Even more subtle than that. I just want to make sure that that if we’re bailing into the cloud and you’re paying by the minute for everything that you’re doing things as efficiently as possible, and you would think their mechanisms are going to lead you down that path. I took that stance when we first dug into it, and after a year or so, I was like, yeah, well, there’s lots of little gaps in the cloud native tools, so what do the third parties bring the table? And that’s where it quickly became you look at the Cisco offerings and stuff or the Firewall offerings.

[00:16:09.880] – Chris
You can use those as some kind of edge routing function in a VPC to for traffic between VPCs or traffic back to on Prem. Use the old Cisco CSR or something. And they have pretty nice semi automated architectures that allow you to attach VPCs to a hub. That hub VPC that contains a couple of CSR routers. Or you could do that for Firewalls. You can do the same kind of stuff. The traditional vendors there, they’re still very much artisanally configuring those each piece of that. Then you can move into third party tool other third party tools that were built for the cloud.

[00:16:48.150] – Chris
And they definitely follow the more SDWAN type controller based single place to make your configs. Tons of stuff already automated into them.

[00:16:59.600] – Ethan
Artisanally, as in one at a time. You’re hitting devices and configure. Even if there’s some automation there versus a controller policy pushes it down into everything that’s under its care and feeding.

[00:17:11.160] – Chris
Absolutely. Yeah. They kind of take your traditional CLI, and they wrapped a few little things around them to help a little bit, but you still had the vast majority your configs would still be a traditional artisanal config.

[00:17:24.080] – Ethan
Again, the driver for these going to something that’s not cloud native. You said it kindly in one passing statement. We found some gaps. Meaning what? There’s just some functionality that’s not there that you found you really needed.

[00:17:42.510] – Chris
The cloud native tools are well, let’s say cloud native tools from AWS are still they use BGP in your on Prem to cloud exchange, but there’s no BGP running between anything else. It’s all static routes. So if you take a transit gateway and set it up in one region, set up a second transit gateway somewhere else for another set of hubs. The interaction between those two transit gateways is based on static routing, and it’s highly limited. I think. I think you get 200 routes or something like that.

[00:18:20.380] – Chris
Of course, you can go ask and they’ll give you some upgrades and lets you do some other things. So there’s still a lot of limitations in there. So bringing the third parties in brings in in your traditional control plane for BGP and stuff so that you can manage interregional or inter hub communications or on Prem to hub. All those pieces become very much normalized at that point. As far as the control plane, you don’t really having to figure out anything special. You just configure BGP like you would normally.

[00:18:53.320] – Chris
It mainly has all the main features have been added to it. The even the third parties are pretty light on what they have, but they can definitely do the routing. They definitely do have community strings and everything that you can use to help control what you do with routing entries that show up. You work through those pieces.

[00:19:14.860] – Ethan
So it feels like the BGP implementation. You can’t actually build a what you would do in a normal network. You can’t build a full BGP domain. You’re just using it as a protocol to exchange routes between two endpoints, and that’s where it begins, and that’s where it ends. You’ve got these little Islands of BGP.

[00:19:33.300] – Chris

[00:19:33.990] – Ethan
I don’t want to say it’s pointless. It’s almost pointless, though. Pointless is an exaggeration, but it’s kind of like if I can’t use policy and so on to bleed routes where I want and hide routes and set preferences and stuff like that, all the power is gone.

[00:19:49.980] – Chris
Be fair to the cloud native side of it. It’s just that you would have to at this point. So say, like, AWS, at this point, you would have to build your own automation to manage all that routing, all the static routing. So you definitely have the ability and definitely could see that there was something failed somewhere. And you could write a script or a Lambda that might go change the static routes for you and move things around. It’s just that they haven’t flushed that out for you.

[00:20:16.000] – Chris
You can build your own, or you can buy a third party that’s basically doing the same thing.

[00:20:22.140] – Ethan
Effectively, if you rolled your own, you’d be writing your own routing protocol. Essentially. I mean, you’re not writing a routing protocol as such, you’re writing something conditional, but at the end of the day, you’re writing code that is populating forwarding tables with the ultimate means of static routes, but that’s really what you’re doing. That’s right. Okay, let’s move the conversation along Chris to multicloud in a little more detail. Even talking about that multicloud networking is complex because the different cloud providers are playing by different rules and different set of scope and criteria for what you can do with networking there.

[00:21:02.160] – Ethan
Okay. Why do I need to actually stitch multicloud together into one big unified network fabric anyway? And I know you had some specific use cases. We’ve chatted with you offline about this because I could argue, wow, I probably don’t actually need that. Maybe everybody needs to talk to on Prem, but do I need this big network fabric where all the multicloud can talk to one another? Maybe I don’t. Or maybe I do. Chris, right.

[00:21:26.120] – Chris
Sure. So from that perspective, we actually did go down the path of saying, okay, for for the most part, we would like to see things done at the application layer, and we went off and delved into API gateways. And I try to do as much as possible without really any network. So say an application living in the one VPC it’ll have its listeners for the API gateway with on, maybe it was typically just over SSH or SSL, just like you would expect for any other user based application.

[00:22:02.620] – Chris
So you get these pieces where the developers try to tie everything together without actually having network connectivity from a, not sure how to say that, from the they don’t have any private connectivity, they just have the public Internet access through the cloud.

[00:22:15.860] – Ethan
And the gateways there basically as a proxy for these API calls.

[00:22:20.280] – Chris

[00:22:21.330] – Ethan
So basically, as long as the gateway can get to the endpoints it needs, then all you need to worry about is talking to the gateway, which in theory would simplify networking. But it sounds like you’re leading up to or maybe it doesn’t.

[00:22:34.260] – Chris
Well, it does for anything that you’ve developed fully in that direction. It’s the stop gaps. And then the On Prem that you have resources that you don’t want to ever have any kind of public exposure, even if it is highly constrained. It must have the same certificates on each to auth against stuff like that. You may not still may not be in a situation where you can do that. So what we found is having to go ahead and stitch together all the clouds and stitch together within a single cloud between VPCs and between clouds became necessary for that backside connectivity for things that weren’t ready to be exposed to the API gateway, so speed up development.

[00:23:19.610] – Chris
So maybe in a long the long view, maybe you can eliminate even more of the network functions underneath and just have application connectivities through SSL. But getting there is, you have to build the road first.

[00:23:35.840] – Ned
Right. I think that was the early vision of AWS was that everything was just going to be brokered through these public available endpoints. So if you look at the very early days of AWS, they didn’t really have a networking construct. They had a smattering of services, and you would link those services together by just using the endpoint that was generated. When you added a service, they would say, okay, here’s your unique endpoint. Go send all the traffic there to use our service. And then eventually people were like, but, hey, I want to do a little bit more than that.

[00:24:08.880] – Ned
I don’t want to do everything through this public endpoint. I would rather use VPC or some sort of networking construct. And so that’s why they eventually created it, and they’re actually retiring the older version of their networking. I read that recently. I was like, wow, that’s been around for, I mean, in cloud years, like, a thousand years, right?

[00:24:29.560] – Chris
Exactly. A multiplier on everything. It’s like I said, AWS, that was the tack we were trying to make. And then you should run into too many issues that you wind up going ahead and creating your your cloud network underneath to make sure VPCs communicate. It’s also, I guess, becomes, from a security standpoint, bringing the security folks into everything. If it’s just just all the app tier, it’s harder for them to be plugged into that when there are somethings going to be exposed. It’s like, okay, we’ve kind of signed off.

[00:25:05.980] – Chris
We got your API gateway stood off. We signed off on that. But it becomes much, much deeper for them to go in and figure out what all allowed through the gateway. What resources need to have extra attention, what API calls maybe, or more specific or more like application firewall, a web application firewall sitting there looking at those details. But it also adds another tier there where if it’s just a network, you can insert those devices at a different, different layer and have it part of the path for all traffic and not trying to figure out what this crazy Spider web that’s been built for the API gateway.

[00:25:41.990] – Ned

[00:25:43.020] – Ned
[AD] We pause this Day two cloud podcast for an important message from one of our sponsors. Cloud is hard. Predicting cloud costs is even harder. What you need is a friend to help out. What you need is Zesty. Zesty uses AI to proactively adapt cloud resources to real time application needs without human intervention. Now I know. I know AI is a term that gets thrown around a lot. There’s a lot of hype and a lot of disillusionment, and that is because vendors try to get AI to do everything instead of the thing that AI is actually good at.

[00:26:24.700] – Ned
And that thing is monitoring and optimizing repetitive and identifiable events. Guess what cloud cost optimization is? A problem of monitoring and optimizing repetitive and identifiable events. Zesty is using real deal AI in the way it was intended. Zesty’s technology leverages AI analysis and autonomous actions based on real time cloud data streams to automatically purchase and sell AWS commitments or in much plainer English. Zesty looks at the real time data from your cloud resources and then makes smart purchasing decisions to save you money. And you don’t have to do anything.

[00:27:11.270] – Ned
There’s probably some alarm bells going off in your head. You just handed Zesty and unlimited credit card and permission to use it. That’s scary. Fortunately, Zesty offers a buyback guarantee for any overprovisioned commitment, you’re not going to get stuck with a pile of reserved instances you don’t need due to a glitch in the matrix. That’s because Zesty makes money when you save money. That’s right. Their fee is based on the savings they provided to you. If you’re not saving money, Zesty isn’t making money. That’s what we call friends aligned interests.

[00:27:50.880] – Ned
The result is an average savings of 50% on EC2 in a mere two minutes to onboard your account. If you’d like a friend who saves you time and money, go to Zesty dot Co and book a demo, that’s Zesty co to book a demo and put your our cloud cost optimization on autopilot. Now back to the episode. [/AD] [00:28:13.180] – Ned
So if I’m thinking about the topology, would you only use API gateways for public facing endpoints? Or do you also use them internally between different business units so that they can talk to each other without connecting their networks together?

[00:28:28.240] – Chris
I don’t know if we want to bring up a specific vendor or whatever, but say, in the case of Mule Soft, they have lots of different implementation options. We have gateways that are expected to be public gateways that you’re setting up private, and then ways of creating interlinks between public and private pieces in that, Mulesoft is just a SaaS application running in AWS. So underneath it still has the same network connectivity and stuff in there. And you can privately connect to that VPC that gets spun up on your on your behalf.

[00:29:03.750] – Chris
Right. So you purchase something from the with the specific for that vendor, they’ve created different entry points that you can use. So there’s public entry points for anybody to make use of an API. There’s private entry points where you’ve got shared certificates, so you can’t talk to that that gateway unless you actually have the right certificate installed on each end. And then you have truly internal where you’ve actually got instances that are sitting and networked within the within the Mule Soft VPC gets networked into the corporate backbone with whatever your backhaul is for your network.

[00:29:39.070] – Chris
So depending on what you’re trying to achieve, and then there’s even ways of doing calls between public and private in there that allow the Mulesoft manages on as adds the layers of security for you to OAuth, application layer, certificate layer, network source IP layer of it. Stacks of different ways you can sort all that stuff out. But, how do we cover that for the.

[00:30:07.100] – Ned
Jeez, I don’t know. One thing that intrigued me is the idea of having a private API that doesn’t have all the authentication and the additional layers on there, and some of the commands may even be different. And then the API gateway can act as that translation layer and authentication layer for public access. Is that one of the things that you’re implementing today?

[00:30:27.460] – Chris
Yes. Both for SaaS application integration. So it’s used heavily in that area. I’m speaking on behalf of the developers, I guess at that point. I don’t have any detailed knowledge about it, but I know some of the design patterns for different structures, integration with Salesforce, so that you the Salesforce has access to our internal ERP system, stuff like that. It’s done through those gateways, as I said. The different layers of security, the different offerings. So the way that you could go about that, is there a significant number of ways to secure that communication.

[00:31:06.220] – Ned
Yeah. You ticked those off pretty quickly if we can go back to a couple of those. So the API gateway, starting with the authentication, can you just plug it into any identity provider out there? Is that or are there specific ones that you sort of prefer? Or that work better?

[00:31:23.700] – Chris
Wow. It has so many options in that area. I think you could almost say you can plug it into any identity provider that you want. We kind of stack those things to a certain point. Right. So it’s bound to a specific certificate. So two nodes of the gateway can only speak between two nodes that have the same certificate, you can establish the session.

[00:31:43.860] – Ned

[00:31:44.130] Right.

[00:31:44.610] – Chris
Even as application and application layer, you maybe be a token that you’re using, or maybe user ID password, or those pieces were all embedded in there. Maybe that’s all that’s all machine to machine. And then you add an actual end user or something. So you may be passing their credentials. So there’s a lot of a lot of layers to that. But I can’t speak to it in very much detail, but they’re definitely in there.

[00:32:09.500] – Ned
Okay, so I mean, that’s just the identity portion of sort of trying to prove who you are. Does the API gateway handle any of the sort of authorization, like what you’re able to do? Or does it rely on the back end applications to tell it all that?

[00:32:23.280] – Chris
I think you can you can use a little bit of both. Right. So it may be the query when it makes a query. Maybe it has the authority to see a lot more data than what it’s going to present because it’s sitting in the middle of it. Or it may be truly passed all the way through to the original application, maybe the one that has the permission set to what you can see. That’s probably one of the areas the developers have a lot of headache trying to figure out all the how much do I expose the process? Or this token has the ability to see X, but the actual person making the call request may only be able to see Y.

[00:33:05.810] – Ned
Right, you could have some sort of DLP thing running on that API gateway that’s going, oops that’s data that can’t go out got to scoop that right out or just deny the request. Now you’re setting up these API gateways. Obviously, they’re in the context of this larger cloud network. So let’s talk a little bit about, like, the placement and Sizing and those sort of aspects of where you’re going to put these API gateways just to make sure that you they keep working if one goes down.

[00:33:36.130] – Chris
Sure. So I said a lot of things are forced with redundancy. You wind up with your dual at least dual AZs in there, and the gateways wind up. You need to have them on both, in both, split across different AZs.

[00:33:54.510] – Ethan
Is there any state maintenance you have to have between the API gateways ir do they function as independent boxes effectively proxying one specific call, and that’s all we care about.

[00:34:04.420] – Chris
Oh, that’s probably the most simplistic layout, I guess when you think of it as a box doing a single task, but it gets definitely where you I guess you could say there’s not even necessarily an API gateway being an instance may not even be that it might be just that your web server is configured with certain branch or offset a certain path that is only expected to be hit by a third party machine to machine request. So it’s not necessarily even that it’s special for the gateway on the application side.

[00:34:48.350] – Chris
It might just be something that’s implemented in that web server. And like, say, Mulesoft is reaching in and accessing that data and formatting and structuring it different to expose it to a different API call somewhere else. Some other applications making a request of.

[00:35:07.310] – Ethan
The point being it’s too shallow to think of it is merely a proxy. That gateway can do data handling manipulation. I got the API call back now. It’s sitting on me in the gateway, and I’ve got instructions to reformat it like this and then give it back to whoever the original requester was.

[00:35:23.760] – Chris
Absolutely. So, yeah, it could be something as simple as a specific call, and that’s all that does. Or it may be much more complicated in mixing data sets. The whole idea is that it’s gateway into all kinds of stuff. So maybe you read some data from a flat file somewhere. Maybe you’re reading some data out of a SQL database somewhere. Maybe it’s in Postgres from there. There’s no tell on what you’re where your data sources are. All the links it’s reaching in. It’s designed to reach into a little bit of everything. Right. That’s the whole purpose of it.

[00:35:53.410] – Ethan
You’re making an API call to the gateway. The gateway is making a SQL query to some SQL database, bringing that data set back, normalizing it and giving you back. Json maybe.

[00:36:02.780] – Chris
Absolutely. Yeah. You got a lot going on in there.

[00:36:07.030] – Ethan
So going back to the state thing, then it may not be about requests that are coming through, but certainly policy and how translations are being done. All of that you’re going to want to have mirrored across however many API gateways you have there. Presumably they’d all be consistent.

[00:36:22.780] – Chris
Yes, definitely, definitely.

[00:36:25.320] – Ethan
But do we need to do something like, like, if you’ve got an old school load balancer pair, you would have state maintenance so that if there’s a failure in the primary unit, you fail over to the secondary. You’ve got state maintenance in an old school, active, passive. Do we have any of that kind of stuff to worry about?

[00:36:42.920] – Chris
So the API is a SaaS application that all those pieces are contained within the SaaS application itself. They are using the load balancers. Actually. You have private and public load balancers that you use. That’s really where the certificates and stuff get installed. Not an actual gateway itself, right? They’ve got that front ending it in the SaaS app, like Mulesoft.

[00:37:09.400] – Ethan
So you said SaaS before, and that didn’t really hit me. You’re not running a Mule Soft instance in EC2 and AWS or something. Actually, you’ve got Mulesoft SaaS that you’re doing all of this to.

[00:37:24.090] – Chris
So you go dig into their console, which is then making calls to the AWS APIs to build your environment for you.

[00:37:34.220] – Ethan
Okay. You just made this whole networking thing that much more complicated.

[00:37:38.970] – Chris
It kind of is like to me, from a network perspective, it was just another SaaS application. So lots of the SaaS applications, right? Are just over the Internet over SSL. It’s all the interaction you have some of them you have private connectivity into, and you have to extend IP Sec tunnels into the SaaS applications environment, or we don’t have any. But I’m assuming it’s also, You had enough traffic and stuff. You would build an actual physical network into a SaaS provider. We don’t have to do a sales force or anybody, but you still have lots of interactions that need to take place.

[00:38:16.740] – Chris
Some of them are on the need to be in back to internal systems that are private. So you wind up with some kind of private network connectivity into your SaaS provider. So in this case, Mulesoft is just another SaaS to us. And most of the time, the thought was it would be over the Internet connectivity. But then you get into the stuff where you have where you need on Prem connectivity for the Mulesoft gateway or gateways, and you wind up having to extend your private network into the Mulesoft instance.

[00:38:48.900] – Ethan
Is that what you’ve ended up having to do?

[00:38:50.820] – Chris
Yes. So we got both. You got Mulesoft sitting there with connectivity over SSL to lots and lots of things. And then you go, oh, what? I need to also reach into something on Prem, or I need to reach into something that’s in a VPC somewhere. So you have all the backhaul needed to the Mulesoft instance for private connectivity.

[00:39:15.350] – Ethan
Okay. Okay. So now we go back to the API, or Ned’s placement question that he asked earlier. Then how do you decide where to backhaul, what to? Do you bring it all like Hub spoke back to on Prem and then relay it to some whatever part of the multicloud needs that access, or do you have a bunch of direct connects into from Mule Soft SaaS into each of your multicloud presences.

[00:39:43.910] – Chris
From another podcast there we use Aviatrix. So Aviatrix, we’ve built it out and created hubs in different regions, and those hubs are used to extend to third parties also. So you have private connections into NI owned VPCs and then connections into third party instances like Mulesoft or Salesforce and stuff so that it brings IP Sec session in terminates it into the Aviatrix node in the cloud. So in any given region, and then you have the full transport and were all the importance of the control plane stuff becomes very useful to control how routing is stitch together.

[00:40:24.500] – Chris
You go all the way to a Colo before you go between two clouds, you go direct between clouds, in the rare case of something’s wrong? Do you hop between two clouds to get to on Prem?

[00:40:35.170] – Ethan
This does go back to the sponsored Aviatrix show that we recorded Chris. That where you were talking about a bunch of your use cases and stuff. Right? Okay, so now you’ve ended up with this to handle this way over the top complex environment. You’ve got Aviatrix as your baseline that allows you to bring it all in and do a full featured routing policy. So you just you basically said I’m dismissing all of your cloud native stuff. I’m not doing that. I’m going to do Aviatrix, which lets me do grown up networking effectively.

[00:41:07.650] – Chris
Right? It’s the normalizer. Let’s me take step across clouds, have the same expected feature sets, and the same tools to control the routing and single place to do automation against the right to manage your well, as close to single as you can with third parties to manage your IP SEC sessions and stuff to build connectivity between third party SaaS applications.

[00:41:34.420] – Ethan
Brother. Okay, security. Then I want to add that layer to this. How are you doing? It feels to me like in this example, the API gateway, the Mulesoft stuff. Getting access to that is a big deal. If I had access to that, I could do bad things or at least attempt to. And I’m assuming you’re filtering access to even get to Mule Soft. Well, I guess that’s the question. Are you doing it like that, like maybe traditional, maybe firewall filtering, or just simple access packet level filtering? Or are you doing things more at the Mule Soft level to control access to that gateway?

[00:42:08.320] – Chris
So I guess, like you would expect in it’s all in security in depth. So there’s tools controls that are done within the Mule Soft itself. They have lots of controls in there, usually more around certificates. So issuing a certificate that’s only used between two endpoints stuff like that to close down those paths for only the known application can do the work. But then between cloud and on Prem, there’s your next Gen Firewalls sitting there and your application aware Firewalls being able to look at the actual calls and make sure that they’re structured and making appropriate calls, not just any old calls.

[00:42:53.000] – Chris
There’s definitely layers of security going on there. From the simple things of the source IP has to match all the way down to deep API call. Deep application level.

[00:43:07.540] – Ethan
Yeah, you’re making an API call, you shouldn’t be making you’re asking a question that you don’t have the right to ask. So we’re going to clobber you, and that’s after all the regular authentication stuff and so on, happens.

[00:43:18.140] – Chris
Oh, yes, yes.

[00:43:21.740] – Ethan
Well, troubleshooting all those layers. It must be an absolute joy.

[00:43:27.060] – Chris
Actually, it’s not been terrible since the stuff is structured so much for an API gateway. Probably a lot easier than a lot of things you think of and just general applications and don’t know what someone’s done, what they’re doing it since it’s an API gateway, the structures are much easier to. So maybe it’s just easier to keep in your head because you only have to think of that one API call.

[00:43:51.180] – Ethan
But yeah, you said structure is the magic word because I’ve done my share of web application firewall work, which is always a bit of a disaster. Devs never know what we’re supposed to allow into certain forms, and then they change stuff and don’t tell you when the WAF breaks things. Yeah never fun.

[00:44:07.010] – Chris
And then from a network perspective, it’s like we help stitch those kind of appliances in there, but we’re not operating those either. So that’s security teams fun. I’m sure they would have a different answer.

[00:44:19.340] – Ned
Fun using, I hear the air quotes.

[00:44:22.130] – Chris
Exactly. I’m sure they have a little different answer on what the troubleshooting looks like. Yeah.

[00:44:27.820] – Ned
Yeah, I got to imagine, from just a logging and monitoring perspective, that’s a huge challenge. It’s hard enough to get your hands around just logging for a particular cloud and their native services and then layer on those third party services and then another cloud. Are you using some sort of logging and monitoring tool that spans this whole architecture? Or is it a combination of the native tools in each cloud and then something else?

[00:44:55.140] – Chris
So obviously the native tools get used heavily, depending on what area of the company you’re in. For application developer or whatever they have. We have Logstash, Elasticsearch set up to do application level logging, so they have all their side of it where they can write their traps and send it to a known log server. And then you got your cloud native stuff, your flow records, and all these pieces in there. And the security team has all their stuff that it is probably from each area.

[00:45:31.160] – Chris
There’s probably something common that they would put together, but there’s definitely overall, there’s multiple different things you have to look through, but it’s usually by the area of expertise. The application developer has their stuff. It is Logstash, but it’s not probably part of data that everyone would necessarily want to dig into. Security team has their stuff like that. We have the Aviatrix tool, which is viewing net flow, which winds up back in Logstash also, so you can kind of see what’s going on from each area.

[00:46:02.840] – Chris
They have their central piece, but you still do have to get into the native tools here and there, too. We haven’t gotten, we’re definitely not far enough down the path to have said, yeah, we’ve got a single place that we can do full logging and visualize everything by.

[00:46:16.850] – Ned
No, I don’t think anyone will ever get to that place. Right. With any complicated environment, there’s no way to just have one tool and one thing to rule it all. But it sounds like something like Logstash. It’s good at collecting everything. And with all the native sources you’ve got in the cloud, something like the Aviatrix product that is able to collect logs from all these different native things and put them into one consistent place. Is that kind of one of the things that it’s doing for you?

[00:46:45.420] – Chris
It’s one of the things that gives, it’s helping normalize a bunch of stuff and expose it through one screen, either in the CoPilot side of it, which is an analysis tool or as we’re consolidating and logging it to Logstash. Kind of got two ways of viewing it, one, that’s one that’s even more massaged getting it down to Logstash or more specific to me what were specifically looking for. And then the native tools. Just when you have to start digging around, not picking up anything of interest, they start digging digging into the native tools or the third party tools in a little more depth.

[00:47:24.200] – Ethan
Well, Chris. If you could sum up this conversation from a standpoint of lessons learned, and here’s the context I’m asking this, you started out in the beginning saying we’re gonna go cloud native and do all the things, and then you didn’t. Can you give some advice to some people that are facing this? They’re realizing their environments going to be multicloud. Would you give them some tips of things, steps to not waste their time with, but move in and consider this other architecture because Chris has been there and he knows.

[00:47:56.920] – Chris
I mean, it’s a great learning experience to go dig into each cloud’s native offerings. And I wouldn’t say not to at least keep your finger on what’s going on there, because they are very big entities. They can turn out features very fast compared to probably any of the third parties. So they’re not. We never had said that. We know we’re not ever going to use cloud native, kind of try to keep up a little bit, pay attention. But if you just need to get things online quickly and you don’t have if you don’t really have a decent control of what cloud you might be in, then I definitely go to the third party market and look at the different offerings out there to help you stitch and manage the pieces together.

[00:48:43.290] – Chris
Otherwise, you’re really going to be chasing a lot trying to figure out some of these pieces in it. Besides traditional firewalls like Cisco CSR routers and SDWAN platforms to stitch cloud to cloud the native functions, I don’t know how you actually achieve it. There’s chicken before the egg problem. You usually have to know the destination, but to start, you can’t really build an IPsec tunnel between two clouds, because both ends have a what’s the destination? It doesn’t tell you the destination until you start the process. Right.

[00:49:19.000] – Ethan
Well, Chris Oliver, thank you for joining us on Day two Cloud today. Are you a social person? Are you out on the Internet where people can follow you or ask you questions?

[00:49:29.630] – Chris
As far as LinkedIn is probably the best place at this point just to send stuff to. I haven’t historically followed it, but I’ve picked up lately sort of having to pay a lot more attention to it.

[00:49:40.420] – Ethan
Yeah, LinkedIn seems to have a lot of interesting conversations going on that more consistently work focused than Twitter. And I’ve seen a bunch of good threads there on LinkedIn, actually, whereas Twitter is, you know, it’s Twitter. Alright. LinkedIn.

[00:49:58.780] – Ethan
And thank you for that. And thank you for spending time with us on Day two Cloud today. Chris really very much appreciate it. And thanks to you out there that are listening virtual high fives for making it all the way to the end of this. Hopefully we got your brain spinning and thinking about things in multicloud and thoughts. If you have a specific application that is driving you to multicloud or that you have to supply networking to for the multicloud. Like Chris’s example of Mulesoft SaaS and the API gateway, he had to feed to a whole bunch of different environments.

[00:50:28.700] – Ethan
If you have suggestions for future shows, we would love to hear them. You can hit Ned or I up on Twitter via at Day two Cloud show. Ned and I watch that you can fill out the form of Ned’s Fancy website if you’re not a Twitter human. Ned’s Fancy website is Ned in the Cloud dot com. Packet Pusher has a weekly newsletter. Human Infrastructure Magazine. HIM is loaded with the very best stuff we found on the Internet, plus our own feature articles and commentary.

[00:50:51.850] – Ethan
It’s free. Doesn’t suck. It’s private. We don’t sell any of your information to anybody. You can get the next issue at Packet Pushers dot Net slash newsletter. Until then, just remember, Cloud is what happens while IT is making other plans.

More from this show

Episode 112