Search
Follow me:
Listen on:

Day Two Cloud 089: Connect All The Cloud Things – AWS Networking In 2021

Episode 89

Play episode

Cloud networking has changed in the past two or three years as major cloud providers have rolled out extensive new features and capabilities for connecting users and workloads to public cloud applications and services. Cloud networking is moving well beyond basic IPSec tunnels.

Today’s episode goes deep on AWS networking to find out what’s new in areas including cloud and SD-WAN, IPv6, edge connectivity, network firewalls, and gateway load balancers. We also discuss major network architecture mistakes that people make, including applying single-data-center thinking to multiple availability zones and regions.

Our guest is Nick Matthews, a network engineer and product manager who works on the AWS VPC team.

Show Links:

@nickpowpow – Nick Matthews on Twitter

Heavy Networking 433: An Insider’s Guide To AWS Transit Gateways – Packet Pushers

AWS re:Invent 2020: Networking best practices & tips with the Well-Architected Framework – YouTube

Deployment models for AWS Network Firewall – AWS

Architecture with an internet gateway and a NAT gateway – AWS

New – AWS Systems Manager Session Manager for Shell Access to EC2 Instances – AWS

 

Transcript:

[00:00:09.140] – Ethan
Welcome to Day Two Cloud. Boy, do we have a nerd fest for you today. We are going deep into AWS networking in 2021. That is, What’s changed. What’s been going on? What’s the right way to connect up all your AWS things? And our guest works deep in the bowels of AWS on the networking team, Nick Matthews, and he’s been on the Packet Pushers podcast network before. And Nick’s got so much to say about so many things. And I feel like, Ned, I almost feel like we ran out of time. There was a lot going on.

[00:00:41.120] – Ned
We definitely could have gone on for longer. And I was really happy to get out one of my personal gripes, which is NAT gateways. So I got to I got to hassle him about NAT gateway pricing for a little bit, and he took it in stride and actually had very good reasons for why it costs what it does. So I guess, listener, you just have to find out what those reasons are.

[00:00:58.640] – Ethan
Enjoy this show with Nick Matthews from AWS Networking.

[00:01:02.810] – Ethan
Nick, welcome to the show, man. And I know you’ve been on the packet pusher’s network before, but for the folks that have not met you, who are you and what do you do?

[00:01:12.170] – Nick
Yeah, Nick Matthews, product manager here at AWS, been here for about five years, I used to do a bunch of partner networking stuff and now I’m on the VPC team. So I just generally tell people I do network stuff.

[00:01:26.690] – Ethan
I’m just going to say VPC team. I think you just gave it away. You’re an AWS networking nerd. Through and through, through and through. All right, well, Nick, we were talking before we hit the record button, kicking around the idea of how cloud networking has changed in the last two or three years, because that’s that’s about how long it’s been since you were last on the show. You were on actually heavy networking, a different show on the Packet Pushers Network on Episode four thirty three, we were talking about transit gateways. So from your perspective, do you have some observations on how cloud networking has changed here since you were last on?

[00:01:59.380] – Nick
Yeah, I mean, I sort of view it like as like pre transit gateway and post transit gateway just because, like, the conversations I had were so different. You know, I was talking some internal folks about those days and we were sort of joking, laughing of how sort of how much pain we went through. But I think it’s also I mean, I don’t know, it feels a little different. It’s hard to judge from the inside because everything I see is cloud.

[00:02:21.220] But, you know, when I read the Twitters or the Internet, blogs and whatnot, it seems like cloud networking is a higher priority on your general network engineers sort of standpoint. So I’m not sure if that’s just me being in the cloud and sort of being surrounded by bias or if that’s actually something you guys see as well.

[00:02:40.030] – Ethan
Well, I’ll make some observations that that the big thing that’s changed is it’s becoming normal to fully integrate your cloud presence into your broader network. And there’s all kinds of different means that can deal with that. One of those is SDWAN. Where one of the popular models is I am going to drop an SDWAN fabric edge node into my VPC and bring that VPC into my SDWAN fabric in that way. But then there’s all these other reference architectures that have emerged.

[00:03:09.590] You’ve got vendors like Equinix that are making it super easy just to plug and play and connect you up to whatever cloud provider you want to be in. And the documentation’s exploded around that. How to do that different service offerings.

[00:03:21.730] And so it’s gone from I’m going to do an IPsec tunnel, I guess, to like all these options to in a very mature way and manageable way, bring your cloud presence, your public cloud presence into the rest of your network.

[00:03:36.500] – Nick
Yes, we see a lot more, I think, with the customers I’ve talked to, I don’t know, 80, 90 percent are in some stage of SDWAN, you know, I think we were looking at the other day, it’s like it’s like a seven year old technology, but still like feels like people are still kicking around a little bit.

[00:03:51.590] So it’s weird, too, I guess I didn’t realize how much of it was driven by cloud. I was doing some reading on the other day and, you know, clouds a big driver for SDWAN and so. Yeah it’s sort of interesting, sort of aligning of those two things, I suppose.

[00:04:06.360] – Ned
I’d like to dig into that a little bit more. Why would cloud be a driver for SDWAN? Because when I think about it, SDWAN is is more about just controlling having multiple Internet circuits to every location and sort of being able to compose it using appliances at those locations. I’m curious how the cloud fits into that structure.

[00:04:24.930] – Nick
Yeah, and I think, I mean, to take a typical sort of network and you don’t break it unless, you don’t fix it unless it’s broken type thing. Right. And then if if all your users are at branches and offices, which obviously is not the case today and all your applications are in a data center, then I think a lot of the traditional sort of network stuff works. But then like when I don’t know when that percentage changes, when 60 percent of stuff is not in your data center is in a cloud of some sort and the users want higher bandwidth, all that sort of stuff, then the network starts to feel a little more broken, I think.

[00:04:57.540] And so the more and more stuff that needs to sort of split off from just your data centers into the cloud or those kinds of things that ends up driving higher bandwidth. And it’s like our users want better. That’s why you have QoS, right? So you have good connectivity from your branches to your data centers, but now it’s like you can’t do QoS to the cloud. So what do you do? You get a high quality SDWAN network, sort of the closest thing I guess we have these days

[00:05:20.820] – Ned
That makes sense. Yeah, I definitely saw that when I was doing a lot of consulting. Most organizations were back hauling all of their branch traffic to the main data centers, which made sense, like you said, when your applications. But then I would be there migrating them to office three sixty five. And suddenly that broke the whole traffic mindset because you’re sort of almost hair-pinning your traffic to a DC to go back out to wherever they were in Office 365. What other sort of major network architecture mistakes are you seeing in customers today with all the new options that are available?

[00:05:56.210] – Nick
Yeah, I think one of the challenges I still see people sort of challenge through is like, OK, you have a region, it’s built with availability zones. Right. And if you want to put an application like a Web server across that, you put a network load balancer, an application load down through there and put your Web servers in three different availability zones. It just works like it’s just dead simple, right. But then, like, if you think about it from a networking perspective, what does that actually doing in the back end?

[00:06:21.440] You’re actually, our availability zones are separate data centers that are spread out over miles. And so you’ve basically like an old school data center world world. You’ve done active, active, active across three data centers. Right. And like, we’ve hidden that magic from you. But then, like then people come and go like, hey, why isn’t my firewall work in all availability zones at all times? Like, let’s just take a second here. Like this is, these are three different data centers across multiple miles and stuff.

[00:06:49.580] It’s not that simple to do like synchronous replication across these large networks. And so I think the largest thing is like we’ve made availability zones sort of like invisible almost. It feels like one data center. And then they try to bring these like sort of single data center concepts to the entire region. And so, like, some of the things are like making sure your traffic doesn’t unnecessarily go through different data centers. So if you actually were managing separate physical entities, I think you’d think about a lot more.

[00:07:18.530] Would you say? Like, so we put the database and on the other side of town and the web server over here.

[00:07:23.750] – Ned
Right.

[00:07:24.290] – Nick
Where should we put that? I think you’d think about a lot more.

[00:07:28.010] – Ethan
Really people still making I mean, you really just talk about the latency mistake where you’re separating workloads from each other so far latency wise that it has an impact on transaction performance.

[00:07:36.860] – Nick
Yeah, it’s that. Plus the data transfer. Right, because we have to we have to send that stuff over private cable all over the place. Right. You know, one of the things we changed is we we announced our encryption stuff. So if it leaves one of our trusted buildings, we encrypt it. So that’s that’s cool. Now, it’s one of those things a network engineer and security people appreciate. We announced that about a year ago. But, yeah, I think people still don’t get that.

[00:07:59.130] So it comes down to like when you put in your SDWAN devices, your firewalls, other things, or even like your VPN, like if you’re only using one of your VPN tunnels, like shame on you, because that VPN tunnel comes with two and like use both please, because they actually terminate different sort of data center availability zones, you know, or like using one that gateway, it’s in one availability zone and you’re transferring traffic across the other one. And there’s there’s some cost tradeoffs there.

[00:08:23.450] But to me that’s that’s like the highest level thing. I see that that and what I call like the fake Visio diagrams I get where someone’s like taking like our diagrams off our Internet, like all the AWS icons. So it looks like a cloud diagram, and they’ve taken like their DMZ design and all the icons from that and just mash them together to make it sound good.

[00:08:44.810] And you’re like, are you are you really taking everything in through a third party device for your client VPN then sending through a firewall, then it’s going through SDWAN and then to your your actual AWS VPCs. And they go, oh no, no, this is just a provisional design. We haven’t actually made it work yet. I’m pretty sure it wouldn’t work the way you’ve just done these diagrams on there. And like because the routing is a little bit different because you have to like statically route to ENIs or you can do some transit gateway integrations.

[00:09:12.200] We have some like we have some new ingress routing stuff so you can set like a route table on a VPC for like ingress firewall stuff. So we’ve got like a whole new set of like sort of tips and tricks like the new gateway load balancers like that too. Right. So you can now load balance across appliances. But like some of these designs are like very like, oh, there’s just a line between these two things, like what is that line between your router and your firewall? Like, you need to think about that more. So those are those are the two I think I see the most.

[00:09:39.440] – Ned
Right. I would especially think of that in terms of if you want to deploy network virtual appliances in your VPC and you want a pair of them filtering all that traffic into two availability zones, you really want some awareness of stuff that’s in availability zone A gets its traffic from the virtual appliance that’s in A and B to B. Otherwise, you’ve got all this cross zone traffic adding latency and and cost to the whole process.

[00:10:07.160] – Nick
Yeah, totally. Totally. I was sort of like for a while there, like the one firewall advocate, if I want to even give myself that title at AWS. I think we talked about it a lot in the last episodes. And now now we at Reinvent, we launched the Gateway Load Balancer, plus a native firewall. So it’s starting to come sort of like is a thing you can do in a native way without being ashamed of it almost.

[00:10:34.370] And so I was personally very happy to see those things because it’s like you want to run your own firewall? Great. Go for it. You want to go like hard core, run your own Linux, open source, firewall some pfSense or do it all yourself. Go for it. You can. Build that type of stuff now, some customers were like, we only wanna see the first hundred bytes of a packet so we can do some complicated container routing stuff.

[00:10:54.210] Cool. Go for it. And then there’s the native option, too, if you want to do that. So that also made me sort of happy that, like, firewalls are now a first class citizen as opposed to like these like transit VPCs use a Lambda to monitor the ENI and switch route tables if it detects three ICMP failures, and that stuff works…but just I’ll stop there before I say something mean.

[00:11:17.490] – Ethan
Well, Nick, I wanted to have you back on the show, is to really catch us up on all the new things that are goingon at AWS. You teased us with some of it. But it seemed like there was a thing when you and I were going back and forth in email that you’re connecting all the AWS things, a bunch of new tools, some of which you’ve mentioned. There’s there’s some IPv6 now. There’s well, let’s start there.

[00:11:36.780] Why don’t we just start with with IPv6? Because there wasn’t any for quite a while within the AWS world, or there was quite limited, but now there’s a much more robust offering. Can you talk us through AWS and IPv6?

[00:11:48.750] – Nick
Yeah, you know, it’s interesting. I think back when I worked for Cisco something like ten years ago, I tried to be like the IPv6 guy for a while there. And I was Chicken Little and I told everyone to replace their 6509s because the T cams weren’t big enough for v6 six routing tables. And I tried that for about six months until no one listened to me. Then I started working on SDN that went someplace. But I still have a love in my heart for v6.

[00:12:12.150] And I think one of the things we see with customers is, I think I view it sort of similar to security where like they go, yeah, we don’t do that in our datacenter today very well. But when we go to the cloud, we’ll do it right this time.

[00:12:25.750] – Ned
Sorry. That’s hilarious.

[00:12:29.010] – Nick
Yeah. And so, you know, I think I think customers are looking AWS to be a leader in v6 because of that. So it’s the gateway for them to sort of do the right thing if you if you if you will, and so we at least over the last couple of months, we’ve made quite a bit of progress because we announced v6 natively on the SEC or on VPC. I want say, three years ago at Reinvent, we’ve, you can put a v6 address on your instances.

[00:12:57.720] The challenges is really all the other things. So for example, we added it like to the EC2 API. You can access via v6, you can put a v6 front end address on your network load balancer now, and those kinds of things. So I mean, the types of use cases we see customers do it for, just like the core architecture, like let’s get v6 on everything that way when it comes big and we want to switch over to it, we’re sort of ready. We don’t have that technical debt sitting around for it. And so it’s all sort of a dual stack play today for the most part.

[00:13:30.930] – Ethan
That was my next question. It does sound like I still need v4 around for lots of things. But would this be the right emphasis that public facing AWS services I can expose more of those with v6 now?

[00:13:43.430] – Nick
Yeah, what we’re doing with network load balancer and those kinds of things allows you to do that. And so, yeah, I mean, I can talk more use cases if you guys are interested. It’s still sort of a burgeoning area because it’s it’s a dual stack thing.

[00:13:55.580] Like, one of the things I’ve seen is like some of these large managed service providers, when you come into AWS and you buy some sort of SaaS from a AWS partner or whatever. A lot of times they have a separate VPC for every single one of their customers. And some of those people will let you define your private IP addresses. And so they might have like multiple 10.1.0.0s and then they have like some sort of centralized provisioning system that they have to do to monitor these things.

[00:14:19.340] And they go, what do we do? We have all these overlapping addresses. And so in some cases, they’re using v6 dual stack there. And then they can like for example, you can put it all on a transit gateway and then just not advertise the v4 addresses and then use everything through v6 so you can run like a totally v6 sort of native network through Transit Gateway. It’s kind of an edge case, but it works,

[00:14:40.430] – Ned
Right. Yeah, I definitely saw that because one of the vendors I worked, uh VARs I worked for was also doing hosting for a bunch of different companies. And they wanted to have that sort of centralized management view. And they would been looking more at like the private link endpoint where you could just drop an endpoint into each VPC you’re, you’re managing and maintaining. You don’t have to worry about that address, but yeah v6 would make sense too. That’s an interesting use case.

[00:15:06.230] – Ethan
Am I bringing my own v6 block Nick, or do I have to get something allocated to me from AWS?

[00:15:12.000] – Nick
Yeah, so we announced that I think also last year, so you can bring your own v6 addresses to AWS, but you can also bring your own v4 addresses. But prior to that, it was all like it came from Amazon’s sort of block and then it was ours. And if you wanted to leave, you couldn’t take it with you, which caused some customers some strife.

[00:15:31.430] So we launched Bring your own IP address for v6 so you can bring on v6 address, get it through ARIN or whatnot. And then essentially you advertise it to us over a peering link. You don’t advertise it over the Internet and then we end up readvertising it through the Amazon.

[00:15:47.580] – Ethan
OK, so I’m going to bring a slash forty, forty eight. Whatever it is, announce it to AWS is going to announce it for me to the Internet.

[00:15:56.130] – Nick
OK, same thing for v4 too.

[00:15:57.660] – Ethan
Straight forward. OK, ok that’s good. Actually that’s really flexible I think. Yeah I think I’m OK with that Nick.

[00:16:05.220] – Nick
All right. Good news.

[00:16:07.360] – Ned
It’s got the Ethan stamp of approval right there, you should add that to the website. I mean.

[00:16:14.670] – Ethan
Talk to us a little bit more about NAT gateway and v6, or maybe just NAT gateway broadly, kind of where we’re at with that in in twenty twenty one, because actually Ned I think this was one of your problems. Why, why are NAT gateways kind of pricey?

[00:16:27.210] – Ned
Yeah, yeah. I mean, I as someone who’s accidentally left NAT gateways deployed on a dev environment that I was just messing around with and then saw the bill, I was like, oh, oh, why are these so expensive when they’re not currently doing any sort of network translation for me. Why isn’t there just like a checkbox option? I don’t know.

[00:16:47.670] – Nick
Yeah, I mean, I can I can sort of wax poetic on this. I think the short story is that like a lot, of course, we’re running NAT instances and kind of stuff like some of the the mild shots I was putting there against like running your own infrastructure, like trying to keep things highly available, that kind of stuff. So the I think the answer maybe you’re looking for is it’s the we’re based on Hyperplane, the NAT Gateway is so essentially it’s like this.

[00:17:12.000] It’s like this multitenant did kind of state sharing fabric that does network stuff. It also runs Tranisty Gateway, Network load balancer, private link, some other things. But essentially, like the way it works is like whenever you spin up a NAT gateway, we actually allocate bandwidth for you. I believe that number is like five gigs of bandwidth. So just by creating a NAT gateway, we’ve basically carved out like five gigs of throughput on the backend for you because we don’t do QoS.

[00:17:35.880] Right. It’s more like a capacity management system. So when you spin up in NAT gateway. We’ve said, OK, we’ve allocated this much bandwidth for you to across our network, for you to use. And you can like if you go over five we will detect that and sort of scale it up. So part of it’s like we have a cost of actually allocating bandwidth to customers. But I mean, that’s obviously a benefit of v6 too right? Because there’s a couple of ways around that.

[00:17:58.350] So, for example, you could create a security group that just doesn’t allow anything in and it only lets things out. And so, you know, the challenge with that is like you’re one security group rule change away from opening something up to the Internet because that thing’s gonna have a public IP address. And for health care finance people, that gives them the heebie jeebies. Right. Otherwise, with v6, you have the Internet egress-only gateway, which allows only things out and sort of is that diode type behavior. So, you know, if you don’t like your NAT gateway, prices just switch to v6. Should be pretty easy.

[00:18:31.930] – Ned
Oh sweet, problem solved.

[00:18:31.930] – Ethan
Problem solved, Ok good!

[00:18:36.310] – Ned
Well, I mean, you’re dual stacked anyway. So anything you’re doing on v4, you could still do with v4 internally on that VPC and then you could use v6 for egress I guess.

[00:18:47.170] – Nick
Yup totally.

[00:18:47.170] – Ned
All right. I got a plan.

[00:18:51.130] – Ethan
So Nick, talk to us about what the major connectivity models look like in 2021. And we’re going to move on from v6 here. And I think because AWS has been around a long time, a lot of us were running workloads in it. OK, we can have an idea of how we would connect our VPC and maybe then connect to regions and then connect from there back to on prem and and so on. Some of us have learned some hard lessons.

[00:19:14.710] So talk us through in twenty twenty one. What in your mind would be rules of thumb for connecting all of these different AWS components together and think about it from that. You know, help us with from that business perspective, that resiliency in mind, you know, as a as a designer who really wants to make sure that app is getting delivered and staying available when maybe there’s an AWS outage, that doesn’t happen often, thankfully, but certainly it happens now and again.

[00:19:41.680] – Nick
Yeah, well, that’s a loaded one. So let’s see here. I think I did a 60 minute presentation at Reinvent on this where I spoke with like one hundred words for a minute.

[00:19:52.060] So, I mean, if I had to break it down, I see a couple different architectures. The first is like, hey, we’re starting off on AWS. We have a couple of VPCs. We’re sort of testing it out. And that’s usually just a handful of VPNs. Right. And the key parts there are if you’re doing VPNs to each VPC, you want to make sure you have used both tunnels. I sort of touched on that, but also two devices on premises.

[00:20:15.610] So that actually is a combination of four tunnels per VPC That’s sort of like base level doing networking right. You know, to go up from there, obviously, four tunnels per VPC can be a little bit of overhead there, particularly as you you know, like a typical dev team is going to dev, test, prod. They really should be putting those in separate VPC. So you’ve got 12 tunnels per dev team. That’s where Transit Gateway comes in. So you can attach transit gateways to each one of the VPCs.

[00:20:45.940] Create your then you only need to create two tunnels to transit Gateway. And it’s sort of Multiplex’s that VPN to the the different VPCs. Then you start to get into sort of larger, hey, we’re a serious company about serious cloud stuff and you get into direct connect, you know, direct connect. Some of the things I see there, we actually just introduce like a false isolation type thing so you can actually test different types of failure modes on your direct connect so you can test like a region failure or port failure, those kinds of things.

[00:21:13.330] What I typically see with Direct Connect is, you know, it’s for customers. It can be quite a bit of an investment just because you’ve got to find a carrier to a site and then you’re paying Equinix or another DX partner for cross connect charges, you probably are putting your own hardware in there. So you’re procuring hardware and you’re connected into AWS which has the port cost. So the cost can add up a little bit and some customers will only put one in you know as they get started.

[00:21:39.610] Unfortunately, that thing has like a whole mountain of single single point of failures. It’s in a single data center, a single port on a single piece of fiber, on a single lots of things. And so then the second thing they do is like, OK, well, we’ll just get a second device in that data center, which you’ve reduced some of the sort of second the single points of failure there. But now you actually still have quite a few like it’s still sitting at one data center.

[00:22:03.730] And that’s for the most part, that’s primarily like you’re in this one building, which has a distinct set of power and failure modes, which I mean, like the Equinix isn’t like these providers, the world, or they’re highly available. But you’re still talking about one geographic location. And so the actual the better way to do that is to get one DX connection into one site and then get a completely different set of stuff at a different site. And so then you have DX at two different sites.

[00:22:31.300] So maybe Dallas and Virginia or whatever it is, which then gives you geographical redundancy. And so we actually have SLAs on DX if you do that. So if you’re more than one location, you can be eligible if you go through a couple of different things. But there’s like a three nines SLA there and then you can mix it right. You can do DX with Transit Gateway. If you start talking about like the higher end of this, where you start doing like 10, 20, 40.

[00:22:54.490] We just released one hundred gig Direct Connect. If you’re doing that kind of stuff, we tend to see customers like really double down on direct connect there and make sure that that that’s all highly available to each VPC, and maybe you’re using Transit Gateway, maybe you’re not in those cases. It depends sometimes if maybe that’s how your VPN type stuff coming in. So you have the mix of VPN and usually at that point also that’s where you have firewalls coming into AWS.

[00:23:21.360] That’s where your SDWAN is coming in and sort of it starts becoming sort of the central part of your WAN, almost a lot of those cases, because that if you’re doing one hundred gig direct connect, you’re probably also moving a lot of applications to AWS, my guess. So that’s one of the things we start to see is is then that becomes a lot of like the the reference architectures around Transit Gateway about, you know, should you can you please don’t build a DMZ using transit gateway. You can. But, you know, there’s a lot of scaling stuff there. So I think I think I answer your question and I’m sort of ranting at this point.

[00:23:52.540] – Ethan
If I was to summarize it, what you are saying is you have the opportunity to build redundant connections in a lot of different places. So depending on what your architectural needs are, as you’ve grown as a company and where you’re putting workloads and how you’re spreading your workloads out, think about it like you would any other network connection you need to analyze for single points of failure and then engineer those out. There are cost tradeoffs. There are performance tradeoffs. But that’s how you have to think.

[00:24:17.950] It’s no different from any other networking we’ve ever done. AWS doesn’t make your single points of failure just magically disappear. You still have exposures you need to think about as a designer to. And just like back in the day when we were doing, say, you know, active, passive or active active data centers, all of those principles still come into play, maybe AWS is handling it somewhat and maybe some of the constructs are a little bit different.

[00:24:41.560] But from a design principle standpoint, what I’m hearing you say Nick is redundancy gives you that resiliency, have those multiple connections, engineer out those single points of failure.

[00:24:53.740] – Nick
And basically we boil it all down, it’s single points of failure and BGP type stuff. Right. You know, the one thing I see actually people get tripped on a little bit is Direct Connect Gateway. So say you, you put in the investment and you buy a direct connect port in Virginia and you’re running your stuff in Virginia and then someone goes, hey, we need we need to do stuff in Europe. So you spin up something in Frankfurt, some customers start thinking like, oh, man, how do I go get a direct connect port in Frankfurt or Europe or whatever.

[00:25:20.660] And the answer is actually Direct Connect Gateway. So direct connect Gateway. And I actually it’s much easier to explain to network people than it is application people, because I can basically say, like, hey, the direct connect Gateway is basically a route reflector. You advertise your routes to it and then it advertises routes on the backbone and it’s highly available. You don’t worry about like a lot of people ask, like is Direct Connect Gateway highly available? Do I need two? The answer is no. And so basically, if you have that one port in one place, you associate with a direct connect gateway, advertise your routes, your routes there, and then that can sort of in scale out all over the world to wherever you want to go. So that’s one of the things actually I see people sort of miss, because I think I think we wrote the documentation for direct connect gateway for developers and it’s typically not developers.

[00:26:00.920] – Ethan
If I could sit with, if direct connect Gateway is like a route reflector, then it’s announcing routes. But it isn’t a transit point necessarily. Is that what you’re saying?

[00:26:07.960] – Nick
Right. Yeah, it’s not a data plane component. It’s basically just control plane advertising routes, BGP stuff.

[00:26:13.210] – Ned
When I first encountered that direct connect gateway, it was very confusing to me as someone who’s more cloud and sort of application and I haven’t done a lot of WAN work I was like, what? How does this thing work? Why does it work? I don’t. What’s the purpose? And then after reading and watching a video and reading and then watching probably another video, I was like, oh, OK, now I got it. And then it was the how does that integrate with Transit Gateway?

[00:26:37.720] Do those two, are they complementary? How do they work together? And eventually I pieced it all together, but yeah, definitely not immediately obvious for the non networking among us. So I think I’d like to shift things out a little bit from the VPCs to what’s going on with AWS and networking at the edge, and I’m thinking of things like Outposts and and local zones. So maybe we can dive into those a little bit.

[00:27:05.420] – Nick
Yeah, totally. So Outposts it’s been pretty exciting. It’s, I think, what, two years since that’s been out now or at least announced. And so we’ve got real customers using that kind of stuff. And I’ve done I did some work with that team originally and some of the some of the things that people should probably understand there, I can talk about like one thing we see is like these sort of customers that were like maybe on the later edge of cloud adoption, they go, oh, that’s scary.

[00:27:30.350] It’s not it’s not ours. It’s somewhere else. Outpost’s that’s great. It’s ours. It’s in our data center. We’ll start there tends to be sort of like the wrong approach, because if you can, the sort of advice is if you can put it in a region, you should it’s the best place for it. There’s more capacity, there’s more services, those kinds of things. But there’s there’s a handful of stuff where it makes sense to have it within your premises.

[00:27:54.680] Right. Latency reasons or some very stringent regulation type stuff and those kinds of things. The idea of…

[00:28:01.580] – Ethan
Regulations like data, governance, kind of stuff

[00:28:04.220] – Nick
Yeah, like data, governance, stuff for, I don’t know, some customers signed a contract that says, like, you’re my end customer, your data will never, ever, ever, ever, ever leave like this boundary. You know, maybe it’s not a regulation, maybe it’s a contract. I don’t know. And sometimes it’s latency, sometimes it’s the, I don’t know, regulations or they wanted to be able to put a sticker on it. I don’t know.

[00:28:28.430] – Ned
I do like stickers. I totally understand that. What are the implications if you do have an Outposts for the rest of your networking, do you need a direct connect to use Outpost’s or VPN tunnel? Like how is it talking back to AWS and hooking into the rest of your cloud networking.

[00:28:47.270] – Nick
Yeah, so it’s got a local gateway on it, which we’ve got yet another gateway now. It’s keeping us networking people in business. It’s great. So one of the the way it works is basically creates a secure tunnel from that gateway into AWS over whichever route it has to the APIs for Outpost. And so so we announced a way to do that privately. So you can do it over like a private VIF now if you’re using direct connect. So some customers were getting direct connect and getting a public VIF for those may not know what a public VIF is, that is that’s basically when AWS advertises our entire public address space to you and you advertise some of your public address space back. And so anything that goes to AWS goes over direct connect.

[00:29:30.350] – Ethan
Rather than over the public Internet.

[00:29:32.150] – Nick
Yep, correct. Some customers were actually using private connectivity to get into to AWS as well. So they would they would do like a private VIF with direct connect and get that is called a service link with Outpost, which basically does all those sort of control plane configuration and like outpost management from AWS, they were sending out over like a private VIF to AWS using direct connect for that.

[00:29:56.360] So there was a couple different ways to do that. The main thing is to be to know is that they recommend a gig of throughput. They can do some lower throughputs. They would want to talk to you about that kind of stuff when you get into the sales process, but they sort of expect pretty good bandwidth there. Another one is the always on aspect of it, because the it’s the same services you get in a region and those those services sort of expect to always be on an AWS region.

[00:30:24.680] So if some customers were like, hey, can we put this on an oil rig, that’s only connected to the Internet thirty minutes a day. No, not a good idea, even if, like, your Internet circuit goes down for like ten or fifteen minutes pretty regularly, you’d want to test this stuff pretty, pretty well, because the services inside outposts sort of expect to be sort of like in the region. They expect to be always on and that’s some other stuff we’re seeing there. So this is cool. There’s some cool stuff we’re doing there. But yeah, that’s some of stuff I’m seeing around Outposts.

[00:30:52.430] – Ned
OK, and what about local zones? Because that’s a relatively new thing. I don’t know how many are deployed now. I feel like when I heard the announcement, there was like three in L.A. or something like that. So what’s going on with local zone networking versus all regions and all the availability zones in a region?

[00:31:11.860] – Nick
Yeah, so there’s this philosophical thing that sometimes we talk about at AWS, of like how many regions should there be? You know, should there be 30? Should there be one hundred? Should be a thousand? And I don’t think we have an answer for that. At least not definitely not a public facing answer about it. I don’t think even an internal facing answer. And the local zones is sort of that thing, because I think the like, for example, L.A. was a lot about like like movie theaters and sort of film industry that need, like low latency, high bandwidth, like editing, that kind of stuff.

[00:31:44.210] Right. Right. I think networking wise and the idea is the same thing is sort of like outpost. Right. Is supposed to be the same experience you get in the region. You know, there’s a there’s a handful of things like you don’t get like, for example, transit gateways not inside of a local zone right now. So you sort of have to still figure out how you want to connect those things in. But for the most part, like VPCs, Internet gateways, subnets, all that kind of stuff works.

[00:32:08.120] And so I know there we announced a whole bunch of, I have to say, a whole bunch now because I honestly, I can’t keep track of the region counts and local zone counts. I probably should check more regularly. But yeah, I think they announced a whole bunch of local zones, sort of nervous about saying a number here.

[00:32:26.420] – Ned
Whatever number you look up now is going to be wrong in a month anyway. A lot is good. Yeah, you’re right. Like with the region, you’ve got multiple availability zones. You got the whole catalog of services. Not every region has every service, but you can it’s a much larger catalog. It sounds like local zones are going to have a slightly more limited catalog of services and more limited capacity. But, you know, if you need that, really, I guess that lower latency is that probably the primary reason people are adopting local zones for the latency?

[00:32:57.790] – Nick
Yeah, you know, I think I think we may also start to see this in some countries as well. Like you might see a local zone show up in a particular country as opposed to a specific region. So we may see some of that, because right now at least, I think what we’re seeing in the US, where I think most of them have been announced, it’s mostly a latency play or potentially like a government like, I don’t know, they say like nothing, the state government, nothing can leave our state because state boundaries are meaningful for cloud computing or whatnot. You may see some of the workloads like that kind of stuff

[00:33:27.140] – Ethan
Kind of related to the edge, Nick, would be remote workers. Is there any new magic going on in AWS, AWS networking that would help this brave new world of everybody works from home, but we all have this AWS infrastructure to manage. We got bastion hosts. But is that at the best we can do still? Is there more we could be aware of?

[00:33:42.440] – Nick
Yeah, so I mean I think actually ties into a couple of things. Right. So, you know, we’ve got a client VPN service, it runs off open VPN. It’s it’s a good service. It’s out there. We’ve seen obviously a lot of uptake on that lately. But around the bastion stuff, it actually we’re really the v4 for usage. Right. Is like you don’t need bastion hosts anymore in a lot of ways. So like the session manager has a feature that you basically run an agent on your instances and then you can you can access those things remotely through session manager. So you don’t need to put a public IP address on it. You don’t need to put a bastion host out there. All the logging, that sort of stuff works through that service. So we’re seeing a lot of customers move to that because it’s just more scalable, is less sort of manual running of like bastion hosts are not something I don’t think anyone really loves. And so that’s that’s some of the stuff we’re seeing, you know, remote workers. We haven’t seen too much change there other than obviously volume.

[00:34:40.040] I think the client VPN services added some additional authentication options. I think I think SAML OAuth type stuff. But outside of that, like not a whole ton of change is actually.

[00:34:50.120] – Ned
Hmm, I’m curious about that session manager, is that something that runs through the browser or is that something I run locally on my machine to connect into these remote sessions?

[00:35:02.510] – Nick
Yeah, so I think it runs through the CLI. I have to admit, I have not used this one myself because I SSH into things like a Neanderthal

[00:35:11.530] – Ned
Don’t we all.

[00:35:14.350] – Nick
Yeah. I guess I can, I can send a blog post on this stuff that’s pretty good. But essentially I think it’s either this through the CLI or maybe through the console that allows you to access instances.

[00:35:24.600] – Ned
Yeah, we will throw a link in the show notes for anybody who’s interested in diving more into that, because I am one of those people who is now interested to go check that out.

[00:35:33.340] – Ethan
Well, Nick, we’ve kind of worked through all of our questions that we had specifically. But I think there’s some other things you have that you you wanted to highlight, especially in the vein of like, hey, maybe you used to do it this way within AWS, but now you should be doing it this way. And more towards the top of the show, for example, you mentioned Gateway Load Balancer. You talked about Amazon firewall, bring your own firewall kind of stuff. Want to talk to some of those things in more detail than anything else that pops to mind.

[00:35:56.720] – Nick
Yeah, sure. Yeah. So like I said, I’m happy we have that sort of answer now to the firewall thing. Basically, in all my presentations, I’ve said like, here’s how to use firewalls, but there’s like a 30 second, like sort of like pharmaceutical warning like this may have unintended consequences. Your security team may not get the same experience. If you have increased downtime you’ll want to talk to your network engineer for further support. And so it’s basically to break it down.

[00:36:25.780] Right. Gateway load balancer is a load balancer for appliances. And so that that can work for like in theory you could use it for SDWAN as well. Or routers or like I said, your own Linux stuff. It’s this sort of purpose built for firewalls, to be honest. The challenge before was it like you had a couple different architecture options, right? You could you could run it like a network load balancer and do like a firewall sandwich type thing.

[00:36:48.100] Where you have a network load balancer that goes to your firewall fleet and your firewall fleet, then forwards to another network load balancer, which actually goes to your application. You know, it’s about as much fun as it sounds. It was also like the only way to really auto scale out your firewalls. You know we had some options with the transit gateway where you could create sort of VPN based attachments and those kinds of things. It was it was similar.

[00:37:09.370] And I think I spent like two or three years of my life trying to make these things work with with these vendors. And so gateway load balancer, because like like I said before, like early like you want to create a Web application, put across three AZs? Great, put a load balancer in front of it, put an autoscaling group, select all three AZs and you’re done. We wanted that same experience for network appliances and so it handles a few hard problems right?

[00:37:30.760] It handles the health checking problem. Like is this thing up or down? It handles the auto scaling association so you can assign an auto scaling group and that works. One of the kind of cool things we did, which I’m not sure how many people will fully appreciate, is we added a Geneve tunnel in between the load balancer and the firewall, which allows you to essentially, like think about if you want to create like firewall as a service. So let’s say that you did have a bunch of VPCs that had overlapping addresses and stuff like that, or your firewall vendor you wanted to vend this out to.

[00:38:01.180] Who knows how many customers, what you end up doing. It’s sort of like private link. So you, private link is basically like a load balancer. You put the network interface in another VPC kind of thing, and so you create the gateway load balancer, do the Geneve tunnels to your firewalls, and then then you put endpoints out in each one of your VPCs. So in your VPC, all you see is a new type of endpoint and that’s like your your firewall.

[00:38:25.510] You can send traffic to it and it’ll come back once it’s gone through the firewall. And it’s sort of like you don’t have to worry about really anything because then the firewall’s in another VPC, it’s all sort of separated. And through that Geneve tunnel, we can then send the metadata for what the incoming source was. So we can tell you like the instance and the VPC ID, those kind of things. So if you’re able to basically parse the Geneve metadata in the firewall, you can do some fancy stuff.

[00:38:49.840] – Ethan
Separate your logs out per tenant, if you will. It it feels like a multitenant.

[00:38:54.370] – Nick
So yeah. So we basically solve, like, my ask to the team was like, hey, give us high availability for firewalls and they like how about high availability plus multitenancy. I’m like, OK, that was kind of cool. And then we actually used we actually used that infrastructure to do network firewall as well. So we have Amazon network firewall. If you don’t feel like managing your own firewalls, it’s all based on Suricata.

[00:39:16.120] As far as I know, we have pretty good coverage of like the Suricata like statements and rules and that kind of stuff. And so the biggest use case we see for that is like because like we always argue with that firewalls on AWS are different than on premises because like, your users are doing stuff, going to like WW dot cool gambling dot bad website dotcom or whatever. And those are really easy to filter with regex. Right. But like your your actual AWS resources are more deterministic.

[00:39:46.150] So like, you know, they’re hitting this API and hitting this monitoring service and like we know, which like seven URLs you’re supposed to go to if you’re in a AWS, and so that tends to be like one of the very common ones is like, hey, only allow AWS services or only allow these six APIs that these things should be accessing and filtering that egress is just like a very easy, common thing we see people to do. And now you don’t have to, like, manage more complex stuff. We can do a little more natively. So that makes me happy.

[00:40:12.140] – Ethan
Now, you said the Gateway load balancer was really aimed at firewall specifically. But but you also said I could cut through any network. Well, I’ll call it an NFV back there that I want. Do you do you see other use cases like that?

[00:40:25.320] – Nick
Yes. I mean, we’ve seen some customers with like packet inspection engines of the world, the hard part is really the just the Geneve tunnel that you have to do. So if you if you homegrow something or you have an off the shelf router you just need to make sure that you can handle the Geneve tunnels on there, which is supported in basic Linux. I don’t know the details on that, but it’s there.

[00:40:44.690] – Ethan
It has been for a while. Yeah. Geneve but yeah that seems to be I keep hearing Geneve pop up more and more. It was like the odd man out for a long time and all of a sudden it’s like, yeah Geneve. OK, I guess I guess we’re all using Geneve now. Fine. OK.

[00:40:58.520] – Nick
Yeah. No I’m, I’m sort of in the same boat. I thought it was like an academic protocol that would never take off. But the, the, the magic there is like really the metadata is like in VXLAN. You can’t for example, we wouldn’t be able to do the same multitenant stuff with VXLAN that we can with Geneve. And so that’s why we see some of that.

[00:41:21.300] – Ethan
Not with a single tunnel, you mean.

[00:41:23.870] – Nick
No, lots of tunnels, lots and lots and lots of tunnels, right, one tunnel per tenant to basically.

[00:41:27.900] – Ethan
That’s right, yeah.

[00:41:29.310] – Nick
And there’s only 16 million of those, like what could go wrong.

[00:41:34.110] – Ethan
Yeah. So you guys want the option of I can set up a Geneve tunnel and get metadata, get multitenancy as opposed to VXLAN. I’ve got 16 million VNIs what could possibly go wrong. Let’s set up a bazillion tunnels.

[00:41:45.140] – Nick
Yeah. Yeah basically.

[00:41:46.700] – Ned
Yeah. So I’m curious what level of. So let me back this up. One of the things that I ran into when you described that sort of firewall sandwich where you’ve got an external network load bouncer and an internal one and you got your firewalls or whatever your virtual appliance is sitting in the middle. The problem I always ran into was if the vendors hadn’t updated their software to understand the cloud and how this should all work, then it just didn’t work because they they didn’t understand the cloud.

[00:42:13.970] So have you had to work with a list of vendors to get it to work properly with the gateway load balancer, or have you done something where they don’t have to change anything? You’ve accommodated them.

[00:42:25.930] – Nick
I mean, they still need to do a little bit of accommodation, right, because they need to be able to support the Geneve Tunnel, so some are easier than others there. But outside of that, unless you want to do, like, more of the multitenancy identification stuff and go dig into the metadata of the Geneve, it should just mostly work.

[00:42:43.140] And I think I can’t remember, I think we launched with what, like 10 or 12 partners with gateway load balancer. Most of them have like a cloudformation template or something you can run. That’s sort of one of the things there’s a couple of things to know about gateway load balancer. One is and in a lot of cases you can’t use NAT gateway with it, like in the same VPC. So you would natively sort of say, like, I want my instance before it talks to the Internet.

[00:43:08.850] I want it to talk to my, go through the firewall and get it scrubbed clean, healthy, nice, put a bow tie on it, whatever. And then after it comes back to the firewall, I want it to go to the NAT gateway and then go to the Internet. That’s like a very common thing that actually doesn’t work right now because the NAT gateway has to be inserted in between the Gateway load balancer Endpoint and the Internet. So you basically have to you have to sort of like, what ends up happening, you have to specify a more specific route inside the VPC.

[00:43:39.030] So if you create a VPC of like 10.0.0.0/16, you can’t create any routes for like 10.0.0.0 more specific than 16, like you can’t say between these two instances in the same VPC. I want to put this firewall thing. That’s currently a limitation. There’s a there’s a good blog post. I could send you guys a link on this on how to do that with Transit Gateway, where you can basically put the gateway load balancer endpoint and the NAT Gateways in like separate VPCs off Transit Gateway, and then you basically hairpin to the TGW to achieve that same thing.

[00:44:09.720] So we’re actually seeing probably more of customers doing it through the transit gateway method just because it’s that sort of centralized, you know.

[00:44:17.490] – Ethan
That’s Interesting. OK, so you’re saying Gateway Load Balancer isn’t going to do NAT for me, as in it’s not acting as a proxy?

[00:44:26.490] – Nick
It doesn’t do that. It just takes traffic. It keeps it symmetric. Sends it to the same firewall. And when the firewall sends it back, keeps it keeps it that way. So it’s like a symmetric layer three load balancer without any of the sort of proxy NAT stuff.

[00:44:39.870] – Ethan
Well, Nick, man, we’ve. I just looked at the time. It’s like, wow, OK, we’ve been talking for over forty five minutes. I think we’re wearing you out man.

[00:44:49.650] So Nick, as you mentioned, you’ve got presentations that you’ve done at Reinvent and so on. I think you’re somewhat active on Twitter. Would you share with folks where they can either find your presentations or maybe you wrote a book or you’ve got a blog or Twitter or anything like that you want to share with people?

[00:45:05.310] – Nick
Yeah, I’m NickPowPow on Twitter. I do stuff there every now and then. I don’t know. I didn’t write a book. I helped write the study guide for networking like three years ago. Apparently that’s still somewhat accurate. And then. Yeah, that’s that’s pretty much it.

[00:45:18.480] – Ethan
Well, thanks for joining us, Nick. And again, if this is the first time you’ve heard Nick Matthews on the Packet Pushers podcast network. He’s been on some other shows, too. So you can go up to packet pushers, dot net, hit that search box, look for Nick Matthews name. And all the shows that he’s been on will appear, including the the most recent one before this, which was heavy networking episode 433, where we talk in some detail about AWS transit gateways.

[00:45:41.580] That was just a couple of years ago, Nick. So again, thanks for making the time and for joining us today here on Day Two Cloud. And if you’re still listening, hey, man, you made it to the end. Thank you very much. Virtual high fives to you for tuning in. And if you got suggestions for future shows, we would love to hear them. You could hit either Ned or I up via at Day Two Cloud show on Twitter. Ned and I both monitor that account, send us your clever ideas. And and, hey, we’ll talk about them. We’ll do our best.

[00:46:09.090] If you like engineering oriented shows like this one visit Packet Pushers dot net slash subscribe all of our podcast newsletters and websites are there. It’s all nerdy content designed for your professional career development. And until then, just remember, cloud is what happens while IT is making other plans.

More from this show

Episode 89