Search
Follow me:
Listen on:

Day Two Cloud 124: New Cloud Security Thinking

Episode 124

Play episode

Today on Day Two Cloud, we talk about new ways of thinking about security for cloud. As organizations adopt  cloud services, they’re applying on-prem security designs to cloud. Our guest is here to argue that this doesn’t work, and that you need a different approach.

Our guest is Adeel Ahmad, Implementation Services Lead at Hashicorp. This is not sponsored show, and Adeel is speaking for himself.

We discuss:

  • The mistaken assumption that cloud security operates the same as on-prem
  • Why identity controls are essential to get right
  • Focusing on risk
  • Compliance complications
  • Hybrid and multi-cloud security challenges
  • More

Takeaways:

  1. Aim for architectural simplicity of the platform (allows for scale).
  2. Aim to reduce operational complexity (or avoid operational hazards) – (allows for distributed self-service cloud provisioning).
  3. Understand the principle (business or compliance) risk and question whether stipulated controls add any value.

Sponsor: Aviatrix

Check out Aviatrix’s Flight Training to learn about multi-cloud networking and security. It’s worth your time if you’re defining your company’s multi-cloud strategy or want to nail down your Aviatrix Certified Engineer certification. Get details and register at aviatrix.com/flight-training.

Show Links:

@devops_adeel – Adeel Ahmad on Twitter

Adeel Ahmad on LinkedIn

Transcript:

[00:00:00.190] – Ethan
Check out sponsor aviatrix’s Flight Training to learn about multicloud networking and security from the Avaitrix perspective, aviatrix dot com slash flight-training worth your time if you’re defining your company’s multicloud strategy or want to nail down your Aviatrix certified engineer Cert. Aviatrix dot com slash flight-training.

[00:00:25.950] – Ethan
Welcome to Day Two Club Strap in, everybody. This show is basically getting on the back of a rocket ship and going for a ride. Our guest today is the deal, Adeel Ahmad, and we are talking about new security thinking for Cloud. And the big idea here is a lot of folks are trying to take the security designs that they had built for an on Prem environment and apply them to Cloud and Adeel’s arguing very well I might add, that that doesn’t work, and that’s a dumb idea, and that can actually lead to problems.

[00:00:53.630] – Ethan
And we need to rethink how we do security in Cloud and the rocket ship part is Adeel’s brain because the man is full throttle all the way. Wouldn’t you agree, Ned?

[00:01:02.640] – Ned
I would. And I don’t want to belabor the point. I don’t want to ramble on my own because Adeel has got so much to say. All I want to say is he asks two critical questions that everyone should be asking the why and the so what? And if you’re not sure what that means, you will find out in this episode.

[00:01:19.550] – Ethan
Enjoy this episode with Adeel Ahmad Implementation Services lead at HashiCorp. Adeel Ahmad, Welcome to the Day Two Cloud podcast. It’s fun to talk to you, man. I got to say, because I’ve listened to a couple hours of you on the hashicast where you were talking about a lot of these security ideas. Ned’s listened to them as well. So we’re both pretty keen to dive into some of these concepts some more. But before we do that, you got to tell us, who are you? What do you do?

[00:01:49.870] – Adeel
Hi, Ethan. My name is Adeel. I’m from the UK. London. I work for HashiCorp. I’ve been there for the last eleven months now. By the way, me working for HashiCorp, and whatever I’m about to say is not representative of HashiCorp. They’re all my personal opinions, from my experiences that I’ve picked up from in this role as well as actually significantly, my last role, I’ve worked for Tier one investment bank, working on Google Cloud. and working very closely with the UK regulations.

[00:02:22.240] – Ned
Okay, man.

[00:02:23.460] – Ethan
Got you. Okay, so we’re very clear on this. Everyone listening. Yeah, Adeel works for HashiCorp. This is not him speaking on behalf of HashiCorp. This is Adeel speaking on behalf of Adeel and all of his real world hands on experience with the security craziness, the crazy ideas you’re going to bring to us Adeel. We should start there to set the show up for everybody in a sentence. Or maybe two. Explain what SecOps folks are getting wrong about practical security in the Cloud. Maybe you could cite a few examples to help us get our heads around it.

[00:02:55.190] – Adeel
Yeah, sure. This is from my observation and my experiences working in the cloud very closely with security is that there is a common misunderstanding that the components in the cloud, the constructs in the cloud are very much the same as the constructs on Prem, such as a VM or networking. With that in mind, there is this trend of applying the very same controls that you would apply on Prem and applying that in the cloud more so, especially around defense in-depth, and how some of these multiple layers are not necessarily applicable in the cloud.

[00:03:41.950] – Adeel
Definitely. Especially when it comes to understanding the impact of some of these perceived risks.

[00:03:48.980] – Ned
Right. One thing that I’ve seen a lot of security professionals and even it Ops folks do, is they kind of treat the cloud as just another data center, and there’s so many more services and constructs in the cloud that they could be using and they totally missed the boat. I think you have a couple of examples of cases where a security professional is treating the cloud like just another data center. Could you jump into one or two of those?

[00:04:12.720] – Adeel
Yes. I think one of the biggest examples actually is in my previous role, I actually had to work hard working with both the networking team and the security team in convincing them that there’s no need to carry out multiple micro network segmentation in the cloud, especially when it comes to around VPCs. And if I use Google Cloud, for example, where they have this concept of a shared VPC, I believe AWS has recently started rolling this out where you can have multiple billing accounts or GCP projects attached to this shared VPC.

[00:04:50.180] – Adeel
Therefore, all of these different multi tenants are actually using the same VPC. As far as saying, actually, this is what I really push hard with both InfoSec and our security team. Having the backing of Google is that I went as far as even pushing for having a Dev environment and a Prod environment in what we would seem to be a single subnet in Google Cloud. What we must understand two things right is that we know most of us who are working in the cloud.

[00:05:25.620] – Adeel
Know and understand that there is no broadcast domain. Therefore, when you think about a slash 24 subnet, for example, if there is no broadcast domain, then intrinsically two IP addresses within that range shouldn’t be able to talk to each other because there is no ARP, they’ll be unable to discover each other in GCP. However, in reality that slash 24 in reality, each IP address is a slash 32. Therefore, it’s own broadcast swarm and they actually route to each other providing there is a firewall rule that allows them to talk to each other.

[00:05:58.760] – Ethan
So Adeel, let’s just park right there for a second and recap that. In other words, the way routing from host to host happens in the cloud is not the same as we would think of it if we were building a traditional networking VLAN. There’s no broadcast domain as you’re talking about, so no means to discover. And in fact, they’re not even in the same what we would call layer two address space. They’re not in the same VLAN, so to speak, they have similar address space. They’re in a common block of IP addresses.

[00:06:28.930] – Ethan
But that doesn’t mean it’s functioning in the GCP cloud. In this example, like it would function on a switch on your on Prem network. So instead you’re saying what’s really happening is every host is its own standalone little domain. In order for each of those host to talk to one another, there’s got to be a firewall rule that permits that. And so if that’s the case, we can rethink then what the security looks like so that the hosts are protected one from another. Just doing the same old thing we did on Prem and applying it to this construct in GCP would make no sense.

[00:07:03.550] – Adeel
Exactly. That’s the point. Networking in the cloud is not networking. The constructs are not the same. I mean, they are just a facade. This is my opinion. Again, I believe that they’re just a facade to ease the transition for the consumer of the cloud from moving away from this on Prem onto the cloud. Especially because if we understand these networking constructs or subnets, let’s just call Subnets and Microsoft Vnets or VPCs, they all have API endpoints. Right? So if you have an API endpoint, it’s a programmable object.

[00:07:39.220] – Adeel
When you understand that, you realize that how you would manage that differently. In Google Cloud, for example, you have IAM permissions against subnet. How do you have IAM permissions against subnet? It makes sense to have IAM permissions against programmable objects. So when you understand and realize actually in reality they’re not subnets, it’s just a contiguous block. And inherently, unless there is some kind of default, allow all internal firewall rule in place. Once you understand that you realize actually forget having multiple VPCs, why do I even need to have multiple subnets?

[00:08:16.450] – Adeel
Why do I even need to go down the whole static IP addressing? Why do I even need to do any subnet planning? Why do I need IPAM?

[00:08:23.320] – Ethan
Because your boundaries are different now. So again, on Prem, the traditional way you would design a network would be you build a block of addresses. They can probably talk to one another. If you need to firewall between a group of addresses, you have some kind of a layer three point that traffic has to route between and then there’s firewall rules applied at that point. Access list rules, some kind of a control there. But since the paradigm has completely changed up in again in your GCP example here, why would you build it that way?

[00:08:53.800] – Ethan
It doesn’t make sense where your checkpoints are are in different places now. And so it does really change your network design and again, just underscoring your point. Doing design and applying a security paradigm exactly like you did on Prem up in the cloud doesn’t make sense. It isn’t the right thing. And I guess to your point Adeel, it’s making things worse or needlessly complex.

[00:09:18.120] – Adeel
100%. And that’s the crux of it. When you read these security reports out there, the cloud security reports and the DevOps reports are out there. Most of them there’s a trend about how a lot of these vulnerabilities or security incidents are related to these misconfigurations. I 100% believe the misconfiguration is related to the extreme complexity that we produce as consumers within the cloud. It’s important we understand. And I believe from my experience, there is this piece of education that as consumers, we have a responsibility to take on ourselves, but also how our cloud providers would also need to really push on the fact of how there is paradigm shift and how that has shifted and how we should be expected to consume the cloud.

[00:10:06.350] – Adeel
I think there is some of that also missing. I mean, you have docs public docs out there explaining that, but to be a general enterprise, these are fundamental pieces that really need to be in place that would help security and regulations really understand where the risk is and where the attack surfaces and therefore what the attack factor is.

[00:10:28.740] – Ned
Right, I’m wondering if the cloud providers to a certain degree. You kind of said they wanted to present these familiar constructs to the people consuming the cloud. So they called it a subnet, but that’s not really a subnet. And they called it a VPC, a virtual private cloud. But it’s not an accurate descriptor. So maybe they shot themselves in the foot a little bit by going with this familiar terminology that doesn’t actually map to the construct that it’s applied to. I guess another question that I would have is because there’s all these new features and solutions in the cloud.

[00:11:07.390] – Ned
Is there stuff that SecOps is missing out on by being so focused on the traditional way of approaching security?

[00:11:17.170] – Adeel
Let’s take this example, because we focus on these multiple, lower levels, lower layers. Essentially, if you look at the shared responsibility model, the clouds are very clear, to say from the host below, everything is within the cloud responsibility, and they literally meant that, networking is your responsibility. But the networking on our layer is our responsibility. Well, it’s not networking, is it anything above that is just applications. And this is what we need to understand. Once we understand we realize that why don’t we start backwards? Let’s start with or sort of top down?

[00:11:57.870] – Adeel
Rather, let’s just say right. Let’s start with security application and work our way downwards. In essence, there’s a lot of focus, especially in my experience, when enterprises are going into the cloud, the very first thing they want to do is work on the network, and they’re asked to work on the perimeter, before that no one’s allowed to get in. So the perimeter needs to be secured. Right. But there is no perimeter. Even though clouds have been saying there is no perimeter, why do we still focus the perimeter?

[00:12:28.380] – Adeel
Because the penny has to drop for us to understand that a perimeter would mean that there is a broadcaster domain and that you’re limiting this broadcaster domain around the perimeter, which then you think is okay. Even if the data is open somewhat, we have a secondary protection here that they can’t get out of. But it’s not true, right? For example, if you were to create, say, a GKE and you’ve put it inside the VPC and you’ve now added a perimeter around VPC, anything that goes out in and out of must go through some virtual network appliance.

[00:13:01.860] – Adeel
But the moment you turned on public IP address to reach GKE, it’s not using your Palo Alto or F5 or whatever it is that you’ve got virtual appliance. It’s now going through the back end of GCP’s underlay and thereafter from there is going out to the Internet. So there is no perimeter.

[00:13:23.470] – Ned
It’s something we’ve brought up multiple times, especially in the context of networking is traditionally your data center. There’s only a couple of ways out, and those ways out are guarded by metal boxes sitting there being the Sentry for you, whether or not those were always configured properly and usually just the tangle of firewall rules that no one can actually comprehend. But those physical boxes were there when you moved to a cloud construct. You don’t have that physical box sitting there. And any developer who has sufficient permissions can just say, oh, I want a public IP address assigned to my instance, and they got it.

[00:13:59.790] – Ned
And suddenly you have another entry point into the network. So what’s the answer? Instead of trying to set up this fake perimeter, I think you sort of alluded to it with the idea of identity being a big component and service accounts within GCP would be an example of that.

[00:14:18.850] – Adeel
It does all come down to identity. I was a previous network engineer, and we all are familiar with the concepts of we’ve seen the concept of segregation around it, but we’ve never called it identity. Only now when we’re exposed to these ideas, we understand that we were treating IP addresses as identity, right?

[00:14:42.590] – Ethan
In fairness to the industry, there’s been some attempts at identity, but there’s never been any one consistent theme beyond typically five tuple that ever became identity consistently across vendors. You did some vendors doing some fancy stuff and adding a lot more metadata to give you a more intelligent identity of what that flow was and user context and so on. But it’s been not industry standard, shall we say Adeel. So just to go back to your point, yeah. Ip address kind of has ended up being the default identity for all of its shortcomings.

[00:15:17.730] – Adeel
100%. And what we need to understand is right is that the arena has changed, as in that the identity is no longer the IP address, or one must accept that the identity is now operating at a different layer. To the extent even network. In my opinion, networking professionals need to be aware or application aware and understand that the application itself has now become the identity. And if you have, for example, two applications, you’re able to identify and allow or deny access, even from a networking perspective, let’s put this in a networking perspective.

[00:15:52.810] – Adeel
When we have Firewalls, we would allow a continuous block that’s assigned to a set of applications to then talk to another contiguous block which assigned to another set of applications. At this point, what we’re saying is actually app A from this set of applications is only allowed to talk to app C from that set of applications, which is far more secure than allowing a big continuous block to talk to each other because you don’t have any control. For example, if there’s a three tier application, you have your web tier, app tier, and DB tier.

[00:16:24.580] – Adeel
Well, you’ll only allow the app tier to talk to the DB tier in another application or in another project. You can’t do that with networking unless you start assigning, as you say, making those subnets even smaller.

[00:16:36.810] – Ethan
Okay, so it sounds like you’re talking about different groups of applications that are classified by some kind of metadata. Things like IP addresses are ephemeral, so you can’t bank on them as new nodes come up in an app pool. Let’s say, because we’re doing auto scaling the IP address is going to be whatever it is, it really doesn’t matter. You still need to be able to enforce a security policy no matter what IP was assigned. And so you’re talking about identity again, higher up the stack where that identity is determined what that flow is.

[00:17:05.020] – Ethan
But what I’m not clear on from this perspective, where are you talking about enforcement happening? Is it still there’s a control layer in there that is mapping whatever that IP address ephemeral, though it just happens to be and doing an IP drop, or are we talking about drops and blocks happening somewhere else? Not at the IP layer at all.

[00:17:25.110] – Adeel
The point is, if we accept that IP address is no longer an accepted identity, then all of these controls that we talk about, especially flow controls, are only around identity. And if the only accepted identity at that point, say, example, in Google Cloud is Google Cloud identity, then why would you add these other controls that are not attached to an identity or that you’re forming as your own identity? But they’re not identities. Like, for example, if we say the two applications a service account is allowed to access GCS bucket, for example, right.

[00:17:56.120] – Adeel
How do you network layer to that and how do you identify that?

[00:18:03.330] – Ethan
So you’re moving enforcement way up the stack to I mean, are we saying firewalls don’t matter anymore because of things like IAM controls.

[00:18:13.590] – Adeel
What I’m saying is firewalls. IAM is the new firewall, even the Firewall as a concept. Let’s go back to GCP Firewall rules, GCP firewalls, although they’re managed centrally, they’re actually enforced at the host level, it’s actually a host-based firewall. It’s distributed. Right. We take the traditional concept of the firewall. It’s a centralized device that we expected all the traffic to come through here. And at that point the firewalls then decide which way it goes to. Right. But the moment we start going a more decentralized enforcement approach.

[00:18:49.750] – Adeel
The concept of firewalls is not applicable here unless we say distributed host based firewall. Again, if we say hosts are ephemeral, then the firewalling is now taking place at the application. An example would be envoy proxy. Let’s say is that enforcement and then you have a central control plane like Istio Service mesh basically essentially do. As an example. Let’s go back to say Google Cloud or AWS. Their control plane is IAM. And the identity or the enforcement essentially, is happening at the underlying the API endpoint. If you’re calling GCS Bucket or if you’re calling a VM, that API endpoint is protected around an IAM and an entity or principle that’s allowing a call to the endpoints.

[00:19:37.470] – Ned
Right. Because we’re not just talking about virtual machines anymore with IP addresses. We’re talking about all the other cloud services that exist inside a public cloud, and those don’t have nice tidy IP addresses or ranges that you can necessarily assign. So we need that extra metadata to control access to those. The pushback I would have on this is the Firewall doesn’t just do allow deny lists based off of Tuples, right. Some firewalls also do packet inspection and maybe even looking for suspicious traffic or malformed packets or requests that aren’t expected, and they’re filtering out that sort of stuff, too.

[00:20:16.580] – Ned
And if someone has managed to compromise, say, your application servers somehow and now they’re trying to land and expand. I don’t see the identity piece working to filter out what could potentially be a lateral attack. So how do you guard against that vector without a firewall doing that inspection?

[00:20:35.010] – Adeel
I’ll disagree. Right. Because the assumption here is if you’re already taking a single layer control, which is the identity based approach, at which point the assumption should be that you’re already applying the principle of least previous access. Essentially, an application would only have access to what it needs to. So if there was to be some vulnerability or some rogue action safeness within that application, your firewall will not prevent them from accessing what they already have access to. Right. Whatever they all have access to. Well, the firewall hasn’t really added any value for those applications that they never had access to in the first place.

[00:21:16.350] – Ethan
Well, okay, I’m going to push back, too, because Ned’s argument is the Firewall is going to see certain packets that trip a signature that fails some sort of a deep packet inspection and discard them before they ever get to the application.

[00:21:30.750] – Adeel
What’s the problem? We get to the application.

[00:21:34.170] – Ethan
Let’s say the application is not patched to current vulnerability standards. Whatever’s come through, that payload can take advantage of that vulnerability.

[00:21:44.530] – Adeel
Well, there’s two things, right. Okay. I accept that that’s a scenario that can happen at which point we’re now trying to cover or mitigate a different risk here. Right. And the risk of trying to mitigate here is this unpatched vulnerability. So this unpatched vulnerability, how is the risk here because of the human error? They forgot patch it. We need to understand this scenario here for you to think there is a potential risk. The only way we can think of that is to understand the context is that actually we do have a human intervene process in place.

[00:22:25.920] – Adeel
Therefore, there is this risk of the inevitable human error, and at which point we think the control to mitigate that human error is this firewall. Do we not think that it’s the wrong control that’s been applied here? If the risk here is human error, if the risk is human error, then surely the control that we need to be applied to the human error should be removing the human altogether.

[00:22:47.530] – Ned
Well, I think a good example this would be a SQL injection attack, right. I have an application server. It should be talking to the SQL server through whatever interface, but somehow it’s been developed incorrectly. And there is this SQL injection vulnerability. If you have something in the path, say something like a WAF or something along those lines that’s looking at that layer seven and going, oh, that looks like a SQL injection attack. So I’m just going to reject that. I think there still could be value putting a device between the two different things.

[00:23:19.860] – Ned
Maybe not doing that through a traditional firewall, but still some type of filtering.

[00:23:26.890] – Adeel
I agree. Right. In principle, as in there needs to be a function that is able to detect and prevent that from taking place. I agree with that. Right.

[00:23:34.550] – Adeel
Whether that’s the firewall or not, that’s debatable where I think in my view, that would take place or should take place, as in this detection and control would be something. This is an example. I don’t think this exists today where if we went back to this whole distributed envoy proxy scenario at this point in time, when we apply these kind of service controls, service A can talk to Service B that enforcement takes place on the ingress of the Envoy proxy. However, if we were able to apply those similar controls again, say around SQL injection attack and also the ability to detect or even say, don’t allow anything outside of XYZ and also apply that on the egress of an envoy proxy, then you have a more distributed way of managing that instead of actually trying to centralize everything I would accept at this point, I don’t know whether this exists or not, but the point is that if we were to push for that today we may be able to develop something like that, but what it is our problem is that we try to rely on the current technology and apply that into the cloud, especially when.

[00:24:53.210] – Ethan
The part of this, I think is driven by compliance stuff. So I’ve had highlighted here in the notes that I definitely want to hit this compliant thing. I’ve supported PCI, SOXs and HIPAA environments in the US, for example. And the way some of those regulations are written there, they can be fairly prescriptive with you need to have a firewall here to separate these things. And so on are those regulatory bodies of those regulations and compliance regulations keeping up with public cloud and the fact that the paradigm is changing and you can have the same security that they intend for you to have, but in an updated in a modern way, not just replicating what we’ve done on Prem.

[00:25:31.550] – Adeel
Yeah. So I accept that there is that right. The regulatory that can be prescriptive and actually go to the Nth degree to dictate or describe how to isolate data from unauthorized access. In the UK in the UK, I know that FCA who take their guidelines from NCSC, is a financial conduct authority. They take the guidelines from NCSC, which is the National Cyber Security Center, and they actually very up to date and recently have published as guidelines around zero trust architecture in the cloud. Their guidelines are more high level and more of a recommendation.

[00:26:15.480] – Adeel
So the guidelines will be around ensure that the emphasis here is the data and ensuring that the data is isolated from unwanted access, and then they start providing recommended guidelines around. Okay, well, this can be done for say, IAM. This can be done through mutual TLS. But actually the funny thing is they say that if the network is too large, such as an enterprise, they should be treated as if it was a public internet, therefore don’t trust the network. I appreciate and agree this will vary from country to country, but I believe and I strongly believe and I think one of the reasons why I joined the team of the HashiCorp is that I think there needs to be a push from these vendors as well as enterprise consumers.

[00:27:05.700] – Adeel
When I was working in my last role, for example, we did come across these roadblocks and we did start talking to FCA and we got clarity there. So in our case, actually it was more of a the guidelines were not so clear. Once we had directly engaged with them, we realized that we got clarity. But I do believe there’s a massive effort that there’s a massive gap here, especially in the effort of building more clarity or, let’s just say, working together with the likes of FCA or PCI to understand or educate the nature of cloud and how it works and therefore actually to be less prescriptive, but also even for the cloud providers as well as for the cloud based vendors to understand how do their products or how do solutions within their platform, how should they be built to be PCI compliant or how can they be built to be HIPAA compliant?

[00:28:10.150] – Adeel
Now, these are efforts that I 100% believe needs to take place, especially given that they have they build the platform with their opinion on how it should be. Therefore, I also believe it’s their responsibility to also then reach out to these regulations to help educate them and then help rewrite guidelines that can be easy to consume.

[00:28:32.250] – Ned
Got you. So I will say that as far as my knowledge, I don’t know about GCP, but both Azure and AWS do have guidance docs architecture docs that recommend an architecture for PCI or HIPAA. Now, assuming you follow that, they don’t guarantee you’re going to pass because they don’t want that legal responsibility. But they do have at least the guidance on that. That takes advantage of some of this stuff. But I think it’s really an education component, partly for the cloud engineers. They need to educate themselves on different options out there.

[00:29:03.010] – Ned
But Additionally, those cloud engineers also need to bring the compliance security folks into the conversation and let them know, hey, these are your options. And this is a compensating control for this. You no longer have to go with the traditional approach. There is another approach that meets the same ultimate goal.

[00:29:23.610] – Adeel
Yeah, there’s two things there, right? I mean, you’re right. There needs to be more of those kind of guidelines out there to say, hey, these are compliant, and I am certain here that they probably read the guidelines based on the guidelines. This is the solution that they built, and they believe it to be compliant. They’re not saying certified because the regulations haven’t certified it. But more so, though, I think is that even though I’ve read those guidelines, right. And when I read those solutions, architects or recommended reference architectures or recommended patterns, when I read them, I realized that they haven’t gone to the regulations and spoke to say, hey, you need to change how you prescribe your guidelines because look how I found it.

[00:30:06.080] – Adeel
Rather, what they’ve done is pulled off the public docs. And based on that, they build out the solution architecture, because if you look at it, for example, the PCI compliant ones, they’re still recommending network segregation. They’re still recommending a separate VPC. So even though, and this is I’m talking about Google Cloud as well, right. Even though there are constructs where they know that the isolation is already achieved, they still recommend a separate VPC because maybe they want to get something quickly. It is easily consumable. People just signed off or something and they go on with it rather than actually going back to the regulations say, hey, this network segregation that you’re talking about, it’s unfair because we want to achieve.

[00:30:46.780] – Adeel
What is it that you ultimately want to achieve here? Is that the application isolation? Is it the network isolation? If so, we’ve already achieved that. Why are you prescribing network segmentation as an example? These are the education that needs to happen. But when you look at your guidelines, they don’t reflect that.

[00:31:01.410] – Ethan
[AD] I’m rudely cutting into this conversation to ask you where you’re at with your multi cloud networking strategy. Because a few different multi cloud networking vendors.

[00:31:08.170] – Ethan
They’ve come on as podcasts and they’ve shared their approach here on the Packet Pushers Podcast Network. One of those vendors is today’s sponsor Aviatrix. And in fact, you heard from Aviatrix engineers and a customer as Ned and I nerded out with them on the day two cloud podcast, episode number 113. We covered their data plane that’s common across all the different clouds, giving you consistent network operations. Now, if Aviatrix isn’t a company name, you know very well, don’t just blow them off.

[00:31:35.470] – Ethan
I challenge you to consider all vendors that might solve your problems. And Aviatrix is going out of their way to make it easy for you to include them in your upcoming multi cloud networking Bake off. First, they are well funded. So they’re going to be around for a long time.

[00:31:49.390] – Ethan
Tell your boss, Aviatrix just closed a $200 million Series E funding round if you get asked. Second, Aviatrix is also offering Nerdy deep dives for you, the engineer, so that you can make an informed, nuanced decision about whether Aviatrix is the right multi cloud networking strategy for your organization. They call it flight training, and you can go for a 90 minutes hands on lab.

[00:32:10.840] – Ethan
A five hour deeper instructor led hands on experience and even prep for the Aviatrix Certified Engineer certification. So give day two cloud episode 113 a listen and then visit Aviatrix dot com slash flight-training to find out more. I’m hoping to take the five hour flight school training sometime myself soon if they can find room for me again, that is Aviatrix dot com slash flight-training and let them know you heard about it on the Packet Pushers Podcast network. And now back to today’s episode. [/AD] [00:32:44.490] – Ethan
Well, the joke about these regulations, too, is being compliant with the regulations does not necessarily mean that you’re secure as well. Something else. They’re a guideline. They’re a good place. You can go an awful long way with it, but just being compliant with a particular regulation does not guarantee a secure environment, and you’re coming at it from a different way. Saying, hey, we can be compliant and not meet the regulation, or we can be secure and not meet the regulation and not be compliant Adeel.

[00:33:16.030] – Adeel
There’s a good example of that. I can give you good example, actually. So in my previous role, for example, they try to anchor on the fact that we must encrypt sensitive data or confidential data with our own keys, especially PII data, and it has to be with our own keys, and we must demonstrate control and rotation etc, etc. So, I started looking deep into the guidelines or the regulations and the prescriptions, there what they say is that should you encrypt with your own keys? First of all, they say that it’s actually enough for you to receive an SLA from your cloud service provider.

[00:34:00.310] – Adeel
This is again, I’m talking about UK. Here a part of shared responsibility model. It’s enough for you to get an SLA from your platform as far as they have encrypted it. And they are managing and they are able to provide auditing report. They demonstrate the whole rotation and management of those keys and auditing capabilities around those. Right. So their reports or annual reports are enough for the likes of FCA to use and accept as compliant. That’s the first thing. 2nd thing, though, is okay. Let’s go with the scenario where okay, we must demonstrate the ability to rotate and own those keys, et cetera.

[00:34:42.320] – Adeel
So in Google, they have something called CMEK, the customer managed encryption key where you would use the Google class Kms generate a Kms key, and then you would encrypt a GCS bucket or a hard drive with this Kms key, and security professionals are believing that.

[00:34:58.590] – Adeel
Okay, this is how we are demonstrating the whole management of rotation of the key. And technically, we own the key because it’s under the ownership of the account of the banks. In truth, though, is that CMEK enabled the GCS bucket. Well, what they mean there is that first of all, the Kms key is not what’s encrypting the data. That’s a KEK. Right. The key encryption key, which is encrypting. The GCP owned DEK, the data encryption key. So all you’ve done is demonstrated the rotation and management of a key that’s encrypting the key and not the data.

[00:35:38.900] – Adeel
So actually, you’re not compliant, but you’ve got a false sense of control, and all you’ve done is actually added an operational hazard and increase operational complexity. Because with Kms keys, there’s that danger of where if you delete that Kms key, you’ve lost that data forever.

[00:35:59.050] – Ned
Right. I think that’s an important thing to really draw out a little bit. And what you’re talking about is we have this idea in security of defense in depth that I need multiple layers of security and more layers is probably better. Right?

[00:36:13.320] – Ned
Because if they get through one layer, oh, there’s another layer. Now you can’t get through that one. But each of those layers, like managing your own key. That is another layer of administrative burden, complexity and a possible failure as well. Because like I said, you mishandled the key, you lose the device that has the original key on it. You’re kind of up the Creek without a paddle, as it were there.

[00:36:39.410] – Adeel
100% There is that piece. Right. But again, you might have some security professionals that will show no empathy towards that. Right. They’re showing the empathy towards the operational complexity or any workflows around that simply just for the mandate. You must have it. How you do it? That’s up to you. Okay, fine. Fair enough. Your security. Let’s go with the security perspective. Right. Have you considered that each of these layers is an additional attack surface? Therefore, there’s an attack vector here. I’ll give you one example in my last role again, when I was rolling out HashiCorp vault, I ensured that everyone has access to vault.

[00:37:18.040] – Adeel
There is no networking restriction on who can access Vault. Every client is essentially a vault client, everyone is a vault client, and we will control the RBAC through vault policies. And the security team, then were going to respond to it, and we should have a broker. We should also have multiple layers of load balancers or firewalls. Actually, two layers of firewalls. And then obviously having a broker in between all of this stuff. Right. To prevent from what are you trying to prevent? What are you trying to mitigate here? A DDoS attack.

[00:37:53.410] – Adeel
I said, if they do a DDoS to the firewall, your Vault services are unavailable. Right. Let’s consider that. I think all these other layers, if those layers were then compromised. When I say compromise, at this point, they had a DDoS attack and therefore normal traffic can’t go through, then your vault service is unavailable. So the impact is just as much as if you had your Vault service open to all the other clients. So really the problem hasn’t gone, right?

[00:38:24.470] – Ethan
You’re arguing the difference between designing a holistic security system where every step is designed to work as an integrated whole versus I on this team am responsible for this piece. And so therefore, I have to have that piece in place so that when things go sideways, I don’t get blamed, which sadly, is what often happens.

[00:38:42.030] – Adeel
The truth. But the truth of the matter is, let’s take it back to you. On a business level. Why has an enterprise going to the cloud? And the bottom line is they want to save money. They’ve been promised. They’ve been given an ROI to say if you go into the cloud, the X number of VMs that you had on prem moved to the cloud, you’re saving XYZ money. But what’s not apparent there is that if you were to design and consume the power consumption model this way, then you will save this money.

[00:39:11.780] – Adeel
Right. Which means that actually those silos that you’ve created and those you have on Prem rather and those human intervene change review process, human review process you have if you want to apply that into the cloud, you’re actually more expensive. And what happens and you’ll see my experience. What’s happened is that third year into the program. The plug is pulled because it’s too expensive. We don’t see the returns and actually come back to on prem again. This is what you see cause the exec team. Don’t understand why they’re not saving the money that they’ve been promised.

[00:39:45.590] – Adeel
The point I’m trying to make here is that security or governance and all of these silo things probably don’t have deliberately don’t have the business context. And they should. For example, let’s talk about some of the securities and the impact of such if, for example, there is a risk, I don’t know. Let’s come back to the example of lateral movement and networking. So we have full confidence that all applications are secure at the top layer. However, security professionals are demanding and mandating that we should start specifying which ports to be open in each of these holes, which are host based firewalls.

[00:40:31.030] – Adeel
And in addition to that. But let’s just add all of this stuff. Right. What’s the risk here? We say it’s just best practice. The thing is, we need to understand here is that if there are no risks, well, let’s just say we did get a rogue attack was rather unauthorized access into the network. My response will be, so what? Right? If there is no impact, why are you investing up front? Again, I’m not dismissing that these controls be applied, but I think if you take the business impact into consideration, it helps you prioritize those controls.

[00:41:10.540] – Ethan
Yeah. You’re arguing for risk assessment. What is the risk? And if the risk is tolerable enough, why are we killing ourselves with all of this complexity, either in the design or the cost to put this control point in? If the risk is tolerable, we can handle it. If we get hit with that thing, it’s fine.

[00:41:28.070] – Adeel
Yeah. Especially what you need to understand is if you think what’s the harm in doing that is it not better to have multiple layers anyways, have you not considered the unintended consequences that have come about because of those? Right. Let’s understand that. And start weighing up the pros and cons. And this is what we’re not doing, because again, the silos haven’t gone. We’re so isolated with actually not having contextual architecture. And this is extremely important, especially even with this KMS piece. Right? If the data, given that it’s not storage, one of the things one security professional I was having long been about is that let’s use the example of the VM image and security professionals were mandating that those VM images must be encrypted.

[00:42:18.230] – Adeel
With the KMS key. I asked them the difficult question. Why?

[00:42:26.690] – Adeel
Because someone might pull the image down from GCS market and then fire up on a VMware VirtualBox. This is where we need to understand a VM image in the cloud is not the same as a VMDK as an example, right? It’s not files. It’s an object that is represented as a file to us. But in reality, they’re sharded pieces of multiple streams to come together to represent this VM image to us. When you understand that the underlay engine, the storage engine, for example, the image engine. Right. They’re built up of multiple components Google Cloud has published.

[00:43:04.550] – Adeel
I know I always referencing back to Google Cloud only because that’s where majority of my cloud experiences are. They will publish all these white papers to explain how Andromeda works, how Colossus works. It will tell you that how all of these images in reality, in the back they just sharded different types of objects. And even if you did, for example happen to bring them all together, you can’t spin them up with Virtual Box as a vmdk. So that’s the first piece of understanding. Right? But let’s just say you can.

[00:43:34.740] – Adeel
So what? My next. I will ask the question again. So what? Okay, so someone managed to download a VM that was based by the bank. Okay, why are you basing a base golden image with sensitive information and all might have sensitive information? A base image should just have a manifest of what the bank presumes to be your secure image. For example, it should have Splunk agent installed in it should have different monitoring, but any sensitive information or like, password back to home for connectivity.

[00:44:18.990] – Adeel
Again, I know I’m contextual here, right? But in a scenario where you have a secrets manager or you have something like vault, then essentially, you would spin up Vault agent, and that would pull up all these secrets runtime once, booted up all the secrets or credentials, place them in the necessary ini file, and then those agents, other agents will operate based on that. In our scenario, we did that right now, we did have our base images that way. So I don’t understand what the problem is. If someone did say, manage to find first of all, the way the nature of the cloud, even an internal say, Google employee can’t actually steal a disc and find anything in there.

[00:45:01.340] – Adeel
Secondly, the bucket that is stored again, that bucket is a back endpoint which no one else has access to. Let’s just say forget all of that. Right? Forget all these other controls. Even it’s just that it was properly bare and it’s open. What’s the problem? And I think that should be the first question we should ask before we start creating all these controls to a nonexistent risk.

[00:45:24.000] – Ethan
I love the Adeel school of security. So what? So what if it happens? Is that so bad? It’s really an important question.

[00:45:32.620] – Ned
I like the two things that you keep coming back to is so what? And why? When someone comes to you with the control they want you to put in place, what’s the actual risk that you’re trying to mitigate against? And why is that a problem? The thing that I keep coming back to is you’re talking about all these additional services and options and features that exist in the cloud that allow you to approach a security issue from a totally different light, like having a secrets manager where you can store all of that sensitive data, you don’t have to Bake it into the golden image.

[00:46:06.940] – Ned
You can just have it dynamically, pull at boot up, and you can configure that Secrets manager to only accept requests from a validated identity, which gets back to our identity conversation. That VM has an identity on any of the clouds. They all have some version of that. And if it can’t verify the identity, the Secrets manager goes no, you can’t talk to me and get that information. So that completely makes sense to me. One last thing I want to bring up, and this is because we’ve really been focusing on GCP, and obviously there’s at least three big clouds out there and then other clouds as well.

[00:46:45.980] – Ned
Alternative public clouds. I’ve heard them called how does your concepts? How do your concepts map to a multi cloud world where an organization isn’t just dealing with GCP? But they’re also dealing with Azure and AWS? Let’s say.

[00:47:06.190] – Adeel
That’s a gap. The reason why I say this is because if you were to consume AWS or Azure or GCP, it just goes that cloud alone. You’re taking advantage of their native cloud identity system to be able to manage that. Right. The moment you start going to multi platform, given that we don’t really have a good story around a vendor neutral, solid accepted identity system, it becomes quite difficult to do. From a user base. Let’s take it back to our user base. Right.

[00:47:41.960] – Adeel
We have multiple identity providers now, like Okta, like Centrify, all of these different ones for a human. And they’ve got a good story now where they try to add all these different factors to identify you properly. For example, the location, what device you’re coming from, what kind of plans on your phone, as in the usual business hours for you. Other factors. Right. And all of these build this kind of trust to say. Okay. Actually, you’re authorize to access XYZ or like, for example, if you’re not coming from one trusted device, then you do have access to your emails, but you won’t have access to Git.

[00:48:22.290] – Adeel
There’s a good story there, but there needs to be a good story for machine authentication, because what I’m suggesting here is essentially, I’ve already assumed that the human element is now removed away from the process for you to remove the human element. That means that we have a good story around human authentication. I’m sorry. Machine authentication and machine identity. I mean, you could possibly, for example, Vault, HashiCorp Vault. I know, referring back to Vault as well, only because I’m quite passionate about and I’ve been using it a lot.

[00:48:57.880] – Adeel
For example, you can create multiple authentication methods like AWS, so that Vault then recognize AWS machine identity because you just plug into the AWS platform. And this by the same virtue for Google Cloud and Azure. Right. So there is a potential where actually, I can maybe, for example, have Azure VMs to be able to consume content from a GCS bucket. And how we would do that would be we would have to first speak to Vault. Vault will then provide dynamically generated credentials after that, to the actual VM, and then the actual VM would use that to consume GCS, so it’s possible.

[00:49:39.090] – Adeel
But there is also the element of passing credentials around. One of the things I discussed in our HashiCast with Rob was that ideally right. We need to move away from secret zero. And with Vault essentially, even though I don’t like to call it secrets management because I don’t think it does justify that. Rather, I think it’s a dynamic access management. But the only problem is how does it provide access? It provides access by providing or generating credentials. Right. But what we really need to do is move away from credentials in the first place.

[00:50:11.830] – Adeel
Just like how Azure, AWS and Google have a good story when they do the whole IAM control when you can say actually, this GCS bucket can only be has this VM has read access to this GCS bucket. There are no credentials that they pass around to make that happen. So today for multi cloud scenario, you need to have some kind of central, say multi platform accessible function like Vault or say Consul as another example, Consul service mesh. Actually forget Consul, just a Federated service mesh right across multiple platforms that can also unify those kind of resource identity and Vault again can also kind of facilitate that as well.

[00:51:01.330] – Adeel
From the point where if you were to integrate with every process or every communication process, so have you talking to another VM or talking to say BigQuery or RDS must go to Vault and generate a credential on the fly and give that to you. We can do that today. But in an ideal world, really, we should be moving towards something that is more credential less. But it does mean that the identity will be central has centralized. Rather, it will be ubiquitous and identity that will be accepted across multiple platforms.

[00:51:33.420] – Adeel
I think that’s a future state, right?

[00:51:35.990] – Ethan
Yeah. It is a future state, and it feels like actually kind of a natural place for us to end up this conversation today Adeel. Man, I think you have fit in twice as many words as any of our other guests ever. You talk so fast.

[00:51:48.170] – Adeel
Sorry. Especially when I get passionate. I got so much in my mind, I feel like I need to unload and I have all this context out of my head. Everyone already knows what I’m talking about.

[00:52:02.930] – Ethan
No, it’s great. But compartmentalized three big things. Three takeaways from this episode, things you want to leave the listener with. Just keep them tight. Some bullet points that folks can walk away with from today.

[00:52:14.930] – Adeel
Yeah. Okay. There are three ways I’ll succint, right? Is that first of all, security is everyone’s job. It’s about the awareness that actually every different function should have. So you shouldn’t be centralized back to the security professional. Second, is understanding the business risk and actually not just that is to don’t be afraid to go ahead and actually carry out a streamlined validation process, even though these risks seem to be the same risk that you may seemingly seem to be appearing on Prem. Don’t be afraid to go ahead and run a validation process.

[00:52:51.800] – Adeel
That’s the second one and the third is understanding the identity piece and understanding how the identity being the primary layer. And what does that look like for each different cloud providers? Those are the three things I think are fundamental going to the cloud.

[00:53:13.560] – Ethan
The identity thing is especially big to me. The more I thought about some of your ideas when I was listening to the HashiCast that you were on earlier and so on. That’s the point. I keep coming back to that. Identity changes the game, how we think about securing application workloads and flows and so on. Adeel, how can folks follow you on the Internet?

[00:53:33.470] – Adeel
I’m on Twitter. My Twitter handle is DevOps underscore Adeel. That’s one I’m most active on, and I’m keen on getting lots of feedback from everyone. I may be wrong. These are all personal ideas I have. So I’d love to share this with everyone. And really, the more people can come together, maybe there’s some mature story that comes out of this.

[00:53:54.490] – Ethan
Yeah. And Adeel really does want to have more conversations with you. This whole show began because he pinged me on Twitter just to have a chat about some of the things he was talking about on a couple of hashicast podcasts. And we had some back and forth and some dialogue, and it turned into this show. And so, yeah, at DevOps underscore Adeel, hit him up with your thoughts and ideas and questions and let’s get a conversation going as a group of people that listen to Day two Cloud, that would be fantastic.

[00:54:19.670] – Ethan
So, Adeel, thanks to you again for appearing on day two Cloud. And if you’re still listening out there virtual high fives to you, you made it amazing. If you have suggestions for future shows, Ned and I want to hear them. We monitor at day two Cloud show on Twitter. So tweet us your ideas. And if you’re not a Twitter person, that’s cool, go to Ned’s fancy website ned in the cloud dot com. He’s got a form there and you can submit your ideas there. A little bit of housekeeping now.

[00:54:43.280] – Ethan
Did you know that you don’t have to scream into the technology void alone? You’re not alone out there because the Packet Pushers Podcast network has a free slack group that’s open to everybody. Visit Packet Pushers dot net slash Slack and join. It’s a marketing free zone for engineers to chat, compare notes, tell war stories, and solve problems together. Packet pushers dot net slash slack and we’ll see you in there. And until then, just remember, Cloud is what happens while IT is making other plans.

More from this show

Day Two Cloud 164: DevSecOps Is A Real Thing

Today on the Day Two Cloud podcast, we talk DevSecOps and how it's more than just a marketing term. We also discuss Infrastructure as Code (IaC) and IT as Code and what that actually means for operations folks. It doesn't mean you have to write code all...

Episode 124