Search
Follow me:
Listen on:

Day Two Cloud 145: Using Open Policy Agent For Cloud-Native Policy Enforcement

Episode 145

Play episode

Today’s Day Two Cloud explores the Open Policy Agent (OPA), an open-source project that serves as a policy engine for cloud-native environments. According to the OPA Web site, OPA “provides a high-level declarative language that lets you specify policy as code and simple APIs to offload policy decision-making from your software. You can use OPA to enforce policies in microservices, Kubernetes, CI/CD pipelines, API gateways, and more.”

OPA is a graduated project in the Cloud-Native Computing Foundation.

Our guest is Anders Eknert, Developer Advocate at Styra. Sytra created OPA.

We discuss:

  • Anders’ own journey to OPA
  • OPA’s role in policy enforcement and security
  • The need for policy enforcement within microservices
  • How OPA decouples policy and enforcement
  • Rego, OPA’s policy language
  • OPA use cases
  • More

Sponsor: StrongDM

StrongDM is secure infrastructure access for the modern stack. StrongDM proxies connections between your infrastructure and Sysadmins, giving your IT team auditable, policy-driven, IaC-configurable access to whatever they need, wherever they are. Find out more at StrongDM.com/packetpushers.

Sponsor: ITProTV

Start or grow your IT career with online training from ITProTV. From CompTIA to Cisco and Microsoft, ITProTV offers more than 5,800 hours of on-demand training. Courses are listed by category, certification, and job role. Day Two Cloud listeners can sign up and save 30% off all plans. Go to itpro.tv/daytwocloud and use promo code CLOUD to save 30%.

Tech Bytes: VMware

Stay tuned for our sponsored Tech Bytes conversation with VMware. We discuss vRealize Network Insight Universal. Our focus is the SaaS version of vRealize Network Insight and how it can help with your cloud migration project.

Show Links:

Open Policy Agent

Open Policy Agent Docs

@openpolicyagent – OPA on Twitter

OPA Slack

Styra Academy

The OPA AWS CloudFormation Hook

Transcript:

[00:00:00.970] – Ethan
Sponsor StrongDM is secure infrastructure access for the modern stack, StrongDM proxies connections between your infrastructure and sysadmins, giving your It team auditable policy driven IAC configurable access to whatever they need, wherever they are. Find out more@strongdm.com PacketPushers this episode of Day Two Cloud is brought to you in part by ITPro TV start or grow your It career with online It training from ITProTV. And we have a special offer for all you amazing Day Two Cloud listeners. Sign up and save 30% off all plans just before we start the show today. Keep listening past the end, we’ve got a tech bite where we’re going to be chatting with VMware. Don’t miss it.

[00:01:01.250] – Ned
Welcome to Day Two Cloud. Today we’re talking about OPA. Hey, OPA, it’s the OPA, the open policy agent. And we have Anders Ecknert, a developer advocate from SteerA to steer us through the conversation and the nuances of what OPA is. What jumped out to you.

[00:01:19.920] – Ethan
Ethan, what was awesome that I did not know what OPA was all about exactly. I thought it was, oh, it’s a security tool. It’s some sort of a Kubernetes related firewall or something. I don’t know. And as I dug into this, doing the research for this show, Ned, that is not exactly what it’s all about. Yes, there is definitely a strong security component to OPA, but it does so much more.

[00:01:41.630] – Ned
Yes, it’s a wide open decision maker. Essentially, you give it a policy, it makes a decision. But we’re going to dig into all that and more with our guest, Anders Eckner, developer advocate at Sierra. So enjoy the conversation. Well, Anders, welcome to Day Two Cloud. We’re very excited to have you here to talk about OPA or I think it’s pronounced OPA, but we’re talking about open policy agent. And I understand you discovered OPA or OPA through your own personal journey. Can you expand on that a little bit and define what OPA is along the way?

[00:02:17.270] – Anders
Yeah, sure. First of all, thanks. It’s great to be here. So, yeah, my own OPA journey started. I think it was three or four years ago where I was in a team for the last few years. We had worked on kind of solving identity for that company where I was at the time where you have all these standards, you have OAuth, you have Open ID connect, you have like skim for user permissioning and so on. But we kind of failed to find anything when we try the same thing for authorization. So it’s basically how I found out by eventually it’s pretty much by random. I went to Cuban in Barcelona that year and we saw a couple of talks about OPA. And those talks were primarily, I think, targeting the Kubernetes crowd. And it makes sense for a conference named KubeCon. Yeah. So they weren’t really targeting the app authorization use case. That was kind of the scenario where we were in. We had a few hundred microservices or I think we had 700 micro services, and those were distributed among, I think, 30 teams or something like that. And these teams had come and gone like the mergers, acquisitions and whatnot.

[00:03:49.920] – Anders
So some of them were doing Python, some of them were doing Java, some net. And the problem with that is that when you try and solve identity, you have a centralized component, you have an identity provider or you have many identity providers, but they all go in via some form of centralized component, which is your identity server. So when your users log in, there’s commonly some single sign on provider or single sign on solution. So no matter how many products you have, they can log in using the same credentials. It’s not something that is opaque to the user. They don’t really know what’s going on. Not beneath there. But the problem with authorization, which also makes it so much more interesting, is, of course, especially for distributed systems, there isn’t really a centralized component that you can rely on because authorization decisions, they happen everywhere. You only log in once. Right. And then that’s kind of persisted for a long duration of time. But authorization needs to be done, like everywhere and all the time. So if you try and delegate that to a central component somewhere, that’s going to be slow and it’s going to be a bottleneck of your entire security stack.

[00:05:21.830] – Anders
So that was kind of the premise or the challenge that we were facing. And yeah, that’s kind of how we discovered OPA.

[00:05:32.810] – Ned
Okay. That puts a lot of pieces together for me because I have fought the authentication Dragon before, trying to get everybody on the single sign on or using the same identity provider. That’s difficult enough. But then you’re talking about how many 300, 700 micro serverless that all have some sort of authorization they need to use in addition to the authentication. Like, yes, I know that you are who you say you are, but what should I allow you to do in my little sector of the micro service? So that is a tricky problem to solve. So do you think of OPA as a security tool? Is that its primary function? Because policies in the name that makes me think of, like, compliance and regulations more than security?

[00:06:17.990] – Anders
Yeah. I don’t know if there’s like a perfect category other than the policy category itself, but I think it’s mainly concerns the main concern is probably security. And I think often what we call compliance or regulations, they kind of fall into that domain somehow, too. So at least a lot of the regulation that we see people use policy for or the kind of rule sets and so on, whether they try to conform to some standard or some regulation or whether it’s organizational rules and policy, I think it’s fair to call it to kind of place it in the security category.

[00:07:10.370] – Ethan
One of the words you used when you were describing OPA in your journey to finding OPA. Opa. Sorry, I got to say that that’s the right way to say it. Right, Andrews? Opa.

[00:07:19.250] – Anders
Opa is the right way to say it.

[00:07:20.640] – Ethan
Okay.

[00:07:21.710] – Anders
There’s two pronunciation challenges included here. There’s OPA and there’s the policy language of OPA, which is Rego, which is spelled R-E-G-O. There’s a lot of people saying Rego, which. So, yeah, we invented two concepts, and there’s these discussions and pronunciation of both. So I guess we failed there in some respect.

[00:07:54.110] – Ethan
No, I mean right in the docs, you say OPA and then you give a phonetic how it is pronounced and Rego as well.

[00:08:01.070] – Anders
Right.

[00:08:02.130] – Ethan
Okay.

[00:08:02.480] – Anders
We try our best there.

[00:08:03.640] – Ethan
So on your journey, there of discovery around not the authentication problem as much as the authorization problem, the need for ongoing authorization. You ran into OPA. Opa. I did it again. Opa. So does that mean OPA would have something to do with zero trust?

[00:08:22.590] – Anders
Yeah, for sure. I think the basic idea behind zero cost is pretty much that you don’t assume anything around identity and you don’t assume anything around permissions, but you rather verify that in each step along the way, which is even more of a requirement when we talk about distributed systems and distributed architectures. So the old way of doing it is, of course, you just put a gateway in front of your systems and you’d have the gateway verify the identity of the caller, maybe do a lookup in some permissions table, and then it would just forward that request back to the back end. And the back ends, they would do nothing about that because obviously the request must have passed the gateway on its way here, and that already did this verification. There is one obvious flaw with that model, and that’s, of course, like, once you’re past the gateway, it’s an open highway. There is no verification. There’s nothing. So if anyone or it doesn’t necessarily need to be like a malicious actor, it could just be an internal system running inside of the cluster which would not go through that gateway on the kind of the perimeter of the system.

[00:09:47.240] – Anders
So, of course, if you’re aware of this, you might want to still go through that, but it’s not going to be like a hard requirement. So the zero trust model, it assumes nothing. You just verify identity. And if you don’t pass the identity check, you don’t pass the permission check. That’s where the request stops.

[00:10:11.020] – Ethan
Well, so why OPA? To solve this problem, particularly, you went on a journey there and EndToEnd up at OPA. But if we focus on ongoing authorization, if we focus on zero trust, there’s a lot of entrance in that space. There’s a lot of software and technology that’s been thrown at this problem. What was it about, OPA, that you were like, this is it? This is the one?

[00:10:36.790] – Anders
Yes, that’s a good question. I think for the zero trust model, to work, you need something very lightweight, and you need to deploy that as close to your service as possible. Because what you want to avoid is because you have to do it all the time. You want to avoid the latency overhead of calling a service in another part of the data center or even worse, another part of the world that’s just not going to scale. Like if you had ten milliseconds for any hop to a service just to do offer station, and there’s ten services involved, you have 100 milliseconds of latency overhead, and that’s even before you can have done any business logic. So the way OPA solves that is commonly that you deploy OPA as close to your service as possible, which is commonly called the sidecar pattern if you’re familiar with how Kubernetes does things. So it’s basically running OPA on localhost. When you query OPA from your service, you say you send a query to localhost and you get back a decision, because that’s basically what OPA. It’s a decision engine, so you can offload that from your service rather than having the service determined.

[00:11:59.310] – Anders
Should a doctor be able to retrieve these medical journals? You say OPA, should this doctor be able to retrieve these medical journals? So it’s basically like an Oracle for authorization or policy decisions.

[00:12:14.650] – Ned
Okay. It’s interesting to see how a lot of these zero trust or just these networking models in Kubernetes move to more of a decentralized model. Whereas before you’re talking, you had that gateway or that load balancer that was kind of like the bottleneck for all traffic. And once you got inside the network, you had the hard crunchy exterior and now the soft gooey interior move that if you wanted to use that Firewall, that gateway, you have to send the traffic back out to it. And that added latency and processing time. And then you also had to spend a lot of money on a really big Firewall appliance solution that I’ve seen from a networking perspective is to know, let’s move it as like a sidecar proxy next to the service that’s running. But wait, that sidecar proxy can do more than just networking Firewall rules or whatever. It can now do policy and stuff like that. So I think it’s a really interesting model.

[00:13:08.410] – Anders
Yeah, definitely. And I think it’s a pretty common trend with all these kind of gateway vendors. They’re moving their gateways to be proxies on top of each individual service rather than having this, like, one big monolith component to do that.

[00:13:28.990] – Ned
I want to dig deeper into that. But first, I want to back up a second because I have this habit of zooming away and on things. I want to get a little more background on OPA and where it came from. So who is behind OPA? Who’s developing it and maintaining it?

[00:13:45.850] – Anders
Yeah, that would be my employer, which is Styra. Styra. It’s actually for me. Funny, because I’m a Swedish citizen, and Styra is a Swedish word. Originally, one of the founders of Styra was from Finland, so he kind of brought that word with him to the States when they founded that company, the domain name was luckily available. So stirring means steering or to navigate or to govern, really, which is pretty much what Styra adds to OPA, since OPA is a distributed component. And again, we were talking about having Opus running for 700 micro services, which means basically in a serial trust model, you’d have 700 Opens running in your cluster. So eventually you’re going to want to have a way of managing that. And that’s basically what Style provides it’s a control plane to manage OPA scale for large organizations.

[00:14:53.710] – Ethan
I thought OPA was open source. Was the free open source model or is it also a paid model?

[00:15:00.370] – Anders
There is no paid model for OPA. There’s no enterprise edition, there’s only the open source OPA. And I think to me that was also something important to me when we adopted OPA in our organization. I don’t really like when you have to pay for security or when you have to add things like single sign on or things that are. To me they are like basic requirements. They aren’t things that I’d like to pay for. I can definitely understand. Like if you’re a large organization, you pay for large organization needs or requirements. That makes sense to me. It doesn’t make sense that you should pay just to have a basic feature set for security. So the business model of Styro was appealing to me both on that side of the equation. And of course, now that I’m here.

[00:16:01.320] – Ethan
Well, okay, that’s a fair question. What is Styra getting out of it? Because if there’s money that’s going into open development, how does Styra, if this is truly free open source software and there’s no premium model, what are they getting out of it?

[00:16:17.050] – Anders
Yeah, that’s a good question. So while OPA is the distributed component, it comes with a few capabilities for remote management. So there’s a feature called Bundles, which basically means that you configure your OPA to go and periodically fetch policy and data from a remote EndToEnd. And there is another feature or like one of these management features for decision logging so that each decision OPA takes can be logged and sent to a remote endpoint. There’s a status API to report like the health of your OPA instances and so on. So basically Styra is on the other side of those calls. So it provides a control plane and a management API for Oppa.

[00:17:05.150] – Ned
Okay, yeah, I think that going back to our decentralized conversation where you are kind of sprinkling these pieces throughout your architecture. You need some sort of centralized component control plane if you want to call it that, that’s managing and orchestrating the rollout of policies and verifying that they Azure successful and that you can prove that through an audit trail. If you need to. And that’s usually the thing that people end up having to pay for. So you can try out the decentralized thing. And there’s probably a way to hack a control plane together that you don’t have to pay for. But if you’re doing it at scale, if you’re doing it as an enterprise or if you need to prove to auditors, then the paid version with support maybe makes more sense.

[00:17:48.890] – Anders
Yeah, no, for sure. And I think I tend to like that business model because it doesn’t stop anyone from like, if you’re a small company, have a handful of services, just throw your policies up in an s three bucket, send your logs to, I don’t know, log stash or whatever people are using these days not going to be as nice, but for the requirements of a small organization, you’ve come a long way just using open source tools or whatever cloud providers you have already.

[00:18:22.790] – Ned
Now I’m curious, since OPA is open source and it’s very general purpose, people can do whatever they want with it, and when people can do whatever they want with it, people do really weird things. So I’m curious, have there been some really weird use cases people have tried to add OPA to, and does that impact the roadmap and scope of the project?

[00:18:47.510] – Anders
Oh, yeah, I think for weird cases I think there was one. I saw a repository where someone had used OPA to write like an RPG engine, like a roleplay game engineer. Basically what an RPG engine is, isn’t it? It’s like you provide some input and you get back a decision, so you kind of wheel your sword against the helmet of your opponent and you throw a dice in there and then you get back like, did it hit or not? That’s probably the most odd use case I’ve seen for opaque.

[00:19:32.330] – Ned
Yeah, I mean, you’re absolutely right. When I think about playing D and D, it’s all just a decision matrix with different.

[00:19:37.530] – Anders
Yeah, it is, isn’t it?

[00:19:39.590] – Ned
Yeah, that’s wild. I love that. I hope to find that.

[00:19:43.670] – Anders
Yeah, I’ll send you the link after. I think I can find it for you.

[00:19:47.720] – Ethan
Of the more mainstream use cases. Anders, would you say that OPA is Http, Centric or Azure? There other protocols that it could cope with as well?

[00:19:56.450] – Anders
Yeah, it’s definitely Http centric. I think it’s pretty much had to be given the wide range of technologies that can integrate with OPods. Basically. Like Http is like the lingua franca of the modern times, isn’t it? So Http is the protocol, but what’s sent over the wire is just JSON. So your queries to OPA are JSON and the decisions that come back are also Jason. So that’s kind of how OPA can be used across these wide set of different heterogeneous technologies, because most of them understand Http or can be kind of retrofit an Http client in there somewhere. Http is definitely the common way of doing it. There are a few other models for deployment we have experimented with. There’s a web assembly option where you can compile your policy into web assembly and have that evaluated in any system where a runtime for that is available. And there’s been a few other projects to allow OPA to exist in places where we might not have access to Http or JSON.

[00:21:23.690] – Ethan
That’s the control channel, if you will. We’re delivering JSON payloads over Http between the two endpoints. Whoever’s asking OPA and OPA returning the response. What about the resources that OPA is protecting? Would it be only Httporiented services or could I protect, say, an SSH service as well?

[00:21:46.310] – Anders
Yeah, you definitely could. And there’s even an integration to do that. Precisely. But most of these integrations, they still kind of work by somehow fitting in an Http client there to query an OPA. There’s also the Go. Like, since OPA is written in Go itself, if your service is running Go, you can just use OPA as a library. So that’s another option, of course.

[00:22:16.260] – Ned
No, the Wazam in particular is interesting because we did a whole show about WASM not too long ago. And what kind of jumped out to me is you can package up WASM build into an OCI format and put it in a registry. So you could kind of like publish your maybe I’m kind of spitballing, but you could publish your Opaque WASM images or whatever into a registry and just have them automatically be pulled as part of a deployment and then just have them run. So that’s really neat that you could do that.

[00:22:51.410] – Anders
Yeah. It definitely opens up a lot of doors for us where we might have to struggle to run like OPA the traditional way, like using a server. You might not have like networking or you might not have the resource allocation required to run open in some environments. I talked to some developer the other week where they were talking about like they were writing software for cars and so that’s kind of resource constrained environment, I can imagine. So they couldn’t really run like the Open server there, but WebAssembly might be a viable option for them, right?

[00:23:38.050] – Ned
Yeah, I see. Especially in edge computing deployments where you do have that sort of constrained environment, web assembly seems to be taking off a bit and being able to just Bake it right into there. That’s useful.

[00:23:51.030] – Anders
That is really useful.

[00:23:54.270] – Ned
When I was reading through the documentation, one of the things that jumped out to me is architecturally OPA D couple’s policy decision making from policy enforcement. What does that mean?

[00:24:11.850] – Anders
Yes. So in practical terms it means like OPA can tell you what to do, but it can enforce that because whoever is asking the questions, it’s going to be on them to actually enforce those decisions. So you ask OPA a question and OPA provides you an answer. But what you do with that answer is still going to be your responsibility. If OPA says no, this person should not have access to these files. Like, if you’re using a web service, that might be the correct response might be to return a 403 unauthorized response. In some other context, you might want to contact the authorities or you might want to do, I don’t know. So that’s enforcement, right? What you choose to do with that decision, that’s enforcement. And in some cases, you might even say like, yeah, this is a Dev environment. So even if this is not allowed, we’re still going to do it because we might just love that this was a violation and it’s going to fail in production. So that’s basically the difference between making decisions and then actually acting on those decisions. That’s enforcement. And OPA doesn’t do enforcement because it’s so highly context specific.

[00:25:37.330] – Anders
We don’t know how we should do that. That’s going to have to be up to you.

[00:25:43.960] – Ned
How do you write an enforcement engine that will happen in a car operating system versus something that’s going to be serving up web pages versus something that giving file access? And just like, yeah, don’t do that. Leave that. I can tell you what the decision is based on the policy you’ve written, and then you can do whatever you want with that information.

[00:26:07.240] – Anders
Yeah.

[00:26:08.790] – Ethan
We pause the podcast for a couple of minutes to introduce sponsors strongDM’s Secure Infrastructure Access Platform and if those words are meaningless, Strong DM goes like this. You know how managing servers, network gear, cloud VPC, databases and so on. It’s this horrifying mix of credentials that you saved in putty and in super secure spreadsheets and SSH keys on thumb drives. And that one dock in SharePoint. You can never remember where it is. It sucks, right? Strong DM makes all that nasty mess go away. Install the client on your workstation and authenticate policy syncs, and you get a list of infrastructure that you can hit when you fire up a session. The client tunnels to the Strong DM gateway and the gateway is the middleman. You know, it’s a proxy architecture. So the client hits the gateway and the gateway hits the stuff you’re trying to manage. But it’s not just a simple proxy. It is a secure gateway. The Strong DM admin configures the gateway to control what resources users can access. The gateway also observes the connections and logs who is doing what, database queries and Cube cuddle commands, et cetera. And that should make all the security folks happy.

[00:27:14.100] – Ethan
Life with Strong DM means you can reduce the volume of credentials you’re tracking. If you’re the human managing everyone’s infrastructure access, you get better control over the infrastructure management plane. You can simplify firewall policy. You can centrally revoke someone’s access to everything they had access to with just a click. Strongdm invites you to 100% doubt this ad and go sign up for a no BS demo. Do that at StrongDM. Compacketpushers. They suggested we say no BS. And if you review their website, that is kind of their whole attitude. They solve a problem you have and they want you to demo their solution and prove to yourself it will work. Strongdm.com PacketPushers. And now back to the podcast. So then let’s drill into this. Then describe how we get from policy to enforcement. So OPA is going to give some request or an answer, yes or no, you’re allowed to do this thing. So the requesters got to know how to ask the question which you said is a JSON representation. There’s going to be some kind of a packaged up JSON payload that gets sent to OPA. Opa is going to look at that and look at its policy and know that yes or no, this is approved.

[00:28:39.180] – Ethan
Send a JSON payload response back to the requester, and then the requester is going to go, oh, I can do this, I’m going to allow it or I’m not allowed to do this. I’m going to run my packet filter, disallow this command or whatever it is the request or how does it know how to talk to OPA? Is there an integration?

[00:29:02.650] – Anders
How it knows what type of data to provide? That’s going to have to be agreed on out of band. So since OPA is kind of agnostic to that and it’s also general purpose the way this works. If you take the Kubernetes API for an example, you can have OPA be on the receiving end of a web hook for dynamic admission control, which basically means if you say Cube CDL, apply and you provided a deployment or something like that, the deployment AKS, the JSON model of the deployment is going to be provided to OPA as your input. So that’s going to be kind of the query, is this deployment good to go or not? So OPA then based on the policy that it has been provided beforehand or loaded into OPA along with any kind of environmental data, it could be like a user database. That could be the state of the cluster. You might want to have a policy that says you can deploy an ingress controller, but you can’t overwrite a path of a previously existing ingress controller because that’s going to be a big problem. So you might want to have access to that type of data what’s already in your cluster.

[00:30:21.500] – Anders
So you can use that for your decision making as well. So based on policy, you have and data. So in this case, of course the policy would be that we want to allow any new deployments or any new deployments of an ingress controller if there’s a conflict in host paths or paths and so on. So that would be the actual policy. And then the data is then provided both as part of the input, which is the new ingress and existing data, which is like the current state of your cluster that Colo be one example of decision made based on what’s provided from the client, and in this case, the Kubernetes API. But the client could be a micro service asking an authorization question or whatnot? As long as it’s JSON or Jammel, that’s something we can work with.

[00:31:17.490] – Ned
Right. The key to me with that in the Kubernetes example is there’s already this construct of admission controllers in Kubernetes. Their job is to intercept requests and make a decision, and then they can use OPA to help them make that decision. But that enforcement layer already exists in Kubernetes.

[00:31:41.030] – Anders
Yeah, exactly.

[00:31:42.630] – Ned
Any other thing that you’re trying to integrate it with, it needs to have that pre existing enforcement layer, or you have to write that and then have an integration through that enforcement layer with OPA.

[00:31:53.960] – Anders
Yeah, that’s right. I think OPA predates Kubernetes admission control within a year or two. It wasn’t built with that purpose in mind once. And of course, the Kubernetes admission control, I don’t think was built with OPA in mind, but still, it just worked from day one. Once they released that feature, you could just point out at OPA and it would work just as you’d expect, which is kind of it speaks volumes to the idea of open, really, that you don’t really need to build these specialized integrations. As long as you have Http, you have JSON, it’s just pretty much just going to work.

[00:32:42.170] – Ethan
What if I want to run OPA at scale? That is, I may be asking OPA for decisions for thousands of requests per minute, something like that. Do I have to build out an open cluster or something like that?

[00:32:57.110] – Anders
Yeah, you might need to do that for the Kubernetes model, where the Cube API or the Mission Control, where you have, like, there’s a lot of servers asking you’re pointed at an OPA end point somewhere. In that case, you’re probably going to need to scale out your opposite as well. We don’t really do, like, state when you send something into OPA, it doesn’t modify the state of the store. Like, you just provide some input and you get the response back. There’s no modification done along the way. In general, I think just adding up new instances or having them removed is quite undramatic compared to other components in our cluster.

[00:33:44.870] – Ned
Right. You’re going with that decentralized model if you’re attaching a sidecar. So as your app scales out horizontally, your policy processing power also scales out horizontally.

[00:33:56.580] – Anders
Yeah, exactly. You add a new pod, and that pod contains your service and OPA. So as your pods scale up, you scale up OPA as well. It’s like pretty much done automatically.

[00:34:10.010] – Ned
Okay, now we’ve kind of danced around this a little bit in terms of the policy language, Rhego, but I do want to dig into how it works and what it does. And do you have any background on where the name came from? Why Rego or R-E-G-O?

[00:34:28.790] – Anders
That’s a good question. I know I’ve heard this sometime but I cannot remember what it was. So no, I don’t think I have no, I can’t find it.

[00:34:40.890] – Ned
Well, just generally speaking, can you give us an overview of how the policy language is constructed? And, you know, we’re not trying to learn it in the next 15 minutes, but just how easy is it to pick up and start using?

[00:34:54.770] – Anders
Yeah, of course. First, I think there’s a couple of principles behind, like Rego or the designer Rego, which is basically that a regular policy should mirror a real world policy. A real world policy. What is a real world policy? It’s basically a set of rules. So I’d say that’s, like the main concept of Rego is that you’re working with rules, so you have a policy document or a package, and then you add rules to that policy. So it’s meant to be read pretty much like any other policy. Like if this or that, then either allow or deny and so on. So it’s basically like elevating the if else or if Denshaw.

[00:35:44.690] – Ethan
It’s a really long LS if or a case statement.

[00:35:47.540] – Anders
But it’s basically what it is. It’s turning the if then boss around or just kind of turning it onto his head. So rather than saying like, if that and that is true, then do this, because there’s not a whole lot of side effects in Rhego, remember? So, like the den clause, that’s not going to be where the action is. It’s the actual if that’s the policy or that’s the roof so opaque that turns that around. Regular turns that around, and it says, like, allow is equal to true, and then comes the if all of these conditions are true. And so that’s kind of the anatomy of a rule. You say this should be allowed if all of this is true, and then you can add more. If you have kind of work conditions, you might want to say, like, the request should be allowed if the user is admin or the request is targeting a public endpoint because anyone should have access to those. So the way you’d work with that is you’d have one rule that says allow if admin, and the other one would say, like allow if request, path, whatever is equal to public or, I don’t know, images or some other public, and then you just add more rules as you need.

[00:37:17.910] – Anders
I think that was kind of the basic design of Rhego to kind of mirror the needs of a policy where you have all these conditions. That’s really what the rule is. It’s just a bunch of conditions. Like you require this and that and that. And writing that as a traditional if L faucet is kind of cumbersome.

[00:37:43.650] – Ethan
I’m going to interrupt the podcast for a minute here to talk about it. Training. You remember the ransomware attack on the gas pipeline last year? It caught your attention, probably caught mine. There’s a key thing here. Cybersecurity professionals are in demand to prevent that kind of thing. But there are not enough humans out there to fill all the positions. There’s over 500,000 open cybersecurity roles. You can become a cybersecurity professional if you get some training, some online training. It is never too late to start a new career in it or move up the ladder. It pro. Tv has you covered for your training. They cover everything. Comp Tia to Cisco to EC, Council to Microsoft. They’ve got all of it, including the cloudy stuff, more than 5800 hours of on demand training, and the way they present the information. You know, some presenters are like they’re reading from the book and they’re super boring. That is not ITPro TV format at all. They use engaging hosts that they’re going to present the information in a talk show format and really keep it interesting. And they do it live. They’re live every day. And then once they’ve recorded that live show, it goes studio to web in 24 hours.

[00:38:55.020] – Ethan
As you’re digging through their website looking for content, all the courses are conveniently listed by category, certification, job role. You can find what you’re looking for without a lot of trouble. And then when you pick the thing and you’re ready to go, you can stream ITPro TV’s courses, either the live stuff or the on demand stuff from anywhere in the world via whatever platform you like. Roku, Apple TV, PC or there’s apps on iOS or Android. Learn it, pass your certs, and then get a great job maybe in cybersecurity with ITProTV. Visit ITPro. Tv multicloud for 30% off all plans. Use promo CodeCloud at checkout. That’s ITPro day two cloud. Day two cloud is day the number two cloud. And then use promo code cloud at checkout one more time. Itpro. Tv multicloud and use promo CodeCloud at checkout to save 30% off all plans. And now let’s get back to the podcast. I spent quite a bit of time digging through Rego docs just to get a feel for it, and there’s lots of documentation there, and it feels like a fairly full fledged programming language. Are there tools, IDE style tools that are going to help me write over policies?

[00:40:13.240] – Anders
Oh, yeah, for sure. There’s a plugin for Vs code, there’s one for IntelliJ idea, and I think there’s some like managed by the community. There are some for EMAC, for Vim or all these Editors, pretty much. But both the Vs code and the idea are managed by the open source, like the Open project.

[00:40:36.630] – Ned
Okay. And yeah, looking at the language, it does seem relatively straightforward. You can kind of tell that OPA is Go based because Rigo also has kind of a go feel to it a little bit.

[00:40:49.890] – Anders
At least it’s in the name, at least.

[00:40:59.890] – Ned
Now I imagine these policies can get complex really quickly as you have all these different decision points. Does it have a concept of importing an existing template or module or something along those lines to help you build out a policy locally?

[00:41:19.150] – Anders
Yeah, definitely. There’s a lot of things to help you along the way once your policy starts growing. I think a pretty common pattern is to use healthy rules. So if you have a rule called allow and that in turn. Again, once the conditions start piling up, you want the user to be an admin or you want to use her to have some particular roles if they are trying to access this or that document and the request method is post and so on and so forth. Maybe you want to check for the existence of some headers or whatnot. So rather than doing that, check in each of these allow rules. Do you just create a helper rule where you say like is admin? So the rule name would be like is admin, and then your allow rule would just say allow if it’s admin, and so on. So it’s kind of reads more natural and you can kind of hide away the details of the rule unless you really need to see it.

[00:42:26.560] – Ned
Got you. Could you then import that rule into multiple policies, like have it as almost like a library or a package that you would import to your policy?

[00:42:37.170] – Anders
Oh, yeah, definitely, definitely. So there’s this concept of packages and modules where you can refer to a package in some other file or module and have that imported into your main package.

[00:42:55.070] – Ethan
Lego works like Lego. Ned, see what I did say?

[00:42:58.580] – Anders
Wow, that was connected. All the pieces are falling in.

[00:43:09.790] – Ethan
So, Anders, as I build out my policy and maybe I have used some Lego idea to build this thing from a bunch of different rules that I’ve created. How do I test this thing? I need to start in a Dev environment. Before I go to Pride, I’m going to assume. Is there some recommended workflow for that?

[00:43:27.850] – Anders
Yeah. Again, I think this is like what we’re working with here really is policy as code. That’s the concept we’re working with here. That’s what opened this kind of embodiment of policy as code. So when we start to treat policy as code and not just like a PDF document in some executive or board member office drawer, then you can start to gain all the benefits of working with anything else as code. Which is of course, like, you can work with pull requests, code reviews, you can work with tests, you can work with linters or static analysis of your files, maybe get recommendations like, did you know that you can do this instead? So, yes, that’s one of the main kind of benefits of OPA or policy as code. Like, OPA ships with a framework. A very lightweight framework, I should say, for doing unit tests of your policies. So rather than querying OPA for things for decisions which you’d be doing from your service, you can write a test to say, like, given that I am a doctor, I should have access to this medical record or things like that. And given that I’m trying to access a public endpoint, I should be allowed to do so.

[00:44:52.040] – Anders
Or like, given that I’m just an anonymous user, I should not have access to any of these endpoints. So opaque does come with a unit test framework. And I think at least for larger projects, I think it’s fairly standard that you work with OPA or Rego as you would with any other code. So everything is version controlled. You work with code reviews, you work with unit tests, you automate unit tests, so they’re always run before anything is as part of the pull request. And yes, Rego, but it’s basically that’s the idea behind policy as code, I think.

[00:45:37.430] – Ned
Okay, so you’d have some unit tests for the different rules that go into your policy with the expected output based off of inputs, and then run that unit test and make sure that the policy does what you expect it to do before it rolls into the next phase of testing or even out to an environment.

[00:45:56.960] – Anders
Yeah, exactly. And you can work with even with code coverage. So you can see that this rule over here is not covered by your test. So probably it should be right. Okay.

[00:46:08.540] – Ned
And I’m curious in terms of where folks are checking these policies into. Do you typically find that the policies get stored along with the application that they’re governing or the service they’re governing, or Azure the policies kept in their own repository?

[00:46:28.130] – Anders
Yeah, that’s a good question. I’ve definitely seen both. I tend to like the concept of having like, a policy repo, but that’s more of a personal preference. Another aspect could be like, you want to keep anything related to this application should be kept in a single repo. But then again, one of the core ideas behind us is that you decouple your policies from your application logic. So the application logic isn’t necessarily or there’s not necessarily that kind of hard coupling between policy and your app. So at least to me, it makes sense to keep your policies separate from that. And also, once you start to have a lot of services, you’re going to want to have common rules or common like libraries or regular that you can import and use. I think there’s still going to be an element where you want some code to be kept separate from that of your service mesh.

[00:47:36.540] – Ned
That makes sense. And depending on who’s writing the policies, maybe they’d rather have you know, if it’s your security team or your Ops team who’s actually writing the policies, maybe they want to maintain their own repository of policies and let the application folks do their app Dev stuff and then meet in the middle somewhere, I guess in the pipeline and have a big yeah.

[00:47:58.000] – Anders
I think you might not need to choose one of the two, but kind of find some options in between. There where you have some policy kind of centralized and some responsibilities still kind of distributed out in the teams.

[00:48:12.950] – Ned
Yeah, I’ve had some interesting similar debates when it comes to infrastructure, AKS code. And do you store the IAC with the application that it’s supporting, or do you put it in its own repo? And again, it’s like, well, it depends how’s your structured.

[00:48:28.150] – Anders
Dang.

[00:48:28.550] – Ned
It can’t we just have a solid answer to anything? I guess the answer is no. Well, this has been fascinating, and I think I really want to dig into OPA some more and maybe even try it out with my personal favorite, which is TerraForm.

[00:48:43.250] – Anders
But yeah, of course you should.

[00:48:46.750] – Ethan
It’s worth mentioning that we’ve talked about things that have been about security and security focused kind of stuff. But there’s just so many other things that you can make up to do if you want. And you mentioned TerraForm. This is actually one that as I was digging into the topic, just grab my attention. So I’m going to read Anders from the documentation here, the TerraForm Use case for OPA. Terraform lets you describe the infrastructure you want and automatically creates, deletes and modifies your existing infrastructure to match.

[00:49:15.410] – Anders
Right.

[00:49:15.810] – Ethan
Okay, we know what TerraForm does. Opa makes it possible to write policies that test the changes TerraForm is about to make before it makes them. We don’t have time to dive into that now, but I just want to throw that out for the audience to go. If you’re only thinking in terms of security. No, there’s more to open here and more interesting use cases than you’re considering. If all you’re thinking about is authentication and authorization kind of stuff. Super cool tool, really powerful and impressive.

[00:49:43.370] – Ned
So if folks want to know more, find out more, dig deeper into OPA. Where would you point them? Where’s the best places to look?

[00:49:51.970] – Anders
Yeah, sure. I think the Open website with the Open docs are a great place to start if you want something more like hands on for learning Rigo. There’s also the Styra Academy, which is a free resource that provided by Stirring, which is an online like more video based content tutorial with quit style tests and so on. So that’s another good resource. But I think the opposite website. We have a slack as well. I think there’s like 6000 users or so. It’s quite a vibrant community. So yeah, if anyone has questions or so, I’d be happy to talk there.

[00:50:30.290] – Ned
Yeah. And if people want to find you, do you have a Twitter handle that you’re active on or are you active on LinkedIn?

[00:50:35.830] – Anders
Yeah, I’m both. It’s just my first name and followed by my last name, so that’s pretty simple enough. Anders Ignite. All right.

[00:50:45.530] – Ned
Well, we will include that in the show notes as well. Anders, thank you so much for appearing as a guest today on day Two Cloud. And hey, listeners out there, virtual high FIEs to you for tuning in. If you have suggestions for future shows, we would love to hear them. You can hit either of us up on Twitter at day Two Cloud show. That’s the handle that we track. Or if you’re not a Twitter person, you’re just not down with that.

[00:51:08.460] – Anders
That’s cool.

[00:51:09.120] – Ned
I got a fancy website. It’s Ned in thecloud.com go to the contact form and put the info in there. Did you know that you don’t have to scream into the technology void alone. The Packet Pushers podcast network has a free Slack group open to everyone. You can visit PacketPushers. Net slackandjoin. It’s a marketing free zone for engineers to chat, compare notes, tell war stories. You could talk about your latest OPA policies if you want. You can find that all@packetpushers.net Slack. Until next time, just remember, Cloud is what happens while it is making other plans.

[00:51:51.770] – Ethan
Welcome to the Tech Bites portion of today’s day Two Cloud episode. Vmware is our sponsor, and we’re discussing the Realized Network Insight Universal new keyword. Universal. We’ve talked about VMware, about V Realize Network Insight before. And today’s focus is on the SAS version of V Realize Network Insight and how it can help you with your cloud migration project and some new features that have come up along the way. You definitely have a cloud migration project. We know you do. You’ve been working on it for quite a while. Yeah, that’s the one. You know, our guests are Martine Smith and Seijungha to talk to us today. And Seijiang, let’s open up with you. We have a new keyword here and the product name, Universal. So what does the Universal mean in V Realize Network Insight Universal.

[00:52:37.950] – Sehjung
Yeah. Thanks, Ethan. Thanks, Ned. So basically, Universal means we’re trying to keep it a lot simpler for the network practitioners out there that are trying to maintain their infrastructure, building out their networks, building out their infrastructure. So we’re trying to make licensing is just one thing. Of course, all the technical things they have to deal with, but we’re trying to make it simpler so that they just have to worry about one thing, which is the solution that they want to implement. And for customers that have maybe on premises for network monitoring or they want to move to a SAS based version of network monitoring, Universal will handle all that.

[00:53:18.200] – Ethan
So customers can see the universal part. This is a licensing change that we’re talking about. We are simplifying licensing. Is that what you’re getting at?

[00:53:25.880] – Sehjung
Yes, that’s correct. So basically just keeping it a lot simpler so they don’t have to worry about it because it’s a lot more flexible. And in terms of purchasing or consumption, they just do the universal licensing and we can monitor things like desktops. We can monitor the cloud infrastructure, native public cloud before they would have to sort of have different CCU units, for example, or different VDI units. They would have to have different vCPU for public cloud. So you just get the one license and it handles everything now. So then it’s just one thing they don’t have to worry about. And then also, we know customers have maybe an instance of onpremises deployments where they deployed, let’s say the solution on premises, or they want to do it as a SaaS, so they don’t have to worry about upgrading, for example. So a lot of the network monitoring that they have, they deploy on premises. But then every now and then, if you want new features or functionality, you have to upgrade it. So this way, as a SaaS solution, you don’t have to do that anymore. You can just run the solution and then just as a SaaS service, all the features and functionality, upgrades, improvements, patches, they all just automatically happen on their solution.

[00:54:45.170] – Sehjung
So that’s kind of the universal simplification we’re talking about.

[00:54:50.260] – Ned
Okay, so it’s not just licensing. I mean, the licensing is obviously very important. I don’t have to buy all these different and disparate licensing to make sure I’m properly trueed up on all the different places I’m deployed. I get the Universal license. I know, I’m good. But then you added that extra layer that it’s not just about the licensing of the endpoints, it’s also the solution itself. I could run it on Prem, I could run it as SAS. And you mentioned a few reasons why you might use SAS. Are there some other benefits to using the SaaS version over a traditional on Prem installation?

[00:55:22.860] – Sehjung
Yeah, there’s lots of benefits. We’ve done a lot of things in terms of security. So for the SaaS, those are things you don’t have to worry about or Harden on yourself on the plants. You don’t have to worry about the sizing of the appliance because we have different brick sizes. You don’t have to worry about powering it. So there’s a whole bunch of things that will SaaS that you’ll see that are improved, especially the upgrades, the patches, and then also just getting that feature velocity. So we’re doing feature releases every three months, every quarter. So rather than taking some time and scheduling your weekends to sort of get those latest features, you don’t have to worry about it. It’s on the SaaS side. It’s all taken care of.

[00:56:09.480] – Ethan
Now, it may or may not matter a lot, say Jung. But if I’m using the SAS version of this product and I’m in the middle of a cloud migration, is there some advantage where if I’m on Prem, it’s a little harder? And if I’m using the SAS flavor, it’s a little easier to begin dealing with my workloads and monitoring them as I move them up to the cloud?

[00:56:28.000] – Sehjung
Yeah, absolutely. So one of the things that we added into the sad solution for February Network and Universal was this concept of Federation. And you’re going to be able to see everything in one dashboard versus looking at different consoles you might have in your different regions to see what’s happening. So the information is not siloed in terms of troubleshooting, so you can see everything end to end with our SaaS solution.

[00:56:55.910] – Ethan
Mark, I think that’s a time for you to chime in here, because I know cloud migrations are something you’re pretty familiar with. So help us visualize this. If I’m using Vrealized Network Insight Universe, that’s a mouthful, guys. We realize Network Insight Universe. I don’t want to stress that that’s a big product name, but if I’m using that product for my cloud migration and SaJung just introduced this idea of Federation, help me understand what that monitoring infrastructure is going to look like.

[00:57:24.290] – Martijn
Yeah. Serverless network inside Universal. In the Mouthful, I typically call it VR NYU to make it a little bit easier on myself. But basically what Seijung just described is when you are in the middle of a migration or starting a migration. Right?

[00:57:39.820] – Martijn
So you have on Prem infrastructure, you’re monitoring that, you’re creating a migration plan. As you said in the beginning, like you’re mapping out the application landscape. You’re looking at what talks to what, which is what Fear and I will tell you. But then you slowly shift those workloads, shift those applications from your on Prem environment to the cloud, whichever cloud that is, and then Via and I, you will actually move with it. So, like, the Universal licensing makes it possible so that you start fully on Prem and then gradually move towards SaaS, and then the Federation feature that we unlock with Virus Inside Universal is basically a way to bridge the gap between the migration when you are still half on Prem and still have in the cloud, for example, and have different V and Eye instances to monitor those environments so that license pool will move with it. And then the Federation feature itself is a dashboard that you can get into with the fiery Icloud instance where you can see how is my VR nine instances behaving that’s one so you can monitor via Night itself. So if you’ve got multiple of those instances, you want to make sure that they’re healthy.

[00:59:02.090] – Martijn
But also how is the infrastructure that those Virginia instances are monitoring doing so, like the example that Seijang mentioned, we have a customer that went from a global footprint. They had data centers in America, EMEA and APJ, all different Fiat instances because you want to deploy those as close to the infrastructure as possible. And then they merged all of those data centers into their cloud infrastructure. So they moved their workloads from those on Prem data centers to the cloud. They both used Fear and I to do the migration, and we can talk about that a little bit. But they also used Fear and I Universal in order to make sure that they can seamlessly flow all of those instances and then migrate them to the SaaS solutions that we have so that they didn’t have to worry about what licenses do I have where they just have one big pool. That’s the entire premise of universal.

[01:00:05.820] – Ned
Okay. And I think for a lot of organizations, they’re never going to be fully in the cloud. Right. They’re going to be in this sort of intermediate step. Well, some stuff is going to stay on Prem. Some of it is going to migrate up to the cloud, and being able to monitor that status of the applications I have in both locations is going to be huge. In one Federated dashboard. Martine, you mentioned something about how it can make migrations simpler. Can you dig into that a little bit more? Because I’m curious what aspects of Vrniu would help make the actual migration of things simpler?

[01:00:45.050] – Martijn
Basically, what VR Night does as a product, it’s uncovering all of the things that are running within the network. So it has application discovery, for example. So you can look at the visual workloads or the cloud workloads in order to make up what an application consists of. And that’s like a typical application model. So you have an application name with a tiering model like app, Web database, and then the workload within those tiers. Right. That discovery is huge within your night because typically you’re looking at your CMD or you’re looking at your feature environment. And if someone knows exactly what is running within their infrastructure, I’d like to meet that person because I’ve never been able to put my finger on a company that actually knows what’s running. Right. So application discovery is huge in that sense of uncovering what’s actually running, but then also mapping out dependencies within that application landscape. So it will be able to tell you or if you’re not able to tell you that this application is talking to these other applications, but also these end users, those IoT devices, those printers, those cloud workloads. So every single connection is basically locked within fear.

[01:02:05.810] – Martijn
And I. And you’re able to uncover that pretty easily by using that application discovery and creating those groups within fear. And I to show those connections. That’s one. But then also it goes into how much data is being sent between these applications. So if you’re targeting a piece of your infrastructure, like 2050 workloads within your infrastructure that you want to pick up in one migration window, you will be able to group those into a migration wave, as we call them, and then you will see the actual requirements for that migration wave. You will be able to tell, like, how much bandwidth do I need between the destination cloud and my on Prem when I picked this migration wave up and actually migrated them to the cloud. So you will be able to see what kind of Internet traffic is going through these these workloads. So also kind of predicting your Egress traffic, which comes back to cost management a little bit even. But you will also see the amount of traffic that you need between the end users and those applications.

[01:03:16.290] – Ned
Right.

[01:03:16.480] – Martijn
So if it’s an internal application, you also want to scale out your data center interconnect properly before you hit the Migrate button. Incessantly.

[01:03:27.650] – Ned
Right. So this really helps you plan out not just the migration of the applications itself, but the network infrastructure that’s needed to support it at both ends and for the end users. Because I’m just thinking about an internal app that everybody’s accessing because they’re maybe kind of going in through the same remote office connection to a central point. Now that application has moved up to the cloud, am I going to send my users through that central point and then up to the cloud or is there a way to just send them directly to that cloud instance? So all that traffic isn’t flowing through my data center now? The application is not even there.

[01:04:02.870] – Martijn
Exactly. So fear and I can uncover the requirements that you need. When you migrate an app or just looking at the specific application, it can tell you how much network capacity it needs. And then based off of that information, you can indeed create your network diagram architecture in accordance to what the requirements are, instead of kind of fat fingering it and trying to just see in production how much network traffic will come up.

[01:04:35.360] – Ethan
Well, Martine, if folks are interested in learning more about V Realize Network Insight Universal, what’s the best way to go about that?

[01:04:43.260] – Martijn
So I think we’ll put a bunch of links in the show notes, but basically my favorite way is to just play around with it. And there Azure two ways to do that. One of them is using the free 30 day trial that we have. So you can just sign up for VR Night Cloud, get an instance yourself and then deploy a collector and then you’re off to the races. You can start monitoring the environment or you can try our hands on apps which is basically a simulated environment where you have a complete demo system to your availability and you can just play around with the UK itself without having to set anything up except logging into the Hol website.

[01:05:23.870] – Ethan
Excellent. And we will have the links to all of those things in the show notes. You can find those@packetpushers.net or multicloud IO. So our thanks to VMware for sponsoring today’s Tech Byte segment of today’s Day two Cloud episode. And if you ring up VMware to take the realized Network Insight Universal out for a test drive, make sure to tell them that you heard about it on day two Cloud, part of the package pushers podcast network. And until then, just remember, Cloud is what happens while it is making other plans.

More from this show

Episode 145