There are several guiding concepts that make it easier for organizations to build a Zero Trust strategy. The first that typically comes to mind comes from CISA and NIST. These core elements, ranging from the five pillars through to building a ZT architecture, offer a vendor-neutral path towards removing implicit trust. Organizations like CSA also do a great job of expanding upon this knowledge with more contributions from technology and service providers. This week, we take our first step towards understanding what goes on behind these policies, standards, and recommendations, and for that, we have a well-equipped guest to walk us through it.
Zack Butcher is one of the founding engineers over at Tetrate, a vendor that provides a consistent way to connect and protect thousands of individual microservices and deliver Zero Trust security operations across any environment. They also have their roots stemming from a team that worked at Google, which many of you are likely familiar with their connection to Zero Trust through BeyondCorp. However, he is also the co-author of NIST special publication 800-207A. If that looks familiar, it’s because it’s an expansion of the earlier mentioned core NIST resource, 800-207.
NIST SP 800-207A builds upon that core architecture piece and hones in on access controls in cloud-native applications in multi-cloud environments. That is a bit of a mouthful, so here is Zack on what you need to know.
When we talk about Zero Trust at runtime, there's a lot of FUD and a frustrating amount of FUD in the in the marketplace and a lot of vendors claiming certain things are Zero Trust and not.
And you know, in that landscape, I wanted to really kind of push for people to have a very clear definition of Zero Trust at runtime, and it's a minimum definition. Let me be clear. You can do a whole lot more than what we talk about in the SP, but I try and give a very, very simple minimum definition. And that is five policy checks at runtime, and we call that identity based segmentation.
Butcher also co-authored NIST SP 800-204A, which focuses on building secure microservices-based applications using service-mesh architecture. So this week, Neal and Butcher ran down the rabbit hole of expanding upon these core Zero Trust resources, implications of a more secure environment at runtime, and identity-based segmentation.
Identity-Based Segmentation: Zack emphasized the significance of identity-based segmentation as a fundamental aspect of Zero Trust. Implementing five key policy checks at runtime can help organizations effectively bound an attack in space and time.
Five Policy Checks: The five essential policy checks for identity-based segmentation include encryption in transit, authenticated service principles, access policies, end-user authentication, and resource authorization.
Zero Trust at Runtime: By following the minimum definition of Zero Trust at runtime, organizations can enhance their security posture. This includes implementing encryption, authenticating services, and authorizing access based on specified policies.
Microservices and API Security: Zack highlighted the growing importance of APIs and the need for a new approach to API security. As organizations shift towards service-oriented architectures and rely more on APIs, ensuring secure communication and implementing proper controls become crucial.
Complementing Microsegmentation: While microsegmentation plays a significant role in network-oriented controls, it should be seen as complementary to identity-based segmentation. Organizations can relax lower-level network controls in exchange for tighter identity-based controls, achieving greater agility without compromising security.
After a brief hiatus so Neal could travel and I could readjust to the midnight… 2 a.m…. 4 a.m… feedings with my newborn, we are back in action until the holidays hit. We have a few new episodes in the can already, with several interviews on the schedule, which should get us til mid-December when we wrap season two. As an aside, I am working on two new podcast pilots, one of which is designed for those trying to break into cybersecurity. The other may be of interest, too, but is more focused on the broader aspects of personal growth. I’ll share more once the first episodes go live.
More from Neal: Lastly, while on hiatus, Neal had a chat on the What's The Problem with Mike Krass series and dug into the three core tenets of Zero Trust. You can check that out here.
Identity-Based Segmentation: The Foundation of Zero Trust
Those five the identity based segmentation, those five checks, the first time I actually ever implemented those was at Google. We didn't call it that at the time, but that's what we did. And so I was actually there in cloud. So in GCP was the first part where this hit.
Digging in further, Zack recommends these policy checks at runtime:
Encryption in Transit: Ensuring secure communication by encrypting data while it's in transit.
Authenticated Service Principles: Authenticating services to establish trust and prevent unauthorized access.
Access Policies: Defining access policies to control and restrict resource access based on user roles and permissions.
End-User Authentication: Verifying the identity of end-users before granting access to resources.
Resource Authorization: Authorizing access to specific resources based on user roles and policies.
Zero Trust at Runtime: Enhancing Security Posture
Implementing zero trust principles at runtime is crucial for organizations aiming to enhance their security posture. This involves implementing encryption, authenticating services, and authorizing access based on specified policies. By adopting these practices, organizations can minimize the risk of unauthorized access and potential data breaches.
Microservices and API Security: Addressing New Challenges
Zack highlighted the growing importance of microservices and APIs in modern application architectures. As organizations increasingly rely on microservices and APIs to build scalable and modular systems, ensuring secure communication and implementing proper access controls become paramount. Service mesh provides a powerful toolset for managing and securing microservices, enabling organizations to effectively address the unique challenges of API security.
Complementing Microsegmentation with Identity-Based Segmentation
While microsegmentation plays a significant role in network-oriented controls, Zack stressed the importance of complementing it with identity-based segmentation. By adopting identity-based controls, organizations can achieve greater agility without compromising security. This approach allows them to relax lower-level network controls while maintaining tighter identity-based controls, resulting in a more flexible and secure environment.
Zack Butcher's insights shed light on the practical implementation of service mesh and zero trust principles. By focusing on identity-based segmentation and implementing the five essential policy checks, organizations can enhance their security posture and minimize the impact of potential attacks. Additionally, understanding the evolving landscape of microservices and API security is crucial for organizations seeking to maintain a robust security framework.
To learn more about Zack Butcher's expertise and his involvement in developing NIST cybersecurity standards, listen to the full episode of the AZT podcast.
This transcript was automatically created and is undoubtedly filled with typos. As usual, we blame the machines for any errors.
Elliot Volkman: Hello, everyone, and welcome back after a short hiatus for AZT. I am your sleep deprived producer, Elliot Volkman, along with our hopefully well rested, well-vacationed, maybe well vacationed host, Neal Dennis along with Zach Butcher, the founding engineer over at Tetrate, but he also contributed to two NIST elements that we're going to be discussing a bit today. And then also just some of the other items. Considering he works with a technology vendor that wraps around Zero Trust, I'm sure you're more than equipped to be able to kind of have some conversations around what the market's looking at.
So we'll jump between the two of those. But that said, Neal, how's it going? How's that? How's travel?
Neal Dennis: been a busy couple of weeks, man. Travel was great, vacation was great, unlike you sitting there with, you know, a tiny hoodlum that you have to manage that's screaming in your face, feed me, feed me. But yeah, I wish I could say I was well rested. Managing a conference right now for shameless plug, but Texas Cyber Summit here in Austin.
I, I haven't slept in literally two days. So this will be a fun conversation, Zach. I, I am, I am leaving you guys all to get out, brother. So we're good to go.
Zack Butcher: It'll be a good one. I'm looking forward to it.
Elliot Volkman: So Typically it's this is a good point for us to let Zach introduce himself just to give a little bit more than what I can see on LinkedIn. Zach, obviously you have done a little bit more than be the founding engineer for a startup, but you have a pretty interesting background.
Obviously you worked over at Google, but some other spots. So maybe tell us a little bit about yourself and how you found yourself in your shoes. Thank you
Zack Butcher: Yeah. Yeah. I actually, funnily enough, my career started all the way back at a company that most people never had heard of before. But now I'll say the name and just a 2nd, and probably everybody to listen to this podcast will recognize it immediately. My 1st job out of college was actually working for colonial pipeline company.
Now, I didn't do any security stuff there, but of course, everybody immediately perks up is that was the company that had the ransomware attack that triggered the executive order mandating the zero trust stance. You know, for me, that was kind of a funny full circle thing to have happen, right? I actually was already working with NIST on microservice security standards at the time, then the Colonial Pipeline breach and ransomware event happened.
And so it was kind of funny for me, though, to look back and see you know, obviously I chatted with some of my old buddies there and they were, they were not having a good time. But, you know, for me, that was, that was cool. After that, I, I moved and worked in Google on Google Cloud. I, I kind of jokingly say I built all the enterprise stuff in GCP.
So if you go to Google Cloud, you make a project. That was my baby for a long time projects. And I was there on the team when we ship the organizational hierarchy orgs folders. The I, the identity and access management system with hierarchical permissioning there a lot of that stuff, the service management service, we then as part of that team rolled out the service mesh architecture to 100 percent of workloads at Google. And that was in in from 2014 into into 2017 or 2016 or so. We finished that up and Around that same time, obviously, Kubernetes had was hitting its stride, and we were seeing a lot of networking oriented problems, a lot of security problems, and eventually into some of the zero trust things we'll talk about the minute out in the space.
And obviously, we had solved a lot of those inside of Google in various different ways. But with that service mesh architecture question minds, that's when we created, and I was 1 of the earliest engineers to join the team. They're, they're Google, where we you know, kind of brought the service mesh architecture into the fold.
So jump forward and that's kind of what I do a lot with, with Tetra today. So I was one of the founding engineers of the year. I was one of the you know, our, our CEO and co founder was the original product manager for Istio over at Google. And I wear a bunch of different hats. I work on Envoy community things.
I work on service mesh stuff. But, you know, most relevant here I help a lot of large organizations adopt service mesh and, and some of the security posture they're in. And I work with the federal government on two sets of SPs, on the SP 800204 series, that's the series that offers guidance on microservice security.
And on just very recently, in fact, about two weeks ago it was finalized. It's been in draft review for a few months. SP 802 0 7 A, which is the most recent installment in the Zero Trust series. A lot of folks may be recognized. S SP 802 0 7. That's the Big zero trust granddaddy one that was mandated by the executive order and, and all that.
2 0 7 A is the first ins or the next installment in that series. Much more narrow focus. And we can definitely dig into that today.
Elliot Volkman: Yeah, and I honestly, I think that's a really good starting point for us to kind of dig into and then, yeah, Neal, I'll let you kind of go down whichever rabbit hole you choose, but yeah, maybe you can give us a little bit of context of what is contained in there, how it advances what has previously been published.
Zack Butcher: Yeah. So obviously, you know, 207 and the existing stuff that's been published by CSUN around it and the cybersecurity framework. There's a huge amount of material. Zero Trust is a huge area topic, right? 207A, like I mentioned, very focused. And in particular, we focus only on runtime. And I would argue that's maybe the easiest part, right?
I know you've had past speakers on the show. Talk to you know, the cultural change, the people process change. That's the hardest part. However, when we talk about zero trust at runtime, there's a lot of foot and a frustrating amount of foot in the in the marketplace and a lot of vendors claiming certain things are zero thrust and not.
And you know, in that landscape, I wanted to really kind of Push for people a very clear definition of zero trust at runtime, and it's a minimum definition. Let me be clear. You can do a whole lot more than what we talk about in the SP, but I try and give a very, very simple minimum definition. And that is five policy checks at runtime, and we call that identity based segmentation. So there's kind of three ideas that the SP introduces. Identity based segmentation is the first and the most important one. And those five policy checks at runtime, hopefully y'all will nod your heads along when I say them, because I think, you know, for this crew, they should be pretty self evident.
One, we need encryption in transit, and we need that for message authenticity and for eavesdropping protection, right? So we want to make sure nobody can eavesdrop and nobody can tamper with the messages that we're sending. Second, we need an authenticated service principle. So we need to know what are the software systems that are communicating, and we should use that authenticated principle to authorize the access.
We should have an access policy that says the front end can call the back end, the back end can call the database, the front end cannot call the database. Then the additional two policies that we want on every hop are, in addition to those three, are the end and end user needs to be in session and authenticated.
And we need to perform authorization on the resource that that end user is accessing in the, in this context in this session. You know, hopefully, you know, nothing, and especially the user side of it should be very straightforward. You know, hopefully everybody, you know, and certainly if we've been in government space, we've been doing things like FIPS compliant encryption anyway.
You know, hopefully, maybe the only novel pieces are some of the and even then it's not new new is that that service authentication and authorization, right? Fundamentally, the reason we propose those 5 is a minimum runtime check is because, you know, the heart of zero trust is the idea that the attackers in the network, right?
So if you have a perimeter based control it can already be bypassed. I happen to give a talk about 207A in D. C. in, in downtown in the Ronald Reagan building, in the Federal Triangle, and that morning earlier this year, and that morning in fact, we found out, the, the, the U. S. federal government released that China had compromised multiple networks in the Pacific, right?
So now it's a beautiful case study, and a motivated attacker can get inside your network, right? I promise. And so then, the question from a security perspective becomes, how do we mitigate the risk? Right. How do we bound an attack in space and in time to minimize what they can do? So that's the whole ballgame that we're playing.
So my argument is that those 5 runtime controls, and we make this argument in the SP, is that those 5 runtime controls help you effectively bound an attack in space. With authorization policies to limit pivot and in time with ephemeral credentials that that need to either be re re stolen or will expire out.
Right? So you either need a persistent attack and you're limited in what you can pivot there or you need to continually perfectly. Perpetrates on the theft of credentials. And so all of it's about shrinking the gap, right? Shrinking, you know, we need to, we want the attacker to have to get through Swiss cheese, you know, the holes in the Swiss cheese, we need to go as many layers as possible to make it as hard as possible to get through those gaps.
That's the goal. Identity based segmentation, those 5 checks, we argue absolute minimum that you should be doing. And if you're doing that, then I would argue you can call it a zero trust system. One other little piece I'll give you is the mental framing, the attackers in the network. The other way that I try and frame that for people to think about is, suppose that I can just pick a system any software in your, in your infrastructure and expose it to the internet.
What is the impact of that? And when you can tell me with a straight face that the impact of that is, is, the risk is minimized, because I have a variety of other controls. So it's not catastrophic that this is internet facing, then I would argue you're, you're, you're in the right ballpark for zero trust. And so I would argue those five policy checks, authenticating, authorizing the applications, communicating, authenticating, authorizing the user, having that all encrypted should give you a pretty strong guarantee that if I'm going to go expose this application to the public internet, it hopefully is pretty safe, right?
That's the mental model we want to operate in.
Neal Dennis: Yeah, that's. I'm going to start poking a little bit.
Zack Butcher: Please.
Neal Dennis: No, so I think that's awesome. I mean, that's, that's a cool review of the deal. So it also saves me from having to go back and try to read it now. So
Zack Butcher: Fortunately, it's a little bit easier to read than most, too. I will, I'll kind of pat myself on the back here. I thought, you know, but it's only about 12 pages of content. So one thing, when I say it's focused, one of the goals here is that, like, 207A is a couple hundred pages. 207, we're trying to keep these things short and sweet and focused so it's easy to read.
So please, and I say that as a, hey, please, if you're a practitioner in this space, go check it out. This is one of the easiest ones to read. So sorry, a little plug there.
Neal Dennis: I also want to say this is really, you know, we, we've had some authors, we've had some book writers and some other stuff. We had some people who have contributed policy, but I don't think we've ever really had like a legitimate person on that. That's actually is the policy, right? That the person who, who authored wrote.
Created guided this. So especially at this level of a standard. So first off, one, I think that's pretty epic. You know, whether all the stuff you're sharing relative to how that works. Thank you for all that. Because I think this is really cool to have someone with. This cool, unique perspective, and I know that's partially why Elliot made sure we had this chat.
That being said, I'm very curious about so we talk about the standard, we talk about some of the fun stuff, and you talk about services, right? And I'm, I'm going to kind of bridge the gap here between this and a little bit of your company background, but I'm very curious about the microservices and, and quote, micro segmentation and things like that, right?
So I know your company's set in house. That's a focus, right? I get that. So I would love to start with. It's your official definition of microservices, because while it is technically a clean definition, let's be fair, that it's, it's something that fuzzes around. So I would love to hear that first, and then, you know, depending on where you go with this, I might, I'm going to bring us back over to the, to the standard that we're talking about as well, potentially.
Zack Butcher: Yeah, yeah, yeah, for sure. So services itself. I actually hate the term microservice. I think that that's kind of a misnomer that I also just to be clear, I hate the term zero trust. And in fact, I listened to some of your earlier podcasts and you mentioned Zero Implicit Trust. I actually tried very hard to change the name to Zero Implicit Trust.
That's the, the right thing. So bravo on that one. Hate, hate that name. Hate the name microservice as well. That's not the important part. Micro puts the emphasis on the wrong thing. The important part is not the size of the service. The important part is the mode of communication. In a monolithic or a non service oriented architecture, we tend to communicate over procedure call.
Right. Local and and a service oriented architecture. We now have remote procedure calls that are in RPC. It changes everything. Right and for me, it's not microservice service, whatever the important part is that this is a this is a thing that exposes the contract over the network that it's that it's enforcing and it's implementing.
Right? So that's that for me is the more important piece rather than than like, what is a microservice or not? Right? What is a service? It's it's an application that we're running that exposes some interface over the network to be consumed by other parties. Whether that's a 1st party in the same word, 2nd party, like a partner API that we're going to trust each other a little bit or something like a sass.
That's a 3rd party and untrusted consumer, right? Regardless, doesn't doesn't really matter. The mechanics are the same. And in a lot of respects, actually, the, the. The thing that we need to protect and the controls we want to put in place are going to be very similar. I would argue that actually when you're in that world of everything as a third party, that's another way that you can use that definition of zero trust, right?
Neal Dennis: cool. Thinking about this a little bit, so first off, awesome definition. Thank you very much for, you Unintentionally agreeing with my brain pan on where this should go. I love that. So I'm kind of curious. So when we think about, you know, your five tenants here and, and the goals here for how all this, this really meshed together at this nice, cool strategic level that you've explained.
One of the things I like to keep coming back to is, and we talk about these, these services, these programs and, and remote and stuff, and everybody always thinks API, right? So I, that's obviously very front of mind, peace and process. So one of the things that, that I've seen the last year or two. Has been You know, focused on, on either revamping how API works or at least becoming more aware of what your APIs are doing, right?
So from your perspective, you know, what's your kind of hot take on this based off the tenants that you discussed around where API is going to potentially go from a security perspective or, or just a utilization perspective and what people should think about that. I'll caveat this by saying, I'm a firm believer that there needs to be a net new change in how we approach API.
Not just the security, but just the overall construct of what that is. So I'll throw that out there.
Zack Butcher: Yeah, yeah, I actually, I would probably agree with that, right? So like API is going to be the lingua franca that we taught, right? So we're going to move, and this is kind of part of the whole Purpose. Let me actually, I'll come back to that in just a second. Let me touch on micro segmentation. He asked about earlier because this plays into them that whole idea that we're moving up the stack.
And you know, micro segmentation is fundamentally a network oriented thing when we get into A. P. I. Surface and A. P. I. Security. Now we're different nouns, right? We're moving up, I would argue. And so I think that will be the key noun in the future that matters critically. Let me come. Let me talk on micro sec first and I'll come back to that.
So micro segmentation is a really important point. Capability, right? It can provide quite a lot of controls. There are quite a lot of assurance, but again, it's fundamentally network oriented. And so you're placing implicit trust because of the privileged location that you reside out. I'm going to allow you to do a thing.
Right? So obviously that that's in in tension with the definition of zero trust that I gave and that definition of five things, right? Identity based segmentation didn't talk about any of that. That's actually the 2nd big point that we make in 2078. So we introduced identity based segmentation. I told you there are 3 things that we talked about.
The 1st is identity based segmentation. The 2nd is. What we call multi tier policies, or maybe, you know, more practically what we call existing in the real world with real policies that are already present. Right? And so we make the case that you're actually, you can't, you know, yes, you should do those five policy checks.
And if you do those, those should be enough, but you probably can't get rid of your network oriented controls today. And there's a bunch of different reasons for that, right? The auditors and the regulators don't like it. You have a piece of paper written in 1994 that's your corporate I. T. security policy that says that you need a perimeter and changing that paper is more expensive than paying Cisco for firewalls.
Right? Like there's a bunch of reasons that we have this crop that we can't just get rid of network oriented controls, nor do I think that would be a great idea, right? Even if we can, it can have supplementary controls that take their place, there's still a value in defense in depth. And so how do we trade off and how do I see micro segmentation versus this stuff?
I see them as complimentary. But the key point that we make when we talk about multi tier policy is that you should feel justified in relaxing. Lower level network controls in exchange for. Tighter identity based controls. If that gives you more agility as an organization. We actually have a picture in the example that I use, and it appears in the SP as well. Let's think about going from on prem to cloud, and we typically go through a firewall. We probably go through 2 firewalls. We go through an outbound firewall from the on prem side, and we go through an inbound firewall on the cloud side. Right. And so now, if I want to consume, let's say, a SAS database or or maybe part of my organization has shifted into cloud, and I'm still on print, and I want to consume functionality there, the way that most organizations do that is with a pairwise firewall new app wants to consume it on prim.
That's a new firewall rule to allow the connectivity from its micro segment to wherever the thing is deployed on cloud. Right? Historically, that's a big cause of slowdown. In organizations, it's very common that with a lot of financial institutions that I work with, I'll ask this question because I know the answer already.
And I say, how long does it take you to change the firewall? Right? And not once has anybody said anything different than 6 weeks, right? How long does it take? It takes 6 weeks. It goes to the spreadsheet. You go, right? And it's a huge hamper on the ability to use and consume services in cloud to get agility.
Like, why are we doing the whole microservices thing to go faster? To get more agility, to get features in the customer's hands faster. And if our security policy is standing in the way of that, we need to fix it. And what we propose in this SP, and we, you know, we have examples there, is a pretty simple model is one mitigation for this.
Put identity aware proxies on both sides. Author a static set of firewall policy that says the two identity where proxies on either side can communicate and then use an identity based policy to govern which applications can go over that tunnel. If you squint really, really hard, it's actually kind of like a VPN.
So what does a VPN do? It links. We have 2 networks. We say there's these privileged gateways that are allowed to talk to each other and they control who can go over the gateway to, to communicate, over the bridge to communicate on the networks on either side. We're doing a very similar conceptual thing, but at the identity layer, right?
So it's not, you're on an IP address, so we trust you. It's, you present an authenticatable runtime identity. I can apply an authorization policy on it and you're allowed, therefore I let you over the bridge. So that's one example where we, we relax a network control. We still have a firewall rule, but we don't have pairwise rules for every application.
And in exchange, we've augmented it. We've added an identity aware policy governing who can go over that identity aware bridge. And as a result, we see organizations move a lot faster and are able to update those policies more readily. One thing I'll just touch on real quick. Why can they update the policy faster? Because I humans can't read IP addresses. Like I get a cider range. What does it mean? What's the app on the cider? Who knows? That's why I go look in the spreadsheet. Right? So instead, when we're doing things in identity level, it's much more evident. I can see that this is the front end application running in a particular namespace, and I can read that as a human.
And so maybe I still need to go verify in my spreadsheet, because names drift over time and things, right? But I have a much higher assurance that I'm not going to screw it up, versus trying to map those ciders, right? And so that's one of the reasons, fundamentally, we can go faster. So that's kind of the second key idea. we're moving up to identity base to take you back to API surface and API security.
So yeah, API service. So API in my mind is going to be the new thing. Cause what do we, it's all about. I, how did I define service? I didn't, I was very careful not to say API when I defined service but most of the time when we're, as we're moving up the stack, I think we are going to be in an API centric.
Security world, right? So in part of the reason for that is just how does, how does an external actor, a threat actor, a friend, like, how does somebody interact with the system, your A. P. I. So that is fundamentally. So if you're not reasoning about your protections at that layer. That the user is interacting with it, you're not going to have the right policies in place.
You're going to have gaps and you're going to miss it. Right? Even just basic stuff. So let me one example of an A. P. I. Because I actually give when we talk about multi tier policy in the S. P. We only talk about network and identity. There's way more right of tears of policy that we may want to apply.
Application tier policy is one of those. And the example I actually give is something that an A. P. I. Gateway traditionally does. Which is request validation, right? So there's huge classes of errors of just now formed requests, right? And so just simple straightforward policy like I have a spring boot app.
It has a Welsh defined request payload, and I want my spring cloud gateway to verify that the incoming HTTP request conforms to the schema. There's a lot of security that you can get out of something simple like that. There's a lot of large classes of attacks that you can avoid with simple validation like that, right?
And and you know, even just basic things like that, we need to start to think about and do across the board to your, you know, back to the original question, API is going to be the key now that it comes when we, when we start to talk about securing things because that's the, the unit that identities are going to act on the system.
Neal Dennis: No, that makes sense. Like I said, you are very much like me. I go, I have a big thing I need to go through to get there. And this is good because this is where the knowledge transfer happens. So thank you. Cause this is following along with how my brain path likes to consume things. So I appreciate this.
And once again, I love this dude, you're, you, you write the stuff, you know, the stuff you're not just making a, you know. Some stuff for a company, you're not just going into a company to be like, Hey, look at me. I'm zero trust. And, and, you know, we've, we've only had a core group of people that are truly living, breathing the constructs here and like as their facts of life.
So I, I love this. So I want to get that out of the way again. I'll probably say it another four or five more times because I'm, I'm seriously with, with the nature of what you're talking about. So that being said you know, we, we think about this and, and I think it's really cool from, You know, your perspective on the API side of the house and, and once again, why, why I bring it up, you know, the stuff at RSA this year and, and stuff that you see in the flavor, there's API security, there's API security there.
So if we think about this wrapper, right, if we think about the growth with API as the goal, do you see you know, API is as much of a protocol for communication as it is a security layer within respect to how it's approached, right? Do you see either side of that being re approached, overhauled, revamped in respect to this growth phase, right?
And I mean, if so, what's your thoughts on that relative to where you're going with, you know, with the, with the constructs here? I'm harping on it a little bit just because it's been a very big focal point for a lot of people lately.
So If we think about API, right, we think about where we're going with the constructs and like you mentioned, it's the next, probably the next big wave of stuff, right? So let's say if you go back and you, we rewrite, you know, the standard again, or we approach the standard with API as a focal point. Where do you see things maybe highlighting changes or requirements for API to go one way or the other, or does it, right? I mean, like I said, one side it's a com structure, one side it's a security structure to get things done. So do you see any of that needing to change, producing change, or growing in one way or the other?
Zack Butcher: Yeah, yeah, yeah, yeah. Yeah, there's, there's going to be a bunch of change that happens here and a lot of it's actually going to be around tooling. In my opinion, right? So 1 of the key things is going to be actually that tooling to help with the translation between those 2 views of an API definition. Right? So exactly. I like that view. Right? 1 is a security view. What is the thing? What is the functionality that it's exposing? And how do I access it? And what does it look like on the wire? Right? Usually, only one of those is up to
Neal Dennis: Okay.
Zack Butcher: right? Well, actually, most of the time, none of them are up to date unless we can generate it from our code.
Right? And then only one of the two is up to date. And usually that's the API definition, the communication structure. Because I hopefully compiled it into my code and my code can describe it. Again, you know, I use Spring Boot as an example earlier, but Spring Boot produces an OpenAPI spec. For example, right?
So first off, I think we're already seeing tooling that operates on those structures, right? You know, first off, I know that there are plenty of API security and endpoint security vendors that will do things like accept an open API spec and use that for for certain styles of analyses. For example, there's 1 that exactly does that style valid request validation.
That I mentioned earlier, right? That's a very common one where there's a crossover because a lot of that's about the structure, the communication I need it to be well, but there's a large security implication there, too. 1, I see a lot of tooling that we need either to exist or is being created today to help bridge the gap between those 2 just to give you a quick idea as an example, you know, I work with service mesh stuff.
Service mesh does not natively understand an open A. P. I. Spec. It doesn't understand proto buff either or or any of whatever the interface definition, the ideal that you pick for your A. P. I. Is it doesn't understand it, right? Some of the things that we do at a product level are exactly like take an open A.
P. I. Spec and program the service mesh according to it, right? And add some points where you can hang policy with simple annotations, for example, right? So that's one example of What I think will become more common, which is tapping into the communication definition, the open API spec that describes what it needs to look like on the wire and then using tooling to derive policy from that. Right? That can be simple things like the request validation. But I think we also see other more complicated things embedded in. For example, there is an entire authorization section in open API spec. I'm going to keep harping on open API spec because it's kind of a link with Frankie here, but this is true of many different ideals that you could pick to define your your API, right?
Open API spec has an entire authentication and authorization stanza. That you can that you can put in there right now. Is that the best place to put that data? Maybe maybe not right? You know, you could do things like, say, hey, I want to do an authorization. And if you're going to call the read method, if you're going to call a get method method, you need a read scope and mappings like that. I tend to like that kind of thing, because I think that. It gives developers and the security team, a common language. Right so the other big problem that I see all the time is what I call the blank page problem. Right? Like, I need to, I'm an app dev. I was told to write a security policy. And so I go and open up visual studio code and it's a blank page.
And now I need to go, like, find the online that I can copy and paste it. Right? And so I think anything that we can, and that's one of the biggest ways, by the way, that we get like viral bad patterns perpetuated inside an organization, right? I need this config. So what do I do? I go to the get repo where I know that one project 3 years ago had the config and I copied the 3 year old config with the 3 year old defaults and I move it over.
Right? This is 1 of the biggest ways that we get bad patterns in the in the code base and just perpetuated that are almost impossible to stop out in large works. Right? Yeah. And you know, anything that we can do to avoid the blank page, I think is, is huge as a security practitioner, as a, as a platform guy, like, as anything we can do to avoid that blank page problem and start people out in the right spot or closer to the right spot is going to pay off dividends. So I look at being able to leverage things like here's the API, here's the, what my program exposes and my program can tell you it, take that and, you know, have Spring Boot spit out the open API, but add a few annotations there. And I don't have the same blank page problem. I don't have to say, what does my API do from scratch?
I say, this is what the API does. Let me go to the one place and change it. The mindset for a developer to do that is is world different. And so that becomes a much more tractable problem than to actually get them to participate. So that's why I think, like, from we need a lot of tooling and things that play with this API world, right?
Because we're only ever going to be on the side there. It's primarily the developers building it and iterating on it. And so we need to figure out from the security side, how do we tap into their workflows and processes to be able to get the stuff out that we need in the system and. Create a baseline of security, right?
That is, that is as high as we can get. Service mesh is like one way of doing that. And, and, you know, I mentioned some ways that we can do things like pull API spec and enforce some behavior. There's plenty of other, you know, there's any number of other systems you can do the same thing.
Neal Dennis: Cool. No, that's awesome. So I got, well, maybe one or two things. So I really kind of wanted to dive down the service mesh piece a little bit more. You kind of gave us a really good definition to kick off, but I'd really like to talk a little bit more about all that stuff because from a structural perspective, I'm personally aware of it a little bit, but it is, is a relatively new construct for my brain.
Zack Butcher: Exactly. Yeah. Yeah. Happy to talk about it. Definitely. And in fact, actually, that's how my relationship with this started. So the, you know, how did I happen to, you know, be the guy that gets to co author some of these papers? We actually started with this studying access control. So I mentioned, I worked on some of the access control in Google cloud.
Got 6 months where we, where we shipped I am to everybody. Learned a lot about identity and access management there. And when you start to look at the service mesh, just for everybody who's not familiar with the architecture, architecturally, it's pretty straightforward. What we do is we take a traditional web proxy, a reverse proxy envoy is the is the CNCF project that we use.
But think, you know, NGINX, we could have used the same way. For example, we take that reverse proxy and we put it in PC. As a sidecar next to every instance of every application. And when we say sidecar, what we mean by that is, you know, just like a sidecar motorcycle, it's deployed in the same domain. So if we're in a VM, that proxy is on the same VM that hosts the application.
If we're in Kubernetes, it's in the same pod, and it's going to intercept all the network traffic into and out of the application, thereby being what we call policy enforcement point, a PEP. Right. So if you're, if you're familiar with the you know, stylized identity, access management systems, in fact, the term pep goes all the way back to the 70s and the initial, like, you know, hype idealized access control architectures.
Right. They introduced the pep and the pip and the path and the, and the wrap and the all these different 3 letter P things, the, the policy enforcement point, the resource access point that that's where we started, but the service mesh we saw. Because it's intercepting all the traffic into and out of the application as a perfect universal policy enforcement point, the second piece of the service mesh.
So we have those proxies beside every application intercepting traffic in and out. The second piece of the service matches the control plane that programs. So this is where we push configuration and the control plane is going to go and distribute that configuration down to all of those side cars that are deployed across your infrastructure. So that's really what the service mesh is. So now you can imagine when you're sitting there intercepting all the network traffic, you can do a whole lot of stuff. You can do encryption in transit. You can do per request policy. You can do routing and traffic decisions. You can do observability as well. So you can get a really nice comprehensive picture of what's talking. You can give identities to those things. You can start to apply authorization policy on those things. Right. If we think back to identity based segmentation, I said, there's 5 controls, encryption, service, often service, often end user, often end user off. See the service. That's really is very well suited to do the 1st, 3 of those. And it's suited to be a good integration point. For the other two. So when I say it's a policy enforcement point, we can do things like authenticate the user. That's a policy authorized user. That's a policy do a 1 percent traffic canary or only send user agents with iPhone to that instance. That's a policy.
We can do all of those things with the service mesh. And again, it gives you centralized control. So I don't need to go change those settings per service, although I can, but I can change them globally in one place and have them affect the entire infrastructure. Right? So that's kind of the superpower. And that's what's different.
We needed security and observability and traffic management before the service measure gives us in a way that is universal, no matter what the runtime of the program is. And with centralized control. Now, I mentioned I did the access control research. So obviously, as an access, as a policy enforcement point, we could do access control there. But the thing that really got the folks at NIST excited about the, the service measures and architecture, again, goes back to some 1970s research. And if we talk about the kernel, so where does the operating system kernel come from? It comes from a set of ideas about the security kernel. That's where the kernel name comes from.
And the idea is, of course, what we do with the kernel today. If you're a Unix process and you're running on the Linux operating system, you have a certain set of security guarantees provided for you by the system. And specifically, it's the kernel that implements those guarantees. It gives you things like user spaces, and namespaces, and cgroups, and all these other abstractions that we use. The folks at NIST were excited because they saw the service mesh as potentially being the kernel or the security kernel for a, for a distributed system of services. The idea that we can have this policy enforcement point that's not the kernel, but it's the, it's this proxy on the network intercepting.
All the traffic in our distributed system, and we can use one system to to manage and perform enforcement on that. And critically, not just 1 system, but 1 audited code base 1 concentrated piece of code to do this too. Right? So it's not that we're having to re implement these policies in every language and in every application.
We implement them once, we reuse them everywhere, and so we can do security vetting on them, right? Envoy itself, for example, is the data plane that we used in the Istio service mesh. Both Google, Lyft, I believe even Microsoft and Amazon now additionally cover it with bug bounties, But certainly I know for a fact that Google and Lyft both cover it with bug bounties, and we've had a lot of CVEs reported from the community through the bug bounty program.
Right. So this is an example of, of things that we can do because of the security is concentrated on the Envoy code base. The community can fund making sure that that is secure, right? So that's how we can realize that idea, this, this kind of distributed kernel or this kernel for the, for a modern distributed system.
Neal Dennis: Cool. So I've got, I got two, but you know, two roads diverge. So we'll probably won't come back to the other one, but that's fine. So when we think about this, I'm going to jump down a letter actually to see so cause you're, you're hitting on some dev ops, dev sec copy type constructs with, with this approach.
I I'm very fortunate with the company I work for nine to five that I get to listen to a lot of our dev ops guys a lot just by proxy of updates and all that fun stuff. So when we think about this structure, we think about this deal to me, it sounds like, The service mesh is where our DevOps, DevSecOps guys really, really hope to be, want to be, would love to be.
Is that, is that kind of the idea? And I'm, I'm, I'm hoping, yeah, since you went ABC in your lettering, some of those standards there. But, yeah, I mean, is that kind of the vibe check here? Is, you know, we want this service mesh for that, that construct that we're driving with the new DevSecOps mentality?
Zack Butcher: Yeah. And it doesn't have to be in that mentality, but the key thing and a lot of that DevSecOps mentality and, and the service mesh architecture, like it's important, you know, it's, it's, it's a truism, but we build software that matches our organization. Right? So the service mesh came from Google. So what is the organizational structure that it is replicating?
And the answer is it's, it's replicating in Google, the SRE model. And in the larger community, that's, that's turned into the dev ops movement, DevSecOps now, right? If we look at, at what, what did that mean? So inside of Google, the SRE organization has a, a kind of crazy mandate. And that mandate is...
You must grow compared to headcount growth of software engineers. So if software engineers were going to hire like this, that's our organization needs to be flat. So the only way that you can do that is automation and force multipliers, right? Consistently, you know, being able to enact change on a large part.
So that's really where the idea of the service mesh comes from. And so the service mesh is really a key enabler. There's no free lunch. You can't get rid of complexity. Right. But what we can do is move complexity around and importantly, we can concentrate complexity on the people that are best suited to solve it. And that's really what the service mesh lets us do. So the idea is that we have a central infrastructure and I can change one policy to enforce encryption and transit. I can change one policy to swap our PKI, for example, and move over to, you know, there's some more complexity operationally to make that happen, but, you know, like I can do cross cutting things. And critically, the service mesh lets us empower a small group to do that. So I can make the PKI team responsible for the PKI integration with the mesh. Therefore, if it breaks, the expert's already there. And everybody else benefits because they just already have a PKI, because the PKI integration was already done.
And so when I need a certificate at runtime, it's out of the route of trust that the organization already uses. That's one example. But the point is that, you know, you know, the same thing for traffic routing the same thing for operational metrics and for logging. Suppose I want to change the log format across every service.
If you're going to do that today in your organization, that would take months. Right? You have to go to every team, get them, change it, get them to build and redeploy all in a service mesh. We can have the sidecar proxy that's doing logging on behalf of your application because it's looking at the request in L7.
So it can do something like an access log, like an Apache access log. We can change the format of that logging globally with one config. Right and now the entire fleet is producing logs in a different format. Right? So that's what you know, there's no free lunch. We can't just like, hey, somebody still needs to do integration, but rather than making every team in the organization, learn what is learn what a certificate is pull in the right library in their language to integrate with it and then do the integration to have encryption in transit. We can instead have the service mesh one time when we install the service mesh, the PKI team integrates it with the PKI so that it has a signing certificate and now it automatically handles issuing and rotating certificates for you. And any application running in the mesh gets encryption and transit out of the PKI that the team integrated with no effort involved.
Right? So that's 1 example. But, you know, there's many in that vein of where the service special lets a small group tackle the pain on behalf of the whole organization. In that way, it's a, it's a. Force multiplier, right? And so that's where. So now we come back to the DevSecOps, you know, movement and all that's kind of where the service mesh came from.
Yeah, it's, you know, the app devs are still going to want to interact with the service mesh, right? Who knows what the timeout needs to be for each method of your API? What's the right retry policy for by default for a client that wants to call your API? I don't know. You don't know, like, you're a security guy.
You don't know. Right? But the app team knows, hopefully, or at least they can make it a guess. Right? And so let's let them do that. Let's let us do the security policy settings and let's keep the 2 separate so that they don't have to worry about security and I don't have to worry about the traffic setting.
Right. And the service mesh really gives you the system to be able to do that.
Neal Dennis: Cool. I think we've, we've definitely beat that one up a bit. That's cool. I like this. Like I said, I, I, It's a term I've, like I said, it's a term I've seen, it's a term I've looked at a little bit, but once again, you're the first person that we've had that we can really kind of dive into the idea of what it really is.
Zack Butcher: Yeah. And let me just touch on this real quick. So we talk about the service mesh a lot in the 204 series and in 207A we talk about the service mesh. So I want to be very clear that, you know, all the policy that we talk about in the SPs can be implemented without a service mesh. Right. Those five policy, the identity based segmentation, you can implement those in a bunch of different ways.
Right. And that service measure is not required there. What we do think is that the service measure makes it a lot easier to implement those things compared to most other paradigms that you could use to implement it. So the, the lift to get there, we, we argue, is smallest with the, with the service mesh compared to other approaches you could take, like, you know, doing it in the library doing it in, in process with libraries or, you know, in frameworks or, or assembling a set of different disparate technologies like bio regard for encryption.
And this, you know, all of them have different complexity. The service mes gives you a nice set of trade-offs.
Neal Dennis: That makes sense. Yeah. And yeah, you hit the nail on the head. I was going to ask you a few more examples on that. So the, you know, when we think about bring up when you think about those security models and that growth, you know, let's, let's what, what's one tantamount thing to consider out of this, this current structure, the current standard and stuff.
And now I know there was another topic that you want the other standard as well that we want to go down, but I, I pigeoned us into this cause of the service metrics. I got really curious. So I, I'm not sorry but let's just say let's, so let's peel all this back. So we talk about those five tenants, we talk about the, you know, phase one, two, three.
So I have all this, let, let's say if one key tenant out of all this to get started, obviously you want to go through the whole thing, but one key tenant here to get started, where should people kind of poke and prod first, so then they can dive down the right rabbit hole to make sure they do eventually get through this cycle.
Zack Butcher: yeah. Yeah. So when it comes to zero trust overall I think the NIST cybersecurity framework lays out the right thing. And, and by the way, the, this, the cyber security framework is currently undergoing a revision. The improvements are great. Check out the images. 1st step is identify, figure out what are you even protecting?
What's there that exists in the, in the infrastructure and then go from there. And the cybersecurity framework lays out the full 5 step kind of process there, right? We identify, we protect, we detect, we respond and we remediate. Right? Those are the, that's the cybersecurity activity that we need to be able to go through.
So start at the beginning. Start with identifying what's there. So when it comes to, you know, something like a service mesh, for example, that's a great tool. I mentioned it gives visibility. It gives you things like L7 traffic flows. That's a really effective way to start to inventory or identify what exists in your infrastructure.
And not just at the level of like, there's a host with IP address. But at the level of there's an application that's speaking, and it calls these other applications on these methods with these you know, these HTTP verbs, for example, right? So that's the place to start. Identify what you have, know what you even need to put a boundary around. Then, you know, start to build out the protections, right, is the next step. You know, things like encryption and transit, the surface mesh can do. Things like a strong, authenticatable runtime identity. The service me does that when it does encryption in transit, then start to author. Author. So great. Now you have an identity.
Things are encrypted. Start to author, authorization policies start to gradually redu, reduce the area that an attacker can attack if they compromise the workload. Use authorization policies, right? And then, you know why overlay that with your existing identity, you know, end user identity and authorization systems.
And again, as you're layering these things in, Look in the organization where there's other technologies and other policies that hurt agility and see if you can't relax those while you're adding these other controls in.
Neal Dennis: Cool. So I know we're, we're close to time and I think given, you know, the mentality we have here, I, I will, I'm going to ask for kind of a little pitch on your company just out of curiosity. We already kind of went a little bit. You know, we normally, we, we don't outright avoid it, but you know, I, I feel like given who you are, I, I, I think you're more than worthy of the fact that you should be able to pitch a little bit about how, how this all folds up with where your company is personally.
And if LA doesn't think so, he'll just edit it out later.
Zack Butcher: Exactly.
Neal Dennis: know, give us, you know, a few,
Elliot Volkman: totally fair game. Lay it on
Neal Dennis: yeah, so
Zack Butcher: Yeah. Yeah. Yeah, Tetrate does a lot of what I just talked about, right? So fundamentally, the stuff that I'm writing about in the ESPs and all that doesn't come from nowhere. It comes from practical experience that we're doing on the ground with folks like financial institutions, the DOD and others.
Right? And so fundamentally, what do we do? We're all about taking service mesh into into enterprise. And the net effect of that is going to be, you know, helping accelerate your modernization effort, helping keep a secure posture across a bunch of heterogeneous infrastructure. Right? So you can imagine all the controls that I talk about and all those those things.
Those are things that we've done with customers in practice. Right. So things like how do we bridge cloud and on prem infrastructure and mitigate those, those IP rule changes and things like that. You know, how do we actually navigate? What's our security posture across these? How do we get consistent visibility there?
And then how do we know what policy we want to put in and then enact policy on them? We do all of that using the service mesh, and that's really where we play. Kind of taking a single, an open source service mesh is great for one cluster. I always, I joked, I gave the first multi cluster demo back in 2018.
And I joked then that if you only have one cluster, it's a toy. Right. If you have to, you're starting to do serious business because you're available. Right. But but most organizations have many more. And so what we do is the service mentioned open source is really well suited for like a single group nettings cluster. that we we've all of your infrastructure, no coherent whole all of those clusters, all of the instances give you a central place to control it, manage it across clouds, do service discovery, all kinds of cool stuff that you can anticipate, you know, hey, we have a sidecar, we intercept all the traffic in and out.
You can do a lot of things around traffic routing, observability and security. We have a system that helps you manage those capabilities across your entire infra.
Neal Dennis: cool. I'm going to go ahead and punt the ball back over to Elliot and let him at least get one more question in from his brain pan there and then,
Zack Butcher: Yeah, please, please.
Elliot Volkman: Yeah, I mean, that's a really good spot to like, wrap it up, but I don't know. Maybe I'll edit this and we'll just kind of wiggle it around a little bit anyways. Obviously, you spend some time over at Google beyond corp was a thing. They're probably like the 1st proper implementation. I'm curious. Like, you know, what did, you know, what was anything lingering around since the. You know, I can't remember when they phased that out, but like, you know, I'm just curious, like, you know, what, from those efforts and initiatives kind of stuck around.
Zack Butcher: Yeah. Yeah. So a lot of it actually we see today. So those five the identity based segmentation, those, those five checks. The first time I actually ever implemented those was, was in Google. We didn't call it that at the time, but that's what we did. Right. And, and so that's, and, and so I was actually there in cloud.
So in, in GCP was the first part where this hit. And we went, Google had had, and people may be familiar with Loas is the, and it's now Spiffy in open source. Spiffy is what the ServiceMesh uses for runtime identity. Loas was a system for, for service authentication. So when we talk about encryption, Service Auth In, Service Auth C, End User Auth In, End User Auth C.
You know, Google had a strong and robust system for Service Auth In and Service Auth C. We, they were called great ACLs. So you would create an access control list, and you can have a, and, and incorporate the rate level. Together because you, you basically the two, you typically want to go hand in hand in a distributed system.
So not just you have access, but how much access you have until we had had that for a long time. And then we started to roll out the service mesh and the service mesh actually originally came from the decomposition of the gateway. So think about what the gateway does, right? It almost universally end user often end user off Z, do some rate limiting and then load balance for me, please. I think most people, you know, notoriously shared fate outages, noisy neighbors, all kinds of problems with shared API gateways. So we said, hey, let's decompose that into the, to the sidecar architecture, right? In that transition, then now, suddenly, internal communication has to have an end user credential and that end user needs to be authorized.
Right? And that's in addition to the service credential that we already have. And so I actually got to watch kind of firsthand as we rolled out the exactly the policy that we're talking about. Right? Where we went from having kind of a Simple and straightforward service authorization system and then overlaid that with end user authentication and authorization and like all of the challenge that went in through that.
So all of that stuff is still there, right? So that now beyond corp in that bit of it was kind of the edges. So think of everything that I've talked about today. 207 a all this stuff is kind of in the data center. Right. Or or in your data centers in your infrastructure. The hard part, I would argue the harder part of zero trust is like that edge out.
Right. So how do you do device ID? How do you do user ID in 2078? We sweep that under the rug by the way. So I say you need a service identity. You need an end user identity. You should have a device identity in there. Use that authenticated and authorize it. That's a really hard thing to skip over. And by the way, that's where you should fit in your risk based authentication and authorization systems and other things like that.
That is where a lot of BeyondCorp played, right? That was in the device ID, user ID, that side of it. Quite frankly, I don't know which of that is still around at play in Google or not. I know that that's an area that they actively iterated on. The general rule of thumb is that Google publishes a white paper when that's one Generation behind what, what is current.
So I, you know, I haven't been in, in quite a few years to be able to speculate, but I will say that the basics that that the identity based segmentation policy, it's still happening today internally. 100 percent
Elliot Volkman: I appreciate the context. It's just, I think it's helpful for people to know, because whenever you start digging into Zero Trust, that's one of those first topics that people tend to run into. It's like, oh, this is a huge implementation, but, you know, what the hell happened afterwards?
Obviously, you know, they face it now, but it's interesting to
Zack Butcher: continues to evolve.
Elliot Volkman: evolved.
Zack Butcher: Exactly. Yeah. And it continues to the day, right? That's 1 of the 1 of the. Challenges and, and fun parts of being a security team on, and I was on the I work on IAM and some of those things, but you know, they, they have nation state actors that they need to model against, right?
Like act, you know, and, and not all from outside the country the, the Snowden leaks are one that I use regularly in slide decks and the slide that I show in when I introduced 207A and I say the attacker can be inside the network. The picture that I don't actually use that Guam example the example that I use is actually the Snowden leaks of the NSA drawing of Google's infrastructure. The first time I ever saw that picture was on my engineering orientation at Google, where they, where they have an engineer who's very angry come out and, and explain why does the network look the way it looks. And the first picture is that, and then it goes from there. Right? So a motivated attacker can be in the network.
Elliot Volkman: Very interesting. I mean, so we are at the top of the hour Zach, thank you so much for running us through your contributions on the policy side and obviously giving some insight into what you are able to build on the heels of some of those core concepts. So thank you so much for sharing some of that expertise and insight with us.
Thank you for coming back safely so you can do most of the talking as per usual. I don't know how we would do this without you. So it is good to be back up and running. But yeah, so thank you, Zach.
Zack Butcher: Thank you for having me.
Elliot Volkman: wrap you back around.
Neal Dennis: Zach. Thanks man. Once again, good conversation. Appreciate the knowledge. Thank you.