Service Mesh Overview, News & Analysis | The New Stack

Don’t Force Containers and Disrupt Workflows

Alex Williams — Thu, 25 May 2023 22:10:20 +0000

How do you allow people to use their technologies in their workflows? The first thing you do is not force people to use containers, says Rob Barnes, a senior developer advocate at HashiCorp, in this episode of The New Stack Makers.

Barnes came by The New Stack booth at KubeCon Europe in Amsterdam to discuss how HashiCorp builds intent into Consul so users may use containers or virtual machines in their workflows.

Consul from HashiCorp is one of the early implementations of service mesh technology, writes Jankiran MSV in The New Stack. “It comes with a full-featured control plane with service discovery, configuration, and segmentation functionality. The best thing about Consul is the support for various environments including traditional applications, VMs, containers, and orchestration engines such as Nomad and Kubernetes.”

Consul is, at heart, a networking service that provides identity, for example, in Kubernetes. A service mesh knows about all services across the stack. In Kubernetes, Helm charts get configured to register the services to Consul automatically. That’s a form of intent. Trust is critical to that intent in Kubernetes.

“We can then assign identity — so in a kind of unofficial way, Consul has almost become an identity provider for services,” Barnes said.

In Consul, identity helps provide more granular routing to services, Barnes said. Consul can dictate what services can talk to each other. The intent gets established. A rules-based system, for instance, may dictate what services can talk to each other and which can’t.

“I think that’s an opportunity that HashiCorp has taken advantage of,” Barnes said. “We can do a lot more here to make people’s lives easier and more secure.”

So what’s the evolution of service mesh?

“There’s a lot of misconceptions with service mesh,” Barnes said. “As I say, I think people feel that if you’re using service meshes, that means you’re using containers, right? Whereas, like, I can speak for Consul specifically, that’s not the case. Right? I think the idea is that if more service meshes out, they make themselves a bit more flexible and meet people where they are. I think the adoption of the service mesh, and all the good stuff that comes with it, is only going to grow.”

“So I think what’s next for service mesh isn’t necessarily the service mesh itself. I think it’s people understanding how it fits into the bigger picture. And I think it’s an educational piece and where there are gaps, maybe we as vendors need to make some advances.”

The post Don’t Force Containers and Disrupt Workflows appeared first on The New Stack.

Linkerd Service Mesh Update Addresses More Demanding User Base

Joab Jackson — Tue, 11 Apr 2023 13:17:14 +0000

Five years ago, when the hype around the service mesh was at its greatest, Buoyant CEO William Morgan, fielded a lot of questions about the company’s flagship Linkerd open source service mesh software. Many in the open source community were very curious about what what it could do, and what it could be used for.

These days, Morgan still gets questions, but now they are a lot more pointed, about how Linkerd would work in a specific situation. Users are less worried about how it works, and more concerned about just getting the job done. So they are more direct what they want, and what they want to pay for.

“In the very early days of the service mesh, a lot of open source enthusiast who were excited about the technology wanted to get to the details, and wanted to do all the exciting stuff,” Morgan explained. “Now the audience coming in just wants it to work. They don’t want to get into the details, because they’ve got like a business to run.”

In anticipation of this year’s KubeCon + CloudNativeCon EU, Buoyant has released an update to Linkerd. Version 2.13 includes new features such as dynamic request routing, circuit breaking, automated health monitoring, vulnerability alerts, proxy upgrade assistance, and FIPS-140 “compatibility.”

And on April 18, the day before the Amsterdam-based KubeCon EU 2023 kicks off in earnest, the first-ever Linkerd Day co-located conference will be held.

What Is a Service Mesh?

Categorically, service mesh software is a tool for adding reliability, security, and observability features to Kubernetes environments. Kubernetes is a platform for building platforms, so it is not meant for managing the other parts of a distributed system, such as networking, Morgan explained.

In the networking realm, the service mesh software handles all additional networking needs beyond simple TCP handshake Kubernetes offers, such as retries, mitigating failing requests, sending traffic to other clusters, encryption, access management. The idea with the service mesh is to add a “sidecar” to each instance of the application, so developers don’t have to mess with all these aspects, of which they may not be familiar with.

There are multiple service mesh packages — Istio, Consul, Traefik Mesh and so on — but what defines LinkerD specifically is its ease-of-use, Morgan said.

“When people come to us because they recognize the value of a service mesh, they want to add it to their stack,” Morgan said. “But they want a simple version, they don’t want a complicated thing. They don’t want to have to have a team of four service mesh engineers on call.”

Buoyant likes to tout Linkerd as the Cloud Native Computing Foundation‘s “only graduated service mesh” (CNCF also provides a home for Istio, though that service mesh is still at an incubating level). The certs simply mean that Linkerd is not some “fly-by-night open source things that’s just been around for six months. It’s a recognition of the maturity of the project.”

New Features of Linkerd 2.13

For Kubernetes users, the newly-added dynamic request routing provides fine-grained control over the routing of individual HTTP and gRPC requests.

To date, Linkerd has offered a fair amount of traffic shaping, such as the ability to send a certain percentage of each traffic to a different node. Now, the level granularity is much finer, with the ability to parse traffic by, say, query parameter, or a specific URL. Route requests can be routed based on HTTP headers, gRPC methods, query parameters, or almost any other aspect of the request.

One immediate use case that comes to mind are sticky sessions, where all a user’s transactions take place on a single node, in order to get the full benefit of caching. User-based A/B testing, canary deploys, and dynamic staging environments are some of the other possible uses. And they can be set up either by the users themselves, or even by third-party software vendors who want to offer specialized services around testing, for instance.

Linkerd’s dynamic request routing came about thanks to Kubernetes Gateway API. Leveraging the Gateway API “reduces the amount of new configuration machinery introduced onto the cluster,” Buoyant states in its press materials. Although the Gateway API standard, concerning network ingress, wasn’t specifically addressing service mesh “east-west” capabilities, many of the same types can also be used to shape east-west traffic, reliving the administrative burden of learning yet another configuration syntax, Morgan said, admiringly of the standard.

(Morgan also pointed to a new promising new initiative within the Kubernetes community, called GAMMA, which would further synthesize service mesh requirements into the API Gateway).

Another new feature with Linkerd: Circuit breaking, where Kubernetes users can mark services as delicate, so that meshed clients will automatically reduce traffic should these services start throwing a lot of errors.

Security, Gratis

A version of the 2.13 release comes in “a FIPS-compatible form,” the company asserts.

Managed by the U.S. National Institute of Standards and Technology (NIST), the Federal Information Processing Standard (FIPS, currently at version 3) is a set of standards for deploying encryption modules, with requirements around interfaces, operating environments, security management and lifecycle assurance. It is a government requirement for any software that touches encrypted traffic. Many other industries, such as finance, also follow the government’s lead in using FIP-compliant products.

That said, Linkerd is not certified for use by the U.S. government. “Compatible” means Buoyant feels it could muster with a NIST-accredited lab, though the company has no immediate plans to certify the software.

And, finally, Buoyant itself is offering to all of Linkerd users, basic health monitoring, vulnerability reporting, and upgrade assistance, through its Buoyant Cloud SaaS automation platform. This feature is for all users, even of the open source version, and not just for paid subscribers.

“We realized a lot of Linkerd users out there are actually in a vulnerable position,” Morgan explained. “They aren’t subscribed to the security mailing lists. They’re not necessarily monitoring the health of their deployments. They’re avoiding upgrades because that sounds like a pain. So we’re trying to provide them with tools. Even if it’s pure, open source, they can at least keep their clusters secure, and healthy and up to date.”

Of course, those with the paid edition getting a more in-depth set of features.

To upgrade Linkerd 2.13 or install it new, start here, or search it out on the Azure Marketplace.

The post Linkerd Service Mesh Update Addresses More Demanding User Base appeared first on The New Stack.

How to Create Zero Trust Architecture for Service Mesh

Joe Fay — Mon, 27 Mar 2023 14:00:59 +0000

It’s not surprising that zero trust security has captured the imagination of tech organizations. The premise is beguilingly simple: just because you come into a system doesn’t mean you should have access to everything. Or, indeed, anything.

Each request — whether from a person or a microservice — is examined in a broader security context, taking into account attributes such as: Who is this user? How do we validate that? What time of day is it? What permissions does this user have?

Simple as the principle is, progress in implementing zero trust is proving elusive. Gartner predicts that it will be 2026 before even one in 10 large organizations “will have a mature and measurable zero trust program in place.” Less than 1% have such an implementation in place today.

The fact is, implementing a zero trust architecture requires a way to understand what applications are doing, and to manage their security properties, as Christian Posta, Solo.io’s global field chief technology officer, told The New Stack. And that can’t simply be bypassed by developers in a hurry or undermined by a pre-existing or new vulnerability.

This is where a service mesh like Istio comes into play, at least for modern applications built on microservices. The mesh operates as a visible infrastructure layer that can be added to applications, providing observability, traffic management and security. In the case of Istio, this is achieved using the Envoy sidecar, which acts as a proxy for each service, without having to be embedded in core application code.

Indeed, Posta said, the number one use case for service meshes is security. Perhaps that’s not surprising, given that the service mesh approach enables secure communication between the services that make up an application, as well as authentication and authorization, and traffic management.

But he pointed out, a service mesh “doesn’t just automatically and magically solve zero trust.” Rather, he said, “What a service mesh does is implement a lot of the properties that you would expect in a zero trust stance.”

And simply taking an open source project like Istio and dropping it into a real-world enterprise environment is not straightforward, he said: “Enterprises are not clean cut with nice, square edges for everything to fit in just perfectly. They’re war zones.”

New Mesh, Rough Edges

So, Solo.io, and other Istio contributors, have done a lot of work on “softening up” some of those rough edges, Posta said. “The products that we build here at Solo are meant to further simplify the effort of deploying, installing Istio, and manage service mesh in general.”

But even this doesn’t totally solve the complexity challenge when constructing a zero trust architecture — or of developers simply doing exactly what they want.

As Posta told The New Stack, developers may forget to deploy the sidecar within their applications. Or they may misconfigure ii. Or the sidecar may not be as transparent as the developers intend.

Sometimes companies will pursue policies or procedures that are initially incompatible with Istio. For instance, he said, one company Solo.io worked with relied on the HTTP headers between applications being case sensitive. But the HTTP spec doesn’t cover case sensitivity for headers. This meant Istio — still within the spec — changed them to lowercase. The result? The applications broke.

“So here’s an example of, as a developer, I wanted to deploy a sidecar because I want to be in the mesh,” Posta said. “But now I run into these unintended consequences because of some weird crazy things the enterprise is doing.”

At the same time, he pointed out, the Envoy proxy, which is used as the data plane in Istio, is extremely flexible. It can open connections, establish mutual TLS, and collect telemetry. It also understands database protocols, such as MongoDB and MySQL, and can be extended and customized with WebAssembly.

“And then its core is basically doing things around HTTP, and GPRC, request routing, load balancing, circuit breaking,” said Posta.

Looked at another way, this flexibility starts to look like complexity. But, if security is the key driver for implementing a service mesh, it probably makes sense to keep things as simple as possible, by removing the sidecar where appropriate.

This is the aim with Istio Ambient Mesh, introduced last year, which removes the sidecar, without trading off the security benefits of the service mesh. Instead, individual services are connected via Ztunnels — or zero trust tunnels — which handle layer four connectivity, while more detailed policy enforcement is handled at layer seven. Solo.io has announced support for Ambient in its own Gloo Mesh implementation.

Simply running a secure overlay, which Posta described as “the backbone of Ambient,” means engineers aren’t distracted by issues like case sensitivity or the nuances of how the proxy handles certain protocols. “We’re just solving the use case that you want, using just the amount of technology that you need,” he said. “And then the rest of that stuff is opt-in later if you need it.”

Casting the Mesh Wide

While the core Istio project is predominantly focused on Kubernetes with support for expanding out to virtual machines (VMs), Posta said Solo.io’s technology supports complex environments spanning multicluster, multi on-prem and public cloud and demilitarized zones (DMZs).

“We can integrate VMs and physical machines,” he said. “There’s a lot of complexity that comes into it. We try to shield people away from that. We integrate with things like [Amazon Web Services] Lambdas directly, that open source Istio doesn’t do. And, by going down that path, that also alleviates people from having to use AWS API Gateway and [application load balancers].”

Many organizations, particularly in the financial world, still use code written decades ago which can present a security and integration challenge.

But the service mesh is focused on enforcing policies over the network, Posta pointed out. “So if there are [Simple Object Access Protocol] applications, they can communicate over the network,” he said, which means Solo.io can apply “some level of policy.”

Ambient can potentially support more workloads and run in more places, because it is not a full-blown proxy. Deploying a sidecar on a VM can be “pretty invasive”, he said.

“We can maybe more easily include things like workloads that run on HashiCorp Nomad, or AWS ECS Fargate,” Posta said. Logically, he added, those places can extend all the way to mainframes.

This really does open up how service meshes can be used to enable zero trust architectures, even in that “war zone” of enterprises with legacy applications and infrastructure running alongside modern, Kubernetes-based, cloud native applications.

None of which means the sidecar will be completely obsolete. “At Solo, we believe that Ambient will eventually become the default data plane in Istio,” said Posta. “And the sidecar will be used as an optimization when needed.”

When that is needed will depend on the overall architecture and the security context in which the zero trust architecture is being implemented.

This means, said Posta, “The behavior of the mesh is going to be dictated by multiple different groups in the organization.”

This includes the platform engineering team, which typically owns the life cycle of the mesh, but also the security team, obviously, as well as site reliability engineering teams, API gateway teams, and of course the application owners.

Does this make life complex? Probably. Life is always more complicated when people are involved. This is another reason to make implementing the service mesh on which zero trust is built as uncomplicated as possible.

The post How to Create Zero Trust Architecture for Service Mesh appeared first on The New Stack.

Ambient Mesh: Sidestepping the Sidecar

Jeff Goldman — Wed, 01 Mar 2023 16:44:29 +0000

Last fall, Solo.io and Google introduced Istio ambient mesh, a new mode for Istio that leverages a mesh data plane instead of deploying sidecar proxies, without sacrificing key strengths such as zero trust security.

IDC research vice president Brad Casemore told The New Stack that the offering reflects a broader trend in the market. Cilium Service Mesh, he noted, also offers the option of running without sidecars.

“It’s a trend, I think, that’s related to the cost of sidecars, both in the complexity of injecting them and managing them into the mesh, and also the cost of overhead, because they are extra containers running within the service mesh,” he said.

Avoiding sidecars, Casemore said, also reduces the barrier to entry for organizations that lack key expertise.

“As we see more containers and cloud native application architectures proliferating in enterprises, most folks don’t have the very high-skilled, high-price-tag Kubernetes folks in-house, and they’re looking for simple ways to adopt this technology – and when they look at things like sidecars, it often is very intimidating to them,” he said.

Key Benefits

Torsten Volk, managing research director at Enterprise Management Associates, said that ambient mesh can free developers to focus on what they do best.

“The concept of the ambient mesh enables DevOps engineers to provide all the individual components required to establish connectivity, security, compliance and reliability directly through a unified API that is attached to each node of each Kubernetes cluster,” he said.

“The core layer of ambient mesh basically delivers zero trust on demand in a very simple manner and in the form of an open source project,” Volk said. “I would expect Kubernetes platform vendors and product vendors from other CNCF [Cloud Native Computing Foundation] categories to incorporate ambient mesh into their solutions, as it could be an easy way toward offering zero trust security for their products.”

According to Solo.io vice president and global field CTO Christian Posta, the two key benefits of ambient mesh are ease of application onboarding and of managing upgrades.

“Service mesh is infrastructure-level technology, but since a component of it is tied very, very intimately with the application, it creates that friction that if I want to upgrade a piece of the infrastructure, I also have to coordinate with the application, restart all the applications, change out their proxies,” he said.

Without a sidecar, that friction goes away – and it also makes it far less disruptive to respond to vulnerabilities. “If we have a CVE and we need to patch something, we’ve got to do it quickly, and we’ve got to do it without disruption of the running apps. It’s much easier to do it if the components of the service mesh are running outside of that,” Posta said.

For Solo.io, Posta said, those were the core drivers for the development of the new offering.

“What we are really interested in is, how do we simplify onboarding, which could potentially open up new cases, and keep the mesh running in Day 2 with minimal interference to the running applications?”

Upgrades Without Disruption

Imagine, Posta said, that a vulnerability is found in Envoy.

“Istio uses Envoy Proxy as a sidecar, so if you start to see these CVEs in Envoy, then we’ve got to go upgrade Envoy – and if Envoy is deployed with the applications, you think, ‘All right, I’ve got to go upgrade the applications because I have to change out the proxy,'” he said.

Doing so requires careful planning.

“You have to say, ‘Well, let’s restart this application first, and then we have to wait for that to come up, and then we’re going to restart this application,’ and so on,” Posta said. “There’s coordination that has to happen so you don’t get any unexpected outages, because the proxy’s tied to the app.”

With ambient mesh, Posta said, the process is far easier.

“Since nothing is in the app, whatever steps you have to take to upgrade the service mesh, you don’t have to restart any of the applications,” Posta said. “You don’t have to think about, ‘Well, I’ve got to restart A first, because that would otherwise impact B or C.’ It doesn’t matter to the applications. That orchestration and planning and rolling upgrades, and all this stuff that has to happen that could potentially cause an outage, doesn’t happen anymore.”

You can restart and upgrade the ambient mesh waypoint proxies, Posta said, without the applications even being aware of the change.

“Obviously, we’re not going to take every single waypoint proxy down at once, but we’ll do a rolling upgrade of the waypoint proxy – and again, we’re not coordinating anything with the app developers or with the application,” he said.

A Place for Sidecars

Still, sidecars will likely still have their place, at least for the foreseeable future. For some organizations, Volk said, compliance concerns can encourage their continuing use.

“The fact that sidecars attach the service networking layer to the application instead of to the underlying infrastructure cluster can enable a higher degree of workload separation and therefore be relevant from a compliance perspective,” he said.

More generally, Posta said a sidecar can be useful when you need to focus on settings and capabilities that only apply on the client side.

“The waypoint proxy that we use in ambient, which is assigned per workload type, represents the target workload … but now you don’t have the capabilities representing the client directly,” he said.

As an example, Posta said, Istio’s retry policy allows you to configure the client to retry a request if that client calls a service and the request fails.

“That decision is made on the client side, because the proxy is deployed with the client,” he said.

In ambient mode, on the other hand, the request goes from the client to the waypoint proxy, which implements retries on the client’s behalf.

“So we’re pushing all of the policies – whether, in the sidecar mode, it was on the client side or the service side – we’re pushing all of that to the server side,” Posta said. “But if there are client-side-specific things that we want to tune, we might need to allocate dedicated resources specifically for certain clients.”

Casemore said the wide range of established service mesh deployments, particularly those that require FIPS (Federal Information Processing Standards) compliance, will ensure that sidecars don’t disappear any time soon. “Those environments aren’t going away, but more and more, I think you will see the greater degree of growth in those non-sidecar environments,” he said.

The post Ambient Mesh: Sidestepping the Sidecar appeared first on The New Stack.

Service Mesh Demand for Kubernetes Shifts to Security

B. Cameron Gain — Thu, 27 Oct 2022 18:04:07 +0000

DETROIT — Service mesh has long been considered an essential staple in creating, deploying, and managing Kubernetes environments. However, as the community becomes more aware of the threats and challenges associated with managing highly distributed containerized environments, security has emerged as the main benefit of what service mesh offers DevOps teams.

According to a survey Buoyant conducted of KubeCon + CloudNativeCon Europe attendees, security remains the key driver behind service mesh adoption.

The idea, Linkerd creator William Morgan, CEO of Buoyant, told The New Stack is that Buoyant Cloud will now be able to automate Linkerd upgrades, installations, rollbacks, data-plane version syncs, etc. Morgan’s creative philosophy for Linkerd also seems to continue to carry over from the early days of Twitter when Morgan’s fellow Twitter engineers sought a way to simplify scaling the platform to accommodate hundreds of millions of users, which lead to Linkerd’s creation. Morgan spoke with The New Stack at the KubeCon + CloudNativeCon North America event here.

“The number-one driver Linkerd these days is the security feature. That was surprising for us because when we came into service mesh as platform engineers, we thought Linkerd in the early days was about observability and traffic control, which are still useful, of course,” Morgan said. “Sometimes they’re almost apologetic, but I tell them they don’t have to be apologetic because it makes sense.”

Mutual Transport Layer Security (mTLS), which is TLS, but also helps to ensure that the client is authenticated. TLS is a connection-level protocol designed to provide security for a TCP connection, Morgan explained. (We’ll see exactly what security means here below). Since TLS works at the connection level, it can be combined with any application-level TCP protocol without that protocol needing to do anything different, Morgan wrote in a blog post. For example, HTTPS is HTTP combined with TLS (the “S” in HTTPS refers to SSL, the predecessor of TLS), and nothing about HTTP needs to change to accommodate TLS.1, Morgan wrote.

The three guarantees for a connection that TLS provides that Morgan communicated include:

Authenticity: the parties on either side can prove that they are who they say they are.
Confidentiality: no one else can see what data is being exchanged.
Integrity: the data received is the same data that was sent.

“MTLS is a very well-understood protocol. It’s got its warts and it has its detractors, and there are no surprises there,” Morgan told The New Stack. “But that’s the standard that we have available and so let’s make that work for use.”

In this era of DevOps hoping to achieve or implement zero trust security into their operations, the sidecar component of service mesh will remain critical, Morgan said. The sidecar is seen as a key component of the service mesh. It connects — or more exactly — interconnects microservices to different distributed containers and virtual machines. It also serves a key function with service mesh for connecting microservices in Kubernetes environments, serving as a gateway in many ways. However, Solo.io and Google recently introduced Istio Ambient Mesh, which company representatives claim is the industry’s first service mesh to deliver both sidecar or sidecar-less architectures while maintaining the key security features the service mesh offers.

However, without mentioning Istio Ambient Mesh, Morgan noted that in a “zero trust world,” the service mesh sidecar becomes “really more interesting,” amid the “sidecar-less buzz.”

“The whole point of zero trust is, every part of your infrastructure is doing its own validation, authentication and authorization,” Morgan said. “But without the sidecar so that the proxy is the enforcement point, security boundaries get fuzzier as you move out of the sidecar model.”

The post Service Mesh Demand for Kubernetes Shifts to Security appeared first on The New Stack.

AmeriSave Moved Its Microservices to the Cloud with Traefik’s Dynamic Reverse Proxy

Ann R. Thryft — Thu, 08 Sep 2022 21:02:23 +0000

When AmeriSave Mortgage Corporation decided to make the shift to microservices, the financial services firm was taking the first step in modernizing a legacy technology stack that had been built over the previous decade. The entire project — migrating from on-prem to cloud native — would take longer.

Back in 2002, when company founder and CEO Patrick Markert started AmeriSave, only general guidelines for determining rates were available online. “At that time, finance was very old-school, with lots of paper and face-to-face visits,” said Shakeel Osmani, the company’s principal lead software engineer.

But Markert had a technology background, and AmeriSave became a pioneer in making customized rates available online. “That DNA of technology being the driver of our business has remained with us,” said Osmani.

Since then, AmeriSave has automated the creation and processing of loan applications, giving it lower overall operating costs. With six major loan centers in 49 states and over 5,000 employees, the company’s continued rapid growth demanded an efficient, flexible technology stack.

Steps to the Cloud

With many containerized environments on-prem, company management initially didn’t want to migrate to a cloud native architecture. “The financial industry was one of the verticals hesitant to adopt the cloud because the term ‘public’ associated with it prompted security concerns,” said Maciej Miechowicz, AmeriSave’s senior vice president of enterprise architecture.

Most of the engineers on his team came from companies that had already adopted microservices, so that’s where they started. First, they ported legacy applications into microservices deployed on-prem in Docker Swarm environments, while continuing to use the legacy reverse proxy solution NGINX for routing.

“We then started seeing some of the limitations of the more distributed Docker platform, mostly the way that networking operated, and also some of the bottlenecks in that environment due to increased internal network traffic,” said Miechowicz.

The team wanted to move to an enterprise-grade cloud environment for more flexibility and reliability, so the next step was migrating microservices to Microsoft’s Azure Cloud platform. Azure’s Red Hat OpenShift, already available in the Azure Cloud environment, offered high performance and predictable cost.

The many interdependencies among AmeriSave’s hundreds of microservices required the ability to switch traffic easily and quickly between Docker Swarm and OpenShift environments, so the team wanted to use the same URL for both on-prem and in the cloud. Without that ability, extensive downtime would be required to update configurations of each microservice when its dependency microservice was being migrated. With over 100 services, that migration task would cause severe business interruptions.

First, the team tried out Azure Traffic Manager, an Azure-native, DNS-based traffic load balancer. But because it’s not automated, managing all those configurations through Azure natively would require a huge overhead of 300 to 500 lines of code for each service, said Miechowicz.

One of the lead engineers had used Traefik, a dynamic reverse proxy, at his prior company and liked it, so the team began discussions with Traefik Labs about its enterprise-grade Traefik Enterprise for cloud native networking.

Cloud and Microservices Adoption Simplified

Traefik was founded to deliver a reverse proxy for microservices that can automatically reconfigure itself on the fly, without the need to go offline.

The open source Traefik Proxy handles all of the microservices applications networking in a company’s infrastructure, said Traefik Labs founder and CEO Emile Vauge. This includes all incoming traffic management: routing, load balancing, and security.

Traefik Enterprise is built on top of that. “Its additional features include high availability and scalability, and advanced security, as well as advanced options for routing traffic to applications,” he said. “It also integrates API gateway features, and connects to legacy environments.”

Vauge began work on Traefik as an open source side project while he was developing a Mesosphere-based microservices platform. “I wanted to automate 2,000 microservices on it,” he said. “But there wasn’t much in microservices available at that time, especially for edge routing.”

He founded Traefik Labs in 2016 and the software is now one of the top 10 downloaded packages on GitHub: it’s been downloaded more than 3 billion times.

“The whole cloud native movement is driven by open source, and we think everything should be open source-based,” he said. “We build everything with simplicity in mind: we want to simplify cloud and microservices adoption for all enterprises. We want to automate all the complexity of the networking stack.”

Multilayered Routing Eliminates Downtime

Working together, Traefik’s team and Miechowicz’s team brainstormed the idea of dynamic path-based routing of the same URL, between on-prem Docker Swarm and cloud-based OpenShift. This means a service doesn’t need to be updated while its dependency microservice is being migrated.

Any migration-related problem can be quickly fixed in Traefik Enterprise by redirecting routing from OpenShift back to on-prem Docker Swarm, correcting the issue, and redirecting back to OpenShift. Also, there’s no need to update configurations of any other services.

This is made possible by the way that Traefik Enterprise’s multilayered routing works. “Layer 1 of Traffic Enterprise dynamically collects path-based and host-based routing configured in Layer 2,” said Miechowicz. “In our case, we had two Layer 2 sources: on-prem Docker Swarm and cloud-based OpenShift. Layer 1 then directs the traffic to the source that matches the host/path criteria and has a higher priority defined. Rollback from OpenShift to Docker Swarm simply consists of lowering the priority on the OpenShift route. We did a proof-of-concept and it worked perfectly and fast.”

This contrasts with how NGINX works. “You may configure it to route to a hundred services, but if one service does not come up, NGINX will fail to start and cause routing outage of all the services,” said Osmani. But Traefik Enterprise will detect a service that’s failing and stop routing to it, while other services continue to work normally. Then, once the affected service comes back up, Traefik Enterprise automatically establishes routing again.

NGINX also doesn’t have Traefik’s other capabilities, like routing on the same URL, and it’s only suited for a smaller number of services, Osmani said. Both Azure Traffic Manager and Traefik must be maintained and managed, but that’s a lot easier to do with Traefik.

No More Service Interruptions

Osmani said adopting Traefik Enterprise was one of the best decisions the team has made in the past year because it’s removed many pain points.

“When we were on-prem, we were responsible for managing everything — we’ve often gotten up at midnight to fix something that someone broke,” he said. “But with Traefik you can only take down the service you’re affecting at that moment.”

From the business standpoint, the main thing that’s better is the migration, said Osmani. “Because we are a living, breathing system, customers are directly affected. In the online mortgage lending business, if a service is down people will just move on to the next mortgage lender’s site. Now we don’t experience service interruptions. There’s no other way we could have easily accomplished this.”

“For developers in our organization, the result works like magic,” said Miechowicz. “We just add a few labels and Traefik Enterprise routes to our services. As our developers move services to the cloud, none of them have seen a solution as streamlined and automated like this before.”

The post AmeriSave Moved Its Microservices to the Cloud with Traefik’s Dynamic Reverse Proxy appeared first on The New Stack.

Can You Now Safely Remove the Service Mesh Sidecar?

B. Cameron Gain — Thu, 08 Sep 2022 16:19:08 +0000

Service mesh has certainly emerged as a cornerstone layer for managing containers, virtual machines and Kubernetes environments. Without it, observability and monitoring, logging, routing and, of course, security would be even more difficult than it is now. The sidecar is a key component of the service mesh. It connects — or more exactly — interconnects microservices to different distributed containers and virtual machines. It also serves a key function with service mesh for connecting microservices in Kubernetes environments, serving as a gateway in many ways.

But what if we were able to eliminate the sidecar in the service mesh? And more importantly, why would you want to?

Solo.io and Google have introduced Istio Ambient Mesh which company representatives claim is the industry’s first service mesh to deliver both sidecar or sidecar-less architectures.

For Istio, the sidecar component of the service mesh has worked very well for a number of years, Idit Levine, founder and CEO of Solo.io — the leading provider of tools for Istio — told The New Stack. However, Solo.io has been at work finding ways to improve the structure of both the service mesh and the sidecar. Google, as it turned out, has been doing the same thing. “We both basically have been trying to figure out how to make Istio more easily consumed and maybe even being a little bit more secure,” while offering the same functionality but while solving some of the difficulties users were having, Levine said.

During the past six months or so, Solo.io and Google have been working together to build “the next version of Istio that we believe will be the future,” Levine said.

Eliminating the sidecar has long represented one way to improve connectivity, lower latency and save costs, in addition to reducing complexity for operations teams using service mesh for Kubernetes. However, “there are a limited number of ways you can do it and still be secure,” Louis Ryan, principal engineer, Google Cloud, told The New Stack. “What we’ve done is find a way to do that while still maintaining all the best security properties of a service in a really important and meaningful way. We don’t think that there’s anything inherently wrong with sidecars, but they do represent operational challenges.”

However, this is not the first attempt to offer a sidecar-less service mesh, Torsten Volk, an analyst for Enterprise Management Associates (EMA), told The New Stack in an email response. A new architecture where sidecars are optional, depending on use case specific requirements, also just became available with Cilium, which is used for container networking and security, Volk said. Cilium uses a Kubernetes Custom Resource to directly program proxies through the Linux kernel. Linkerd “went down a different route” by using the Kubernetes control plane for service discovery only but relying on independent sidecar-based proxies to function, Volk said.

Istio Ambient Mesh is fully compatible with sidecar-based Istio deployments, and either sidecar or sidecar-less deployments are managed by the Istio control plane. solo.io and Google say with the Ambient Mesh enhancements, Istio becomes the “first service mesh” to deliver both modes with a consistent control plane. Also, with Istio Ambient Mesh, there is no loss of platform or policy management capabilities on the overall service mesh, no loss of application-specific security and application offload capabilities and no need for application or infrastructure teams to immediately learn new programming languages, Solo.io and Google say.

Solo.io and Google were able to accomplish this because a sidecar creates a security boundary around a service that ensures policies are enforced on traffic into or out of the service,” Google’s Ryan wrote in an email response. The challenge is to maintain that same boundary while putting the complexity of those policies into the network, in order to make the sidecar remote. By introducing “strong zero trust networking at L4 we can maintain this boundary around a workload by enforcing that all traffic goes through a waypoint proxy (the remote sidecar) before hitting the service,” Ryan said.

Volk agreed the Istio sidecar-less version is possible to maintain security — but he said caveats remain such as potential “noisy neighbor” problems that could occur when specific This is because CPU-intensive sets of application workloads meet on the same node or cluster requiring the Kubernetes scheduler to make scaling decisions “without knowing which application policy to apply,” Volk said. Inexperienced users could introduce configuration errors that would lead to compliance issues related to a lack of traffic separation. “Also, in case of failure, multiple applications could be impacted,” Volk said. “Sidecars would also still be needed for applications relying on the runtime injection of custom code, for example, to pull off event-triggered configuration changes or enforce context-specific security requirements.”

Three ways that Solo.io and Google say Istio Ambient Mesh will improve the overall experience for Istio users are:

The enablement of a sidecarless architecture that moves the proxy functionality from the pod-level to the node-level, in order to improve overall application performance, with 10 to 20 times less compute and memory overhead, the companies say.
Improved transparency for applications to simplify operations and facilitate system upgrades and new applications to be deployed into the mesh.
The availability of a new optional security element, PEP (“policy enforcement point”), that the companies say delivers Layer 7 security inspection.

Sidecar Allowed

As mentioned above, users also have the option to use the Istio service mesh in sidecar mode since there is a common control plane that can be used with either sidecar or Ambient mode. “This brings consistency to the overall environment,” Brian Gracely, head of product marketing strategy for Solo.io, said in an email response. “We allow sidecar or Ambient mode to be deployed on a per-cluster or per-namespace basis, so users can best match the architecture with their application needs.”

A regulated company might prefer sidecar mode for a set of applications that absolutely cannot have any shared resources (such as proxies) anywhere in the data path for security and/or compliance reasons, Gracely explained. “But they might also have a set of applications (e.g. marketing applications) that can use Ambient mode without any issues,” Gracely said. “Other service meshes offer either sidecars or sidecar-less, and they often make you bring-your-own control plane depending on which data plane you choose (if they support more than one mode).”

The post Can You Now Safely Remove the Service Mesh Sidecar? appeared first on The New Stack.

eBPF or Not, Sidecars are the Future of the Service Mesh

William Morgan — Fri, 12 Aug 2022 17:00:00 +0000

William Morgan

William is the co-founder and CEO of Buoyant, the creator of the open source service mesh projects Linkerd. Prior to Buoyant, he was an infrastructure engineer at Twitter, where he helped move Twitter from a failing monolithic Ruby on Rails app to a highly distributed, fault-tolerant microservice architecture. He was a software engineer at Powerset, Microsoft, and Adap.tv, a research scientist at MITRE, and holds an MS in computer science from Stanford University.

eBPF is a hot topic in the Kubernetes world, and the idea of using it to build a “sidecar-free service mesh” has generated recent buzz. Proponents of this idea claim that eBPF lets them reduce service mesh complexity by removing sidecars. What’s left unsaid is that this model simply replaces sidecar proxies with multitenant per-host proxies — a significant step backward for both security and operability that increases, not decreases, complexity.

The sidecar model represents a tremendous advancement for the industry. Sidecars allow the dynamic injection of functionality into the application at runtime, while — critically — retaining all the isolation guarantees achieved by containers. Moving from sidecars back to multitenant, shared proxies loses this critical isolation and results in significant regressions in security and operability.

In fact, the service mesh market has seen this first-hand: the first service mesh, Linkerd 1.0 offered a “sidecar-free” service mesh circa 2017 using the same per-host proxy model, and the resulting challenges in operations, management, and security led directly to Linkerd 2.0 based on sidecars.

eBPF and sidecars are not an either-or choice, and the assertion that eBPF needs to replace sidecars is a marketing construct, not an actual requirement. eBPF has a future in the service meshwork, but it will be as eBPF and sidecars, not eBPF or sidecars.

eBPF in a Nutshell

To understand why we first need to understand eBPF. eBPF is a powerful Linux kernel feature that allows applications to dynamically load and execute code directly within the kernel. This can provide a substantial performance boost: rather than continually moving data between kernel and application space for processing, we can do the processing within the kernel itself. This boost in performance means that eBPF opens up an entire class of applications that were previously infeasible, especially in areas like network observability.

But eBPF is not a magic bullet. eBPF programs are very limited, and for good reason: running code in the kernel is dangerous. To prevent bad actors, the kernel must impose significant constraints on eBPF code, not the least of which is the “verifier.” Before an eBPF program is allowed to execute, the verifier performs a series of rigorous static analysis checks on the program.

Automatic verification of arbitrary code is hard, and the consequences of errors are asymmetric: rejecting a perfectly safe program may be an annoyance to developers, but allowing an unsafe program to run would be a major kernel security vulnerability. Because of this, eBPF programs are highly constrained. They can’t block, or have unbounded loops, or even exceed a predefined size. The verifier must evaluate all possible execution paths, which means the overall complexity of an eBPF program is limited.

Thus, eBPF is suitable for only certain types of work. For example, functions that require limited state, e.g., “count the number of network packets that match an IP address and port,” are relatively straightforward to implement in eBPF. Programs that require accumulating state in non-trivial ways, e.g., “parse this HTTP/2 stream and do a regular expression match against a user-supplied configuration”, or even “negotiate this TLS handshake,” are either outright impossible to implement or require Rube Goldberg levels of contortions to make use of eBPF.

eBPF and the Service Mesh

Let’s turn now to service meshes. Can we replace our sidecars with eBPF?

As we might expect, given the limitations of eBPF, the answer is no — what the service mesh does is well beyond what pure eBPF is capable of. Service meshes handle all the complexities of modern cloud native networking. Linkerd, for example, initiates and terminates mutual TLS; retries requests in the event of transient failures; transparently upgrades connections from HTTP/1.x to HTTP/2; enforces authorization policies based on cryptographic workload identity; and much more.

eBPF and sidecars are not an either-or choice, and the assertion that eBPF needs to replace sidecars is a marketing construct, not an actual requirement.

Like most service meshes, Linkerd does this by inserting a proxy into each application pod — the proverbial sidecar. In Linkerd’s case, this is the ultralight Linkerd2-proxy “micro proxy,” written in Rust and designed to consume the least amount of system resources possible. This proxy intercepts and augments all TCP communication to and from the pod and is ultimately responsible for implementing the service mesh’s full feature set.

Some of the functionality in this proxy can be accomplished with eBPF. For example, occasionally, the sidecar’s job is simply to proxy a TCP connection to a destination, without L7 analysis or logic. This could be offloaded to the kernel using eBPF. But the majority of what the sidecar does requires significant state and is impossible or at best infeasible to implement in eBPF.

Thus, even with eBPF, the service mesh still needs user-space proxies.

The Case for the Sidecar

If we’re designing a service mesh, where we place the proxies is up to us. From an architectural level, we could place them at the sidecar level, at the host level, or even at the cluster level, or even elsewhere. But from the operational and security perspective, there’s really only one answer: compared to any of the alternatives, sidecars provide substantial and concrete benefits to security, maintainability, and operability.

A sidecar proxy handles all the traffic to a single application instance. In effect, it acts as part of the application. This results in some significant advantages:

Sidecar proxy resource consumption scales with the traffic load to the application, so Kubernetes resource limits and requests are directly applicable.
The “blast radius” of sidecar failure is limited to the pod so that Kubernetes’s pod lifecycle controls are directly applicable.
Upgrading sidecar proxies are handled the same way as upgrading application code is, e.g. via rolling deployments.
The security boundary of a sidecar proxy is clearly delineated and tightly scoped: the sidecar proxy contains only the secret material pertaining to that pod, and acts as the enforcement point for the pod. This is granular enforcement is central to zero trust approaches to network security.

By contrast, per-host proxies (and other forms of multitenancy, e.g. cluster-wide proxies) handle traffic to whichever arbitrary set of pods Kubernetes are scheduled on the host. This means all of the operational and security advantages of sidecars are lost:

Per-host proxy resource consumption is unpredictable. It is a function of Kubernetes’s dynamic scheduling decisions, meaning that resource limits and requests are no longer useful—you cannot tell ahead of time how much of the system the proxy requires.
Per-host proxies must ensure fairness and QoS, or the application risks starvation. This is a non-trivial requirement and no popular proxy is designed to handle this form of “contended multitenancy”.
The blast radius for per-host proxies is large and continuously changing. A failure in a per-host proxy will affect whichever arbitrary sets of pods from arbitrary applications were scheduled on the host. Similarly, upgrading a per-host proxy will impact arbitrary applications to arbitrary degrees depending on which pods were scheduled on the machine.
The security story is… messy. A per-host proxy must contain the key material for all pods scheduled on that host and must perform enforcement on behalf of all applications scheduled on this host. This turns the proxy into a new attack vector vulnerable to the confused deputy problem, and any CVE or flaw in the proxy now has a dramatically larger security impact.

In short, sidecar proxies built on top of the isolation guarantees gained through containers, allowing Kubernetes and the kernel to enforce security and fairness. Per-host proxies step outside of those guarantees entirely, introducing significant complexities to operations, security, and maintenance.

So Where Do We Go from Here?

eBPF is a big advancement for networking, and can optimize some work from the service mesh by moving it to the kernel. But eBPF will always require userspace proxies. Given that, the right approach is to combine eBPF and sidecars, not to avoid sidecars.

Proposing a sidecar-free service mesh with eBPF is putting the marketing cart before the engineering horse. Of course, “incrementally improving sidecars with eBPF” doesn’t have quite the same buzz factor of “goodbye sidecars,” but from the user perspective, it’s the right decision.

The sidecar model is a tremendous advancement for the industry. It is not without challenges, but it is the best approach we have, by far, to handle the full scope of cloud native networking while keeping the isolation guarantees achieved by adopting containers in the first place. eBPF can augment this model, but it cannot replace it.

The post eBPF or Not, Sidecars are the Future of the Service Mesh appeared first on The New Stack.

My Istiod Pod Can’t Communicate with the Kubernetes API Server!

Lin Sun — Wed, 10 Aug 2022 17:00:48 +0000

Lin Sun

Lin is the Director of Open Source at Solo.io and a CNCF ambassador. She has worked on Istio service mesh since 2017 and serves on the Istio Technical Oversight Committee and Steering Committee. Previously, she was a Senior Technical Staff Member and Master Inventor at IBM for 15+ years. She is the author of the book Istio Explained and has more than 200 patents to her name.

A few days ago, I published a blog post on if a network cache-based identity could be mistaken, where I introduced an error scenario that has caused a Kubernetes’ pods’ identity to be mistaken, thus granting unauthorized access. In this blog, I would like to demonstrate, using the exact same scenario, how to leverage defense in depth with Cilium and the Istio service mesh to prevent just such unauthorized service access.

In this experiment, you’ll set up a Kubernetes kind cluster, deploy v1 and v2 of the client applications (“sleep”) and v1 and v2 of the server applications (“helloworld”), along with the v1 network policy that allows ONLY the v1 client to call the v1 server, and the v2 network policy that allows ONLY the v2 client to call the v2 server.

You’ll also set up Istio Authorization policies to allow ONLY the v1 client to call the v1 server and ONLY the v2 client to call the v2 server. You’ll first observe the network policies enforced as expected. Then you would trigger an error scenario, along with scale up/down client pods and observe the v1 client able to bypass the L4 network policy but failed at the Istio RBAC check. Let us get started!

Setting up the Environment

Refer to the setup instructions in this blog to set up your kind cluster and Cilium CNI. Download the latest stable istioctl, install the minimal profile and scale up the Istiod deployment to three replicas:

istioctl install --set profile=minimal -y

kubectl scale deploy istiod -n istio-system --replicas=3

Deploy the applications and network policies

Label the default namespace for sidecar injection:

kubectl label namespace default istio-injection=enabled

Clone the repo, then deploy the sleep and helloworld deployments, along with the v1 and v2 CiliumNetworkPolicy resources.

kubectl apply -f ./yamls

Apply the simple Istio PeerAuthentication resource below to only allow strict mTLS traffic in the default namespace:

Apply the simple Istio Authorization resource below to allow nothing in the default namespace, based on zero trust best practice — always starting with trusting nothing then explicitly allow access as needed.

Apply the Istio Authorization policy below to allow sleep-v1 to call helloworld-v1 on GET methods:

Apply the Istio Authorization policy below to allow sleep-v2 to call helloworld-v2 on GET methods:

Assume all of your sleep and helloworld pods are up running, you can call helloworld-v1 from the sleep-v1 pod and helloworld-v2 from the sleep-v2 pod:

You’ll get outputs as below where only sleep-v1 can call helloworld-v1 and only sleep-v2 can call helloworld-v2, and nothing else. When sleep-v2 calls helloworld-v1, the “connection failed” error is displayed because the v1 network policy is properly enforced:

Can sleep-v2 call helloworld-v1 successfully when it should not be allowed?

With the above applications and network and authz policies deployed, in most cases, network policies will be effective so sleep-v2 will not be able to call helloworld-v1 successfully. Let us trigger a similar error scenario where the node (where the helloworld-v1 pod runs) can’t communicate with the Kubernetes API server. In my environment, on that kind-worker node, I have one Cilium pod and one Istiod pod running:

Trigger the error similar as before:

Run the Test!

If you are not familiar with the test, refer to the Review the test script section. Simply issue run-test.sh to run the test. You may observe that a few sleep-v2 pods take 30 seconds or so to reach the running status, this is because when the istio-proxy container tries to start, its pilot-agent sends the Certificate Signing Request (CSR) to Istiod (which serves as CA in my test) using the service account token that is provisioned by k8s and mounted to the pod.

If the CSR request happens to be sent to the Istiod pod which can’t communicate with the Kubernetes API server, it won’t be able to validate the service account token thus won’t process the request. This is where retries come to the rescue: the pilot-agent on the istio-proxy container is intelligent enough to try to send the CSR request to a different Istiod.

Shortly, you’ll see all pods have reached running.

You’ll also observe the sleep-v2 pod that has the mistaken Cilium identity is found but it still can NOT call the helloworld-v1 successfully:

Note the “RBAC: access denied” error came from Istio with the Authorization policies enforced by the helloworld-v1’s istio-proxy container. If you recall, the error was different earlier when Cilium network policy was properly enforced to NOT allow sleep-v2 to call helloworld-v1:

Take a look at the short video to watch me run the above steps in my test environment with Cilium and Istio:

Wrapping up

When one of the Istiod pods could not communicate with Kubernetes API server, your application identity (based on cryptographic primitives) can continue to be properly generated from its Kubernetes service account token via CSR requests and Istio Authorization policies continue to be enforced. This reinforced my recommendation earlier of using the defense in depth approach along with a zero trust model in your security architecture, so that you can be well prepared for various error scenarios.

The post My Istiod Pod Can’t Communicate with the Kubernetes API Server! appeared first on The New Stack.

How to Handle Authorization in a Service Mesh

Judith Kahrer — Fri, 08 Jul 2022 19:29:42 +0000

Microservices architecture results in an ecosystem where small pieces of an application collaborate to fulfill a business case.

Judith Kahrer

Judith is a product marketing engineer with a keen interest in security and identity. She started her working life as a developer and moved on to being a security engineer and consultant before joining the Curity team.

The big benefit of using microservices is that developers can change, update and deploy the parts of an application independently. Teams can work in parallel and, ideally, do not have to wait for other groups to release. This flexibility increases productivity and decreases the time to market for new features.

However, even though services are highly independent in a microservice architecture, they must still communicate to work together. Yet, service communication becomes a challenge with the growing number of parties. Routing requests between multiple services and versions while implementing security requirements like authorization, authentication or encryption gets complex.

The Infrastructure Layer

A service mesh addresses the challenges of service communication in a large-scale application. It adds an infrastructure layer that handles service discovery, load balancing and secure communication for the microservices. Commonly, a service mesh complements each microservice with an extra component — a proxy often referred to as a sidecar or data plane.

The proxy intercepts all traffic from and to its accompanied service. It typically uses MutualTLS, an encrypted connection with client authentication, to communicate with other proxies in the service mesh. This way, all traffic between the services is encrypted and authenticated without updating the application. Only services that are part of the service mesh can participate in the communication, which is a security improvement. In addition, the service mesh management features allow you to configure the proxy and enforce policies such as allowing or denying particular connections, further improving security.

The Application Layer

To implement a Zero Trust architecture, you must consider several layers of security. The application should not blindly trust a request even when receiving it over the encrypted wire. It must validate that requests are legitimate and ensure that data access is secure at the application level. A well-established protocol to achieve authentication and authorization on the application level is OAuth 2.0.

A component like an ingress controller, an API gateway or a reverse proxy is ideal for enforcing authentication and performing basic authorization by validating the OAuth access token before the request reaches the service network. However, it is best practice to not only authenticate and authorize each request at the perimeter but also inside the network and between the services. As the authorization decision is part of the business logic, token validation is also part of the application.

When validating tokens, it’s recommended to follow best practices such as:

Only accepting tokens that are valid in terms of the issuer and expiration.
Ensuring tokens are used as intended concerning the audience, scope and claims.

Tokens in and Between Services

The introduction of authorization on the application level in a service mesh comes with a challenge. As outlined above, services collaborate to fulfill a business case. But when all services in the service mesh require authorization, service requests must contain authorization data. In other words, services need to be able to share tokens with other services.

Forwarding Tokens

The intuitive way is to simply forward the incoming token to downstream services. This approach works fine as long as all the services require the same privileges and the token stays within one security domain. While reusing a token is a simple solution, it can be problematic when a token’s privileges, that is, the scopes and claims, grow to fulfill all the requirements of subsequent requests. A token with many privileges is eventually a target for exploits and breaks the principle of least privilege.

Privacy is another concern when forwarding tokens, particularly when forwarding tokens over security boundaries like external services. Tokens often carry user or business data. When reusing the token for several services, each has certain requirements that affect the data in the token. For example, one service may require the user’s bank account number and another the user’s home address, which may be irrelevant to the first service. Sending irrelevant data, some of which may be sensitive regarding data privacy laws introduces an unnecessary security risk.

Tailored Tokens

Follow the principle of least privilege and ensure that tokens contain just enough data for a service to fulfill its task. If the chain of subsequent service requests required to complete the business case is known beforehand, then embedded tokens are an appropriate approach for leveraging tailored tokens. In this approach, the parent token carries an embedded token for each subsequent request to other services. The API gateway or service can extract the embedded token before sending a request to the next service. Following, a service receives a tailored token with the corresponding scopes and claims that the service needs to fulfill its part in the business case.

The embedded token approach demands a particular standardized token service and infrastructure. First of all, the token service must be able to embed tokens. It must have the dependency tree information for a request to know which tokens to embed for the incoming token request. It most likely will base the decision on the client-id and scope. When the dependency tree is complicated, and when there are several levels of embedded tokens, the task of evaluating which tokens to include becomes cumbersome. As the parent token contains embedded tokens, it is bigger than a single token.

Consequently, the size of the request becomes bigger and thus requires a network infrastructure that can handle large HTTP headers. Not only must the infrastructure be able to transport potentially large messages, but also cache the tokens for further processing. This can be especially challenging when there are high peaks in the load.

At times, the final service chain may not be known in advance. This is common in a world of loosely coupled services, where the system frequently changes as part of agile paradigms or where processes are conditional. Consequently, embedding the required tokens for subsequent service requests may not be possible. In this case, the embedded token approach will not work, and a more dynamic approach is needed.

Tokens on Demand

The most flexible approach presented in this article is token exchange. It can be used to exchange an existing token for a new one. The new token can be narrowed in scope and claims to be a subset of the original token. Or, it could be exchanged for a completely new one to fit a different security domain. The token exchange protocol for OAuth 2.0 is standardized in RFC8693.

Whenever a service aims to call a downstream service for further processing and whenever the original token does not fulfill the requirements for subsequent service requests, then the originally called service can exchange its token for a new tailored token. It simply sends a request to the token service, asking for a new token. The token service then checks its ruleset, and if the request is valid, it issues and returns a new token that the calling service uses in its request to the downstream service.

There are two crucial aspects of the token exchange process. First, it helps implement the principle of least privilege by issuing tailored tokens with narrow scopes and claims designed for a particular service or set of services. Second, it enables flexibility by issuing tokens on demand when they are actually needed. The approach fits very well with the microservices paradigm, where the services are loosely coupled and combined in various ways. There are basically no limitations for the new token. With token exchange, you can safely cross security boundaries and implement various use cases, including impersonation.

However, since the protocol requires additional requests to get the new token, the approach implies challenges in systems where latency is crucial. Also, the token service will still have to maintain the ruleset for the conditions on when to issue which new token. This ruleset is based on business rules and must consider security concerns to avoid privilege escalation.

Conclusion

Security is a multidisciplinary field and needs to be implemented at various layers. Service mesh improves the security of an application built upon microservices by adding an infrastructure layer where connected services can communicate securely. OAuth 2.0 adds security to the application layer and allows the secure implementation of business rules across the different microservices. There are three approaches to sharing tokens between services in the service mesh: forward, embed and on-demand. The approaches are not mutually exclusive but can be combined to achieve a secure implementation that suits the business and its prerequisites.

The post How to Handle Authorization in a Service Mesh appeared first on The New Stack.

Applying Zero Trust Security to Kubernetes Via Service Mesh

Ashher Syed — Wed, 22 Jun 2022 19:36:21 +0000

Earlier this year, the White House issued an Executive Order on Improving the Nation’s Cyber Security, which laid the groundwork for creating a zero trust architecture for federal agencies.

Ashher Syed

Ashher is a product marketing leader at HashiCorp and is based in Austin, Texas. When he's not running after his four kids, he's exploring the possibilities of what cloud-based technologies can bring to modernize organizations.

The National Security Agency (NSA) and the Cybersecurity and Infrastructure Security Agency also jointly published a Kubernetes hardening guide that highlights best practices for key zero trust security principles in Kubernetes.

These documents emphasize the need for organizations and developers to rethink the way they secure their applications and infrastructure. Given how security needs are changing with the adoption of cloud infrastructure, it is imperative to understand how to apply these principles across your run times, clouds, platforms and specifically Kubernetes clusters.

One key approach the NSA guide recommends is to use a service mesh in Kubernetes environments to authorize, authenticate and encrypt service-to-service communications. Let’s take a closer look at how an open source service mesh can help strengthen your security posture and promote zero trust networking.

Zero Trust Security

Before we jump into how a zero trust approach applies to Kubernetes, we need some context on zero trust principles and why they’re becoming more important.

Securing infrastructure, data and access becomes increasingly complex and difficult across multiple clouds and on-premises data centers. As organizations move to multicloud and hybrid infrastructures, the measures they took to secure their private data centers start to become obsolete. IP-based identity and perimeter-based access are no longer relevant in a world of ephemeral IP addresses and a constantly changing — and often remote — workforce that needs constant access to shared resources.

This shift requires a different approach to security, one that trusts nothing — not even your own services and users — but instead authenticates and authorizes everything before allowing access. Critically, the move to zero trust is not binary; it’s an ongoing approach that requires a fundamental shift to your architecture. Fortunately, these three best-practice zero trust principles can help point the way:

Deliver identity-based service-to-service access and communication: Services should be based on service identity. Service identity, not IP addresses, should be used for authorization. And services should mutually authenticate their identities when establishing connections.
Include secret and certificate management and hardened Kubernetes encryption: Secrets should be encrypted, time bound and able to work with a global service identity that enables data encryption in transit. Access credentials should be time-bound, requiring the user or application to refresh their credentials at defined intervals.
Enable observability with audits and logging: All access attempts should be audited and logged.

Applying Zero Trust Principles to Kubernetes

Now that we have established that network perimeter-based security isn’t enough in dynamic environments, we need less reliance on network perimeter controls. What steps can organizations take to implement zero trust principles in Kubernetes environments?

1. Deliver Identity-Based Service-to-Service Access and Communication

Every Kubernetes cluster provides a flat network where every container or service can talk to another without any restrictions. Kubernetes considers the container network or applications running on it as trusted, not requiring authentication. For example, if a database service and logging service are running on the same Kubernetes cluster, by default they can access each other at a network level.

Users can create policies in Kubernetes to apply default rules to deny both ingress and egress traffic in the cluster. However, even with those restrictions, you still need service-to-service authentication and authorization to ensure that only required resources are made available to services.

You can solve this problem by introducing a service mesh, which allows you to assign service identities to each service running on the Kubernetes cluster. Based on service identity, the mesh can authenticate service identities using mTLS, and service-access requests can be authorized or blocked using intentions, which allow operators to define service-to-service communication permissions by service name. With a service mesh in place, the logging service can access the database service only if both services can mutually authenticate their credentials.

2. Include Secret and Certificate Management and Hardened Kubernetes Encryption

Using credentials for the Kubernetes control plane, either for identity or for managing secrets, expands the attack surface, is hard to maintain and does not follow the principles listed above. Kubernetes secrets have several weaknesses in a zero trust security architecture:

By default, secrets are base-64 encoded, not encrypted.
Since secrets do not expire (they’re not time-bound) they can expose you to risk.
Kubernetes can manage resources, such as secrets, only within a cluster boundary. If you have sets of clusters, resources used in multiple clusters must be managed separately.

A service mesh with a secrets broker can address this challenge. Consul on Kubernetes, for example, has an integration with HashiCorp‘s Vault — a centralized secrets management solution that closes the gaps in Kubernetes secrets.

The goal is to ensure that secrets are encrypted at rest with centralized access control and auditing. The workflow should support both single Kubernetes clusters and federated multicluster deployments. In addition, certificate autorotation can help operators reduce the time-to-live (TTL) values of their TLS certificates, strengthening their security posture.

3. Enable Observability with Audits and Logging

To improve Kubernetes security, it’s important to understand what’s happening inside the cluster to see what service-access requests were made. Audit logs let the audit team inspect event data to see which credentials have been used, what actions have taken place and the timestamps associated with these transactions. This provides greater insight and accountability for security teams.

A service mesh deploys sidecar proxies capable of emitting metrics with each of the services running in the cluster and keeps the records for all the service-to-service communication and access requests. Ideally, the service mesh would integrate with open source monitoring tools such as Prometheus and Grafana to make it easier to analyze service-networking patterns and bolster security.

Conclusion

As service deployment environments get more complex with multiple Kubernetes clusters, multiple runtimes, multicloud and on-premises deployments, and all of their various interconnections, following zero trust security principles becomes a necessity.

A service mesh like HashiCorp Consul can be an essential part of this process, providing a management layer that enforces zero trust principles by providing identity-based service networking that mutually authenticates and authorizes any access. Consul makes it easy to enforce granular security policies in Kubernetes and across multiple environments without having to code all the security parameters into the application itself, which results in a fast, easy-to-manage onramp to a more zero trust security posture.

The post Applying Zero Trust Security to Kubernetes Via Service Mesh appeared first on The New Stack.

Service Mesh Gets Boring and That’s a Good Thing

B. Cameron Gain — Fri, 03 Jun 2022 11:00:47 +0000

A recently released the Cloud Native Computing Foundation survey points to how service mesh is seen as essential for the majority of organizations working with Kubernetes and microservices. At the same time, a survey of readers of The New Stack shows there are indications that service mesh is even becoming boring, or at the very least, a subject that readers are less interested in reading about than they were before.

In the CNCF survey that was released during KubeCon+CloudNativeCon EU in May, it was reported by the authors that “it’s no surprise that service mesh has become a key component of cloud native infrastructures.” Of the 253 survey respondents, 70% run a service mesh in production or development, and 19% in evaluation mode. “Some said they are running a service mesh for clients who had high levels of adoption. Those not implementing a service mesh formed a distinct minority — fewer than 10%,” the report authors said.

Less Interest in Service Mesh

But while service mesh has become a mainstay for the majority of the organizations the survey respondents represented, The New Stack readers are less interested in the subject. Only 9% of the readers said service mesh when asked which topics readers would like The New Stack to cover in 2022 compared to 30% and 29% for the subjects of DevOps and Multicloud/hybrid architectures, respectively. Service mesh plummeted to 39th place in 2021 in the ranking of “Comparison for Topics Asked About Last Year,” compared to seventh place in 2020.

Both of these data points portend that service mesh is now mainstream after being introduced just a few years ago. The New Stack readers survey shows that as the subject matures, readers are less interested in reading about it. All told, the CNCF numbers reflect how service mesh has matured, William Morgan, CEO of Buoyant and Linkerd, told The New Stack during KubeCon+CloudNativeCon EU. “Service mesh market has been so hype-driven for so many years, and now that hype has kind of shifted.

The service mesh market has also been great for us because people who are now in the service mesh market are really interested in solving problems,” Morgan said. “People are not psyched about the technology for its own sake — they are trying to use it to actually solve business problems. It is a different type of relationship and even a very different type of conversation we’re having now about service mesh.”

Wasm Influence

The CNCF survey indicated that users are also interested in using service mesh with new technologies. Using service mesh with environments such as WebAssembly (Wasm) was important for 25% of the respondents, which shows promise for the language since interest in Wasm is relatively recent.

But while Wasm has been shown to help lower computing power and latency for data streaming, for example, and offers other impressive computing benchmarks, the jury is still out as to what extent it will see wide-scale adoption. Conversely, service mesh is now “tied to the whole cloud native pattern of developing software,” Morgan said.

The post Service Mesh Gets Boring and That’s a Good Thing appeared first on The New Stack.

Vendor Checklist for Real-Time Data Meshes

Tim Wagner — Thu, 02 Jun 2022 13:20:17 +0000

In Part 1 of this series, we highlighted the challenges of real-time data sharing. In Part 2, we defined the real-time data mesh and discussed the key tenets for incorporating them into modern IT stacks. This post focuses on what’s needed to effectively evaluate real-time data-sharing solutions.

Tim Wagner

Tim is the inventor of AWS Lambda and a former general manager of AWS Lambda and Amazon API Gateway services. He has also served as vice president of engineering at Coinbase, where he managed design, security and product management teams. Tim co-founded Vendia to help organizations of all sizes share data more effectively across clouds and companies, and he serves as its CEO.

Given its importance to organizations, broad scope and platform-centric nature of real-time data sharing in a modern IT environment, it’s especially important to evaluate and select vendors capable of delivering on the broad set of needs organizations require. This section can be used as a vendor checklist to ensure that solutions — whether developed in-house, outsourced to a consulting company or purchased from a vendor — provide the range of capabilities real-world challenges demand:

Multicloud and Software-as-a-Service (SaaS) integrations as fundamental features — The single, most fundamental feature of a real-time data mesh solution is that it can seamlessly span the most important divides in the modern IT landscape: multiple clouds and different SaaS applications, such as Salesforce and Microsoft Dynamics. Without this feature, much of the undifferentiated heavy lifting of data sharing will remain in the hands of the IT organization, radically blunting the effectiveness of real-time data sharing. Best-of-breed vendors will support data sharing across the major public clouds and application platforms with a connector strategy that makes ingress from and egress to other services and platforms fast to develop and easy to operate and maintain.

SaaS delivery model with a “zero infrastructure” footprint — IT organizations already struggle under the weight of managing too much hardware and software infrastructure, including their large compliance and security surface area. Data mesh solutions, because they can carry data with varying levels of exposure risk, have to be designed to handle the worst-case scenario, making securing their end-to-end infrastructure even more complex. Acquiring these capabilities through a SaaS solution that doesn’t expand a company’s infrastructure footprint is critical to avoiding direct and indirect cost and staffing bloat. Cloud-based SaaS platforms for data sharing also confer the economic benefits of shared development and deployment costs, further improving ROI versus in-house and outsourced manual development approaches.

Cloud native design with automatic scaling and fault tolerance — Of all the elements in an IT stack, the data-sharing layer is one that most benefits from an innovation-forward design. Solving for fault tolerance across multiple regions, clouds and departments/partners is challenging enough, and when combined with the need for dynamic scaling that is simultaneously highly available and cost-effective, most homegrown or consulting-based solutions peter out quickly. Cloud native designs incorporating the latest advances in serverless and container-based technologies offer the best options for creating a solution that offers maximum uptime without sacrificing price/performance ratios.

Cost-effective — Unlike blockchains and ERP systems that typically need to be “scaled to peak” and then kept there 24/7, real-time data mesh providers will employ scale-by-request, providing tightly enveloped costs that vary with actual usage rather than scaling with peak infrastructure capacity requirements.

Green tech with high utilization — As carbon footprint reduction becomes more critical to investors and public company reporting and transparency requirements start to expand, choosing “green tech” becomes ever more important. Fundamental to lowering carbon emissions is reaching high levels of data and compute utilization; otherwise, the majority of infrastructure capacity is spent worthlessly, creating a negative environmental impact — the worst possible outcome. Data mesh solutions based on modern, serverless technologies offer highly efficient, “100% utilization” solutions based on massively multitenanted cloud-based strategies. These benefits pass through to the companies that deploy them, resulting in significant carbon savings.

Compliant and secure by design — Data-sharing solutions are, because of their nature, the target of many regulatory, security and information-handling policies and controls. Building out not just the solution but the necessary monitoring, reporting and management capabilities needed to ensure constant compliance across assurance programs, such as SOC2, PCI, GDPR, CCPA, HIPAA, FedRAMP and more, is time-consuming and costly and typifies the sort of undifferentiated heavy lifting that can be transferred to a platform vendor whose sole job is to deliver on these outcomes 24/7.

Capable of permanent, durable storage — A viable real-time data mesh needs to be capable of more than just transiting data from point to point; it needs to be capable of storing an unlimited amount of information for an unlimited amount of time. Kafka and other streaming data solutions are fantastic building blocks for connecting systems together, but they lack permanent storage, cross-company (and cross-cloud) support and other facilities needed to span the necessary canyons.

Going Further

To go deeper, especially on the topic of “data mesh integrates the two planes of operational and analytical data and applications,” see the book “Data Mesh: Delivering Data-Driven Value at Scale” by Zhamak Dehghani, director of emerging technologies in North America at ThoughtWorks.

A Key Role in IT Stacks

Real-time data meshes play a key role in IT stacks … whether they appear implicitly and accidentally or explicitly and with thoughtful consideration. Platforms such as Vendia offer a leap forward from historical approaches to EAI/EiPaaS, ERP and first-generation blockchains by automatically creating a single source of truth from a standards-based data model and then managing it through a zero-footprint SaaS deployment.

Companies can leverage these benefits quickly by using it for application-to-application sharing challenges or by selectively migrating from existing file-based sharing approaches. Teams focused on innovation or broader operational data-sharing solutions can incorporate fine-grained data sharing for industry-spanning, best-of-breed practices when building supply chains, financial settlement systems or other use cases that benefit from cross-company “single source of truth” outcomes. Selecting a vendor to deliver on these outcomes requires understanding the breadth of challenges such a platform needs to support and benefits from the checklist provided above.

The post Vendor Checklist for Real-Time Data Meshes appeared first on The New Stack.

The Real-Time Data Mesh and Its Place in Modern IT Stacks

Tim Wagner — Thu, 26 May 2022 15:07:17 +0000

In Part 1 of this series, we highlighted the challenges of real-time data sharing, discussed operational vs. analytical data, and considered legacy solutions and their limitations. This post defines the real-time data mesh and discusses the key tenets for incorporating them into modern IT stacks.

Tim Wagner

Facilitating real-time data sharing is a challenging proposition, particularly when multicloud and SaaS applications are included as typical requirements. At the same time, these difficult implementation challenges are surprisingly undifferentiated: They don’t differ significantly across industries, sectors or company sizes, making data sharing ideal for a platform-based solution.

Best-of-breed solutions share with legacy blockchain approaches a key architectural insight: The job of maintaining a single source of truth across multiple parties belongs with the platform, not with each of the parties. This produces several advantages:

The platform understands the data. Unlike Mulesoft and other “dumb pipe” solutions that are ignorant of the data being carried, having the platform understand the data means it can also ensure that data is consistent, correct and up to date everywhere it flows. This key difference shifts many of the conventional operational and security challenges of handling data from individual applications and IT teams back onto the platform provider.
The platform can offer a uniform way to model data and access controls. Almost as important as sharing data is ensuring that the wrong data doesn’t get shared. When every party and system require a DIY approach to security, access controls, auditing, compliance and governance, those problems take on a life of their own, becoming heavy lifts that can dwarf the original problem of simply moving data between parties. Letting a platform accomplish this not only shifts the burden of implementation (and spreads it among many customers, making it cost effective to produce and maintain), it ensures that the approach is uniform across all parties and deployments.

Unlike early blockchains, which were essentially commercialized prototypes, modern data mesh offerings are based on solid public cloud engineering. They share the same multitenanted, highly scalable designs as highly adopted public cloud services and exploit modern interfaces, including GraphQL APIs and container-based code sharing. These advances in engineering and architectural patterns have allowed “second-generation” approaches to solve the issues that plagued early (and usually failed) attempts to deploy blockchain technologies in enterprise settings:

The platform is highly scalable and low latency. Blockchains are plagued by poor performance, with public chains like Ethereum struggling to maintain 14 transactions per second worldwide, shared among all customers! Transaction settle time can approach 15 minutes or longer, and the cost per transaction can be as high as $50 or more. Even private chains, such as Hyperledger Fabric suffer from “one-box deployment” models — unlike a cloud-based service, they are forever, fundamentally limited by the processing and memory capacity of a single server and at most a few cores within that server. That doesn’t bode well for any enterprise workload that needs to scale.
The platform is delivered as a SaaS service. First-generation blockchains were a throwback to early approaches in more ways than one: Not only do their “single-box deployment” models make them slow with limited throughput, this limitation also means they have to be manually deployed, maintained, monitored, secured, scaled, made fault-tolerant, etc. That’s a huge burden to an already strapped IT team and only adds to the burden of infrastructure management and staffing load. By contrast, next-generation data-sharing solutions are commonly provided as SaaS services, with zero infrastructure footprint and “ilities” such as scaling, fault tolerance, monitoring, infrastructure maintenance, etc., owned by the service provider, rather than left as an exercise to the IT customer.

Why Is Data Sharing so Difficult?

Despite being a ubiquitous need, real-time data sharing isn’t always a well-modeled element in existing IT stacks. Gartner echoes this thought, “IT and business-oriented roles … adopt EiPaaS as a key component of their integration strategy … [but] despite its mainstream use, choices of providers are fragmented and difficult to navigate.” It’s an intriguing question: Why should that be?

The answer lies in the structural shifts our industry is undergoing. “Classic” IT had a relatively simple problem to solve:

Data, such as sales information, was produced in-house.
Workloads to process that data, such as calculating accounts receivable and producing and sending invoices, were run in-house over the collected data.
The data was optionally collected and shipped to an analytics subsystem for offline analysis to produce business intelligence reports for management.

In other words, both production and consumption of data, along with any transmission or “sharing”, was handled in-house — often within the confines of a single mainframe. Whether built in-house, through outsourced delivery partners, or provided via ERP systems, these “data monoliths” were, despite their other challenges, relatively easy to manage from a sharing perspective.

Flash forward to today:

SaaS vendors increasingly pull data away from central IT and into their own (usually public cloud-based) storage and compute solutions.
Business partnerships, such as supply chains, are increasingly global, meaning that more and more of the “data of interest” to a company lives outside its four walls. Amazon, for example, estimates that up to 80% of critical business data no longer resides internally.
Adoption of public clouds in general and multicloud architectures, in particular, require a wholesale shift of data outside of conventional on-premises data centers, often into fully managed services or specialized cloud databases, such as NoSQL, time-series or graph-optimized offerings on AWS, Azure or GCP.
Customer demands for “always-on” internet-era experiences mean that applications that used to be “9-to-5, Monday-to-Friday” SLAs are now 24x7x365 with 99.99% uptime requirements, a threshold that forces IT teams to design and deploy leading-edge approaches to scalability, fault tolerance and multicloud resiliency for virtually everything with public or partner surface area. That’s an incredibly tall order for teams already struggling to meet business needs and potentially with limited in-house knowledge of advanced distributed systems design techniques.
“Shadow IT” has forever fragmented the notion of a single team operating a single mainframe into a distributed patchwork of applications, teams and approaches that is challenging to manage even for the most well-run of the Fortune 100. Approaches that embed security, scalability, fault tolerance and other “ilities” and governance models directly into the product or service offering thus confer a huge advantage over approaches that make those challenges DIY, because DIY in a shadow IT org usually implies unbridled heterogeneity and an increasingly chaotic portfolio over time.

With all these structural changes, it’s easy to see why ERP systems developed in the ‘90s, and even EAI approaches that worked fine in the 2000s are no longer able to satisfy the needs of companies and their IT demands: The challenge of disparate data isn’t something they had to worry about, and as a result, they’re ill-equipped to deliver on modern IT experiences in data sharing.

Incorporating Real-Time Data Mesh Solutions into a Modern IT Stack

Because of the challenges cited above, even high-functioning IT teams don’t necessarily have a strong “recipe” for incorporating real-time data sharing into their approach in a uniform, best-practice fashion. This section briefly surveys three deployment approaches with increasing levels of capability and complexity to provide an overview of how these platforms can be incorporated into modern, service-based IT portfolios.

Application-Based Solutions

The simplest deployment approaches are those where the data model and connectivity is tied directly to an existing SaaS-based domain, such as sharing marketing or sales information between CRM systems for co-selling purposes. Because the domain is well known in these cases, there is little to no data modeling challenge, and because the systems of record (Salesforce, Microsoft Dynamics) are well known, connectivity is equally easy, usually limited to authorizing the platform against the systems in question.

Setup and configuration typically can be done inside of a week and involves:

Field name alignment among parties (usually two or more departments or business partners).
Configuring access controls to ensure that only the authorized data is shared with the right parties.

Figure 1 illustrates a typical application-based deployment using CRM contact sharing as an example.

Figure 1: CRM data sharing — a sample deployment architecture

Application-based solutions are simplified by a shared domain model, such as CRM data, but are able to connect different SaaS vendors across different organizations, even among multiple companies. They represent substantial leverage over point-to-point API-based data-sharing solutions that require building and operating a full, security-hardened and compliant integration between every pair of parties involved.

Because both the applications being connected and the underlying platform are all SaaS-based, there is no infrastructure to deploy or complex data modeling to perform, and deployments can move from prototyping to testing to production in the space of weeks rather than months or years. For teams already familiar with ETL “data exhaust” from these applications, the design pattern is identical, making deployment even more efficient because similar patterns of authorization and enablement can be followed.

Going Further

This pattern can be easily repeated for other SaaS applications and takes advantage of the industrywide trend toward SaaS: Eventually, every major SaaS application will have real-time data connectors that simplify the sharing of data with similar applications across departmental, cloud or organization lines.

The design is also open-ended: “Hybrid” deployments can take advantage of the simplicity of connection to a SaaS system, such as a CRM provider like Salesforce, while also connecting (internally or through a partner’s implementation) to in-house applications (see Figure 2). This flexibility supports custom development of mission-critical applications without giving up the advantages of simple data connectivity to existing systems.

(For more on fully modeled solutions and their deployments, see below.)

Figure 2: A “hybrid” deployment showing connections through a partner’s implementation to in-house applications

File-Based Solutions

The next step toward custom development is file-based sharing. This pattern shares with application-based sharing the advantage of not requiring the construction of a data model: The data model is essentially just a file system shared among the various parties. File-based approaches are more flexible than pure application-based solutions, however, because they can leverage legacy formats. Many existing cross-company data-sharing solutions are based on files, and a file-based sharing approach is a simple way to maintain compatibility while simultaneously progressing toward a modern data-sharing solution for real-time data needs. Figure 3 illustrates migrating from an sFTP-based “file depot” solution to a real-time data-sharing pattern based on files while preserving existing file formats and application-processing logic.

Figure 3: Migration from an sFTP-based “file depot” solution to a real-time data-sharing pattern based on files

As with the application-based approach described above, access controls are critical: Each party needs to define, for the files it authors, which other parties should receive the data. In addition, files can be large, and best-of-breed platforms will actually distinguish between sharing the data and copying the data. This additional dimension of control allows the members of a data-sharing arrangement, whether they’re two regional deployments in an application, multiple organizations within a single company or multiple companies with a shared workload (such as a supply chain) to decide how many copies of a file are warranted. Copying controls allow parties to balance the cost of making copies with the operational isolation that “having your own copy” naturally affords.

Real-time data mesh offerings also provide versioning, lineage (who changed what and when), built-in auditing, logging and reporting capabilities. These are essential for governing file-sharing systems over time and at scale; otherwise, the sheer weight of building appropriate compliance and security reporting can overwhelm already taxed teams. The more parties involved and the more “arm’s length” they are from each other, the more critical fine-grained access controls (and commensurate reporting, versioning and auditing capabilities) become. Legacy blockchains and “walled garden” ERP and EAI solutions typically fail at this level of complexity, because they don’t easily provide simple file-sharing capabilities, coupled to production-grade security and versioning controls.

The best file-sharing platforms also provide backward compatibility with existing public cloud blog storage APIs. This compatibility enables existing investments in popular cloud service APIs, such as AWS’s S3, to be preserved intact while still offering seamless data sharing across both organizations and with other clouds. Having cloud-based portability for files built-in means that file-sharing solutions can also be used in-house to create multiregion, multi-account and multicloud strategies with just a few lines of configuration code, rather than the months or years of planning and development usually mandated for a complex “cross-cloud” data-sharing platform.

Going Further

File-sharing solutions are easily extended to incrementally incorporate additional fine-grained data modeling. This optional process can proceed in graduated steps:

Attaching simple key/value metadata to files (no real data model, but it allows for incorporating fine-grained “scalar” data).
Selectively adding a data schema in parallel with the file data.
Migrating file-based formats to scalar formats, often using the foundation laid in Step 2.

Even for teams that want to adopt fully modeled solutions (see below), file-based approaches can be an easy on-ramp, as they often permit existing application workloads and file formats to remain unchanged in the initial stages of adopting a real-time data mesh framework.

Fully Modeled Data Solutions

The “holy grail” of real-time data sharing is a fine-grained data model capable of automatically powering secure, scalable public APIs. While this approach requires having in hand a data model (also known as a data schema) acceptable to all the parties involved, from there the platform can take over: Modern platform approaches such as Vendia’s can generate APIs automatically, using nothing more than the data model itself. This includes not just sharing current data, but also versioning (“time travel” access to older versions of the data) and lineage/auditing (access to information about “who did what and when,” which is needed to create compliant end-to-end solutions that third parties can successfully audit). Figure 4 illustrates a fully modeled, fine-grained data-sharing architecture among multiple parties.

Figure 4: A fully modeled, fine-grained data sharing architecture among multiple parties.

As discussed above, sharing data is only half the battle: Just as it’s important to get data swiftly from one party to another, it’s important to ensure that only the right data is shared. Access controls, governance mechanism and fully auditable tracing of these settings are key requirements, not just for enterprises but for any company operating in an environment where accidental sharing of personal data makes headlines. Fine-grained data models also provide a natural framework on which to “hang” metadata such as access controls, indexing requirements, and other operational and security annotations, allowing the platform to compile them automatically into a complete, SaaS-delivered solution.

Real-time data mesh solutions don’t make challenges like authorization or authentication harder, but they do emphasize the inherent heterogeneity and security challenges associated with connecting clients that may vary dramatically from party to party. For example, one party might ingest data from a public cloud service and require a cloud native identity and access-control solution, while another party may have elected to distribute shared data to a mobile app running on millions of handheld devices. A successful platform needs to embrace, rather than bypass, these differences by supporting a variety of authentication and authorization mechanisms that can be customized on a per-party basis. As important as a shared-data and governance model is, allowing and supporting required differences among parties is equally critical.

Going Further — Schema Evolution

Business relationships are constantly changing. Business needs, and the data that powers them, is constantly evolving. So to be successful, a real-time data mesh needs to model the “sharing topology” as a first-class element and make it both simple and safe to evolve the data model over time to match the needs of the business.

Successful real-time data meshes incorporate both of these features: The parties sharing data, whether they represent multiple companies, different organizations within a single company, different cloud vendors, multiple SaaS applications, regional deployments or any combination thereof, need to be easy to capture and represent using configuration, rather than requiring complex code or tooling. The data model itself needs to be represented in a standards-based format, not a proprietary representation that could lead to a “walled garden” problem down the road with the ability to augment or alter it in controlled ways over time. By generating APIs and other infrastructure automatically from the data model, the platform can also guarantee backward compatibility for clients, ensuring that as the data model evolves, applications and other parties aren’t left broken and unable to continue sharing data effectively.

Once a deployment strategy has been elected, how can an IT organization perform an effective vendor selection process? The next article provides a methodology for vendor consideration that incorporates the requirements exposed by these design strategies to assist in locating a best-of-breed platform.

Vendia and Real-Time Data Meshes

Looking to learn more about real-time data meshes or their integration with analytical data solutions? The Vendia blog has a number of articles, including how these features surface in modern applications and get exposed through data-aware APIs.

Next Up

In Part 3 of this series, we provide a vendor checklist that focuses on what’s needed to effectively evaluate real-time data-sharing solutions.

The post The Real-Time Data Mesh and Its Place in Modern IT Stacks appeared first on The New Stack.

3 Consul Service Mesh Myths Busted

Van Phan — Mon, 23 May 2022 18:06:58 +0000

Van Phan

Van is a technical product marketing manager for Consul at HashiCorp. He has been in the infrastructure space for most of his career and loves learning about new technologies and getting his hands dirty. When not staring at this computer screen, he's sharing pictures of food to his wife's dismay. He lives in San Jose, California, with his wife and two young boys.

Most infrastructure engineers have a good idea what Terraform does, and those who care about security likely know about HashiCorp Vault, but what about HashiCorp Consul?

Some engineers see it as a service discovery solution. Others recognize it as a service mesh. And some might know Consul only from its earliest use case — as a key-value (KV) store.

Yes, Consul does all those things. But since bursting on the scene as a popular open source networking tool back in 2014, it has grown into a much more comprehensive networking platform.

So let’s take a look at three Consul capabilities you may have misconceptions about or not be taking full advantage of.

Consul Bolsters Zero Trust Networking

Ashher Syed

Security professionals want to apply zero trust principles across their whole infrastructure, especially throughout their network. Using traditional firewall-based security to protect the network perimeter or control access between internal networks, VPCs, and VNETs is an increasingly insufficient approach.

Traditional security approaches are based on IP addresses as the unit of access control between machines and services across different networks. But authorizing (allowing or denying) communication between individual services based on IP addresses requires a significant amount of effort at scale, especially in a modern world where machines and IP addresses are ephemeral.

Consul can relieve these pain points by offloading traffic between network services to be handled by Consul’s service mesh. The key is the service identity that is applied to every service registered to Consul. In this situation, a service identity replaces the IP address as a unit of control with which all future policy enforcements are made.

Consul service mesh service-to-service communication flow.

Administrators can allow or deny communication between services based on the service name rather than IP addresses. This makes it much easier to manage communication between services, particularly across microservice environments where the service names are constant while the IP addresses are dynamic. For example, administrators can impose a secure-by-default “deny all” policy, which is the first step toward zero trust networking. After starting with secure defaults, you can build up service-to-service authorization policies in line with the requirements of your applications and threat models.

Once authorized, communication between services is authenticated and encrypted with mTLS. TLS certificates are automatically generated by Consul and tied directly to each service’s identity. The full authentication process and certificate exchange is handled by Envoy, Consul’s sidecar proxy, and will also ensure traffic is encrypted over the network. Put it all together, and Consul relieves developers from having to add separate authentication and encryption code to their services, which ultimately increases productivity. If it’s built into the developer workflow and platform well, the developers will barely even notice it’s there.

Consul Thrives on Kubernetes

Another misconception is that Consul is mainly targeted for virtual machine environments. In reality, in addition to non-containerized workloads, it supports multiple runtimes including virtual machines, Kubernetes, Amazon ECS and HashiCorp Nomad and works across multiple clouds. It’s the perfect service mesh for heterogeneous environments.

Consul has powerful features for Kubernetes. Features like Admin Partitions and Transparent Proxy were driven by Kubernetes requirements. And more recently, several new features were developed specifically to optimize the user experience for Consul on Kubernetes. For instance, Consul 1.11 introduced the Consul K8s CLI tool, which was intended to simplify the installation and life cycle management experience of Consul on Kubernetes without the requirement of Helm charts or kubectl. Helm is still fully supported, but now users have the option to use a different approach.

Consul 1.11 also provided a tighter integration between Consul on Kubernetes and Vault. This made it easier to use Vault to automatically generate, store and manage TLS certificates on both Consul’s control plane and data plane. This was part of a broader initiative for closer integration between Consul on Kubernetes and Vault to store all secrets.

This initiative continued with the 1.12 release, which allows Consul to automatically rotate TLS certificates on both the control plane and data plane without any downtime. Administrators no longer have to manually rotate certificates and risk outages due to human error. It also improves security, since certificates can be rotated more often without burdening the administrator.

Consul storing Kubernetes secrets per cluster vs centralized in Vault.

Lastly, the 1.12 release gave users the option to store secrets like ACL tokens and license keys on Vault rather than as Kubernetes secrets. This also improves security since Vault encrypts all secrets by default and provides many additional auditing and reporting features for centralized secrets management and governance.

Consul Is Easier Than You Think

Have you ever been to a casino or watched a movie where people are cheering at a craps table? If you don’t know the rules, craps can seem confusing and quite intimidating because there’s so much going on. However, craps is actually composed of many simple wagers, and players can choose to bet only on the parts that interest them.

Similarly, if you’re new to Consul, it can be easy to get overwhelmed by its many options and capabilities. But you don’t need to use every Consul capability. Every customer has a different use case, from service discovery to securing network services to automating network devices, and they don’t need to take advantage of everything right away.

The four pillars of networking Consul focuses on.

So where do you start? Well, whatever your use case, the first thing to do after deploying Consul is to register your services. Once your services are registered, they have an identity that can be used for service discovery or service mesh capabilities. Consul has matured over years to become simple to deploy, manage and use. And most of Consul’s capabilities are transparent to the application developer. In fact, it is almost effortless with Consul on Kubernetes to automatically register your services. Just set the connectInject.enabled and connectInject.default parameters to true in your Consul Helm values file. This configures Consul to automatically inject an Envoy sidecar proxy onto every service in your Kubernetes cluster. In addition, Transparent Proxy is enabled by default and will ensure traffic between services are automatically redirected through the Envoy proxy. This frees developers from having to manually edit their application’s Kubernetes manifest to account for any upstream services.

Want something even easier? If you don’t want to bother installing and managing Consul servers, HCP Consul is a fully managed service that takes care of all that for you. The HCP Consul servers are installed in a dedicated, single-tenant cloud infrastructure environment called HashiCorp Virtual Network (HVN). Just install the Consul clients in your environment and join them to the Consul server cluster. We make it easy to connect the HCP Consul servers from our HVN to clients in your own organization’s VPC. All connections within Consul’s service mesh are secured and encrypted by default without any additional management overhead. All the best practices to update server software and ensure server resiliency are managed for you by HashiCorp expert engineers. Want to get started? Check out this HCP Consul tutorial.

Put it all together, and you can see that HashiCorp Consul does a lot more — and does it more easily — than you may have thought. Consul helps you implement zero trust security where all the application traffic is authorized, authenticated, and encrypted. It provides service identity-based security across all platforms, runtimes, and clouds. Consul on Kubernetes has native integrations with Vault and provides a specialized CLI for simplifying Kubernetes workflows. Overall, the project’s goal continues to be giving users the ability to run a number of modern networking capabilities for any application across any infrastructure.

Resources and Use Cases

The post 3 Consul Service Mesh Myths Busted appeared first on The New Stack.

Envoy Gateway Offers to Standardize Kubernetes Ingress

Joab Jackson — Tue, 17 May 2022 02:22:48 +0000

The Envoy Proxy project is expanding, with the aim of establishing a standardized, simplified set of APIs for working with Kubernetes itself.

This week, at the KubeCon+CloudNativeCon EU, the open source project revealed that is has been working on an extension, Envoy Gateway, that would equip the Envoy reverse proxy to be a network gateway, allowing it to not only direct internal microservices traffic, but also to manage external traffic coming into the network. Kubernetes is the initial target.

The idea behind the Envoy Gateway is to provide “a simplified deployment model and API layer aimed at lighter use cases,” explained Envoy creator Mat Klein, in a blog post.

Originally created for Lyft, Envoy was released as open source in 2016 and has been primarily used to build service meshes (often with Istio, another CNCF project) that help cloud native apps communicate with one another by way of sidecars.

Interestingly, Lyft itself first used the software as an API gateway/edge proxy, meaning it could easily serve as a reverse proxy for internally routing outside traffic as well, offering the same level of observability and zero trust security.

The Envoy Proxy project was adopted by the Cloud Native Computing Foundation is 2017 and has reached the Graduated project maturity level. Envoy Gateway will be hosted by CNCF as well, as a spin-off project.

Kubernetes Gateway API

According to Klein, the API Gateway can be thought of “as a wrapper around the Envoy Proxy core,” one that would not make any significant changes to the core itself. It can manage L4/L7 traffic in a variety of use cases.

For admins, the software aims to provide an easier way to set up Kubernetes service mesh. Envoy itself is not so easy to administer, Klein himself admits. A bit of ease-of-use might help attract more users, namely those with smaller networks who now tend to deploy HAProxy or Nginx for Kubernetes ingress duties instead.

Perhaps more importantly, the project would like to see developers and third-party tool vendors settle on using the Envoy Gateway to access Kubernetes, by providing a reference implementation to run Envoy as an ingress controller for a K8s cluster. This API will be “the Kubernetes Gateway API with some Envoy-specific extensions,” Klein explained.

“This API was chosen because deployment on Kubernetes as an ingress controller is the initial focus for the project and because the API has broad industry buy-in,” he said.

Use of the Kubernetes Gateway for configuration would decouple “routing from management in its API,” further explained Ambassador Labs CEO Richard Li, in a blog post. Ambassador is a sponsor of the project, along with Fidelity, Tetrate, and VMware. The API would hide the low-level configurations that developers and admins would otherwise have to tangle with.

The project will “focus on making the experience incredibly simple and easy for individual application devs or teams to take and deploy Envoy in their infrastructure,” Tweeted Zack Butcher, who is an Istio contributor and steering committee member.

Butcher also noted that converging on the Kubernetes Gateway API would reduce the amount of duplicate work now being undertaken to build competing control planes.

“Control planes are boring! They’re also expensive to build and hard to get right,” Butcher noted.

If the Kubernetes user base were to all agree to deploy these APIs, it will allow their vendors “to easily provide alternate SaaS implementations that may be preferable if a user outgrows the reference implementation, wants additional support and features, etc,” Klein argued.

Third-party software developers can then move up the stack, and work on building advanced features atop the standardized, vendor-neutral, Kubernetes Gateway API.

This is not the first CNCF to tackle the API gateway. But the best bits of Contour and Ambassador’s Emissary will be merged into this new effort, CNCF said.

The post Envoy Gateway Offers to Standardize Kubernetes Ingress appeared first on The New Stack.

Buoyant Wants to Make Linkerd Easier to Use and Manage

B. Cameron Gain — Fri, 06 May 2022 14:07:20 +0000

Service mesh by definition is supposed to help reduce the complexity associated with Kubernetes. Linkerd, often championed by smaller organizations as the service mesh that is simpler to deploy and manage than other open source alternatives, could become even easier to use with what Buoyant is touting as “fully managed” Linkerd to Buoyant Cloud.

Automating Linkerd

“The service mesh is widely recognized as a critical component of any successful Kubernetes platform, but the space remains notorious for its complexity. In this crowded landscape, Linkerd stands apart for its laser-like focus on simplicity and minimizing operational burden,” Morgan said. “Buoyant Cloud’s new full management features take this focus to the next level, allowing adopters of the CNCF’s flagship service mesh to treat it as a true utility. Buoyant Cloud’s fully managed experience for Linkerd eliminates maintenance tasks like upgrades, monitoring and alerting, allowing organizations to achieve far-reaching goals like enabling zero-trust traffic security for K8s clusters with near-zero maintenance cost and without needing to be a service mesh expert.”

Prior to this release, Linkerd adopters were able to take advantage of the operational simplicity of the Linkerd, but they still needed to perform maintenance tasks such as upgrades, which “in the fast-paced ecosystem of Kubernetes, was a non-trivial burden,” Morgan said.

“While Buoyant Cloud has offered proactive monitoring of Linkerd clusters since its inception, allowing adopters to sleep soundly at night, the new release provides automation of tasks such as upgrades, trust anchor rotations and more, allowing SRE teams to ‘hand over’ ownership of Linkerd to Buoyant Cloud and remove service mesh management and maintenance from their plate entirely,” Morgan said.

Ultimate Control

Buoyant Cloud also allows for simpler control for monitoring Linkerd deployments, ranging from control plane to data plane, but things can still go wrong, Morgan said.

“Linkerd sits at the intersection of many sources of failure, including the network, the cluster, and, of course, your application itself. But we’re going to do this in a way that is predictable and that gives you ultimate control. For example, keeping your data plane in sync with your control plane requires — thanks to Kubernetes’s immutable pods — restarting your workloads,” Morgan wrote in a blog post. “Buoyant Cloud will do that on only the workloads you allow, and in the exact way you specify, without surprises. Upgrading your control plane will happen only when you are ready, and if anything goes tragically wrong, the existing control plane will not be removed. And so on.”

For those organizations that may just be getting their feet wet with Kubernetes or may only want to start a sandbox project, for now, Buoyant Cloud’s Linkerd could be appropriate thanks to its low barrier of entry.

“Teams that want the power of Linkerd but don’t want to invest in the maintenance and operational burden of managing it themselves can easily use Buoyant Cloud’s new ‘fully managed’ feature set, on any cluster and any cloud,” Morgan said.

The post Buoyant Wants to Make Linkerd Easier to Use and Manage appeared first on The New Stack.

Implementing a Secure Service Mesh

Jonathan Kelley — Thu, 28 Apr 2022 09:00:41 +0000

Jonathan Kelley

Jonathan has been a technologist for 14 years, with a focus on DevOps for half of that. He’s currently a site reliability engineer at LogDNA, where he contributes his expertise about Linux, Kubernetes, networking and cloud infrastructure.

I’m currently working toward integrating all of our workloads under what is known as a service mesh.

A service mesh is a network layer that sits between every pod in all clusters. We can use the mesh and its associated tools to enroll a series of pods into discretely defined and secure network data planes.

For the purpose of this blog post, I’ll be talking about Kuma. Kuma is an open source solution built on top of Envoy that acts as a control plane for microservices and service mesh. It works with Kubernetes and virtual machines (VMs), and it can support multiple meshes in one cluster.

There are other open source and managed service mesh options out there, like Istio, Linkerd and Kong Mesh.

Why Use a Service Mesh

One of our main goals with using a service mesh was to get Mutual Transport Layer Security (mTLS) between internal pod services for security. However, using a service mesh provides many other benefits because it allows workloads to talk between multiple Kubernetes clusters or run 100% bare-metal apps connected to Kubernetes. It offers tracing, logging around connections between pods, and it can output connection endpoint health metrics to Prometheus.

This diagram shows what a workload might look like before implementing a service mesh. In the example on the left, teams are spending time building pipes instead of building products or services, common functionality is duplicated across services, there are inconsistent security and observability practices, and there are black-box implementations with no visibility.

On the right, after implementing a service mesh, the same team can focus on building products and services. They’re able to build efficient distributed architectures that are ready to scale, observability is consistent across multiple platforms, and it’s easier to enforce security and compliance best practices.

How the Kuma Service Mesh Architecture Works

The magic of taking an application pod’s socket communications from plain text over to mTLS lies within the Kuma control plane, the sidecar and Kuma Container Network Interface (CNI). When a developer merges some change, adding new services to an application, Kuma transparently detects and injects the required bits to proxy traffic automatically across its own network data plane.

Kuma service mesh has three major components:

Kuma CNI: A CNI plugin that identifies user application pods with sidecars, based on annotations, to set up traffic redirection. It sets this up in the pod lifecycle’s network setup phase when every pod is scheduled in Kubernetes through a process called mutating webhooks.
Kuma-sidecar: This runs on every instance exposing a service. The services delegate all the connectivity and observability concerns to an out-of-process runtime that will be on the execution path of every request. It will proxy all the outgoing connections and accept all the incoming ones. And, of course, it will execute traffic policies at runtime, like routing or logging. By using this approach, developers don’t have to worry about encrypted connectivity and can focus entirely on their services and applications. It’s called sidecar proxy because it’s another container running alongside a service process on the same pod. There is going to be one instance of a sidecar proxy for each running instance of services, and because all the incoming and outgoing requests, and their data, always go through the sidecar proxy, this is also called the Kuma data plane (DP), since it sits on the network data path.
Kuma control plane (kuma-cp): This is a distributed executable written in GoLang that can run on Kubernetes, issue data plane certificates and coordinate data plane (DP) state within a Kubernetes API. You can use Kuma Custom Resource Definitions (CRDs) to configure Kuma settings and policy, and the sidecars pick up changes automatically from the control plane.

Conclusion

The service mesh topology of today is much like what the enterprise service bus (ESB) architecture of the 1990s and 2000s was going for. Instead of directing broker traffic along a route based on business policies as ESB architectures did, with the mesh, you now freely connect your application and the mesh manages the routes and policies from the top.

In my opinion, the biggest reason ESB architecture wasn’t more prevalent in the industry was because of the monolithic codebase requirements it had to make it possible and the ultimate dependency management problems you often encountered. You would have dozens of projects sharing dependencies for managing objects on the ESB, and it became a software management headache.

Service mesh technologies ease the pain by staying decoupled from your code. It allows developers to shift the complexity of security, reliability and observability away from their application stacks and keep it solely as part of the infrastructure equation.

The post Implementing a Secure Service Mesh appeared first on The New Stack.

Istio Applies to Join CNCF: Why Now?

B. Cameron Gain — Tue, 26 Apr 2022 20:17:08 +0000

The Istio Steering Committee’s decision to offer the service mesh project as an incubating project with the Cloud Native Computing Foundation (CNCF) raises the question: why has it taken so long?

The move follows concerns by IBM — one of the original creators with Google and car-sharing provider Lyft — and other community members over the project’s governance, specifically Google’s advocacy of the creation of the Open Usage Commons (OUC) for the project in 2020. However, the context has changed today, an Istio steering committee member noted on GitHub.

Right Timing

The Istio steering committee implied this week that the timing is right. The move is intended to help “deepen” Istio’s integration with Kubernetes through the Gateway API and gRPC with proxyless mesh, “not to mention Envoy, which has grown up beside Istio,” according to an Istio statement released on GitHub by Istio steering committee member Craig Box, who leads the Cloud Native advocacy team at Google Cloud. “We think it’s time to unite the premier Cloud Native stack under a single umbrella,” the statement reads.

However, Istio’s application to join CNCF followed criticism in 2020 over Google’s creation of the Open Usage Commons (OUC) license for Istio and Google’s ownership of the associated trademarks. IBM deemed the OUC licensing scheme “disappointing because it doesn’t live up to the community’s expectation for open governance,” then IBM’s Jason McGee, general manager and CTO of IBM Cloud Platform, wrote in a blog post in 2020.

“An open governance process is the underpinning of many successful projects. Without this vendor-neutral approach to project governance, there will be friction within the community of Kubernetes-related projects. At the project’s inception, there was an agreement that the project would be contributed to the CNCF when it was mature,” McGee wrote. “IBM continues to believe that the best way to manage key open source projects such as Istio is with true open governance, under the auspices of a reputable organization with a level playing field for all contributors, transparency for users, and vendor-neutral management of the license and trademarks. Google should reconsider their original commitment and bring Istio to the CNCF.”

Relinquishment of the trademarks by Google was required in order for the Istio project to achieve its long-term objectives, Todd Moore, vice president, open technology, IBM, told The New Stack in an emailed response.

“Long ago, IBM realized the power of communities that are openly governed and projects that are secured in neutral homes are the ones to gain momentum and spawn markets. While the Istio project governance made great strides, the project was not destined to reach the broad adoption that would be secured by a long-term neutral home,” Moore said. “Single-vendor control over the trademark and licensing is a deterrent to broad adoption as end users and industry players are aware of the pitfalls.”

Meanwhile, the parties at Google who were reluctant to surrender trademarks “are no longer there,” Moore noted. “This freed sensible heads to prevail. At the start, it was a toss up on who would register the trademark and IBM took Google at good faith that our agreement to take the project to the CNCF would be honored,” Moore said. “This turned out to not be the case, but that has been put right.”

A Google spokesperson countered in an emailed response: “We’ve been waiting for the right time of Istio’s lifecycle to donate, and now is simply the right time in terms of its maturation. Google approached the OUC and asked them to donate the trademark to the Linux Foundation. The OUC agreed to do so, so as part of the contribution, the trademark will be transferred.”

Initial Reluctance

Yesterday, Istio’s steering committee said the OUC license will remain in effect. However, the trademarks will move to The Linux Foundation but continue to be managed under OUC’s trademark guidelines.

According to industry sources, certain Google parties were reluctant to surrender the ownership of Istio’s trademarks. This is because, Torsten Volk, an analyst for Enterprise Management Associates (EMA), told The New Stack, Google “has invested a lot of staff hours into Istio and regards service mesh as a critical entry point into the enterprise market.”

“Controlling the ‘strings’ that hold together distributed applications would be a great position for any vendor to be in, but Google was certainly aware of what happened to Docker when they overplayed their hand, paving the way for Kubernetes,” Volk said. “Point being, Google needed to take this step in order for VMware, Cisco, IBM, Red Hat and friends to stay committed to Istio, instead of eventually starting to shop around.”

While Istio is retaining the OUC license, the act of moving the associated trademarks to The Linux Foundation, and especially, the decision to apply to become a CNCF project, seems to have appeased IBM — at least somewhat.

IBM Reaction

IBM wrote in a post yesterday: “IBM fully believes in open governance and the power of community. Therefore, we enthusiastically applaud today’s submission of Istio to the Cloud Native Computing Foundation (CNCF).”

However, IBM was not more specific. The about-face, according to Volk, can be accounted for by “lots of friction around this topic in the past and Google still hanging on to the OUC license model instead of simply adopting a traditional open source license without trademark protection.”

“This is a tricky topic for all parties involved, as Istio integration requires each vendor to make significant investments and nobody wants to explain to their board why their company was contributing to Google’s shareholder value,” Volk said.

More Support and Governance

Meanwhile, Google has made over half of all contributions to Istio and two-thirds of the commits, according to CNCF DevStats, Chen Goldberg, vice president of engineering for Google, noted in a blog post. Google also became Envoy’s largest contributor after adopting Envoy for Istio.

“Istio is the last major component of organizations’ Kubernetes ecosystem to sit outside of the CNCF, and its APIs are well-aligned to Kubernetes. On the heels of our recent donation of Knative to the CNCF, acceptance of Istio will complete our cloud-native stack under the auspices of the foundation, and bring Istio closer to the Kubernetes project,” Goldberg wrote. “Joining the CNCF also makes it easier for contributors and customers to demonstrate support and governance in line with the standards of other critical cloud-native projects, and we are excited to help support the growth and adoption of the project as a result.”

Istio’s joining CNCF is only good news for Solo.io, the leading provider of tools for Istio. The CNCF’s support will, of course, only make Istio more robust, which should translate into performance benefits for users of Solo.io’s Gloo Mesh and other Istio-based products.

“We bet on Istio five years ago … But we did believe that Istio is the best service mesh even when it wasn’t in the CNCF. But before people were a little bit confused about why Istio was not in the CNCF and were even a little bit worried,” Idit Levine, founder and CEO of Solo.io, told The New Stack.” Now I think that Istio joining the CNCF will make Istio exactly like Kubernetes, as the de facto service mesh.”

Service mesh is defined in the book “Istio in Action,” by Christian E. Posta, vice president, global field CTO for Solo.io, and Rinor Maloku, field engineer for Solo.io, as a relatively recent term “used to describe a decentralized application-networking infrastructure that allows applications to be secure, resilient, observable and controllable,” Posta and Maloku write.” Service mesh, in this way, describes an architecture consisting of a data plane that uses application-layer proxies to manage networking traffic on behalf of an application and a control plane to manage proxies. This architecture “lets us build important application-networking capabilities outside of the application without relying on a particular programming language or framework,” Posta and Maloku write.

“Istio is an open source implementation of a service mesh. It was created initially by folks at Lyft, Google, and IBM, but now it has a vibrant, open, diverse community that includes individuals from Lyft, Red Hat, VMware, Solo.io, Aspen Mesh, Salesforce and many others,” Posta and Maloku write. “Istio allows us to build reliable, secure, cloud-native systems and solve difficult problems like security, policy management and observability in most cases with no application code changes.”

The post Istio Applies to Join CNCF: Why Now? appeared first on The New Stack.

The Rush to Fix the Kubernetes Failover Problem

B. Cameron Gain — Thu, 31 Mar 2022 15:32:26 +0000

Services and clusters will certainly fail on Kubernetes, and all too often, the unfortunate SRE or operations person will get that call in the middle of the night to manually fix it. While Kubernetes does indeed offer a failover mechanism it is not automated in such a way that in the event of a cluster or a service failure, the services are instantly transferred to a replica cluster configuration where they resume functionality.

A new automated failover functionality for Linkerd gives Linkerd the ability to automatically redirect all traffic from a failing or inaccessible service to one or more replicas of that service — including replicas on other clusters, Buoyant’s Alejandro Pedraza, a senior software developer, wrote in a blog post. “As you’d expect, any redirected traffic maintains all of Linkerd’s guarantees of security, reliability, and transparency to the application, even across clusters boundaries separated by the open internet,” Pedraza said.

Other leading service mesh providers also offer a similar fix for Kubernetes’ failover shortcomings that Istio and HashiCorp provide (more about that below).

Sigh of Relief

For Linkerd users, this failure functionality should prompt a sigh of relief among operations teams working in Kubernetes environments. This is because it prevents operations teams “from having to scramble to fix Kubernetes clusters in the middle of the night, simply by automatically rerouting application traffic without any need for code changes or reconfiguration,” Torsten Volk, an analyst at Enterprise Management Associates (EMA), told The New Stack.

With Linkerd’s new automated failover feature, cluster operators can configure failover at the service level in a way that’s fully automated and also transparent to the application, Linkerd co-creator William Morgan, who is also CEO of Buoyant, told The New Stack. This means that if a component fails, all traffic to that component will be automatically routed to a replica, “without the application being aware,” Morgan said.

“If that replica is in a different cluster in a different region or even a different cloud, Linkerd’s mutual TLS implementation means that the traffic remains fully secured even if it is now traversing the open Internet,” Morgan said. “This is something Linkerd users have been asking for a long time and we’re happy to deliver it to them today.”

In Istio’s case, Istio has supported the automation of failovers for Kubernetes “for a while,” Christian Posta, vice president, global field CTO, for Solo.io, told The New Stack, adding “we automate away all of the config” with Solo.io Gloo Mesh.

“It largely stems from locality and priority-aware load balancing that Envoy has,” Posta said.

The locality failover sequence with Istio.

HashiCorp has also implemented the automation of a failover functionality for some time, which is described in its documentation.

The push to automate the failover functionality of Kubernetes supports the original conept of policy-driven application placement, Volk said. In this way, “DevOps teams no longer have to exactly define a specific application environment based on application requirements, but instead, developers can declare app requirements within the application code that the service mesh then matches,” Volk said.

Simple Concept

The main issue is how Kubernetes does not provide an automated failover functionality in the event of a failure. When services and clusters fail on Kubernetes, Volk said. “DevOps teams must typically make changes to the application code to change traffic routing in a manner that is specific to the underlying cloud infrastructure,” Volk said. “This means, you would need to write different code for routing workloads to or between clusters on AWS, Azure, Google Cloud or other specific platforms.”

Indeed, the concept of failover is simple, Morgan said. “If a component breaks, send all traffic destined to that component to a replica that’s somewhere else, usually in another cluster. One of the biggest challenges for DevOps teams who want to use failover to improve the resilience of their applications is simply the fact that Kubernetes itself doesn’t provide any automation around this,” Morgan said. “So you can deploy replicas of application components across regions and zones, but failing over between them is left up to you. Worse, if you want to be able to failover individual services, the application somehow needs to understand how to send traffic to different replicas in the event of failure. That conflates application concerns with platform concerns and leads to maintenance problems.”

The new failover feature in Linkerd is built on top of existing Kubernetes and Linkerd features, like health probes and SMI TrafficSplits, and introduces a minimum of new machinery and configuration surface area, Morgan said. “This is the same design principle that has made Linkerd the simplest service mesh to operate, by a wide margin,” Morgan said. “It’s part of our commitment to our users: Kubernetes is complicated enough; your service mesh doesn’t have to be.”

The post The Rush to Fix the Kubernetes Failover Problem appeared first on The New Stack.

Solo.io Intros Gloo Mesh Enterprise 2.0

Darryl K. Taft — Fri, 11 Mar 2022 15:26:01 +0000

Solo.io this week introduced Gloo Mesh Enterprise 2.0, the latest version of its Istio service mesh and control plane that simplifies service-to-service connectivity in distributed applications.

Key enhancements in the new release include multitenant workspaces, a new unified API for east-west and north-south traffic management, a new UI for observability, and improved VM support.

In addition, the application networking company announced the beta release of GraphQL for Gloo Edge, which embeds a GraphQL server natively into Gloo Edge — Solo.io’s API gateway and ingress controller that enables federated GraphQL queries of APIs.

Solo.io made these announcements at its second annual SoloCon digital user conference this week.

“The enhancements are a natural evolution of our APIs that continue to simplify the adoption, use, and management of Istio over single-cluster and multicluster environments,” said Idit Levine, founder and CEO of Solo.io, in a statement, noting that Solo.io’s customers are using Istio in some of the largest, most complex and most diverse implementation environments imaginable — many running on multiple platforms with thousands of APIs and hundreds of millions of requests per day.

Customer-Driven Features

Levine added that the new capabilities in Gloo Mesh Enterprise 2.0 are largely customer-driven based on feedback from hundreds of customers that Solo.io keeps in touch with via Slack.

“We have all our customers basically on Slack, each of them has their own channel, and we have a very good relationship with them and are very responsive to them,” Levine told The New Stack. “However, there are hundreds of them. This is a different model than regular companies. This is like, really, really crazy scale.”

“One of the key challenges that we faced was when we were assessing service mesh providers and had relied upon support that simply never materialized,” said Tom Howard, cloud networking lead at Invitae, in a statement. “We found what we were looking for with Solo.io and Gloo Mesh. The responsiveness of the support, the focus on building these partnerships with customers and clients — we see it as an interactive partnership to move forward to enhance the platform.”

Introduced last year, Gloo Mesh Enterprise is an Istio-based Kubernetes-native solution for multicluster and multimesh service mesh management.

New features in 2.0 such as multitenant workspaces enable users to set fine-grained access control and editing permissions based on roles for shared infrastructure, enabling teams to collaborate in large environments. Users can manage traffic, establish workspace dependencies, define cluster namespaces, and control destinations directly in the UI. And the policies can be re-used and adapted using labels.

Gloo Mesh Enterprise 2.0 also features a new Gloo Mesh API for Istio management enables developers to configure rules and policies for both north-south traffic and east-west traffic from a single, unified API. The new API also simplifies the process of expanding from a single cluster to dozens or hundreds of clusters. And the new Gloo Mesh UI for observability provides service topology graphs that highlight network traffic, latency, and speeds while automatically saving the new state when you move clusters or nodes.

Moreover, the newly improved VM support reduces the time and the amount of code required to add VMs to Istio service mesh environments, the company said.

“As users of the Gloo Enterprise offering, we are looking forward to having the complexity of an Istio service mesh abstracted in one control plane combined with our current Gloo Edge gateway,” said Gert-Jan Groeninckx, a platform engineer at Waylay, in a statement. “Having Gloo Mesh added to the Enterprise offering will make adoption of Istio and its advantages on multicluster routing, observability, and managing the whole system very interesting.”

GraphQL for Gloo Edge

Meanwhile, Solo.io also announced beta availability of GraphQL for Gloo Edge. Solo.io has embedded a GraphQL server natively into Gloo Edge, which enables federated GraphQL queries of customers’ APIs using their service mesh and API gateways with no additional GraphQL infrastructure required. GraphQL, an open-source data query language for APIs that is used to simplify client-server interactions.

“Basically, we teach Envoy to become a GraphQL server,” Levine said.

“Ory is an open source company building out a planet-scale network infrastructure focused on customer zero trust,” said Thomas Curran, co-founder of Ory, in a statement. “We’re very interested in GraphQL due to our whole permissioning infrastructure, and we are excited Solo.io is adding GraphQL to Gloo to enable developers’ APIs as GraphQL queries.”

GraphQL provides declarative data fetching, with a client specifying what data it needs from an API and underlying services through a single endpoint.

GraphQL for Gloo Edge delivers:

A GraphQL server, resolvers, a GraphQL schema generator, and schema stitching for GraphQL are all driven by declarative configuration.
The ability to store GraphQL schemas as code, observe usage, and publish information, as well as manage schema and server changes, saving time and toil.
Integrated monitoring capability that provides access to key metrics for GraphQL requests, query execution, and upstream resolvers to monitor performance and debug issues. Users can export metrics with Prometheus and Grafana.

“Solo.io’s GraphQL support for Gloo Edge replaces cumbersome resolver code with a simple configuration,” said Matt Jones, senior director of architecture at the Fitch Group.

New eBPF Acceleration

Meanwhile, Solo.io recently introduced BumbleBee, a new eBPF (extended Berkeley Packet Filter) open source project that simplifies the development, packaging, and sharing of eBPF tools. The company is now incorporating eBPF into Gloo Mesh Enterprise to complement Istio and further enhance performance.

“We’re leveraging eBPF to get better latency and throughput of the service mesh,” Levine said. “Basically what we’re doing is we’re using eBPF to shortcut the HTTP stack.”

The eBPF tooling includes a Container Network Interface (CNI) extension and the ability to extend the CNI with network policy written in eBPF without requiring users to replace the existing CNI, the company said.

New Hire

And in other related news from the SoloCon event, Solo.io announced the hiring of Brian Gracely as its new vice president of product strategy. Gracely comes to Solo.io from Red Hat, where he was senior director of product strategy overseeing the company’s open hybrid cloud offerings, including OpenShift. He brings more than 20 years of industry experience in areas including open source software, containers, Kubernetes, cloud computing, data center virtualization, serverless and more, the company said.

The post Solo.io Intros Gloo Mesh Enterprise 2.0 appeared first on The New Stack.

Is Linkerd Winning the Service Mesh Race?

B. Cameron Gain — Tue, 08 Mar 2022 17:19:44 +0000

Linkerd appears to be taking the lead market share-wise in the service mesh race as organizations increase their adoption of Kubernetes and realize they can’t do it without a proper control plane mesh. According to a recent CNCF survey, for example, Linkerd has surged ahead of Istio’s adoption in the three major geographic zones of North America, Europe and Asia.

Independent of the CNCF survey, analyst firm Enterprise Management Associates (EMA) noted a “strong momentum of the Linkerd platform,” Torsten Volk, an analyst at EMA, said. “Its growing user base is raving about its simplicity and its targeted addition of more enterprise-grade features.”

Simplicity Is Key

Simplicity of adoption and use accounts for Linkerd’s very strong growth in adoption, Linkerd creator William Morgan, CEO of Buoyant, said. “Istio is notorious for its complexity, and I think that burned a lot of community goodwill. By contrast, Linkerd has been focused on simplicity since day one — especially operational simplicity, which is very hard to achieve. We tried very hard not to oversell the project,” Morgan said. “This simplicity means that not only do we get adopters because simpler is better, but also as you suggest because a low cost of adoption means that Linkerd is usable in more situations. Today, we see production usage of Linkerd on three-node clusters, on massive global deployments with tens of thousands of pods, and on everything in between.”

However, a deeper analysis shows how the service mesh sector may not necessarily be a one-horse race led by Linkerd despite its rapid adoption. “I am not sure there is enough evidence to determine Linkerd is taking a perceived lead in the service mesh race,” Erik Frieberg, chief marketing officer for Solo.io, said.

Smaller Targets?

Linkerd is geared for smaller as well as larger deployments. In that respect, just by volumes alone, Linkerd adoption is surging. Within this context, the other leading open source alternatives, such as Istio, are often perceived as being geared for different types of applications, particularly for large-scale deployments for Kubernetes management.

“Linkerd is focused on making service mesh accessible to a wider audience of organizations without service mesh experience. The product appeals to product teams that need a service mesh to quickly solve daily challenges while Istio targets large enterprises with experienced cloud native infrastructure teams that want to create one unified solution for most of the enterprise, from the top down,” Volk said. “As a result, there could be a larger number of smaller enterprises trying Linkerd at the team level, while Istio might have much larger overall penetration of each individual enterprise.”

Based on his experience implementing service mesh, Steve Gray, founder of internet publishing company ZeroFlucs who is also a Linkerd ambassador, said Linkerd is relatively simple to implement and manage. He based his observation from when Gray supported Linkerd’s deployment at the Australian division of sports betting and gaming group Entain where he was head of trading solutions when Entain was also evaluating Istio.

“I’d agree that Isitio is only suited for larger deployments due to the weight it brings to the table — it’s just not worth doing at small scale. However it’s not correct to layer the opposite supposition over the top of Linkerd and say it can’t play in those spaces,” Gray said. “Linkerd excels in both worlds: it’s both simple to get started, but also painless to scale. I’ve used it on everything from pet projects right through to backbone systems that power multibillion-dollar companies’ real-time platforms, with thousands of pods, processing petabytes a month in a high-chaos environment.”

A service mesh geared for large-scale deployments and multi-cluster management does not necessarily have to be complex to implement and manage, either. “Being complex, or having a long time-to-value is not the hallmark of a requirement for being suited for large-scale deployments. Ultimately, the two meshes target exactly the same workloads — you can use Istio or Linkerd at either end of the scale spectrum — and the question becomes whether or not the immense weight of complexity can be carried by the minuscule feature footprint differential for real-world use cases,” Gray said. “For higher performance, latency-sensitive applications — the type of applications that Linkerd is best at in benchmarks and those milliseconds matter the most to larger deployments.”

But Istio Adoption Will Still Grow

Despite Linkerd’s rapid gains in adoption and its perceived good fit for smaller organizations and deployments, Linkerd continues to make adoption headway, especially for large organizations. Solo.io is “working with some of the largest deployments of Istio around the world who see features, extensibility, VM support and multicluster as foundational to their architectures,” Frieberg said. “We continue our explosive growth, after seeing 500% growth in the last year,” Frieberg said. “We see the Istio service mesh market as robust and continuing to significantly expand.”

Linkerd’s position in the marketplace versus open source Istio can be also described as the nimble contender that capitalizes on Istio’s well-known challenge of having to be “everything to everyone” and therefore struggling to offer a similar level of accessibility for organizations that are new to service mesh, Volk said. “Istio was conceived to provide absolute flexibility for advanced DevOps teams to connect, secure, observe, and test microservices on Kubernetes and VMs and has only recently started to lower its barrier of entry,” Volk said. “At the same time, Linkerd has added advanced features such as multicluster support, policies, and adhering to required container startup sequences. These new capabilities appeal to Istio’s original audience, making the service mesh race a lot more interesting today.”

The service mesh market overall is also growing overall, while the CNCF’s adoption numbers include sandbox projects, etc. So, in the CNCF study, a lot of firms could have been reporting adopting Linkerd for the first time after they had already adopted Istio, for example. In that sense, Linkerd can be perceived as an “up and comer” while Linkerd is growing as well. Istio’s relatively slower relative growth could be accounted for (in theory) by how it has been in use for a while and is particularly geared for large-scale deployments among fewer organizations — hence larger yet fewer deployments.

However, Volk did not necessarily agree. Linkerd is also “on its way to becoming a full-blown enterprise solution, while at the same time Istio is working to appeal to a larger audience with less service-mesh experience,” Volk said.

“The actual race starts now, but we can definitely see that Linkerd is currently the contender that shows some significant momentum,” Volk said. “In the recent past, we have seen more and more how products with the most bottom-up momentum beat out larger enterprise solutions.”

The post Is Linkerd Winning the Service Mesh Race? appeared first on The New Stack.

HashiCorp’s API Gateway for Consul Helps Ease North-South Traffic Pain

B. Cameron Gain — Mon, 07 Mar 2022 15:32:37 +0000

HashiCorp’s general availability release of its Consul API Gateway is intended to both simplify and tighten control of traffic and services management with HashiCorp Consul service mesh.

The main feature that DevOps teams may welcome helps to solve the problem of being able to manage both north-south and east-west traffic through a single API.

Indeed, previously, the HashiCorp Consul service mesh was effective at managing intra-organizational microservices connections and communications between applications. However, a separate control plane had to be in place to control access from external requests from ingress traffic originating from the Internet.

As Peter McCarron, senior product marketing manager for Consul, told The New Stack, the ingress gateway was better suited for intra-datacenter connections vs. external client traffic. “The question was: how do I control the access point in a way that is scalable and more dynamic than using some of the traditional solutions?” McCarron said. “Now, you have a centralized point to control the traffic in a more intelligent way with the service mesh.”

Dedicated Ingress

In this way, the Consul API Gateway serves as a dedicated ingress for intelligently routing traffic to applications running on the HashiCorp Consul service mesh, Brad Casemore, an analyst for IDC told The New Stack. The gateway is designed to provide a consistent means of handling inbound requests to the service mesh from an external client, “thus eliminating the need to install another dedicated API Gateway/ingress,” Casemore said. By deploying an all-in-one solution and service, DevOps teams aren’t “forced to obtain, deploy, and manage an API Gateway/ingress controller separately from a service mesh, Casemore said.

Service meshes in general are great at automating and securing communication between services in an east-west fashion, while API gateways are better at securing and regulating north-south traffic between internal services and external clients, Casemore said. Consul API Gateway can thus be thought of as an extension of Consul service mesh. “While the two are configured independently, they use the same servers to communicate policies, validate and receive certificates, and retrieve service catalog data,” Casemore said.

API Gateway for Consul helps “to alleviate potential service mesh angst, whereby some customers feel that service infrastructure is overly complex,” Casemore said.

Operations teams especially are always looking for ways to both tighten access control and simplify the complexity of managing applications and traffic through a single console, which, in fact, is one of the main purposes of a service mesh in general. “Organizations are putting a great deal of effort behind maturing their service mesh capabilities but as they do, they are looking to manage access to their service mesh in a consistent and intelligent way,” McCarron said. “They are also forced to deal with added complexity around managing traffic patterns, controlling connections at the point of entry and identifying external client traffic.”

Additional Features

Other features the Consul API Gateway provides that Casemore described include:

Traffic routing capabilities that enable users to configure how external clients’ requests are handled and detect client metadata and use it to enforce connection paths/routes.
How Consul API gateway is deployed into the same environment and registered as a service with the Consul servers. “Configurations are done directly on the gateway, but Consul service mesh is aware of the policies and connection requirements stipulated by the gateway,” Casemore said.

The Consul Helm chart also supports the installation and configuration of the Consul API Gateway. In order to get started, users need to add the following block to their Helm values file in order to get started:

“With Consul API Gateway, organizations are able to adopt a more consistent and controlled approach for exposing service mesh-based services to external clients, reducing the risk of a potential breach; increasing operational efficiency and reducing complexity by creating a single control plane for managing all traffic,” Casemore said. “It also reduces the risk of unplanned outages by ensuring that requests are evenly distributed across available service instances.”

The post HashiCorp’s API Gateway for Consul Helps Ease North-South Traffic Pain appeared first on The New Stack.

SoloCon: Explore Service Mesh, API Gateways, GraphQL, eBPF

Erik Frieberg — Mon, 28 Feb 2022 15:51:03 +0000

Erik Frieberg

Erik is the chief marketing officer at Solo.io. He has a long history in technology marketing and was previously CMO at Puppet and senior vice president of marketing at VMware.

At Solo.io, we have been working on API gateway and service mesh technologies for years. Today, we have the pleasure of working with incredibly innovative companies around the globe. Some companies are just beginning their journey with microservices in general and Kubernetes in particular. Others are some of the largest Istio service mesh implementations in the world with hundreds of clusters and hundreds of millions of transactions each day.

Interacting with all these companies has given us a unique perspective on the challenges organizations face and the capabilities and guidance they need to manage application networking in distributed environments. At Solo, we take this input and use it to not only guide our product roadmap but also to develop and deliver education and training to the community.

This year at SoloCon, our annual user conference, our goal is to bring together industry experts, innovative companies and award-winning education. This three-day event is divided into four key areas:

Community and open source: If you’re interested in the future of service mesh, how Istio is evolving, the role of GraphQL and WebAssembly, the impact of eBPF and more, we have you covered.
Service mesh and application networking: Dive deep into performance, reliability, security, observability, multitenancy, declarative configuration and GitOps, just to name a few topics.
Edge and API gateway: Learn the ins and outs of configuring, managing, and securing an Envoy-based or Istio-based API gateway. Advanced topics include integrating into a GitOps workflow and canary deployments with Flagger.
Hands-on, engineer-focused workshops: We have four new workshops covering Envoy Proxy as a GraphQL Engine, fundamentals for eBPF, multitenant Istio service mesh and advanced Istio Day 2 operations.

Insights from Industry Leaders

At SoloCon, we are excited to bring together incredible thought leaders including Solomon Hykes, founder of Docker, Louis Ryan, co-creator of Istio, and our own Idit Levine to share their unique insights on the past, present and future of containers, service mesh, application networking and distributed applications.

Companies Using These Technologies in Production

This year’s SoloCon will feature our own customers sharing their stories about the challenges they face and the solutions we have delivered. Learn how:

T-Mobile is delivering a common edge gateway that works across all platforms and removes the need to manage resiliency, security and observability from its development teams’ responsibilities.
Chick-fil-A is innovating faster and improving its architecture and operations across distributed applications to support operations in 2,800 restaurants.
Constant Contact is modernizing its microservice architecture and how it designed microservices, exposed APIs, and integrated new and old services together.
USAA is developing and running modern applications to delight customers and deliver outstanding services.

You will also find great stories from Snyk, CARFAX, Ory, Invitae, Schneider Electric, Waylay and others that are driving innovation in their organizations.

We Invite You to Join Us at SoloCon

The new developments around the technologies in our space are very exciting and innovative, and we look forward to sharing the latest updates with you at the event. This is the goal of SoloCon, where we, our customers, and our community share our experiences and learnings. We’re excited to bring this virtual conference back and look forward to sharing these stories, updates and opportunities to learn with you. We hope to see you there!

The post SoloCon: Explore Service Mesh, API Gateways, GraphQL, eBPF appeared first on The New Stack.

Be Like Switzerland: 6 New Service Mesh Standards

Matthew Yacobucci — Thu, 11 Nov 2021 18:00:52 +0000

Matthew Yacobucci

Matthew “Booch” Yacobucci is a principal software engineer at F5, where he focuses on Kubernetes technologies, including NGINX Ingress Controller and NGINX Service Mesh. He’s experienced in the distributed computing and networking industries with skills in Linux, Go, container orchestration systems, concurrency patterns and lunch.

For the past few years, engineers working in the rapidly emerging realm of service mesh have rolled their eyes at media references to the “service mesh wars.” While seemingly hyperbole, a battle does exist. For better or worse, service meshes aren’t diplomatic. Just getting two different service meshes to talk to each other remains a nightmare. While the Service Mesh Interface (SMI) has pushed the industry toward the open standards required for true interoperability, much work remains for driving those standards to the point where service meshes can become a universal communication and management layer.

A particular case in point: Workload data is now poorly standardized, which creates problems in building standardized management tools for service meshes. Different meshes also have conflicting views on observability and telemetry. Those disparate views mean installing a service mesh requires some serious tuning to get the same observability plane set up on different meshes.

So, how do we end this battle and help service meshes reach interoperability? The key is neutrality. With a core group of neutral standards, service meshes can stop fighting and instead act like Switzerland, a country known for neutrality. Being like the Promised Land of fondue and chocolates isn’t impossible. Going beyond SMI, this wish list can standardize service meshes — not through the lens of defining the standard, but with how those standards should behave and deliver in practice.

1. Fast Installation Standard

A significant barrier to service mesh adoption is fear of installation. Understandably so. Service meshes can be devilishly hard to install and deploy. To address this, as a design principle, we can institute a “fast installation” standard and even benchmark installation time on top of a standard Kubernetes cluster.

Installation time is also a good indication of how well a service mesh can handle complexity. It demonstrates the ability to deliver a good user experience despite complex activities under the hood. Better yet, it demonstrates intelligent prioritization by limiting the mesh to core capabilities. That said, the general goal should be an “opt-in” experience with less complexity rather than an “opt-out” nightmare.

2. Fast Removal Standard

Fast, hitless removal, wherever possible, is the flip side of fast installation. Any hard-to-remove service mesh will reduce the likelihood of adoption and make it harder for application teams to trust the mesh for critical tasks. Rollback to running any application or service without a mesh, particularly internally, will be table stakes as teams design for Kubernetes environments that are fluid and afford them complete control.

Granted, fast removal mileage may vary. For example, if an application team has created a mesh environment with numerous customized CRDs and advanced functionality, then ripping out the mesh may take longer. Although setting a standard and benchmark for this is a tractable problem easily agreed upon.

3. Core Observability Standard

If you can’t observe it, you can’t manage it and understand it. Kubernetes and service meshes present some novel observability challenges because much of their focus has been on the networking layer rather than the application layer where user-facing transactions occur.

OpenTracing was a fantastic start, and the community around it built a powerful vision to give all service meshes observability with a common API. Zipkin and Jaeger have their strengths as well, with Zipkin being an all-in-one tracing solution. Then you had other projects like OpenCensus that attack tracing and observability in other ways. However, having too many competing projects has led to a lack of accepted standards for tracing context.

Fortunately, in an effort to unify observability into a single standard, the incompatible OpenTracing and OpenCensus projects began merging in the spring of 2019 to form OpenTelemetry. This was a huge step forward, combining tracing and supported language libraries into a broader vision of cloud native telemetry. OpenTelemetry also embraced W3C Trace Context as the standardized trace-propagation mechanism. Still, there is work to do. While the most popular coding languages are supported, dozens remain unsupported. In addition, not all observability backends support OpenTelemetry equally well. It’s a work in progress, but a very promising one.

4. Workload Management Standard

Knowing the requirements of a workload should affect the way a service mesh treats that workload. For instance, a workload that is a financial transaction should require encryption and mTLS for all spans of activity that contain account or personally identifiable data. There is no easy way to label workload types and set different standards for how those workloads should be treated in Kubernetes and service meshes. Currently, the closest thing is setting up rules and policies — such as retries, timeouts and terminations — for each service. While that might work in an environment with a handful of microservices, it quickly becomes more complicated as you add more microservices. Service meshes need a standardized method to label workload types while assigning requirements and rules to them. That way, workload management will be easier, automatic and precise at scale.

5. Data Management Standard

Managing data in an ephemeral and ever-changing environment is challenging. In the early days, Kubernetes users avoided running critical databases in their clusters and linked service meshes to outside data stores to ensure that their data was safe and sound. Data requires special care due to regulations like GDPR, FIPS and CCPA, which cover how you handle data, how you make it accessible to customers and the data’s physical location. Data handling in Kubernetes and service meshes remains complex and largely an afterthought.

This is similar to the non-Kubernetes world, where most developers dump data for prototypes and early stage application projects into a simple SQL store, S3 bucket or MongoDB, delaying the data management decision-making process. In Kubernetes and service mesh, kicking that can down the road injects far more complexity as microservices break down data transactions into more discrete tasks and usages. This, in turn, requires more detailed management. Similar to workloads, data management standards would simplify life for operators and development teams alike by allowing them to define the data requirements for each service using a standard language or standardized structure.

6. Protocol Support Standard

Today, there is a wide variance in protocol support among service meshes, which complicates swapping one service mesh for another. Additionally, a full agreement on which protocols should be supported and what that support should look like has yet to exist. Arriving at an agreement is crucial because protocols are often critical in application design. For example, some applications use gRPC for specific communications needs, but not all service meshes support full observability of gRPC processes. Thus, consistent protocol support with consistent support functionality is essential for mesh interoperability and portability.

Switzerland Is Not That Far Away

In the skirmish of technology, creating a neutral, Switzerland-like standardization is both tangible and necessary. While it does require a lot of coordination, we know it’s doable – even at the broadest possible scale. The web browser and W3C have proven so with the way they created evolving standards to deliver near-perfect interoperability across all browsers, alongside the web applications running on top of them.

At F5, NGINX has laid foundations for these standards by adopting “simplicity” as one of our core tenets for NGINX Service Mesh. Simplicity trends towards efficiency, and we’re proud to support a fast and easy mesh. Installation and removal can take mere minutes. We’re also always looking to provide value with expanding protocol support, both down the stack with L4 needs like UDP, and up the stack with feature parity for gRPC and other L7+ protocols as they develop. Finally, workload and data classifications are an innovative goal we hope to realize for the future.

With these standards, neutrality can be found, and the fondue and chocolates are within reach. Integrating this base level of standardizations, so that architects, developers, security and platform ops teams can mix and match meshes as needed, will be the best way to drive adoption and make service meshes just like browsers — ubiquitous, understood and trusted.

The post Be Like Switzerland: 6 New Service Mesh Standards appeared first on The New Stack.

Supercharge Service Mesh with eBPF and Cilium

Idit Levine — Mon, 08 Nov 2021 14:00:48 +0000

This post has been removed at the request of the sponsor, Solo.io.

The New Stack post, “Supercharge Service Mesh with eBPF and Cilium,” has been removed at the request of its sponsor @soloio_inc

— The New Stack (@thenewstack) November 8, 2021

The post Supercharge Service Mesh with eBPF and Cilium appeared first on The New Stack.