Kubernetes Overview, News and Trends | The New Stack

Managing Kubernetes Complexity in Multicloud Environments

Hemanth Kavuluru — Thu, 15 Jun 2023 14:48:40 +0000

Kubernetes has become the ubiquitous choice as the container orchestration platform for building and deploying cloud native applications. As enterprises adopt Kubernetes, one of the key decisions they have to make is around adopting a multicloud strategy. It’s essential to understand the factors driving the need for a solution across public cloud providers such as Amazon Web Services (AWS) , Azure, GCP, Oracle, Alibaba, etc., and validate whether those factors are relevant currently or in the future. Some factors that influence multicloud strategy are:

Specialized cloud technology needs/requirements for particular applications
Multiple business units adopting separate clouds
GDPR and other locality considerations
Disaster recovery
Mergers and acquisitions of other businesses that have adopted different clouds
Dependency on a cloud-managed service

Specialized Cloud Technology Needs/Requirements for a Particular Application

Some applications require specialized cloud services only available on specific cloud platforms. For example, Google Big Table is a NoSQL database only available on Google Cloud. Similarly, Azure has specialized machine learning and AI services, such as Azure Cognitive Services.

In such scenarios, enterprises need to deploy their applications across multiple clouds to access the specialized services required for their applications. This approach can also help organizations optimize costs by choosing the most cost-effective cloud service for each application.

Multiple Business Units Adopting Separate Clouds

In large organizations, different business units may have unique requirements for their cloud services, leading to the adoption of separate cloud services. For example, one business unit may prefer Google Cloud for its machine learning capabilities, while another may prefer AWS for its breadth of services. As a result, the cloud environment becomes fragmented, and deploying applications across multiple clouds becomes complex.

GDPR and Other Locality Considerations

Regional regulations can also drive the need for a multicloud approach. For example, enterprises may need to store and process data in specific regions to comply with data residency regulations. For instance, Alibaba Cloud is China’s leading cloud provider and the preferred cloud in that region.

Deploying applications across multiple clouds in different regions can help enterprises meet their data residency and compliance requirements.

Disaster Recovery

Implementing disaster recovery in the right manner is essential for enterprises, as downtime can lead to significant revenue loss and reputational damage. A multicloud approach can help enterprises ensure business continuity by deploying applications across multiple clouds. In such scenarios, primary applications can run in one cloud while secondary applications can run in another for disaster recovery.

This approach can also help enterprises optimize their costs by choosing the most cost-effective cloud service for disaster recovery.

Mergers and Acquisitions

When organizations merge, they may have different cloud environments that must be integrated. Similarly, when organizations acquire other companies, they may need to integrate the acquired company’s cloud environment with their existing cloud environment, hence the need for a multicloud approach.

Dependency on a Particular Cloud Service

Enterprises may need to deploy applications in a particular cloud due to the dependency on a specific service that a specific cloud provider only offers. For example, an organization may require managed Oracle for its databases or SAP HANA for its ERP systems. In this case, deploying the applications in the same cloud is necessary to be closer to the database. Platform and site reliability engineering (SRE) teams must now acquire skills to manage Kubernetes infrastructure on a new public cloud. Platform teams must thoroughly understand all their application team requirements to see whether any of their applications will fall into this category.

How to Manage Multicloud Kubernetes Operations with a Platform Approach

Enterprises may want to invest in a true Kubernetes operations platform if the multicloud deployment is a critical requirement now or in the future. A true Kubernetes operations platform helps enterprises develop standardized automation across clouds while leveraging public cloud Kubernetes distributions such as AWS EKS, Azure AKS, Google GKE, etc. On the other hand, deploying and managing Kubernetes infrastructure on multiple clouds without a Kubernetes operations platform requires a lot of manual effort and can lead to substantial operational costs, operational inconsistencies, project delays, etc.

A Kubernetes operations platform can standardize the process for deploying and managing Kubernetes clusters across multiple clouds. Enterprises can use a unified interface to automate the deployment and management of Kubernetes clusters across multiple clouds. This automation helps improve consistency and reduce the risk of human error. It also reduces the need for specialized skills.
Enterprises also need to maintain a unified security posture across clouds. In a multicloud environment, each cloud provider has its own security policies, which makes it hard for enterprises to implement standard security policies across the clouds. A Kubernetes operations platform can provide consistent security policies across clouds, enforcing governance and compliance uniformly.
Consistent policy management and network security policies across clouds are critical for adopting multicloud Kubernetes deployments. A Kubernetes operations platform should provide standardized workflows for applying network security and Open Policy Agent (OPA) policies for Kubernetes clusters spanning clouds. Policies, including network policies, ingress and egress rules, can be defined in a centralized location and deployed to all Kubernetes clusters, ensuring consistency and reducing operational complexity.
A true Kubernetes operations platform should provide a unified bimodal multitenancy (cluster and namespace) across clouds. This platform should allow multiple teams and applications to share the same Kubernetes clusters without affecting each other, providing better resource utilization and cost efficiency. Similarly, for teams, applications or environments that require dedicated clusters, the Kubernetes platform should offer cluster-as-a-service where the individual teams can create their clusters in a self-serve manner adhering to the security, governance and compliance set by the platform and SRE teams.
Kubernetes access control, role-based access control (RBAC) and single sign-on (SSO) across all clouds are essential for a Kubernetes operations platform. However, access management becomes increasingly complex when deploying Kubernetes across multiple clouds. A unified access management solution can simplify the process and reduce the security risk.
Finally, a single pane of administration offering visibility for the entire infrastructure spanning multiple clouds is essential for a Kubernetes operations platform. A single management plane can provide centralized visibility into Kubernetes clusters across multiple clouds, allowing enterprises to monitor, manage and troubleshoot their Kubernetes clusters more efficiently.

Conclusion

A multicloud strategy may be an important consideration for enterprises that are adopting a Kubernetes operations platform for managing their Kubernetes infrastructure. Enterprises should carefully look at all factors that influence a multicloud deployment and decide whether multicloud is required for their organization. A true multicloud Kubernetes operations platform should provide standardized automation, consistent security policies, unified Kubernetes bimodal multitenancy, access management and a single administration pane, offering visibility for the entire infrastructure spanning multiple clouds.

The post Managing Kubernetes Complexity in Multicloud Environments appeared first on The New Stack.

Kubernetes Operators: The Real Reason Your Boss Is Smiling

Ryan Wallner — Wed, 14 Jun 2023 13:30:45 +0000

It’s no industry secret that the cloud native segment around Kubernetes has shifted toward hosted Kubernetes providers who build, run and partially manage the Kubernetes infrastructure for organizations. Compared to organizations building and maintaining their own Kubernetes infrastructure, hosted Kubernetes providers allow you to offload a measurable amount of technical complexity so staff can focus on operations and innovation.

Along with the rise of hosted Kubernetes providers, more enterprises are favoring larger Kubernetes distributions from the likes of OpenShift, Rancher, Azure Kubernetes Service (AKS), Amazon Elastic Kubernetes Service (EKS) and others rather than building their own homegrown distribution from the upstream codebase.

These trends are not limited to the Kubernetes platform itself but showcase a general movement toward letting the providers of strong core platform layers do what they do best so companies can focus on the business value that comes from building on top of Kubernetes. This was a chant heard in 2017 to “make Kubernetes boring,” and I think that we are getting there as an ecosystem.

But that was six years ago. What does “boring” look like in 2023 and how do new trends like the rise of Kubernetes operators fit into this picture? There are three ways I think of this when evaluating modern Kubernetes deployments:

I want my organization to build value on top of Kubernetes.

Similar to the mantra of 2017, the “value” we mean here is everything that is built on top on Kubernetes and the infrastructure layers, which has seen substantial progress and evolution from the community over the past six years.

I want Kubernetes to be simple.

Every organization is unique, and roles within your organization may differ depending on not only size, but also Kubernetes maturity. Because of this, skill sets vary, and not everyone has the time or ambition to become an expert. Those who aren’t experts want Kubernetes to be easy so daily tasks aren’t intrusive.

I want Kubernetes to be scalable.

Deployment models for Kubernetes are expanding, and enterprises are taking advantage of using Kubernetes across on-premises, multicloud and hybrid cloud environments. Kubernetes needs to be flexible across these environments while also enabling cluster growth with streamlined scalability as the practice matures.

Building Value on Top of Kubernetes

Once the Kubernetes infrastructure layers are solid for your organization, it’s time to build the “value” on top, whether that is an application end users interact with or a platform layer that adds advanced data services such as observability. Developers need to start somewhere, and this usually consists of finding the right Kubernetes resources for the workload, such as creating deployments, services, jobs, statefulsets, daemonsets, persistent volumes, pod security policies, role-based access control (RBAC) rules, secrets, service accounts and much more.

Managing and tracking all these resources can get quite complicated, and it’s likely that your team doesn’t need to control all these objects but must adhere to resources that affect how applications run. There are cases where this development practice is something that must happen: For instance, if the application you are building is unique to your organization, then the API resources prevent you from having to start from scratch.

However, on the flip side, we see DevOps teams, developers and application owners turning to trusted, prebuilt Kubernetes operators to run, configure and manage common applications so they can focus on the value above these layers.

Operators: Bringing Together Value, Simplicity and Scalability

If you’re not familiar with what a Kubernetes operator is, then I suggest reading the documentation.

Switchboard operator

However, whenever I hear the term “operator,” my mind immediately jumps to a switchboard operator with a massive telephone network in front of them moving wires in and out at a rapid pace while transferring calls.

You may remember them from the pilot of the hit show “Mad Men” or recall the popular saying, “Operator, please hold.”

Much like the way a switchboard operator in the 20th century assisted in the routing and transfer of phone calls, a Kubernetes operator facilitates the deployment, management and ongoing operations of a Kubernetes application. Except instead of having a person move wires behind a telephone switchboard, think of it as a robot who is listening to the inputs and commands and outputting the Kubernetes resources in the appropriate namespaces.

It’s Like a Robot, but without the Attitude

Unlike the switchboard operator, the core tenet of a Kubernetes operator is automation. Automation is a necessity as the community forges ahead with Kubernetes, allowing end users to focus on what matters to them while relying on operators to automate deployments, operations and management of common components in their stack.

There is a community propensity to use trusted operators for applications and not reinvent the wheel when running a particular service on Kubernetes. Take the database landscape’s use of operators as an example.

As seen at KubeCon EU in Amsterdam, the operator pattern has a strong use case for databases because in general; they are a common denominator to many application stacks. Applications may use Postgres or Redis in slightly different ways, but they are common services that need to be installed, configured and managed. Databases on Kubernetes deployed via operator in a trusted way for production is a major win for time-to-value when it comes to DevOps development cycles.

It doesn’t stop at databases, though; operators can be used for all kinds of applications. Operators can be used for almost anything from monitoring and alerting software, to storage integrations, to fully customized applications that may be delivered to internal customers.

It’s great to see the focus move northbound as the Kubernetes ecosystem matures. As end users and organizations are gravitating to hosted Kubernetes and application automation through operators, I’m excited to see the innovations that come next focus on what can be built on top of Kubernetes.

How Do We Use Operators?

Operator frameworks are extremely popular among Dell’s customers, and we are actively working to introduce deeper operator capabilities for our Kubernetes storage capabilities, such as our container storage modules, as well as container storage interface drivers, which are available on OperatorHub.io. Operators are also a key part of our future portfolio offerings and will be integrated into our upcoming user interface for Kubernetes data storage.

The benefits of using operators are straightforward: less time spent on manual processes, more time spent on coding and innovation. If you haven’t started with operators today in your business, I highly suggest exploring the world of Kubernetes operators and seeing how to take advantage of automation to make your life a little easier.

Simple, scalable and adding value on top of Kubernetes.

The post Kubernetes Operators: The Real Reason Your Boss Is Smiling appeared first on The New Stack.

The First Kubernetes Bill of Materials Standard Arrives

Steven J. Vaughan-Nichols — Tue, 13 Jun 2023 17:48:51 +0000

If you’re not using a Software Bill of Materials (SBOM) yet, you will be soon. They’re seen as essential groundwork for building code security defense. While there are many SBOM standards, such as Software Package Data Exchange (SPDX), CycloneDX: and GitHub’s dependency submission format, there hasn’t been one just for the popular container orchestration program Kubernetes until now: Kubernetes Security Operations Center’s (KSOC) Kubernetes Bill of Materials (KBOM) standard.

At this early stage, KBOM is a rough first draft. It provides an initial specification in JavaScript Object Notation (JSON) It’s been shown to work with Kubernetes 1.19 and newer; hyperscale cloud services providers; and do-it-yourself Kubernetes.

With the KBOM’s shell interface, cloud security teams can gain a comprehensive understanding of third-party tooling within their environment. This development is aimed at enabling quicker responses to the surge of new Kubernetes tooling vulnerabilities.

Is It Necessary?

Is there really a need for this, though, since there are many SBOM standards? Since Kubernetes is used by over 96% of organizations to orchestrate container deployments, clearly there’s a deployment security gap here. After all, Kubernetes security adoption remains low, with 34% in 2022. A major barrier to securing Kubernetes is getting an accurate grasp of the environment’s scope.

As KSOC CTO Jimmy Mesta explained: “Kubernetes is orchestrating the applications of many of the biggest business brands we know and love. Adoption is no longer an excuse, and yet from a security perspective, we continually leave Kubernetes itself out of the conversation when it comes to standards and compliance guidelines, focusing only on the activity before application deployment.” Therefore, “We are releasing this KBOM standard as a first step to getting Kubernetes into the conversation when it comes to compliance guidelines. ”

To meet these needs, KBOM offers a concise overview of a Kubernetes cluster’s elements. These include:

Workload count.
Cost and type of hosting service.
Vulnerabilities for both internal and hosted images.
Third-party customization, for example, the deployed custom resources, authentication, and service mesh.
Version details for the managed platform, the Kubelet, and more.

Sounds interesting? It should. To contribute, you can download the CLI tool today or learn more about the standard. You can also work on this Apache 2 open source program via its GitHub page.

The post The First Kubernetes Bill of Materials Standard Arrives appeared first on The New Stack.

The Rise of the Cloud Native Cloud

John Dietz — Mon, 12 Jun 2023 16:38:31 +0000

Kubefirst delivers instant GitOps platforms made from popular free and open source cloud native tools. We’ve supported the Amazon Web Services (AWS) cloud for years and love how well our platform runs on Elastic Kubernetes Service (EKS). We recently announced our expanded support for the new Civo cloud, a cloud native cloud that runs all of its infrastructure on Kubernetes. There are some pretty staggering differences between the two clouds, yet some things remain virtually identical, and it got me thinking about the wild journey of how we got here as an industry.

Remember the Original Public Clouds?

Remember when public clouds were new? Think back to computing just 10 years ago. In 2013, AWS was trying to further its stronghold on the new cloud computing space with its self-service infrastructure public cloud model in Elastic Compute Cloud (EC2). Google Cloud Platform and Microsoft Azure were just a couple years removed from announcing their own, further solidifying the architectural shift away from self-managed data centers.

Despite the higher compute cost of public cloud infrastructure compared to its on-premises equivalents, the overall time and money saved by leveraging repeatable on-demand cloud infrastructure prompted companies to begin tearing down their rack space and moving their infrastructure to the public clouds. The self-service model gave more power to the developer, fewer handoffs in the DevOps space and more autonomy to engineering teams. The public cloud era was here to stay.

The IaC Revolution

Although the days of sluggish infrastructure IT tickets were now a thing of the past, the potential of the cloud still remained untapped for many organizations. True to Tesler’s Law, the shift toward the public cloud hadn’t exactly removed system complexity — the complexity had just found a new home.

To tackle that complexity, we needed new automated ways to manage our infrastructure and the era of Infrastructure as Code (IaC) did its best to rise to this challenge. New technologies like CloudFormation, Ansible, Chef, Puppet and Terraform all did their best to step up to the need, but the infrastructure story from company to company was generally still a rather complex and bespoke operation.

The Container Revolution

Around the same time another movement was sweeping through the application space: containerization. Largely Docker-based at the time, containerizing your apps was a new way to create a consistent application runtime environment, isolating the application from the infrastructure that it runs upon.

With containerization, we were suddenly able to run an app the same way on different operating systems or distributions, whether running on your laptop, on on-premises infrastructure or in the public cloud. This solved a lot of problems that companies suddenly had as their infrastructure requirements began to dramatically shift in new directions.

Organizations with the classic monolithic applications began exploring how container-based microservices could be leveraged to optimize their software development and scaling woes. As the containerized world evolved and teams started building containerized microfrontends making calls to containerized microbackends, the sprawl of micro products started to become a lot to manage. This was particularly felt with the management of applications, infrastructure, secrets and observability at scale.

The Orchestration Battle

With the motion to put applications into containers and the resulting explosion of microservices and containerized micro products came a new challenge: managing all of them.

HashiCorp Nomad, Docker Swarm and Google’s Kubernetes (now part of CNCF) swiftly found their way to the conference keynote stages.

Each had its distinct advantages, but Kubernetes rose to the top with its declarative design, operating system and cloud portability, in addition to an unprecedentedly strong user community. The YAML-based system made it easy to organize your desired state into simple files that represent everything an application needs to work. It could be run in any cloud, on your on-premises infrastructure or even on your laptop, and it boasts a bustling community of cloud native engineers who share a uniform vision for modern solutions.

To Kubernetes Goes the Spoils

Cloud native engineers were quick to identify that all the software running inside Kubernetes was much easier to manage than the software that ran outside of Kubernetes. Opinions were beginning to form such that if your product didn’t have a Helm chart (the Kubernetes package manager), then it probably wasn’t very desirable to the cloud native engineers who were responsible for platform technology choices. After all, if you need to install complex third-party software, your choices are a Helm install command that takes seconds to run or pages upon pages of installation guides and error-prone instructions.

Opportunistic software vendors were quick to pick up on this trend and feverishly began rearchitecting their systems to be installed by Helm and operational on Kubernetes. The promise of delivering complex multicomponent software packages with complex microarchitectures, but still having it easily installable to any cloud environment has been the dream of software delivery teams forever, and it has finally reached that inevitability with Kubernetes at the helm.

How Complex Does Your Cloud Need to Be?

We first built kubefirst to provision instant cloud native (Kubernetes) platforms in the world’s largest public cloud, AWS, and it runs very well there. The maturity of the AWS cloud is largely unparalleled. If you need to accommodate large swaths of Fortune 500 complexities, Federal Information Processing Standards (FIPS)-compliant endpoints from all angles, extreme scales with enormous data volume or some other nightmare of this type, choosing one of the “big 3” (AWS, Google Cloud or Microsoft Azure) is a pretty easy instinct to follow.

If you’re working in this type of environment, kubefirst is lightning-fast and can turn 12 months of platform building into a single 30-minute command (kubefirst aws create).

We still love the big clouds. However, when we asked our community what clouds we should expand into, we weren’t too surprised to find a clamoring of interest for a simpler cloud option that focused on managed Kubernetes. The newer cloud providers like Civo, Vultr, DigitalOcean and others of this ilk are boasting blazing fast cluster provisioning times with significantly reduced complexity. With fewer resources to manage than the cloud pioneers can offer, you get you into that new cluster much faster.

Let’s break this down in terms of Terraform cloud resources, the code objects in your infrastructure as code. To create a new kubefirst instant platform from scratch in AWS, our kubefirst CLI needs to provision 95 AWS cloud resources. This includes everything — the VPC, subnets, key management service keys, state store buckets, backends, identity and access management (IAM) roles, policy bindings, security groups and the EKS cluster itself. Many of these resources are abstracted behind Terraform modules within the kubefirst platform, so the complexity of the cloud has been heavily reduced from the platform engineer’s perspective, but there’s still quite a bit of “cloud going on.” It’s also a very sophisticated and enterprise-ready setup if that’s what your organization requires.

But there’s a cost for this sophistication. To provision these 95 resources and get you into your cluster, you’ll have to wait about 25 fully automated minutes, and a lot of that is waiting on cluster provision time. It takes about 15 minutes to provision the master control plane and another 10 to provision and attach the node groups to it. If you need to destroy all these resources, it will take another 20 minutes of (automated) waiting.

But to have the same kubefirst platform in Civo, you only need to manage three Terraform resources instead of 95, and instead of the 45 minutes it takes to provision and destroy, you could do the same in about four minutes. When infrastructure is part of what you’re changing and testing, this is an enormously consequential detail for a platform team.

The Rise of Platform Engineering and the Cloud Native Cloud

Platform engineering is an emerging practice that allows organizations to modernize software delivery by establishing a platform team to build a self-service developer platform as their product. The practice requires that platform teams iterate regularly on the provisioning of infrastructure, cloud native application suites, application CI/CD, and Day 2 observability and monitoring. With entire software development ecosystems being provisioned over and over becoming the new normal, spending 45 minutes between iterations instead of four can be a costly detail for your platform team’s productivity.

If you fear that you will eventually need the complexities of “the big 3” clouds, that doesn’t mean that you need to borrow that cloud complexity today. Kubefirst is able to abstract the cloud from the platform so you can build your platform on kubefirst civo today and move it to kubefirst aws tomorrow with all of the same cloud native platform tools working in all the same ways.

The Kubefirst Platform on the Cloud Native Clouds

Kubefirst provisions open source instant fully automated open source cloud native platforms on AWS, Civo, Vultr (beta), DigitalOcean (beta), and on the localhost with k3d Kubernetes. Civo Cloud is offering a one-month $250 free credit so you can try our instant platform on its cloud for free.

To create a new Civo account, add a domain, configure the nameserver records at your domain registrar, then run kubefirst civo create (full instructions when using Civo with GitHub, and with GitLab).

Within a few minutes you’ll have:

A gitops repo added to your GitHub/GitLab that powers your new platform so you can add your favorite tools and extend the platform as you need.
A Civo cloud and Kubernetes cluster provisioned with and configured by Terraform IaC.
A GitOps registry of cloud native platform application configurations, preconfigured to work well with each other.
HashiCorp Vault secrets management with all the platform secrets preconfigured and bound to their respective tools.
A user management platform with single sign-on (SSO) for your admins and engineers and an OpenID Connect (OIDC) provider preconfigured to work with all of your platform tools.
An example React microservice with source code that demonstrates GitOps pipelines and delivery to your new Kubernetes development, staging and production environments.
An Argo Workflows library of templates that conduct GitOps CI and integrate the Kubernetes native CI with GitHub Actions/GitLab pipelines.
Atlantis to integrate any Terraform changes with your pull or merge request workflow so that infrastructure changes are automated and auditable to your team.
Self-hosted GitLab/GitHub runners to keep your workloads cost-free and unlimited in use.

And with kubefirst you can throw away your production cluster with the next iteration available just a couple of minutes later.

The rise of the cloud native cloud is here.

The post The Rise of the Cloud Native Cloud appeared first on The New Stack.

Deploy a Kubernetes Development Environment with Kind

Jack Wallen — Sat, 10 Jun 2023 14:00:45 +0000

Let me set the stage: You’re just starting your journey into Kubernetes and you’re thrilled at the idea of developing your first application or service. Your first step is to deploy a Kubernetes cluster so you can start building but almost immediately realize how challenging a task that is.

All you wanted to do was take those first steps into the world of container development but actually getting Kubernetes up and running in a decent amount of time has proven to be a bit of a challenge.

Would that there was something a bit kinder.

There is and it’s called kind.

From the official kind website: kind is a tool for running local Kubernetes clusters using Docker container “nodes.” kind was primarily designed for testing Kubernetes itself but may be used for local development or continuous integration.

Kind is one of the easiest ways of starting out with Kubernetes development, especially if you’re just beginning your work with containers. In just a few minutes you can get kind installed and running, ready for work.

Let me show you how it’s done.

What You’ll Need

You can install kind on Linux, macOS, and Windows. I’ll demonstrate how to install kind on all three platforms. Before you install kind on your operating system of choice, you will need to have both Docker and Go installed. I’ll demonstrate it on Ubuntu Server 22.04. If you use a different Linux distribution, you’ll need to alter the installation steps accordingly.

Installing Docker

The first thing to do is install Docker. Here’s how on Each OS.

Linux

Log into your Ubuntu instance and access a terminal window. Add the official Docker GPG key with the command:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg &&

| sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Add the Docker repository:

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install the necessary dependencies with the command:

sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release git -y

Update apt:

sudo apt-get update

Install the latest version of the Docker CE runtime engine:

sudo apt-get install docker-ce docker-ce-cli containerd.io -y

Add your user to the docker group with the command:

sudo usermod -aG docker $USER

Log out and log back in for the changes to take effect.

MacOS/Windows

The easiest method of installing Docker on macOS and Windows is by way of Docker Desktop. You can download the installers for macOS Intel, macOS Apple Silicon, or Windows, double-click the files, and walk through the installation wizards.

Installing Go

Next, install Go. Here’s how.

Ubuntu Linux

To install Go on Ubuntu, open a terminal window and issue the command:

sudo apt-get install golang-go -y

MacOS/Windows

To install Go on macOS or Windows, simply download and run the installer file which can be found for macOS Intel, macOS Apple Silicon, and Windows.

Installing kind

Now, we can install kind. Here’s how for each platform.

Linux

Download the binary file with the command:

curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.14.0/kind-linux-amd64

Give the file the necessary permissions with:

chmod +x kind

Move it to /usr/bin with:

sudo mv kind /usr/bin/

MacOS

Open the terminal application. For macOS Intel, download kind with:

[ $(uname -m) = x86_64 ]&& curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.14.0/kind-darwin-amd64

For Apple Silicon, issue the command:

[ $(uname -m) = arm64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.14.0/kind-darwin-arm64

Give the file executable permissions with:

chmod +x kind

Move kind so that it can be run globally with the command:

mv ./kind /usr/local/bin/kind

Windows

Open the terminal window app. Download kind with:

curl.exe -Lo kind-windows-amd64.exe https://kind.sigs.k8s.io/dl/v0.14.0/kind-windows-amd64

Move the executable file to the directory of your choice with the command:

Move-Item .\kind-windows-amd64.exe c:\DIRECTORY\kind.exe

Where DIRECTORY is the name of the directory to house kind.

Create a Dev Environment

It’s now time to deploy your first Kubernetes cluster with kind. Let’s create one called tns-test with the command:

kind create cluster --name=tns-test

You should see the following output in the terminal window:

✓ Ensuring node image (kindest/node:v1.24.0) 🖼

✓ Preparing nodes 📦

✓ Writing configuration 📜

✓ Starting control-plane 🕹️

✓ Installing CNI 🔌

✓ Installing StorageClass 💾

Once the output completes, you’re ready to go. One thing to keep in mind, however, is that the command only deploys a single node cluster. Say you have to start developing on a multinode cluster. How do you pull that off? First, you would need to delete the single node cluster with the command:

kind delete cluster --name=tns-test

Next, you must create a YML file that contains the information for the nodes. Do this with the command:

nano kindnodes.yml

In that file, paste the following contents:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker

Save and close the file. You can then deploy with the command:

kind create cluster --name=tns-multi-test --config=kindnodes.yml

To verify your cluster is running, issue the command:

kind get clusters

You should see tns-multi-test in the output.

If you want to interact with kubectl, you first must install it. On Ubuntu, that’s as simple as issuing the command:

sudo snap install kubectl --classic

Once kubectl is installed, you can check the cluster info with a command like this:

kubectl cluster-info --context kind-tns-multi-test

You should see something like this in the output:

Kubernetes control plane is running at https://127.0.0.1:45465
CoreDNS is running at https://127.0.0.1:45465/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use ‘kubectl cluster-info dump‘.

You can now start developing on a multinode Kubernetes cluster, with full use of the kubectl command.

The post Deploy a Kubernetes Development Environment with Kind appeared first on The New Stack.

Enhance Kubernetes Scheduling for GPU-Heavy Apps with Node Templates

Žilvinas Urbonas — Wed, 07 Jun 2023 17:00:45 +0000

Kubernetes scheduling ensures that pods are matched to the right nodes so that the Kubelet can run them.

The whole mechanism promotes availability and performance, often with great results. However, the default behavior is an anti-pattern from a cost perspective. Pods running on half-empty nodes equal higher cloud bills. This problem becomes even more acute with GPU-intensive workloads.

Perfect for parallel processing of multiple data sets, GPU instances have become a preferred option for training AI models, neural networks, and deep learning operations. They perform these tasks faster, but also tend to be costly and lead to massive bills when combined with inefficient scheduling.

This issue challenged one of CAST AI’s users — a company developing an AI-driven security intelligence product. Their team overcame it with our platform’s node templates, an autoscaling feature that boosted the provisioning and performance of workloads requiring GPU-enabled instances.

Learn how node templates can enhance Kubernetes scheduling for GPU-intensive workloads.

The Challenge of K8s Scheduling for GPU Workloads

Kube-scheduler is Kubernetes’ default scheduler running as part of the control plane. It selects nodes for newly created and yet unscheduled pods. By default, the scheduler tries to spread these pods evenly.

Containers within pods can have different requirements, so the scheduler filters out any nodes that don’t meet the pod’s specific needs.

It identifies and scores all feasible nodes for your pod, then picks the one with the highest score and notifies the API server about this decision. Several factors impact this process, for example, resource requirements, hardware and software constraints, affinity specs, etc.

Fig. 1 Kubernetes scheduling in overview

The scheduler automates the decision process and delivers results fast. However, it can be costly as its generic approaches may get you to pay for resources that are suboptimal for different environments.

Kubernetes doesn’t care about the cost. Sorting out expenses — determining, tracking and reducing them — is up to engineers, and this is particularly acute in GPU-intensive applications, as their rates are steep.

Costly Scheduling Decisions

To better understand their price tag, let’s look at Amazon EC2 P4d designed for machine learning and high-performance computing apps in the cloud.

Powered by NVIDIA A100 Tensor Core GPUs, it delivers top throughput and low latency networking and support for 400 Gbps instance networking. P4d promises to lower the cost of training ML models by 60% and provide 2.5x better performance for deep learning than earlier P3 instance generations.

While it sounds impressive, it also comes at an hourly on-demand price exceeding the cost of a popular instance type like C6a several hundred times. That’s why it’s essential to control the scheduler’s generic decisions precisely.

Fig. 2 Price comparison of p4d and c6a

Unfortunately, when running Kubernetes on GKE, AKS or Amazon Web Services‘ Elastic Kubernetes Service (EKS), you have minimal impact on adjusting scheduler settings without using components such as MutatingAdmissionControllers.

That’s still not a bulletproof solution, as when authoring and installing webhooks, you need to proceed with caution.

Node Templates to the Rescue

This was precisely the challenge one of CAST AI users faced. The company develops an AI-powered intelligence solution for the real-time detection of threats from social and news media. Its engine analyzes millions of documents simultaneously to catch emerging narratives, but it also enables the automation of unique Natural Language Processing (NLP) models for intelligence and defense.

The volumes of classified and public data that the product uses are ever-growing. That means its workloads often require GPU-enabled instances, which incur extra costs and work.

Much of that effort can be saved using node pools (Auto Scaling groups). But while helping streamline the provisioning process, node pools can also be highly cost-ineffective, leading you to pay for the capacity you don’t need.

CAST AI’s autoscaler and node templates improve on that by providing you with tools for better cost control and reduction. In addition, thanks to the fallback feature, node templates let you benefit from spot instance savings and guarantee capacity even when spots become temporarily unavailable.

Node Templates in Action

The workloads of the CAST AI client now run on predefined groups of instances. Instead of having to select specific instances manually, the team can broadly define their characteristics, for example “CPU-optimized,” “Memory-optimized” and “GPU VMs,” then the autoscaler does the rest.

This feature has given them far more flexibility, as they can use different instances more freely. As AWS adds new, highly performant instance families, CAST AI automatically enrolls you for them, so you don’t need to enable them additionally. This isn’t the case with node pools, which require you to keep track of new instance types and update your configs accordingly.

By creating a node template, our client could specify general requirements — instance types, the lifecycle of the new nodes to add, and provisioning configs. They additionally identified constraints such as the instance families they didn’t wish to use (p4d, p3d, p2) and the GPU manufacturer (in this case, NVIDIA).

For these particular requirements, CAST AI found five matching instances. The autoscaler now follows these constraints when adding new nodes.

Fig. 3 Node template example with GPU-enabled instances

Once the GPU jobs are done, the autoscaler decommissions GPU-enabled instances automatically.

Moreover, thanks to spot instance automation, our client can save up to 90% of hefty GPU VMs costs without the negative consequences of spot interruptions.

As spot prices can vary dramatically for GPUs, it’s essential to pick the most optimal ones at the time. CAST AI’s spot instance automation takes care of this. It can also ensure the right balance between the most diverse and cheapest types.

And on-demand fallback can be a blessing in mass spot interruptions or low spot availability. For example, an interrupted, not properly saved training process in deep learning workflows can lead to severe data loss. If AWS happens to withdraw at once all EC2 G3 or p4d spots your workloads have been using, an automated fallback can save you a lot of hassle.

How to Create a Node Template for Your Workload

Creating a node template is relatively quick, and you can do it in three different ways.

First, by using CAST AI’s UI. It’s easy if you have already connected and onboarded a cluster. Enter your product account and follow the screen instructions.

After naming the template, you need to select if you wish to taint the new nodes and avoid assigning pods to them. You can also specify a custom label for the nodes you create using the template.

Fig. 4 Node template from CAST AI

You can then link the template to a relevant node configuration, but you can also specify if you wish your template to use only spot or on-demand nodes only.

You also get a choice of processor architecture and the option to use GPU-enabled instances. If you select this preference, CAST AI will automatically run your workloads on relevant instances, including any new families added by your cloud provider.

Finally, you can also use restrictions such as :

Compute-optimized: helps to pick instances for apps requiring high-performance CPUs.
Storage Optimized: selects instances for apps that benefit from high IOPS.
Additional constraints, such as Instance Family, minimum and maximum CPU and memory limits.

But the hard fact is that the fewer constraints you add, the better matches and the higher cost savings you will get. CAST AI’s engine will take care of that.

You can also create node templates with Terraform (you can find all details in GitHub) or use API (check the documentation).

Summary

Kubernetes scheduling can be challenging, especially when it comes to GPU-heavy applications. Although the scheduler automates the provisioning process and delivers fast results, it can often prove too generic and expensive for your application’s needs.

With node templates, you get better performance and flexibility for GPU-intensive workloads. The feature also ensures that once a GPU instance is no longer necessary, the autoscaler decommissions it and gets a cheaper option for your workload’s new requirements.

We found that this quality helps build AI apps faster and more reliably — and we hope it will support your efforts, too.

The post Enhance Kubernetes Scheduling for GPU-Heavy Apps with Node Templates appeared first on The New Stack.

‘Running Service’ Blueprint for a Kubernetes Developer Portal

Zohar Einy — Wed, 07 Jun 2023 15:30:41 +0000

Internal developer portals exist to provide developers with a product-like experience that’s free of cognitive load, allowing developers to stay in the flow and be productive. They are set up by platform engineering teams to help developers serve themselves within guardrails and internal quality standards.

With portals, developers can simply and easily set up an ephemeral environment, restart a Kubernetes cluster, redeploy an image tag or scaffold a microservice. Platform engineering will make those actions reusable in the platform, and the internal developer portal will act as the interface to the platform and then reflect the changes in the software catalog.

But internal developer portals are more than loosely coupled product-like user interfaces that make developer lives easier. The internal developer portal also has a valuable software catalog that includes everything application-related in your engineering, from CI/CD metadata through cloud resources, Kubernetes, services and more.

The value of the software catalog is much greater than the metadata it contains (which is pretty neat, too) and goes way beyond showing who owns a service or where its logs are. In addition to being a single source of truth, its value comes from the way it provides context, especially in case of runtime data. It can quickly answer questions such as, “What is the current running version of service x in environment y?” even in cases that contain feature flags, canary or blue/green deployments.

Context and runtime data are the focus of this article. We will provide a detailed example of an internal developer portal for Kubernetes objects. We will then show the power of the software catalog and the fact that it can support workflow automation — anything from time to live (TTL) termination through service locking, triggering automated actions when services degrade, etc. — as a result of its combination of metadata and runtime data.

Spotify’s Backstage C4 Model for Internal Developer Portals

Software catalogs need a data model, and before you begin, you need to define it. It’s nothing too complex, but you do need a schema identifying what needs to be inside your software catalog. Software catalogs need to be unopinionated and completely flexible, so the best option is to let you define the data model yourself.

In Port, the schema for a type of entity (let’s say a K8s cluster) is called Blueprint. The actual entity (the actual cluster in this case) is called an entity. In Spotify’s backstage, the Blueprint is called “kind.”

Backstage, a leading open source internal developer portal and the third most popular Cloud Native Computing Foundation (CNCF) project, recommends beginning with a certain data model consisting of six blueprints (or kinds).

Component
API
Resource
System
Domain
Group

As Spotify’s senior engineer Renato Kalman and staff engineer Johan Wallin explain here, in designing Backstage they had a software visualization challenge: They needed a “standardized software metadata model to create a common language for communicating software architecture.” What they came up with was the C4 model. You can see an example of a Backstage C4 model here.

But this data model misses one point: the “running service” blueprint.

What Is a Running Service?

Your code is not your app. The code that lives in your repo or in a container image isn’t the app. In real life, your app exists on an environment and serves something (api/other services/users) within an ecosystem of tools and dependencies. It behaves differently depending on where it is.

The running-service blueprint, or as we sometimes call it “service in environment,” reflects the fact that the a single “service” is usually deployed to many different environments. Services can live in a variety of environments, staging, development, production. Services can also live in many different customer environments, especially in the case of single-tenant architectures.

This simple fact that the service lives in many different environments is reflected by the idea of the “running service” blueprint in Port. The “running service” entity lets us see the service “in the wild” — in the specific environment it actually lives in. Only this provides us with the correct and actionable context to understand what is going on.

‍ Sticking to a static software catalog with a static data model that only includes metadata and not runtime data doesn’t provide the context we need. Insights exist only if you look at the real instance of the running microservice.

A Kubernetes Internal Developer Portal: The ‘Running Service’ Blueprint

Some argue that the growth of Kubernetes is one of the core drivers behind platform engineering. Kubernetes complexity, the expertise required of its practitioners and the recent movement of many developers to cloud native development all created increased load and friction between developers and DevOps.

Internal developer portals abstract Kubernetes away for developers. They let developers understand Kubernetes by showing them the relevant data, in context. They also support developer self-service actions. It’s important to ensure that these Kubernetes internal developer platforms include:

All Kubernetes objects in the software catalog, not just microservices
Multicluster support
CRD support

Let’s look at how to set up blueprints (the data model) for a Kubernetes internal developer portal and then at how and when we include the running service blueprint for Kubernetes.

This is the basic set of blueprints for Kubernetes:

Workload is the “running service” for Kubernetes. It is a generic name for stateful sets, deployments, daemon sets and any other workload running in the cluster.

A cluster represents a Kubernetes cluster in the infrastructure, providing the high-level connection between the different objects in the Kubernetes cluster.
A node is a server that hosts and provides the runtime for the different applications and microservices in the Kubernetes cluster.
A namespace is meant to group together many resources inside the same Kubernetes cluster, giving you the option to view how a complete environment hosted from the same Kubernetes cluster is connected.
The workload is meant to be the focal point that provides the most relevant context to a developer about how their app is doing. The workload entity provides the developer with an abstract view of their different workloads. They can see the current status of the workload, such as instance count and health. By going upstream in the dependency tree, the developer can see what other applications and microservices are running next to their own workload, letting the developer understand if there are any connectivity or functionality issues.
A pod is an instance of the workload, giving us visibility into the health of the pieces that make up the complete workload, as well as the ability to understand if there are any specific issues in the availability of the service provided by the workload.

You Should Begin Using a Running Service or Workload Blueprint

We’ve seen that the runtime blueprint, regardless of whether we call it “running service,” “workload” or even the literal “service in environment,” is useful. It reflects the reality in which a single service usually exists in several environments at the same time, such as development, staging, etc. It can also be deployed in many different customer environments. The running service provides the runtime data so we can understand the service in the context of its environment and deployment, as well as its real-time information, from uptime to status.

You can use Port for free at getport.io, or check a fully populated Port demo here.

The post ‘Running Service’ Blueprint for a Kubernetes Developer Portal appeared first on The New Stack.

How Adobe Uses OpenTelemetry Collector

Susan Hall — Mon, 05 Jun 2023 17:02:50 +0000

Adobe’s Chris Featherstone and Shubhanshu Surana praised the OpenTelemetry Collector as the Swiss army knife of observability in their talk at Open Source Summit North America.

They went on to explain how they use it to track massive amounts of observability data their company collects, including metrics, 330 million unique series a day; span data of 3.6 terabytes a day; and log data of over 1 petabyte a day.

Adobe’s Chris Featherstone and Shubhanshu Surana

Featherstone, senior manager for software development, explained that not all of this data flows through his team or the OTel collector, but “it’s a pretty good chunk.”

Distributed tracing led his team to OpenTelemetry. Adobe is largely made up of acquisitions, he explained, and with every new company brought in, people have their own opinions of the best cloud, this tool, that text editor, etc.

“With distributed tracing specifically, that becomes a huge challenge,” he said. “Imagine trying to stitch a trace across clouds vendors, open source. So eventually, that’s what led us to the collector. But we were trying to build a distributed tracing platform based on Jaeger agents.” That was in 2019.

It started rolling out the OTel Collector in April 2020 to replace the Jaeger agents. Originally, the collector was just to ingest traces, but in September 2021, brought in metrics and they’re looking to bring in logs as well.

The team instruments applications using Open Telemetry libraries, primarily auto instrumentation, and primarily Java. It does some application enrichment, brings in Adobe-specific data and enriches its pipelines as data flows to the collector. It has some custom extensions and processors, the team does configuration by GitOps where possible.

“The collector is very dynamic extending to multiple destinations with one set of data and this was huge for us. …Sometimes we send collector data to other collectors to further process. So it’s the Swiss Army knife of observability,” Featherstone said.

His team at Adobe is called developer productivity with the charter to help developers write better code faster.

For the Java services, in particular, it has a base container and “if you’re using a Java image, you should go use this … It has a number of quality-of-life features already rolled into it, including the OpenTelemetry Java instrumentation in the jar. [The configuration is ] pulled from our docs, and this is exactly how we configure it for Java.

“So we set the Jaeger endpoint to the local DaemonSet collector. We set the metrics exporter to Prometheus, we set the propagators, we set some extra resource attributes, we set the tracer, the exporter to Jaeger. And we set the trace sampler to parent-based always off,” he said, pointing out that this is all rolled into the Java image.

So with these configurations, any Java service that spins up in Kubernetes at Adobe is already participating in tracing. Everything set up this way passes through the collector.

“So everyone’s participating in tracing just by spinning this up,” he said. “The metrics, we’ve tried to reduce the friction, people would still need to somehow go get those metrics out of that exporter. We’ve made that pretty easy, but it’s not automatic.” He said about 75% of what they run is Java, but they’re trying the same concept with Node.js and Python and other images.

Managing the Data

They do a lot of enrichment, and as well as ensuring no secrets are being sent as part of our tracing or metrics data, said Surana, Adobe’s cloud operations site reliability engineer for observability.

It uses multiple processes, including reduction processor as well as a custom processor in OpenTelemetry Collector that allows them to eliminate certain fields they don’t want sent to the backend, which could be personally identifiable information or other sensitive data. They’re also used to enrich the data because adding more fields such as service identifiers, Kubernetes clusters, and region help improve search.

“Adobe is built out of active acquisitions, and we run multiple different products in different ecosystems. There is a high possibility of service names being colliding under different products or under as similar microservice names, so we wanted to ensure that doesn’t happen,” he said.

It also uses Adobe-specific service registry, where every service has a unique ID attached to the service name. It allows any of the engineers at Adobe to uniquely identify a service in the single tracing backend.

“It [also] allows the engineers to quickly search on things, even though they don’t know the service, or they don’t know who owns that service, they can go look into our service registry, find out the engineering contact for that particular product or team and get on a call to resolve their issue,” Surana said.

They also send data to multiple export destinations.

“This is probably the most common use case,” he said. “Before the introduction of the OpenTelemetry Collector, engineering teams at Adobe have been using different processes, different libraries in a different format. And they were sending it to vendor products, open source projects, and it was very hard for us to get the engineering teams to change their backend, or to just do any small change in the backend code or their application code because engineers have their own product features and product requests, which they are working on.

“With the introduction of OpenTelemetry Collector, as well as the OTLP [OpenTelemetry protocol] format, This made it super easy for us; we are able to send their data to multiple vendors, multiple toolings with just few changes on our side.”

Last year, they were able to send the tracing data to three different backends at the same time to test out one engineering-specific use case.

They’re now sending data to another set of OTel collectors at the edge where they can do transformations including inverse sampling, rule-based sampling and throughput-based sampling.

He said they’re always looking into other ways to get richer insights while sending less data to the backend.

“This entire configuration is managed by git. We make use of the OpenTelemetry Operator Helm charts primarily for our infrastructure use case. … It takes away the responsibility from the engineers to be subject matter experts … and makes the configuration super easy,” he said.

Auto instrumentation with OpenTelemetry Operator allows engineers to just pass in a couple of annotations to instrument their service automatically for all the different signals without writing a single line of code.

“This is huge for us,” he said. This takes developer productivity to the next level.”

They also built out a custom extension on top of the OpenTelemetry Collector using the custom authenticator interface. They had two key requirements for this authentication system: to be able to use a single system to securely send data to the different backends and to be able to secure it for both open source and vendor tools.

OpenTelemetry Collector comes with a rich set of processes for building data processes, including an attribute processor which allows you to add attributes on top of log data and matrix data. It allows you to transform, enrich or modify the data in transit without the application engineers doing anything. Adobe also uses it to improve search capabilities in its backends.

The memory limiter processor helps ensure OTel never runs out of memory and checks the amount of storage needed for keeping things in state. They also use the span to matrix processor and service graph processor to generate data out of traces and build metrics dashboards on the fly.

So What’s Next?

Two things, according to Featherstone: improving data quality, namely getting rid of data no one is going to look at, and rate limiting spans at the edge.

The collector provides the ability at the edge to create rules and drop some data.

“For metrics, imagine that we had the ability to aggregate right in the collector itself. You know, maybe we don’t need quite 15-second granularity, let’s dumb that down to five minutes, and then send that off,” Featherstone said.

“Another one might be sending some metrics to be stored for long term and sending some on to be further processed in some operational data lake or something like that. We have the ability to just pivot right in the collector and do all kinds of things.”

The second thing is rate-limiting spans at the edge.

“We have one of our edges is taking like 60 billion hits per day, and we’re trying to do tracing on that. That becomes a lot of data when you’re talking about piping that all the way down to somewhere to be stored. So we’re trying to figure out where’re the right places to implement rate limiting in which collectors and at what levels … just to prevent unknown bursts of traffic, that kind of thing,” he said.

They’re also trying to pivot more to trace-first troubleshooting.

“We have so many east/west services that trying to do it through logs and trying to pull up the right log index for whatever team and do I even have access to it or whatever. It’s so slow and so hard to do, that we’re trying to really shift the way that people are troubleshooting within Adobe, to something like this, where we’ve made a lot of effort to make these traces, pretty complete,” he said.

They also looking into how people go about troubleshooting and whether the tools they have provide the best way to do that.

They’re looking forward to integrating the OpenTelemetry logging libraries with core application libraries and running OTel collectors as sidecars to send metrics, traces and logs. They exploring the new connector component and building a trace sampling extension at the edge to improve data quality.

Wrapping up, he lauded the collector’s plug-in-based architecture and the ability to send data to different destinations with a single binary. There are a rich set of extensions and processors which give a lot of flexibility with your data, he said.

“OpenTelemetry in general feels a lot to me, like the early days of Kubernetes where everybody was just kind of buzzing about it, and it started like we’re on the hockey stick path right now,” he said. “The community is awesome. The project is awesome. If you haven’t messed with the collector yet, you should definitely go check it out.”

The post How Adobe Uses OpenTelemetry Collector appeared first on The New Stack.

7 Core Elements of an Internal Developer Platform

Viktor Farcic — Mon, 05 Jun 2023 13:41:04 +0000

What does it take to build an internal developer platform? What are the tools and platforms that can make it work? This post will discuss the architecture and tools required to stand up a fully operational internal developer platform. To see the actual steps of setting up the platform, watch this video.

Why Do We Want an Internal Developer Platform?

Platform engineering’s overarching goal is to drive developer autonomy. If a developer needs a database, there should be a mechanism to get it, no matter if that person is a database administrator or a Node.js developer. If a developer needs to manage an application in Kubernetes, that person doesn’t need to spend years trying to understand how Kubernetes works. All these actions should be simple to accomplish.

A developer should be able to accomplish what they need by defining a simple manifest or using a web UI. We want to enable all developers to consume services that will help them get what they need. Experts (platform engineers) will create those services in the internal developer portal, and users will consume them in its graphical user interface or by writing manifests directly and pushing them to git.

The High-Level Design of an Internal Developer Platform — 7 Core Elements

An internal developer platform needs several parts to become fully operational. For each part we will recommend a tool, but they can be exchanged with similar tools. The core idea is to map out the functionalities needed to build the platform:

A control plane: The platform needs a control plane that will be in charge of managing all the resources, no matter if they are applications running in a Kubernetes cluster or elsewhere, or if the infrastructure or services are in Amazon Web Services (AWS) , Azure, Google Cloud or anywhere else. Our recommended tool here is Crossplane.
A control plane interface: This will enable everyone to interact with the control plane and manage resources at the right level of abstraction. Our recommended tool here is Crossplane Compositions.
Git: The desired states will be stored in git, so we’ll have to add a GitOps tool into the mix. Its job will be to synchronize whatever we put in git with the control plane cluster. Our recommended tool here is Argo CD.
Database and schema management: Given that state is inevitable, we will need to have databases as well. Those databases will be managed by the control plane but to work well, we will also need a way to manage schemas inside those databases. Our recommended tool here is SchemaHero.
Secrets manager: For any confidential information that we cannot store in git, we’ll need a way to manage secrets in a secrets manager. Those secrets can be in any secrets manager. Our recommended tool to pull secrets from there is External Secrets Operator (ESO).
An internal developer portal/ graphical user interface: In case users don’t want to push manifests directly to git, we should provide them with a user interface that will enable them to see what’s running as well as to execute processes that will create new resources and store them in git. Our recommended tool here is Port.
CI/CD pipelines: Finally we will need pipelines to execute one-shot actions like the creation of new repositories based on templates, building images with new release changes to manifests and so on. Our recommended tool here is GitHub Actions.

The setup will require a few additional tools, but the list above is a must.

The diagram below shows how each of the elements interacts with each other. You can use it as a reference as you read through this article.

Let’s examine the role of each layer in the setup:

Control Plane

Let’s talk about control planes: We need a single API acting as an entry point. This is the main point of interaction for the internal developer platform. In turn, it will manage resources no matter where they are. We can use Crossplane with providers, which enables us to manage not only Kubernetes but also AWS, Google Cloud, Azure or other types of resources. We will use Kubectl to get custom resource definitions (CRDs) that will create deployments, services and manage databases in hyperscaler clusters, etc.

However, this alone isn’t enough for a full-fledged internal developer platform. An application can easily consist of dozens of resources. Infrastructure can be much more complicated than that. Most importantly, all those low-level resources are not at the right levels of abstraction for people who are not Kubernetes or AWS or Google Cloud specialists. We need something that is more user-friendly.

A User-Friendly Interface for the Control Plane

The control plane interface can act as the platform API when you’re 100% GitOps. It shouldn’t be confused with the internal developer portal, which acts as the graphical user interface. We can use Crossplane Compositions for that.

What is the right level of abstraction for the users of the platform we’re building? The rule is that we should hide, or abstract, anything that people don’t really care about when they use the internal developer platform. For instance, they probably don’t care about subnets or database storage. The right level of abstraction depends on the actual use of the platform and will differ from one organization to another. It’s up to you to discover how to best serve your customers and everyone else in your organization.

Crossplane Compositions enables us to create abstractions that can simplify the management of different kinds of applications. Next, we probably do not want anyone to interact directly with the cluster or the control plane. Instead of people sending requests directly to the control plane, they should be storing their desired states in git.

Synchronize from Git with GitOps

Changing the state of resources by directly communicating with the control plane should not be allowed, since no one will know who changed what and when. Instead, we should push the desired state into git and, optionally, do reviews through pull requests. If we plug GitOps tools into the platform, the desired state will be synchronized with the control plane, which in turn will convert it into the actual state.

This is a safer approach as it doesn’t allow direct access to the control plane and also keeps track of the desired state. I recommend doing this with Argo CD, but Flux and other solutions are just as good.

Schema Management

Databases need schemas. They differ from one application to another. To complete our internal developer platform, we need to figure out how to manage schemas, preferably as part of application definitions stored in git. There are many ways to manage schemas, but only a few enable us to specify them in a way that fits into the git model. The complication is that GitOps tools work only with Kubernetes resources, and that means that schemas should be defined as Kubernetes resources as well. This requires us to extend the Kubernetes API with CRDs that will enable us to define schemas as Kubernetes resources. I recommend using SchemaHero for that.

Secret Management

Some information shouldn’t be stored in git. Having confidential information such as passwords in git could easily result in a breach. Instead, we might want to store those in a secret manager like HashiCorp Vault or a solution provided by whichever hyperscaler you’re using. Still, those secrets need to reach the control plane so that processes inside it can authenticate with external APIs or access services, for example, databases. I recommend using External Secrets Operator (ESO) for that.

Internal Developer Portal — Graphical User Interface

The internal developer platform needs a user interface to sit on top of everything we’ve built so far. This is the internal developer portal. It both provides a catalog of services people can

use as well as an interface for developers to perform the actions we want them to use autonomously. Specifically, we need a way to initialize a process that will create new repositories for applications, add sample code, provide manifests for the databases and other dependencies, create CICD pipelines, and so on and so forth.

For this setup we began with the Kubernetes catalog template from Port.

We will then add two additional blueprints that will be related to the cluster blueprint, Backend App and Environment:

Environment

Backend App

CI/CD Pipelines

Finally, we need pipelines. They are the last piece of the puzzle.

Even though we are using GitOps to synchronize the actual state into the desired state, we need pipelines for one-shot actions that should be executed only once for each commit. These could be steps to build binaries, run tests, build and push container images and so on.

The Internal Developer Platform in Action

From the user (developer) perspective, a new application can be created with a simple click on a button in a Web UI or by defining a very simple manifest and pushing it to git. After that, the same interface can be used to observe all the relevant information about that application and corresponding dependencies.

Behind the scenes, however, the flow would be as follows.

The user interacts with a Web UI (Port) or directly with git. The job of the internal developer portal in this case is to trigger an action that will create all the necessary resources.
Creating all the relevant resources is a job done by the pipeline such as GitHub Actions. In turn, it creates a new repository with all the relevant files, such as source code, pipelines, application manifests, etc.
As a result of pushing changes to the application repository (either as a result of the previous action or, later on, by making changes to the code), an application-specific pipeline is triggered (GitHub Actions) which, as a minimum, builds a container image, pushes it to the image registry and updates the manifests in the management repo, which is monitored by GitOps tools like Argo CD or Flux.
GitOps tools detect changes to the management repo and synchronize them with the resources in the control plane cluster.
The resources in the control plane cluster are picked up by corresponding controllers (Crossplane), which in turn create application resources (in other Kubernetes clusters or as hyperscaler services like AWS Lambda, Azure Container Apps or Google Cloud Run) as well as dependent resources like databases (self-managed or as services in a hyperscaler).

The post 7 Core Elements of an Internal Developer Platform appeared first on The New Stack.

The Need to Roll up Your Sleeves for WebAssembly

B. Cameron Gain — Mon, 05 Jun 2023 13:00:41 +0000

We already know how putting applications in WebAssembly modules can improve runtime performance and latency speeds and compatibility when deployed. We also know that WebAssembly has been used to improve application performance when running on the browser on the backend. But the day when developers can create applications in the language of their choice for distribution across any environment simultaneously, whether it’s on Kubernetes clusters, servers, edge devices, etc. remains a work in progress.

This status quo became that much more apparent from the talks and impromptu meetings I had during KubeCon + CloudNativeCon in April. In addition to a growing number of WebAssembly module and service providers and startups offering support for WebAssembly, it’s hard to find any organization that is not getting down to work to at least see how it works as a sandbox project in wait of when customers will ask for or require it.

Many startups, established players and tool and platform providers are actively contributing to the common pool of knowledge by contributing or maintaining open source projects, taking part in efforts such as the ByteCode Alliance or sharing their knowledge and experiences at conferences, such as during the KubeCon + CloudNativeCon Europe’s co-located event Cloud Native Wasm Day. This collective effort will very likely serve as a catalyst so that WebAssembly will eventually soon move past its current status as just a very promising new technology and begin to be used for what it’s intended for on a massive industry scale.

Indeed, WebAssembly is the logical next step in the evolution from running applications on specific hardware, running them on virtual machines, to running them in containers on Kubernetes, Torsten Volk, an analyst at Enterprise Management Associates (EMA), said. “The payout in terms of increased developer productivity alone justifies the initial investments that come with achieving this ultimate level of abstraction between code and infrastructure. No more library hell: No more debugging app-specific infrastructure. No more refactoring of app code for edge deployments. In general, no more wasting developer time on stuff other than writing code,” Volk said. “This will get us to a state where we can truly compose new applications from existing components without having to worry about compatibility.”

Work to Be Done

But until we get that point of developer-productivity nirvana, work needs to be done. “Now we need all-popular Python libraries to work on WebAssembly and integrations with key components of modern distributed apps, such as NoSQL storage, asynchronous messaging, distributed tracing, caching, etc.,” Volk said. “Luckily there’s a growing number of startups completing the ‘grunt work’ for us to make 2024 the year when WebAssembly really takes off in production.”

Collaboration, alliances and harmony in the community, especially in the realm of open source, will be critical. “The one thing I’ve learned from the container wars is that we were fighting each other too early in the process. There was this mindset that the winner would take all, but the truth is the winner takes all the burden,” Kelsey Hightower, principal developer advocate, Google Cloud, said during the opening remarks at KubeCon + CloudNativeCon Europe’s Cloud Native Wasm Day. “You will be stuck trying to maintain the standards on behalf of everyone else. Remember collaboration is going to be super important — because the price for this has to be this invisible layer underneath that’s just doing all of this hard work.”

At the end of the day, those writing software probably just want to use their favorite language and framework in order to do it, Hightower said. “How compatible will you be with that? Or will we require them to rewrite all the software?” Hightower said. “My guess is anything that requires people to rewrite everything is doomed to fail, almost guaranteed and that there is no way that the world is going to stop innovating at the pace we’re on where the world will stop, and implement all the lower levels. So, it is a time to be excited, but understand what the goal is and make sure that this thing is usable and has tangible results along the way.”

During the sidelines of the conference, Peter Smails, senior vice president and general manager, enterprise container management, at SUSE, discussed how internal teams at SUSE shared an interest in Wasm without going into details about SUSE’s involvement. “WebAssembly has an incredibly exciting future and we see practical application of WebAssembly. I personally think of it as similar to being next-generation Java: it is a small, lightweight, fast development platform and, arguably, is an infrastructure that lets you write code and deploy it where you want and that’s pretty cool,” Smails told The New Stack.

In many ways, WebAssembly proponents face the chicken-before-the-egg challenges. After all, what developer would not want to be able to use the programming language of their choice to deploy applications for an environment or device without having to worry about configuration issues? What operations and security team would not appreciate a single path of deployment from finalized application code to deployment on any device or environment (including Kubernetes) security without the hassles of reconfiguring the application for each endpoint? But we are not there yet and many risks must be taken and investments made before wide-scale adoption really does happen the way it should in theory.

“We have a lot of people internally very excited about it, but practically speaking, we don’t have customers coming to talk about this asking for the requirements — that’s why it’s in the future,” Smails said. “We see it more as a potentially exciting space because we’re all about infrastructure.”

Get the Job Done

@ShopifyEng's @saulecabrera discussed Winch, a baseline compiler in Wasmtime. He says a baseline compiler is significantly faster than 'ahead-of-time' compilation optimization. Benchmarks to come.. #Wasm day #KubeConEU #kubeconeurope pic.twitter.com/Vq4wdiH6fe

— BC Gain (@bcamerongain) April 18, 2023

Meanwhile, there is a huge momentum to create, test and standardize the Wasm infrastructure to pave the way for mass adoption. This is thanks largely to the work of the open source community working on projects sponsored in-house or among new tool providers startups that continue to sprout up, as mentioned above. Among the more promising projects discussed during the KubeCon + CloudNativeCon co-located event Cloud Native Wasm Day, Saúl Cabrera, a staff developer, for Shopify, described how he is leading the development of Winch during his talk “The Road to Winch.” Winch is a compiler in Wasmtime created to improve application performance beyond what Wasm already provides. Offering an alternative to overcome the limitations of a baseline compiler, WebAssembly Intentionally-Non Optimizing Compiler and Host (Winch) improves startup times of WebAssembly applications, Cabrera said. Benchmarks result that demonstrates the touted performance metrics will be available in the near future, Cabrera said.

The post The Need to Roll up Your Sleeves for WebAssembly appeared first on The New Stack.

My Further Adventures (and More Success) with Rancher

Jack Wallen — Sat, 03 Jun 2023 14:00:16 +0000

When last we met, I offered my first-run experience with the SUSE Rancher Kubernetes distribution, which did not go well. On top of that, my reporting of those first steps did not sit well with the Rancher community. I get that and respect all of the input I received. Because of that, I decided to give Rancher another go and report on my latest findings.

Suffice it to say, while it was still a mixed bag, I did have more success with the platform.

Let’s chat.

It’s All about Minimum Requirements

The first person to reach out to me knows a thing or two about a thing or two and made a specific point to say that in no way could I get the latest version of Rancher to spin up properly when using the minimum suggested system requirements. In fact, this person (who shall remain anonymous) informed me that 8GB of memory is the bare minimum to use when deploying Rancher.

With that bit of information in hand, I spun up a new virtual machine with 8GB of RAM and 4 cores. After installing Docker in my usual fashion, I deployed the Rancher container and hoped for the best.

To my surprise, it worked. I now had a running Rancher container and could take the next steps.

It’s All about the Right Versions

This all started at around 9:30 PM last night. I then copied the Docker command and ran it on a MicroK8s cluster I’d had up and running for some time. This cluster was running perfectly, so I had every confidence Rancher would be able to successfully provision and I’d be working with the tool without a problem.

But there’s this guy named Murphy who has a law…

I went about writing some fiction, checking in on Rancher every now and then, only to see that (by 11:00 PM) Rancher had still yet to provision the MicroK8s cluster. When I woke up this morning, one of the first things I checked was the status of the provisioning and, after nearly 12 hours, the provisioning still had yet to succeed.

Something was wrong.

One person who commented on the original article works at SUSE. In one of his comments, the engineer stated, “There’s a potential version issue between MicroK8s and the Rancher-supported versions.”

Apparently, the latest version of MicroK8s won’t work with Rancher. To get around that, I would need to install MicroK8s version 1.25, which is done with the command:

sudo snap install microk8s --channel=1.25/stable --classic

If you already have an unsupported version of MicroK8s installed, remove it with:

sudo snap remove microk8s

Installing a specific version of MicroK8s isn’t mentioned in the documentation.

Nonetheless, with a supported version of MicroK8s installed on a three-node cluster, it was then time to see if Rancher could provision this time. Here’s how that’s done.

Log into Rancher with the admin credentials you set when you first logged into the service. Once logged in click the Create button for clusters (Figure 1).

Figure 1: As you can see, my first cluster has still yet to provision. Fingers crossed for the next attempt.

This time, I added the new cluster with the supporting version of MicroK8s using the command displayed by Rancher (Figure 2) and crossed my fingers it would successfully provision.

Figure 2: The rancher-agent deployment command I ran on my master node of the MicroK8s cluster.

When the deployment command was completed, I clicked done and waited for the provisioning response. To my surprise, after creating the new cluster, I wound up seeing the following error on the local machine:

Failed to get GlobalRoleBinding for 'globaladmin-user-hbgvw': %!w(<nil>)

A quick Google search clued me in that this is a known bug that shouldn’t affect anything.

At this point, it’s a matter of waiting to see if the cluster successfully provisions. It’s been over 20 minutes so far and nothing (Figure 3).

Figure 3: The local cluster error appeared after I attempted to connect with the new cluster.

I decided to delete the failed cluster and hope that the new cluster would be able to provision. After 40 minutes the cluster had still yet to provision.

The Local Cluster

Fortunately, this time around I could at least play with the local cluster, which does actually make it very easy to deploy applications. I am guessing, however, that the local cluster is not to be used for production deployments.

Even so, it was at least a means for me to see just how powerful and easy-to-use Rancher actually is. To deploy an app, select your local cluster and then click Apps > Charts. Of course, if this were a production environment, you’d want to make sure to select a provisioned cluster (instead of Local). In the resulting window (Figure 4), you can select from quite a good number of apps that can be installed thanks to Helm.

Figure 4: Installing any number of apps with Rancher is decidedly simple.

Select the app you want to install and, on the resulting page, click Install (Figure 5).

Figure 5: Installing Apache Cassandra with Rancher.

You will then be greeted by an installation wizard that allows you to configure the deployment of the app. Which app you choose will determine the steps in the wizard.

With Cassandra, all I had to do was configure the namespace and then I was presented with the YAML file for further customization (if needed). Click Install again and Rancher will do its thing.

After the app was installed, I could check the dashboard and see that it was successfully up and running (Figure 6).

Figure 6: Apache Cassandra has been successfully deployed with Ranch.

After playing around with the Local Cluster, I was finally able to see the value in Rancher. The Helm integration is fantastic, making it incredibly easy to install from a large number of apps and services.

The Conclusion

Now that I’ve seen what Rancher can do, I must say I’m seriously impressed. Even though I was never able to test the MicroK8s cluster, I was able to see just how powerful this platform truly is.

I will say, however, that although Rancher does make managing Kubernetes considerably easier (when compared to the CLI), getting it up and running is not nearly as simple as managing your deployments. I’m absolutely certain the problem with provisioning my cluster is on me but after following all of the advice I’ve been given and still seeing

Rancher failed to provision my MicroK8s cluster, I still am convinced the deployment of Rancher could be made easier. But the truth is, once you get past the deployment of the system, Rancher does, in fact, make Kubernetes simple.

The post My Further Adventures (and More Success) with Rancher appeared first on The New Stack.

How to Protect Containerized Workloads at Runtime

Kevin Casey — Tue, 30 May 2023 11:00:22 +0000

Security is (finally) getting its due in the enterprise. Witness trends such as DevSecOps and the “shift left” approach — meaning to move security as early as possible into development pipelines. But the work is never finished.

Shift left and similar strategies are generally good things. They begin to address a long-overdue problem of treating security as a checkbox or a final step before deployment. But in many cases is still not quite enough for the realities of running modern software applications. The shift left approach might only cover the build and deploy phases, for example, but not apply enough security focus to another critical phase for today’s workloads: runtime.

Runtime security “is about securing the environment in which an application is running and the application itself when the code is being executed,” said Yugal Joshi, partner at the technology research firm Everest Group.

The emerging class of tools and practices for security aim to address three essential security challenges in the age of containerized workloads, Kubernetes, and heavily automated CI/CD pipelines, according to Utpal Bhatt, CMO at Tigera, a security platform company.

First, the speed and automation intrinsic to modern software development pipelines create more threat vectors and opportunities for vulnerabilities to enter a codebase.

Second, the orchestration layer itself, like Kubernetes, also heavily automates the deployment of container images and introduces new risks.

Third, the dynamic nature of running container-based workloads, especially when those workloads are decomposed into hundreds or thousands of microservices that might be talking to one another, creates a very large and ever-changing attack surface.

“The threat vectors increase with these types of applications,” Bhatt told The New Stack. “It’s virtually impossible to eliminate these threats when focusing on just one part of your supply chain.”

Runtime Security: Prevention First

Runtime security might sound like a super-specific requirement or approach, but Bhatt and other experts note that, done right, holistic approaches to runtime security can bolster the security posture of the entire environment and organization.

The overarching need for strong runtime security is to shift from a defensive or detection-focused approach to a prevention-focused approach.

“Given the large attack surface of containerized workloads, it’s impossible to scale a detection-centric approach to security,” said Mikheil Kardenakhishvili, CEO and co-founder of Techseed, one of Tigera’s partners. “Instead, focusing on prevention will help to reduce attacks and subsequently the burden on security teams.”

Instead of a purely detection-based approach, one that often burns out security teams and puts them in the position of being seen as bottlenecks or inhibitors by the rest of the business, the best runtime security tools and practices, according to Bhatt, implement a prevention-first approach backed by traditional detection response.

“Runtime security done right means you’re blocking known attacks rather than waiting for them to happen,” Bhatt said.

Runtime security can provide common services as a platform offering that any application can use for secure execution, noted Joshi, the Everest Group analyst.

“Therefore, things like identity, monitoring, logging, permissions, and control will fall under this runtime security remit,” he said. “In general, it should also provide an incident-response mechanism through prioritization of vulnerability based on criticality and frequency. Runtime security should also ideally secure the environment, storage, network and related libraries that the application needs to use to run.”

A SaaS Solution for Runtime Security

Put in more colloquial terms: Runtime security means securing all of the things commonly found in modern software applications and environments.

The prevention-first, holistic approach is part of the DNA of Calico Open Source, an open source networking and network security project for containers, virtual machines, and native host-based workloads, as well as Calico Cloud and Calico Enterprise, the latter of which is Tigera’s commercial platform built on the open source project it created.

Calico Cloud, a Software as a service (SaaS) solution focused on cloud native apps running in containers with Kubernetes, offers security posture management, robust runtime security for identifying known threats, and threat-hunting capabilities for discovering Zero Day attacks and other previously unknown threats.

These four components of Calico — securing your posture in a Kubernetes-centric way, protecting your environment from known attackers, detecting Zero Day attacks, and incident response/risk mitigation — also speak to four fundamentals for any high-performing runtime security program, according to Bhatt.

Following are the four principles to follow for protecting your runtime.

4 Keys to Doing Runtime Security Right

1. Protect your applications from known threats. This is core to the prevention-first mindset, and focuses on ingesting reliable threat feeds that your tool(s) continuously check against — not just during build and deploy but during runtime as well.
Examples of popular, industry-standards feeds include network addresses of known malicious servers, process file hashes of known malware, and the OWASP Top 10 project.

2. Protect your workloads from vulnerabilities in the containers. In addition to checking against known, active attack methods, runtime security to proactively protect against vulnerabilities in the container itself — and everything that the container needs to run, including the environment.

This isn’t a “check once” type of test, but a virtuous feedback loop that should include enabling security policies that protect workloads from any vulnerabilities, including limiting communication or traffic between services that aren’t known/trusted or when a risk is detected.

3. Detect and protect against container and network anomalous behaviors. This is “the glamorous part” of runtime security, according to Bhatt, because it enables security teams to find and mitigate suspicious behavior in the environment even when it’s not associated with a known threat, such as with Zero Day attacks.

Runtime security tools should be able to detect anomalous behavior in container or network activity and alert security operations teams (via integration with security information and event management, or SIEM, tools) to investigate and mitigate as needed.

4. Assume breaches have occurred; be ready with incident response and risk mitigation. Lastly, even while shifting to a prevention-first, detection-second approach, Bhatt said runtime security done right requires a fundamental assumption that your runtime has already been compromised (and will occur again). This means your organization is ready to act quickly in the event of an incident and minimize the potential fallout in the process.

Zero trust is also considered a best strategy for runtime security tools and policies, according to Bhatt.

The bottom line: The perimeter-centric, detect-and-defend mindset is no longer enough, even if some of its practices are still plenty valid. As Bhatt told The New Stack: “The world of containers and Kubernetes requires a different kind of security posture.”

Runtime security tools and practices exist to address the much larger and more dynamic threat surface created by containerized environments. Bhatt loosely compared today’s software environments to large houses with lots of doors and windows. Legacy security approaches might only focus on the front and back door. Runtime security attempts to protect the whole house.

Bhatt finished the metaphor: “Would you rather have 10 locks on one door, or one lock on every door?”

The post How to Protect Containerized Workloads at Runtime appeared first on The New Stack.

Can Rancher Deliver on Making Kubernetes Easy?

Jack Wallen — Sat, 27 May 2023 14:00:18 +0000

Over the past few years, Kubernetes has become increasingly difficult to deploy. When you couple that with the idea that Kubernetes itself can be a challenge to learn, you have the makings of a system that could have everyone jumping ship for the likes of Docker Swarm.

I’m always on the lookout for easier methods of deploying Kubernetes for development purposes. I’ve talked extensively about Portainer (which I still believe is the best UI for container management) and have covered other Kubernetes tools, such as another favorite, MicroK8s.

Recently, I’ve started exploring Rancher, a tool that hasn’t (for whatever reason) been on my radar to this point. The time for ignoring the tool is over and my initial experience so far has been, shall I say, disappointing. One would expect a tool with a solid reputation for interacting with Kubernetes to be easy to deploy and use. After all, the official Rancher website makes it clear it is “Kubernetes made simple.” But does it follow through with that promise?

Not exactly.

Let me explain by way of walking you through the installation and the first steps of both Rancher on a server and the Rancher Desktop app.

One thing to keep in mind is that this piece is a work in progress and this is my initial experience with the tool. I will continue my exploration with Rancher as I learn more about the system. But this initial piece was undertaken after reading the official documentation and, as a result, made a few discoveries in the process. I will discuss those discoveries (and the results from them) in my next post.

I’m going to show you how I attempted to deploy Rancher on Ubuntu Server 22.04.

Installing Rancher on Ubuntu Server 22.04

Before you dive into this, there’s one very important thing you need to know. Installing Rancher this way does not automatically give you a Kubernetes cluster. In fact, you actually need a Kubernetes cluster already running. This is only a web-based GUI. And even then, it can be problematic.

The first step to installing Rancher on Ubuntu Server is to log into your Ubuntu server instance. That server must have a regular user configured with sudo privileges and a minimum of 2 CPU Core and 4 GB RAM.

Once you’ve logged in, you must first install a few dependencies with the command:

sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release -y

Next, add the necessary GPG key with:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Add the official Docker repository:

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] &&
https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" 
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Update apt with:

sudo apt-get update

Install the latest version of the Docker CE runtime engine:

sudo apt-get install docker-ce docker-ce-cli containerd.io -y

Add your user to the docker group with the command:

sudo usermod -aG docker $USER

Finally, log out and log back in for the changes to take effect.

Deploy Rancher

Now that Docker is installed, you can deploy Rancher with:

docker run -d --name=rancher-server --restart=unless-stopped -p 80:80 -p 443:443 --privileged rancher/rancher:v2.4.18

An older version of Rancher must be used because the latest version fails to start.

The deployment will take some time to complete. When it does open a web browser and point it to http://SERVER (where SERVER is the IP address of the hosting server). You’ll be greeted by the welcome screen, where you must set a password for the admin user (Figure 1).

Figure 1: Setting a password for the default Rancher admin user.

In the next window (Figure 2), you must set the Rancher Server URL. If you’ll be using an IP address, leave it as is. If you’ll use a domain, change the entry and click Save URL.

Figure 2: Setting the URL for the Rancher server.

You will then be prompted to add a cluster (Figure 3). If your cluster is in-house, select “From existing nodes”. If you’ll be using a cluster from a third party, select the service.

Figure 3: Selecting the cluster type for your deployment.

In the resulting window (Figure 4), fill out the necessary details and configure the cluster as needed. At the bottom of the window, click Next.

Figure 4: The Custom cluster configuration window.

You will then be given a command to run on your Kubernetes cluster (Figure 5).

Figure 5: The command must be run on a supported version of Docker (I used the latest version of Docker CE).

After the command completes on the Kubernetes server, click Done.

At this point, the cluster should register with Rancher. “Should” being the operative term. Unfortunately, even though my Kubernetes cluster was running properly, the registration never succeeded. Even though the new node was listed in the Nodes section, the registration hadn’t been completed after twenty minutes. This could be because my Kubernetes cluster is currently being pushed to its limits. Because of that, I rebooted every machine in the cluster and tried again.

No luck.

My guess is the problem with my setup is the Kubernetes cluster was deployed with MicroK8s and Rancher doesn’t play well with that system. Although you can deploy Rancher with MicroK8s, Helm, and a few other tools, that process is quite challenging.

I decided to bypass deploying Rancher on Ubuntu Server and went straight to Rancher Desktop. After all, Rancher Desktop is supposed to be similar to Docker Desktop, only with a Kubernetes backend.

Here’s the process of installing Rancher Desktop on Pop!_OS Linux:

First, check to make sure you have kvm privileges with the command [ -r /dev/kvm ] && [ -w /dev/kvm ] || echo ‘insufficient privileges’
Generate a GPG key with gpg –generate-key
Copy your GPG key and add it to the command pass init KEY (where KEY is your GPG key)
Allow Traefik to listen on port 80 with sudo sysctl -w net.ipv4.ip_unprivileged_port_start=80
Add the Rancher GPG key with the command curl -s https://download.opensuse.org/repositories/isv:/Rancher:/stable/deb/Release.key | gpg –dearmor | sudo dd status=none of=/usr/share/keyrings/isv-rancher-stable-archive-keyring.gpg
Add the official Rancher repository with echo ‘deb [signed-by=/usr/share/keyrings/isv-rancher-stable-archive-keyring.gpg] https://download.opensuse.org/repositories/isv:/Rancher:/stable/deb/ ./’ | sudo dd status=none of=/etc/apt/sources.list.d/isv-rancher-stable.list
Update apt with the command sudo apt update
Install Rancher Desktop with sudo apt install rancher-desktop -y

Launch Rancher Desktop from your desktop menu and accept the default PATH configuration (Figure 6).

Figure 6: The only configuration option you need to set for Rancher Desktop.

Rancher Desktop will then download and start the necessary software to run. Once that completes, you’ll find yourself on the Welcome to Rancher Desktop window (Figure 7).

Figure 7: The main Rancher Desktop window.

Here’s where things take a turn for the confusing. With Rancher Desktop, the only things you can actually do are manage port forwarding, pull and build images, scan images for vulnerabilities (which is a very handy feature), and troubleshot. What you cannot do is deploy containers.

To do that, you have to revert to the command line using the nerdctl command which, oddly enough, isn’t installed along with Rancher Desktop on Linux. I did run a test by installing Rancher Desktop on macOS and found that nerdctl was successfully installed, leading me to believe this is a Linux issue. Another thing to keep in mind is that the macOS installation of Rancher Desktop is considerably easier. However, it suffers from the same usability issues as it does on Linux.

If you’d like to keep experimenting with Rancher Desktop, you’ll need to get up to speed with nerdctl which I demonstrated here.

You can also build an image with Rancher Desktop, by clicking Images > Add Image and then clicking the Build tab. Give your image a name and click Build. You then must select a build directory. What it doesn’t tell you is that the build directory must contain a proper Docker file. With the Docker file in place, the image will build.

Maybe the GUI should key users in on that fact.

Once the image is built, you should be good to go to deploy a container based on that image. Right? Not within Rancher Desktop you can’t. Instead, you have to go back to the terminal window and deploy the container with the nerdctl command.

How is any of this Kubernetes made simple? It’s not. If you want Kubernetes made simple, you go with the MicroK8s/Portainer combo and call it a day.

From my perspective, if you’re going to claim that your product makes Kubernetes simple (which is a big promise, to begin with), you shouldn’t require users to jump through so many hoops to reach a point where they can successfully work with the container management platform. Simple is a word too many companies use these days but fail to deliver on.

The post Can Rancher Deliver on Making Kubernetes Easy? appeared first on The New Stack.

Don’t Force Containers and Disrupt Workflows

Alex Williams — Thu, 25 May 2023 22:10:20 +0000

How do you allow people to use their technologies in their workflows? The first thing you do is not force people to use containers, says Rob Barnes, a senior developer advocate at HashiCorp, in this episode of The New Stack Makers.

Barnes came by The New Stack booth at KubeCon Europe in Amsterdam to discuss how HashiCorp builds intent into Consul so users may use containers or virtual machines in their workflows.

Consul from HashiCorp is one of the early implementations of service mesh technology, writes Jankiran MSV in The New Stack. “It comes with a full-featured control plane with service discovery, configuration, and segmentation functionality. The best thing about Consul is the support for various environments including traditional applications, VMs, containers, and orchestration engines such as Nomad and Kubernetes.”

Consul is, at heart, a networking service that provides identity, for example, in Kubernetes. A service mesh knows about all services across the stack. In Kubernetes, Helm charts get configured to register the services to Consul automatically. That’s a form of intent. Trust is critical to that intent in Kubernetes.

“We can then assign identity — so in a kind of unofficial way, Consul has almost become an identity provider for services,” Barnes said.

In Consul, identity helps provide more granular routing to services, Barnes said. Consul can dictate what services can talk to each other. The intent gets established. A rules-based system, for instance, may dictate what services can talk to each other and which can’t.

“I think that’s an opportunity that HashiCorp has taken advantage of,” Barnes said. “We can do a lot more here to make people’s lives easier and more secure.”

So what’s the evolution of service mesh?

“There’s a lot of misconceptions with service mesh,” Barnes said. “As I say, I think people feel that if you’re using service meshes, that means you’re using containers, right? Whereas, like, I can speak for Consul specifically, that’s not the case. Right? I think the idea is that if more service meshes out, they make themselves a bit more flexible and meet people where they are. I think the adoption of the service mesh, and all the good stuff that comes with it, is only going to grow.”

“So I think what’s next for service mesh isn’t necessarily the service mesh itself. I think it’s people understanding how it fits into the bigger picture. And I think it’s an educational piece and where there are gaps, maybe we as vendors need to make some advances.”

The post Don’t Force Containers and Disrupt Workflows appeared first on The New Stack.

Overcoming the Kubernetes Skills Gap with ChatGPT Assistance

Dev Nag — Tue, 23 May 2023 18:00:45 +0000

The challenge for Kubernetes (K8s) has been its complexity coupled with the lack of in-house skills to provision and troubleshoot Kubernetes clusters in the cloud or on-premises. While K8s adoption is consistently growing, many developers and operations teams are still relatively new to the technology, having to learn new terms, workflows, tools and more.

K8s’ discrete parts require a broad level of expertise, even just in the setup process. Consider that spinning up a K8s cluster requires knowledge about and configuration of multiple components from pods to services, not to mention resources like etcd, API servers, kubelet and kube-proxy.

Then there’s scheduling, scaling and networking to contend with. One misstep can quickly translate into myriad issues with scalability, reliability, even security.

Further, the ecosystem itself is constantly rapidly growing and evolving. Tools and add-ons can be a lot for beginners, and hard to keep up with. Formal education and training around it is not necessarily part of every developers’ background, academically or professionally.

And we can’t forget that the technology, with many moving parts and complex interactions, can be difficult and time-consuming to troubleshoot when failures occur. Diagnosing the causes of the failures requires in-depth technical knowledge and expertise that often live inside a few experienced engineers’ heads.

Let’s dive deeper and explore new and innovative ways that can help overcome the glaring skills gap problem.

Upskill Your Team with ChatGPT Assistance

Kubernetes is challenging to learn and use effectively because there’s no one-size-fits all approach to it. K8s is highly customizable and can be configured in many different ways, depending on the specific needs of your application or infrastructure. It’s often difficult to apply what you’ve learned from documentation (and there is plenty) and training to an existing environment because teams lack contextual understanding and visibility of their architecture.

What does the current architecture look like? What pods are tied to a specific namespace? What is the health of the nodes? Asking basic questions of our environment requires context switching between the AWS console, the kubectl command line, Terraform configuration files and monitoring tools.

What if we could ask ChatGPT these questions?

Let’s look at an example of using PromptOps powered by ChatGPT to understand all the deployments within a cluster. PromptOps offers a free Kubernetes consulting tool that lets users ask questions and get immediate assistance in the form of BASH scripts, documentation references and other helpful resources.

By providing PromptOps fragmented data of our infrastructure from different sources, such as Confluence, Notion, Terraform config files and more, we expect PromptOps to be able to quickly aggregate all the information and help visualize the architecture. Instead of manually running kubectl commands to check the deployments, we prompted PromptOps over a chat to describe which deployments are present in our cluster.

Here is PromptOps’ response:

PromptOps > I suggest running this script to answer your question.

PromptOps gave a script to run along with a reference to a Notion page which contains information about the cluster.

Upon executing the script, PromptOps provided the output of the deployments within the cluster.

PromptOps also provided a visual graph of all the deployments in the cluster to make learning much easier.

The response from PromptOps simplifies Kubernetes infrastructure management, empowering everyone on the DevOps team to get up to speed on current architecture and keep up with changes.

Save Time with Automated Incident Investigation

Let’s look at troubleshooting production issues and see how PromptOps can help identify the root cause of failures in Kubernetes. Kubernetes metrics and logs offer critical insight across all levels from the cluster down to the container. They provide information on resource utilization, pod and node status, cluster autoscaler metrics, network errors, container runtimes and more.

But the issue with all these metrics is that they’re not immediately useful. A log stream or metric chart doesn’t provide the necessary context or direction. There are tools to help, but they often make things more complicated, requiring the pulling context across the whole tech stack just to see what really happened or who did what when issues arise.

At CtrlStack, we’ve been incorporating PromptOps to analyze causal graphs, Kubernetes events, metrics and logs to reduce the challenges of manual troubleshooting. The goal is to help DevOps teams quickly understand the Kubernetes issues by using PromptOps to generate root cause analysis summaries that explain the cause. In this example, a causal graph of the root cause analysis dashboard within CtrlStack is input to PromptOps for troubleshooting and resolution as described below.

Here is an example of a casual graph of the root cause analysis dashboard.

With this, we prompted PromptOps to explain the graph. Below is PromptOps’ response:

PromptOps > Here is the explanation of the graph:

The graph is a visualization of the root cause of an error in a Kubernetes pod. The error is caused by changes in the pod’s image, which are similar to changes in the Kubernetes pod_error metric. The error is also similar to the inflection points found in the kubernetes_uptime and kubernetes_pod_mem_usage metrics, and to the topology changes in node1.

The graph is showing the root cause of a kubernetes_pod_error change. There are three insights. The first is that the image has changed. The second is that there are inflection points found in the metrics kubernetes_uptime and kubernetes_pod_mem_usage. The third is that there is a topology change for node1.

The response from PromptOps provided helpful correlation of information across uptime and memory usage metrics as well as related topology changes. These insights involved checking the Kubernetes change events, metrics, resource usage, and topology changes. Based on this automated incident investigation, developers and operators should have sufficient context to quickly determine next steps to fix the problem.

Combining ChatGPT with Change Intelligence for Closing the Skills Gap

Based on the examples provided, it’s apparent that ChatGPT can provide valuable assistance to close the Kubernetes skills gap. ChatGPT offers fast insights and clear explanations for DevOps teams to troubleshoot and fix production issues. This empowers junior operators as well as developers who are new to Kubernetes to learn the technology and resolve common issues independently.

While ChatGPT’s responses can provide a quick understanding of the problem, it requires contextual information specific to different issues of Kubernetes deployments. That’s where change intelligence comes in. Change intelligence platforms provide the causal graphs that connect resource capacity, infrastructure changes, configuration changes, metrics history charts and an event timeline to optimize the path to root cause analysis.

A ChatGPT-based approach to learning Kubernetes has the potential to significantly improve DevOps productivity while eliminating cognitive overload. By combining ChatGPT with change intelligence, teams can uplevel their Kubernetes skills and gain better observability.

The post Overcoming the Kubernetes Skills Gap with ChatGPT Assistance appeared first on The New Stack.

Could WebAssembly Be the Key to Decreasing Kubernetes Use?

Loraine Lawson — Mon, 22 May 2023 13:00:06 +0000

WebAssembly, aka Wasm, is already changing how companies deploy Kubernetes, according to Taylor Thomas, a systems engineer and director of customer engineering at Cosmonic. Fortune 100 companies are spinning down Kubernetes Clusters to use Wasm instead, he said.

There will always be a place for Kubernetes, he added — just perhaps not as an ad hoc development platform.

“We’ve seen so many companies in the Fortune 100 who we’ve talked to who are getting rid of Kubernetes teams and spinning down Kubernetes clusters,” Thomas told The New Stack. “It’s just so expensive. It’s so wasteful that the utilization numbers we get from most people are 25 to 35%.”

Kubernetes forces developers to care about infrastructure and they don’t necessarily want to, he added.

“Basically, developers have to care about their infrastructure much more than they need to,” he said. “A lot of these things around microservices, we did them in Kubernetes because that was a great way to do it before we had stuff like WebAssembly, but microservices and functions … all those things work better in a world where WebAssembly exists because you focus just on writing that code.”

WebAssembly, or Wasm, is a low-level byte code that can be translated to assembly. A bytecode is computer object code that an interpreter converts into binary machine code so it can be read by a computer’s hardware processor.

Cosmonic Bets on Open Source

Cosmonic is counting on Wasm winning. In April, the WebAssembly platform-as-a-service company launched its open beta and released Cosmonic Connect, a set of third-party connectors designed to simplify Wasm integration. The first Cosmonic Connect integration to launch was Cosmonic Connect Kubernetes.

“You can now connect Kubernetes clusters with a single command,” he said. “We manage all the Wasm cloud-specific bits. We have a beautiful UI you can use to see and manage these things.”

Cosmonic is also involved in furthering WebAssembly standards, including the proposed component model. With the component model, language silos could be broken down by compiling to Wasm, Thomas said. The function then becomes like Lego blocks — developers could combine functions from different languages into WebAssembly and the functions would work together, he added.

“We’ve been focusing on a common set of contracts that we’ve been using at Wasm cloud for a long time, and we’re now centralizing on in the WebAssembly community called wasi-cloud,” he said. “These things are wasi key value, wasi messaging — [if] you want to use a key-value database in 80% of the use cases, you just need the same five functions — get set, put, all these common things — and so it’s defined by an interface.”

That will allow developers to “click” code from different languages together, he said.

“That language barrier is so incredibly powerful — that really fundamentally changes how we put together applications,” Thomas said. “Because of WebAssembly being able to compile from any language, that thing you’re using could be written in Rust or C, and the thing you’re writing could be in Go or Python, and then they plug together when they actually run.”

That doesn’t just break the language barrier — it can also break down vendor barriers because now everything can be moved around, he added. Components will also liberate developers from being locked into custom software development kits (SDKs) or libraries, he said.

“It’s a walled garden and we don’t want that to be the case. We want it to be you just write against the contracts and we provide the stuff you need for our platform but you just focus on the code part of it,” he said. “That’s very different than all these other approaches where you either had to confine yourself to a specific language or a specific type of way things were set up or any of those kinds of details.”

Cosmonic also is a maintainer on the CNCF project wasmCloud and works with the Wasm cloud application deployment manager [WADM] standard. He compared WADM to running a YAML file.

“WADM gives you the ability to connect to something to use a familiar pattern,” Thomas said. “A user is able to define their application, they can say, Okay, here’s the dependencies I’m using that I’m going to link and at runtime, here’s the configuration I’m passing to it. And here’s the code I’m running. And they can specify all those things where they want to run it, and then it’ll run it everywhere for them, and then automatically reconcile if something disappears, or something moves around.”

The post Could WebAssembly Be the Key to Decreasing Kubernetes Use? appeared first on The New Stack.

A Boring Kubernetes Release

Alex Williams — Fri, 19 May 2023 19:23:49 +0000

Kubernetes release 1.27 is boring, says Xander Grzywinski, a senior product manager at Microsoft.

It’s a stable release, Grzywinski said on this episode of The New Stack Makers from KubeCon Europe in Amsterdam.

“It’s reached a level of stability at this point,” said Grzywinski. “The core feature set has become more fleshed out and fully realized.

The release has 60 total features, Grzywinski said. The features in 1.27 are solid refinements of features that have been around for a while. It’s helping Kubernetes be as stable as it can be.

Examples?

It has a better developer experience, Grzywinski said. Storage primitives and APIs are more stable.

“Storage primitives have been around in Kubernetes for a while, and people have debated whether you should store persistent data on Kubernetes,” he said. “But I think a lot of those primitives and APIs have become more stable. So one of the new ones that have gotten some excitement is the read-write-once access method. So there’s a feature now where you can restrict access of a storage volume. Only one pod at a time can read and write from it. Things like that. That’s like just general refinement that makes the developer experience a little bit better.”

It’s not all boring.

The Vertical Pod Autoscaler (VPA) is pretty cool, Grzywinski said. It’s in alpha this time, but it will allow pods to scale to larger resources on demand. It will enable users to scale up to a configured level without restarting.

According to its GitHub page, when configured, VPA sets the requests based on usage, “allowing proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.”

Efforts will continue to communicate better without surprises.

For example, there’s a new depreciation process that came based on feedback from the community. Grzywinski noted the Dockershim removal that caught a lot of people by surprise in release 1.24.

The New Stack’s Joab Jackson reported in March of 2022 that Dockershim would no longer be supported.

The lesson learned: over-communicate so there are fewer surprises. For example, Grzywinski said a blog is launching dedicated to deprecations and removals that will get pushed out earlier than the regular release blog.

The post A Boring Kubernetes Release appeared first on The New Stack.

Guardrails Can Be the Common Language to Bring Dev and Ops Together

Cortney Nickerson — Thu, 18 May 2023 14:04:50 +0000

With the rise of Kubernetes adoption and overall expansion of the cloud native landscape, DevOps certainly isn’t dead, but it is definitely changing. The rise of roles like platform engineer is clearly trying to address this strange adolescence that DevOps is going through in the cloud native era.

When adopting a platform as complex as Kubernetes, even the most polished and smooth-running DevOps pipelines have stumbled across gray areas in cloud native workflows that hadn’t been considered prior to the adoption of K8s, forcing teams to grow and adjust. New-world DevOps teams are beginning to take shape, many of which are being led by the ever-shining platform engineering star.

With a focus on empowering all stakeholders in this growth process, platform engineers and DevOps teams are challenged to find ways to break down the ever-existing silos between Devs and Ops to achieve a reliable, fast and frictionless delivery process.

While Devs and Ops might call different corners of the Kubernetes configuration life cycle home, they do share a common goal: Deploy the desired state so their applications run smoothly.

Unfortunately, this means they don’t necessarily share a common language. I’m not talking about how they talk on Zoom calls or in Slack messages to one another — there are already plenty of ways to navigate those language barriers. I’m talking about how these folks — developers, DevOps engineers and platform engineers — keep applying new trends in development culture, like shift left, while struggling to find cloud native tools that work best for their emergent blended workflows.

Often, chosen tools seem ideal for their perceived area of focus, like an IDE versus a cluster management platform, but all are simply creating more languages that are all trying to determine the same thing: whether or not the cluster and its applications are working as desired.

Language barriers in tooling creates cracks where mistakes start to slip in. And when the environment is complex and the cost of mistakes is high, as with Kubernetes, those with limited experience and those who can’t see the complete picture start to operate out of fear. To avoid introducing costly errors, they stop developing altogether.

What these blended teams need to do to shift left properly is create a common language for answering questions about desired state and cluster health.

Don’t worry, I’m not about to say the common language should be YAML.

The Complicated Landscape of Kubernetes Tooling

Despite the blending of tasks that comes with shift left, like moving testing and validation to the precommit phase rather than the tail end of the CI/CD pipeline, each person involved in the Kubernetes configuration life cycle arrives to work with different ideas about the right tool for their job. They still operate out of a siloed mindset.

Configuration developers and backend engineers, for example, do most of their work in an integrated development environment (IDE), like VSCode, using multiple plugins to create a comfortable environment for authoring in YAML, validating changes and interacting with git for collaboration. They likely don’t think their tool choice has any implication on those who pitch in on other tasks in the configuration life cycle, like cluster management, but they can’t see the full picture.

DevOps engineers are either asked to fix problems other people created, or rest firmly on the critical path to deployment, spending their days fielding questions from everyone else around the configuration life cycle instead of adding their high-value contributions. They need tools designed for collaboration but don’t want to waste time digging through Slack messages or git commits to understand where they could smooth out operations problems or add key optimizations.

Platform engineers are responsible for picking a dozen tools, developing the integrative “glue” with webhooks or APIs to get them all to work together smoothly, and then convincing everyone to hop aboard this new “paved road” experience in an internal development platform. For their development and engineering peers, this platform abstracts away all the complexity by providing a simple self-service/ClickOps experience. But behind the curtain, platform engineers are constantly working to improve internal development platforms by bringing on even more tools and writing more code to minimize conflicts between multiple languages.

Many larger organizations create a common language through platform engineering. It’s a perfectly valid and proven strategy; there are engineers on staff to keep the internal platform online. It’s not an easy move for startups or smaller outfits because of how difficult (and expensive) it can be to hire a successful platform engineering team on top of the application and DevOps engineers required to build applications and experiences.

Let’s consider another way. Instead of an abstracted internal development platform, there’s a common language that empowers people on all corners of the Kubernetes configuration life cycle: guardrails. By defining (and enforcing) what developers and engineers are not allowed to deploy, they have a common ground for learning new techniques, encouraging each other’s continuous education and deploying better software.

How Monokle Unifies Kubernetes Languages and Life Cycles

Monokle is a set of tools — Desktop, CLI and Cloud — that creates a single common language for the entire life cycle. Instead of each team member arriving with new tools and a different language about how to best create and maintain high-quality Kubernetes configurations, they can create blended workflows that don’t require a dozen tools magically working in sync.

Developers and configuration engineers can use Monokle Desktop to manage their day-to-day work on K8s YAML configurations, which is typically hidden on their laptops and IDEs until they’re finally ready to push their branch and create a pull request into a transparent platform for ongoing collaboration. As they work, Monokle’s guardrails features, like forms that eliminate those papercut-like YAML syntax errors or full-on Open Policy Agent (OPA) validation, prevent errors while instructing them how to improve their configurations in the future.

DevOps engineers can use Monokle’s CLI tooling to add those same features, especially Monokle’s custom validator engine, directly into their CI/CD pipelines for in-depth quality checks at every stage in the configuration life cycle. By doing so, errors are removed from the critical path by deferring questions to the validator, resulting in higher quality.

Platform engineers and other team leaders can leverage Monokle Cloud IDE for Policy Enforcement to define and implement the guardrails that both Desktop and CLI adhere to from a central location. By implementing clear guardrails, errors are prevented from reaching production in the first place. Platform leaders can define and create project-specific policies and rules to reflect business objectives, ensure performance, meet security and compliance criteria, and maximize Git workflows by integrating policy validation into every PR, ultimately achieving consistent, high-quality deployments in less time and with fewer resources.

No matter what version of Monokle these folks might use in their day-to-day work, they’re leveraging the same guardrail-enabling features to achieve common goals in the blended workflows created by shift-left culture:

Forms and templates simplify the way developers write their YAML configurations, skipping the frustration of hunting for simple syntax errors that stop deployments in their tracks.

Real-time validation helps the most knowledgeable Kubernetes developers on the team to establish best practices and must-follow policies for YAML syntax, OPA rules and the Kubernetes schema itself. With custom rules, DevOps and platform engineers can prevent vulnerable or misconfigured code from even being committed to their git repository in the first place, the purest outcome of the shift-left paradigm.

Resource comparisons between local and cluster resources, or the various git branches that define a cluster’s desired state, for anyone (not just DevOps engineers) who needs to quickly identify and understand the impact of proposed changes as it moves from a development and production cluster. With a line-by-line diff, anyone can catch errors like a mistakenly changed secret or resource limit change that would affect costs.

A Git-based foundation, where all changes are written into commits to be pushed to the git provider of choice, which ensures all roles and departments can review and collaborate on what others are doing. There are no silos, hidden scripts or questions about who made what changes and when.

A cluster mode dashboard with observability features, logs and terminal access, recent alerts and more. When many organizations restrict monitoring and troubleshooting work exclusively to DevOps engineers, Monokle makes this information available to anyone, another example of democratizing the educational value of having a common language.

The Path to Deploying Your First Guardrail with Monokle

If you’ve felt like the developers and engineers around you speak different languages, no matter which corner of the Kubernetes configuration life cycle you call home, guardrails might be the common language you’ve needed all along. Successfully deploying applications to Kubernetes is an all-hands-on-deck effort, so why shouldn’t your tools accommodate and encourage collaboration and quality control from the very first line of YAML?

Here are a few guardrails to get started with Monokle:

If you want to validate your configurations and visualize state in just a few minutes, and entirely for free, get started with Monokle Cloud IDE for Policy Enforcement by signing in and connecting your GitHub account and repository.

To establish best practices via validators and cluster management, download Monokle Desktop or the CLI tooling, both of which are free and open source.

With shift left blurring pre- versus post-deployment tasks, guardrails are the most transparent path toward creating collaborative workflows, folding education into every commit and deploying high-quality releases to ensure that everyone is speaking the same language.

The post Guardrails Can Be the Common Language to Bring Dev and Ops Together appeared first on The New Stack.

HashiCorp Vault Operator Manages Kubernetes Secrets

Heather Joslyn — Thu, 18 May 2023 13:00:18 +0000

Securing secrets — keys and passwords and so on needed to gain access to your critical applications, data and infrastructure — has never been more important or urgent. Security breaches are costly, in terms of both money and reputation; hackers are getting more creative, and more brazen.

Exhibit A: The average cost of a data breach is $4.24 million, according to a study released in 2022 by IBM and the Ponemon Institute.

Exhibit B: A whopping 99% of issued credentials go unused for 60 days, reported a study released in May 2022 by Unit 42, the security research arm of Palo Alto Networks. Unused credentials, the researchers said, are often used by hackers to access critical applications, data or infrastructure.

Trouble is, the task of locking down and monitoring the usage of your secrets is, to put it mildly, complicated. Distributed, multicloud architecture means there’s no single castle to build a moat around. And Kubernetes is complex, requiring operators to integrate properly with many services and solutions.

Furthermore, workloads on Kubernetes constantly change. Dynamic secrets handle changing service identities while minimizing the blast radius when a credential is compromised.

Kubernetes Secrets can be configured to encrypt secrets at rest, but they’re stored unencrypted by default, in the API server’s underlying data store (etcd). So they’re not, well, all that secret, at least not without additional security in place. It’s relatively easy to read, modify or retrieve secrets from Kubernetes Secrets.

HashiCorp Vault has long been a trusted way to store, manage and control secrets across complex distributed systems. Vault Secrets Operator, an open source project released alongside Vault 1.13 and now available in beta, makes it easier to use Vault with Kubernetes Secrets, automating tasks that were previously manual.

Meeting a Community Need

The demand for a Vault operator supported by HashiCorp designed to work specifically with Kubernetes Secrets came directly from the community of Vault users, according to Rosemary Wang, a developer advocate at HashiCorp.

The message the company received from the Vault community, Wang told The New Stack, was for a desire to use Kubernetes native interfaces to retrieve and manage the life cycle of secrets, rather than continue using the secure alternatives people had grown accustomed to using.

Two of the most common approaches:

Injecting a sidecar. The Vault agent injector, in which a sidecar injects secrets into an application in Kubernetes. The sidecar offloads the retrieval and life cycle of secrets from Vault and puts it into a file within a pod that the application can retrieve.

However, Wang said, the problem comes when third-party services and tools are being loaded in the Kubernetes cluster that also uses the Kubernetes Secrets interface. “The injector makes it very difficult for you to use the native Kubernetes Secrets interface if that’s what your third-party tool or service is leveraging.”

Secrets Store CSI Driver. Some users instead leveraged the Kubernetes’ Secrets Store Container Storage Interface (CSI) Driver. “The Secrets Store CSI driver had two options,” Wang said. “You could mount a volume with a file that included the secret. So, similar to [the] agent [injector], but you don’t need to run it as a sidecar.”

The other option for the Secrets Store CSI drive, she said, is where “ you could synchronize as a Kubernetes Secret. So rather than use the file volume mounting approach, you could use the sync-a-Secret approach where we create Kubernetes secret.”

Ultimately, she added, “You had a lot of different ways you could do this. But there were pitfalls, and some folks didn’t want to adopt the Secret Store CSI Driver; they had existing environments, and they weren’t planning to add the CSI Driver components into it.”

The main reason organizations didn’t want to adopt the Secret Store CSI Driver, Wang said, was that their security requirements prohibit volume mounting from the host.

The Vault Secrets Operator, Wang said, represents a third approach, one that bypasses file volume mounting and exclusively synchronizes to Kubernetes Secrets. “That often solves the gap between third-party services or other tools that are standardized on the Kubernetes Secrets interface.”

Under the Hood

The new technology implements a controller that makes use of Kubernetes custom resource definitions (CRDs) to natively synchronize Vault secrets to Kubernetes Secrets, without the need to run a sidecar container or use the Secrets Store CSI Driver.

The message from Vault community members, Wang said, was that they wanted a controller to dynamically retrieve information from Vault and pass it into their Kubernetes clusters, and then be able to read the information from that secret, she added, “because that’s a common pattern in the Kubernetes community.”

The operator sets custom resource definitions and then Vault manages the secrets used by the services, with Kubernetes orchestrating them according to their CRDs. The Vault Secrets Operator reconciles secrets from the current state to desired state, as defined by the custom resources, using Kubernetes declarative patterns.

Being able to synchronize natively to Kubernetes Secrets is especially useful when secrets change, Wang said. “When a secret changes in Vault, there’s a lot you have to either write in your application, or the process that you’re running in Kubernetes, to reload the new secret,” she said.

The Vault Secrets Operator, Wang said, will “allow you to set a target deployment in Kubernetes, so if the secret changes in Vault, the operator will pick up the difference and then roll out the deployment for you. It will restart the application on your behalf. And you don’t have to worry about that, you don’t have to control that mechanism. And that is valid for static secrets as well as dynamic secrets.”

What’s Next for Vault Secrets Operator?

While in beta period, HashiCorp Vault Operator will remain focused on the Kubernetes authentication method, though its project roadmap calls for adding other authentication methods in the future.

Also in the future: Further testing to support different platforms and secrets engines, secret templating and improvements to secure Kubernetes Secrets.

Notably, Wang said, “there’s a lot of interest in doing everything through Kubernetes, basically configuring Vault and using Kubernetes as the interface to configure HashiCorp tools. There’s certainly some community demand for that.”

As the operator is an open source project, she encouraged users to check out the code and request features. But as the Vault Secrets Operator moves toward general availability, HashiCorp’s top priorities are “the stability of the operator, as well as the ability to monitor it.”

Making Kubernetes, the orchestrator of choice for containerized applications, more user-friendly — helping to create a “golden path” for developers who build and deploy at scale — is a hot conversation in the cloud native world. The new Vault Secrets Operator, she suggested, is a step toward that path.

At HashiCorp, Wang said, “there’s always a golden path conversation, mostly because a number of our tools are focused toward helping you scale your platform. A lot of our tools are focused on addressing the platform engineering teams that need to scale platforms for a developer audience internally or even to other customers.

“And so the result is that we’re always asking the question: Is this a pattern that a development team will feel comfortable adopting?”

Vault Secrets Operator fits that need, providing a native Kubernetes Secrets experience that watches the expectations and practices of development teams rolling on deployments on Kubernetes.

The post HashiCorp Vault Operator Manages Kubernetes Secrets appeared first on The New Stack.

Container or VM? How to Choose the Right Option in 2023

Andrew Sullivan — Wed, 17 May 2023 15:42:48 +0000

A few years back, most technology articles would have you thinking that Linux containers and virtual machines were diametrically opposed components in the data center. That’s natural when a new technology is adopted: The hype cycle can push such innovations into every nook and cranny of the industry, looking for new wins over old software and hardware combinations.

You may remember when JavaScript was going to take over the server side, or when virtual reality was going to revolutionize education. In truth, these technologies eventually found comfortable areas of use, rather than supplanting every other idea for themselves. Things settle over time, and it can be tricky to discern where a given technology will end up most useful, and where it will be supplanted by better options farther down the line.

Now that Linux containers and virtual machines are no longer brand new, they’ve become well-understood tools for the common software developer to consider for various scenarios. We’d like to provide a guide, now, to just when and where each technology is appropriate in today’s hybrid cloud environments.

Big or Small?

Perhaps the easiest way to make your decision is according to application size and complexity. Containers are, among other things, an application packaging technology. Containers can be — and there are often very good and valid reasons for using them this way — deployed without Kubernetes directly into an operating system. This is part of our edge strategy with Red Hat Enterprise Linux and Ansible too: Containers are an easy, replicable way to deploy software while minimizing drift and moving parts.

There are other similar and competing technologies that have many of the same capabilities, such as unikernel, Wasm etc. Thus, while containers might be the right way to deploy an application today, there may be some movement around this model in the future as it is optimized and takes on new types of deployment models.

Some applications are, quite simply, too big and complex to fit into a container as is. We colloquially refer to these as monoliths. It should be noted that there is no technical limitation here: There’s no CPU/memory threshold that you cross and end up disqualified. Rather, this is based on the value of investment. For example, a single installer that deploys a database plus middleware plus $thing1 and$thing2, etc. onto a single server can be challenging to containerize as is. “Modernization” of the application may be required to decouple the components and/or adopt application frameworks and/or runtimes that are more friendly to containerization. One example of this would be moving a Java application from SpringBoot to Quarkus.

For the Developers

Developers, and administrators, regardless of whether they’ve adopted new-fangled cloud native architectures and/or DevSecOps methodologies, should embrace containers for many reasons. Speed, security, portability and simplicity are among the benefits of application containerization. And yet, this does not mean dumping virtual machines completely overboard.

The real question becomes, “Do I want to deploy my containerized application to Kubernetes or directly to a (virtualized) operating system?” There are many factors here to consider. One is the application’s requirements. Does the application need to run constantly as a single node, without interruption? Kubernetes does not migrate application components between nodes non-disruptively. They are terminated and restarted. If this isn’t behavior your application can tolerate, then Kubernetes is not a good fit.

It’s also important to consider the state of the application’s various components. If the application in question relies on third-party components, those may limit the use of containers. Many third-party vendors, especially in more stoic VM-centric industries, are slow to create Kubernetes-ready/compatible versions of their software. This means you can either deploy a VM or take the onus of supporting their software in Kubernetes yourself.

And even before you evaluate these options, it’s important to take a serious look at the skills available inside your organization. Does your team possess the skills and competency to handle Linux containers? Do you have, or are willing to build and or acquire, the necessary expertise for Kubernetes? This extends to API-driven consumption and configuration. Do your application and development teams need/want the ability to consume and configure the platform using APIs?

This is possible with all of “private cloud,” public cloud and Kubernetes, but is often more complex and harder on-prem, requiring a lot of glue from specialized automation teams. When it comes to the public clouds, your team needs specific expertise in each public cloud it’s using, adding another layer of complexity to manage. This is an area where Kubernetes can homogenize and further enable portability.

Infrastructure Efficiency

In many/most cases, a “web scale” application that has tens to thousands of instances is going to be much more efficient running on a Kubernetes cluster than in VMs. This is because the containerized components are bin packed into the available resources and there are fewer operating system instances to manage and maintain.

Furthermore, Kubernetes facilitates the scaling up and down of applications more seamlessly and with less effort. While it’s possible to create new VMs to scale new instances of an application component or service, this is often far slower and harder than with Kubernetes. Kubernetes is focused on automating at the application layer, not at the virtualization layer, though that can be done as well with KubeVirt.

Infrastructure efficiency also implies cost impact. This is going to be different for each organization, but for some, reducing the number of VMs will affect what they’re paying to their operating system vendor for licenses, their hypervisor vendor and their hardware vendor. This may or may not be counteracted by the cost of Kubernetes and the talent needed to manage it, however.

And there are still other considerations when it comes to security. Kubernetes is a shared kernel model, where many containers, representing many applications run on the same nodes. This isn’t to say they’re insecure — Red Hat OpenShift and containers deployed to Red Hat operating systems make use of SELinux and other security features and capabilities.

However, sometimes this isn’t good enough for security requirements and compliance needs. This leaves several options for further isolation: Deploy many Kubernetes clusters (which a lot of folks do), use specialized technologies like Kata containers or use full VMs.

No matter what the requirements are for your organization, nor whether you choose containers or virtual machines for your applications, there is one fundamental rule that is always at play in the enterprise software world: Change is hard. Sometimes, if something is working, there’s no reason to move it, update it or migrate it. If your applications are running reliably on virtual machines and there’s no corporate push to migrate it elsewhere, perhaps it is fine where it is for as long as it can reliably be supported.

Sometimes, the best place for change inside an organization isn’t deep in the stacks of legacy applications, it’s out in the green fields where new ideas are growing. But even those green fields have to connect to the old barn somehow.

The actual technology being used doesn’t necessarily place something in those green fields, however. In this way, it is important to find a method of supporting both containers and virtual machines inside your environments, as the only real mistake you can make is to ignore one of these technologies completely.

The post Container or VM? How to Choose the Right Option in 2023 appeared first on The New Stack.

GitOps as an Evolution of Kubernetes

Steven J. Vaughan-Nichols — Tue, 16 May 2023 15:31:11 +0000

VANCOUVER, British Columbia — Many people talk about GitOps and Kubernetes, but when Brendan Burns, a Microsoft Corporate Vice President, a Distinguished Engineer at Microsoft Azure, and, oh yeah, co-founder of Kubernetes, talks, I listen. Burns spoke at The Linux Foundation’s GitOpsCon about how GitOps is an evolutionary step for Kubernetes.

How? Burns started by explaining how it’s deeply rooted in the development of continuous integration, deployment, and delivery. What really motivated him to help create Kubernetes was, “When we were starting out, we tried to put together reliable deployments. They worked on this using the DevOps tools of the time with a mixture of Puppet, Chef, Salt, and Ansible — and Bash obviously — it worked about 85% of the time. And then you’d massage it, and it eventually would work maybe 95% of the time: However, the journey was often fraught with difficulties and uncertainties, which birthed the idea of Kubernetes.

Kubernetes’ inception was essentially a response to the arduous and unreliable nature of the deployment process. It was a fusion of the DevOps challenges and the innovative strides Docker made in the container revolution. Docker’s focus on hermetically sealing and packaging applications was a vital prerequisite to reimagining how deployments could be executed. Over the past decade, this approach has transformed into the standard modus operandi within the tech community.

Advent of GitOps

But the tech world has now moved a step further with the advent of GitOps. It’s no longer aimed at redefining the deployment process itself. It is no longer just about leading into the deployment that Kubernetes orchestrates but the entire journey — from sourcing configurations to deploying them into the world where Kubernetes can utilize them.

GitHub, with its declarative configuration, now plays a pivotal role in ensuring reliable delivery and contributes to the ongoing evolution of the community. “While it’s universally accepted now,” said Burns, “the idea was a subject of contention at the time.” Scripting was rampant. Notably, the CI/CD pipeline, even when described in YAML, was an imperative program execution. Burns thinks GitOps, with its inherent declarative nature, is a welcome reinforcement to the Kubernetes ecosystem.

Moreover, empowering people to do more was another central theme of our initial thought process. The goal was to alleviate the burdens that plagued developers daily. This, in essence, is the journey of the community — from its inception rooted in deployment and continuous delivery to the present day, where GitOps reigns, offering a more reliable, declarative, and user-empowering approach to managing deployments.

It does this in several ways:

Separation of Concerns: With Kubernetes and GitOps, teams can be compartmentalized, focusing on specific tasks and responsibilities. This clean delineation can help avoid confusion, improve efficiency, and make it clear where one team’s responsibilities end, and another begins.
Multiple Personas: In modern software development, there are many personas involved, such as developers, platform engineers, and security teams. Each has a specific role and responsibilities, and all need to work together in the same environment.
GitOps as a Solution: GitOps can help manage this complex environment. It allows each persona to manage a Git repository, rather than needing to directly interact with the cluster. This can reduce the risks associated with one group having too much control and can make it easier for teams to work together. It essentially allows for a clearer division of labor and less risk of overlap or conflict.
Automated Updates: GitOps can also facilitate automatic updates. Tools such as Dependabot can monitor repositories and propose updates when necessary. This process reduces the time and effort required to stay up to date, increasing efficiency and reducing the risk of falling behind on important updates.
Security and Compliance: GitOps also supports better security and compliance. Through a well-managed Git repository, it can ensure that every change is tracked and auditable, which is important for meeting compliance requirements.

The GitOps workflow and its intersection between platform engineering and the developer is particularly significant for programmers who prefer not to be bogged down by the intricacies of deploying their code into Kubernetes. Irrespective of their preferred programming language — be it Java, Python, Dotnet, Rust, or Go — they simply want to push their code, generate a container image, and have it deployed immediately. GitOps enables them to do this.

Scalability

Burns continued, the beauty of GitOps lies in its scalability. Developers need not be overly concerned with the number of clusters in their organization or their specific locations. The shift from a push model of pipelines to a GitOps pull model allows a level of abstraction where the number of clusters becomes somewhat irrelevant. Developers only have to deal with a Git repository. If a new cluster emerges or an old one disappears, developers may not even notice.

The consistency of the workflows remains even when transitioning from early pre-production to staging to production in the application lifecycle. This decreases the cognitive load on developers, allowing them to concentrate more on their code rather than where it goes post-deployment.

Thus, in GitOps, the Git repository becomes the ultimate source of truth, and the platform engineering team can concentrate on initializing that Git repository, thus empowering developers to efficiently deploy their code.

Burns also reminded us that historically, the concept of “snowflakes” (One-off unique servers impossible to reconstruct if they “melted”) was a cause of concern. True, containers and orchestration eliminated this problem at the individual container level. However, we now face the issue of “snowflake clusters” — clusters of machines that are uniform internally but differ from others.

GitOps, Burns said, offers a robust solution for this issue. The shift from a push to a pull model makes GitOps relatively indifferent to the scale or number of clusters. Each cluster is configured to point to the same Git repository. When you make the Git repository initialization part of creating clusters, it automatically creates clusters that are initialized with the correct software versions.

Thus, this process ensures consistency across the platform. For example, it also eliminates the chances of forgetting to include a cluster in a pipeline that deploys a new version of security software or having to inform a development team about changes in regions. This consistency and reliability are among the main advantages of GitOps.

Interestingly, the application of GitOps is not restricted to Kubernetes but extends to public cloud resources through service operators. Users are leveraging the Kubernetes control plane to manage containerized resources and instances of a Postgres database or blob storage system. GitOps can manage resources within your cluster as well as those in the cloud, thus widening its scope and utility.

No Be-All, End-All

However, GitOps is not the be-all and end-all solution There’s a place for both CI/CD pipelines and GitOps, “It’s not a fight, but rather it’s two very complementary technologies, one that is very good at easily making the state real and one that is very good at orchestrating stages of what you want the world to look like.”

Drawing parallels with robotics, which Burns worked on before he came to software, there’s a constant handoff between control and planning, one can understand the relationship between traditional CI/CD pipeline systems and GitOps. GitOps is like a controller, quickly making a state reality, but it’s not ideal for software rollouts on a global scale that requires slow, gradual deployments. This is where traditional CI/CD systems, or “planners,” come into play.

So, Burns concluded, CI/CD pipelines and GitOps each have their strengths — GitOps in bringing a specific state into reality with ease, and traditional CI systems in orchestrating stages of what the world should look like. Understanding the value of GitOps in the container context and its interplay with traditional CI systems can significantly enhance efficiency and productivity. And all, of course, will work well in a Kubernetes-orchestrated world.

The post GitOps as an Evolution of Kubernetes appeared first on The New Stack.

How Otomi Helped the City of Utrecht Move to Kubernetes

Sander Rodenhuis — Mon, 15 May 2023 17:00:01 +0000

With digital transformation sweeping across industries, we are seeing more and more organizations adopting cloud native technologies to modernize their IT infrastructure. Kubernetes have become the go-to solution for many when managing containers at scale.

While my experience building Red Kubes as CTO has highlighted the need for these technologies, it has also shed light on how integral the adoption process is for companies and organizations, such as The Municipality of Utrecht in the Netherlands.

Together, we addressed a common issue being complex and siloed applications. For context, Utrecht is one of the largest municipalities in the Netherlands that deals with a myriad of applications and huge volumes of data.

Essentially, its IT infrastructure needed a more modern approach to improving its services for the residents. I’m sure you’ve personally experienced the struggle and frustration of trying to get something from your council, municipality, or city.

The Challenge:

At Red Kubes, we designed Otomi (our open source platform) to address these issues, we personalize each aspect of the platform to meet the needs of the user. Considering the challenge lay in speeding up delivery, building connections between these siloes was of utmost importance.

Otomi logo

Before we stepped in, the process when updating (or even changing) was time-consuming, costly and complex.

Furthermore, there was an increasing need for collaboration and information exchange between municipalities, but the current architecture made it difficult to achieve.

I believe many organizations are facing similar issues in modernizing their infrastructure to support more modern application architectures

To address these challenges, Utrecht, along with 15 other major cities, initiated a review of their current information systems and architecture based on “Common Ground.”

The goal was to establish modern standards for data exchange between municipalities through microservices and an API-driven approach. The new standards could not be supported by the existing infrastructure so there was a need to transition to a modern architecture.

As applications and workloads were to be containerized for better cloud portability, Kubernetes was identified as the ideal solution for container orchestration.

Utrecht recognized that they would need to hire talent or contractors with the necessary skills and expertise to set up and manage a Kubernetes environment.

It’s a good thing the city was aware of the complexity of Kubernetes but especially what comes after installing a Kubernetes cluster.

The Solution:

Utrecht searched for a solution that would make Kubernetes easily manageable and ready for production without requiring extensive staff training or hiring new talent in such a tight market. The proposed solutions revealed that our open-source project Otomi could deliver to requirements.

In a nutshell, Otomi simplifies Kubernetes Engineering and management of all additional components required to run Kubernetes in a secure, compliant, and automated way providing self-service to developers. It is designed to enable organizations to get the most out of their containerized applications in just a few days.

Utrecht successfully adopted Kubernetes technology by leveraging Otomi and creating a platform engineering team to build a production-ready platform on top of the Azure Kubernetes environment.

This allowed developers to concentrate on coding while the platform engineering team focused on security, compliance, scalability and stability (the important stuff in Kubernetes environments!).

By combining AKS (Azure Kubernetes Service) and Otomi, Utrecht was able to set up its Production Ready Kubernetes environment within a few days instead of the many many months it would have taken using traditional methods.

The Results: Technical, Operational and Security

With the implementation of Kubernetes, topped with Otomi, the outcomes for the city included a host of technical, operational and security benefits. From a technical standpoint, the deployment resulted in faster, automated testing, enhanced observability, monitoring and immediate access to root cause analysis (RCA).

Additionally, automatic scaling of the Kubernetes environment was achieved, a process that previously took three to six months before Kubernetes and Otomi. Now, development environments can be deployed within one minute, providing instant self-service for development teams, compared to months in the legacy architecture.

Utrecht explained to us that the benefits of Otomi were also significant from an operational perspective. Applications can now be deployed within one day, compared to the previous process which took months.

Furthermore, the entire journey from application concept to production now averages around four weeks, compared to the prior duration of at least six to nine months.

The platform also achieved stability with 24/7 uptime, automatic restart and recovery, and up to 40% productivity gain for developers through Otomi’s self-service capabilities.

We were able to uplift the security posture as well as the implementation resulted in numerous improvements, including Open Web Application Security Project (OWASP), microsegmentation, live scanning, traceability, cluster and network policy enforcement, and more.

While naturally, I’m biased, the solution worked extremely well. Utrecht’s Senior Manager of Digital Services Lazo Bozarov, shared that the platform has allowed the municipality to accelerate its containerization and cloud journey in which they have modernized their architecture towards microservices and an API-centric infrastructure. Goal achieved.

By integrating Otomi with Kubernetes, containerization is simplified, reducing the need for extensive environment management. This results in organizations accelerating their container platform’s time-to-value and the applications on it. For organizations like Utrecht, implementing Otomi on top of Kubernetes will lead to substantial cost savings, time reduction and risk mitigation.

As someone who has co-engineered this product from the ground up, it’s rewarding to see these real-life adoptions actually making a difference. It’s also exciting to see how Kubernetes can revolutionize IT infrastructure modernization. There’s a bright future ahead for the world of Kubernetes, especially in organizations such as these.

The post How Otomi Helped the City of Utrecht Move to Kubernetes appeared first on The New Stack.

Runtime Security: Relevancy Is What Counts

B. Cameron Gain — Thu, 11 May 2023 12:00:15 +0000

Security experts as well as many — if not most — developers and software engineers know that an organization deploying software is almost inevitably working with insecure code. Code and applications are often rife with vulnerabilities throughout the CI/CD process. Other stakeholders, such as the CTO, might have at least some inkling of the status quo but they may or may not know the severity or the magnitude of vulnerabilities and how software even at runtime can be rife with vulnerabilities.

Security best practices have emerged, including those for cloud native deployments. However, that remains a work in progress. Typically, SBOMs and signatures and other security best practices continue to improve. However, detecting and remediating vulnerabilities in code and applications that are deployed remains a work in process. Intuitively, it may seem rational to rely on a common vulnerability scoring system (CVSS) to prioritize the severity of vulnerabilities. This helps to somewhat reduce the whack-a-mole approach to detecting and removing vulnerabilities throughout CI/CD and during deployment. However, this too, can fall short.

This is where a different approach appears more applicable and relevant and of course time-saving. When also combined with automation this process evolves relevancy defined in part as severe vulnerabilities that will remain in the container and code during runtime. The trick is, of course, to automate the pinpointing of these vulnerabilities and which ones are the most relevant, based on the overlap of their severity, flexibility and especially, relevancy.

Prioritization

Ideally, vulnerabilities would be prioritized by the likelihood and severity of future revenue impact — which is similar to how traditional project management is prioritized, Torsten Volk, an analyst for Enterprise Management Associates (EMA), said. A container might include a Python library infected with ransomware, but Volk said this could be irrelevant if:

The application code running in this container does not actually use the infected library.
Strict container networking policies block malware from accessing the ports.
The container runs on a Kubernetes cluster without access to the types of data sources targeted by the malware program.
The malware needs access to a highly privileged account to be able to spread, while the container runs a bare-bones account that lacks the required privileges.
The container does have access to a vulnerable data source, but the data source only contains cafeteria menus and the score sheets of the corporate software team.

“Even these few examples demonstrate that a successful attack heavily depends on the context of its target,” Volk said. “However, identifying the relevant context factors and prioritizing vulnerabilities accordingly is where the magic lies.”

It is also important to take into account that Kubernetes is still a relatively young and fast-evolving technology, Oshrat Nir head of product marketing, ARMO told The New Stack. “While it has started to plateau the talent gap still exists. Pairing that with the current macroeconomic climate means that DevOps or platform teams are and will continue to be short-staffed, yet they have more jobs to be done than ever before,” Nir told The New Stack. “As a result, prioritization has become more important than ever.

This goes double for security, Nir said: “A major security breach can taint an organization’s reputation for a long time, making the hit to the bottom line something that takes longer to repair than the breach itself.”

Relevancy

Kubernetes security tool provider ARMO says it has released under beta a new capability with eBPF: vulnerabilities relevancy and prioritization. Relevancy and prioritization allow ARMO Platform and Kubescape users to deprioritize vulnerabilities that belong to unused software packages and components. By first deprioritizing vulnerabilities that are less critical, users can focus on addressing the ones that pose a greater threat to their running cluster.

This release is also important given that, on average, it takes weeks or even longer to apply fixes to security. “As a result, it would behoove DevSecOps practitioners to fix the vulnerabilities that expose them the most first. The problem is that most scanners return a (long) list of vulnerabilities to the users with little to no indication of what to fix first,” Nir said. “This will often leave teams paralyzed and planning the work of patching, testing and deploying the patch can take weeks.”

The typical way of sorting through vulnerabilities is their criticality, Nir explained. “The thing is that many software packages in containers aren’t even used at runtime, which means they pose less risk than their criticality would lead us to believe. This goes double now that hackers, knowing about this best practice, actually try to infiltrate with the more innocuous, less critical vulnerabilities, Nir said.

ARMO’s relevancy feature pinpoints the vulnerabilities that should be prioritized to be fixed, Nir said. “While it includes fixability, criticality and the ability to access the vulnerability remotely (i.e. code injection or remote code execution) it also factors in whether the security packages are actually in use,” Nir said. “In this way, security teams can filter 60%-80% of vulnerabilities out of their immediate to-do list and focus on the things that need to be solved first.”

The post Runtime Security: Relevancy Is What Counts appeared first on The New Stack.

Nutanix Adds 3 New Parts to Its Multicloud Data Platform

Chris J. Preimesberger — Wed, 10 May 2023 17:28:44 +0000

Nutanix, which two years ago with Red Hat launched Nutanix Cloud Platform, its open multicloud data management platform, revealed several substantial additions to that product at its .NEXT 2023 Conference in Chicago.

The Nutanix Cloud Platform enables enterprises to build, scale and manage cloud native applications on-premises and in hybrid cloud environments. It serves as the core platform for all Nutanix users that provides a unified environment for virtualized workloads, containers and bare-metal applications across private, public and hybrid clouds. NCP is built on Nutanix’s hyperconverged infrastructure (HCI), which combines storage, compute and virtualization into a single appliance.

The new additions announced on May 9 are:

Nutanix Central
Data Services for Kubernetes
Project Beacon, a group of data-centric platforms as a service (PaaS) level services

Nutanix Central

Nutanix Central is a cloud-delivered software package that serves as a single console for visibility, monitoring and management across all IT environments: public cloud, on-premises, hosted or edge infrastructure. This aims to extend the universal cloud operating model of the Nutanix Cloud Platform to break down silos and simplify the management of apps and data anywhere, Nutanix SVP of Product and Solutions Marketing Lee Caswell told a group of reporters.

“Central is a service management model to manage federated endpoints,” Caswell said. “So from an observability and manageability standpoint, this now allows us to consolidate different clusters, if you will, across the hybrid multicloud environment. Nutanix Central then becomes our mechanism for how we go and help customers with a single pane of glass across all of these endpoints.”

From the Nutanix Central dashboard, customers will access domain and cluster-level metrics, including capacity utilization and alert summary statistics, to get a quick overview of the state of each domain, Caswell said. This functionality will also provide seamless navigation to individual domains, based on individual user role-based access control (RBAC), across all domains registered, he said.

Nutanix Central also will support multidomain use cases, including federated identity and access management (IAM), global projects and categories, and global fleet management, Caswell said. This all enables IT teams to deliver self-service infrastructure at scale while remaining in control of governance and security, he said.

Data Services for Kubernetes

Nutanix Data Services for Kubernetes is designed to give users control over cloud native apps and data at scale, Thomas Cornely, SVP of Product Management at Nutanix, said in a media advisory.

Initially conceived as part of Nutanix Cloud Infrastructure, NDK brings management of Nutanix’s enterprise-class storage, snapshots and disaster recovery to Kubernetes. This helps accelerate containerized application development for stateful workloads by introducing storage provisioning, snapshots and disaster recovery operations to Kubernetes pods and application namespaces, Cornely said.

NDK will give Kubernetes developers self-service capabilities to manage storage and data services, while also enabling IT with visibility and governance over consumption, Cornely said.

Project Beacon

Project Beacon is a multiyear Nutanix initiative designed to deliver a portfolio of data-centric platform as a service (PaaS) level services available natively anywhere — including on Nutanix or on a native public cloud. With a goal of decoupling the application and its data from the underlying infrastructure, Project Beacon aims to enable developers to build applications once and run them anywhere. Sort of the original aim of Java 30 years ago — long before the cloud and the edge.

“What Project Beacon does is it says, ‘We’re gonna move beyond the infrastructure we control today,'” Caswell said. “You can take portable licenses and move them from an AI-enabled edge to the multidata site to a data center and into our cloud partners.

“The issue there though, is that that’s about running our full stack. Now, what we’re saying is we’re going to go and expand into hyper-scalar infrastructure that’s running its own compute and storage constructs. And that’s very interesting because what Project Beacon then says is, ‘We’re going to show this NDB (network database) as a service — the basis of all modern and performance applications,” Caswell said.

Caswell said Nutanix did databases first with Project Beacon to show that “now we can take our database-as-a-service offering and allow that to go and run directly on AWS, without requiring the customer-controlled advertising. That’s a very interesting way to start thinking about our intent over time.”

All three of the platform additions will become available for users later this year, Nutanix said.

Nutanix’s .NEXT 2023 Conference continues through May 10.

The post Nutanix Adds 3 New Parts to Its Multicloud Data Platform appeared first on The New Stack.

Intelligent Data Protection Policies for Kubernetes Productivity

Jason Bloomberg — Wed, 10 May 2023 15:32:29 +0000

The story of distributed computing over the past 50 years has been an exercise in complexity. From the on-premises days to the rise of virtualization to cloud computing and now the rise of cloud native computing, the power and business value of technology is matched only by its increasing complexity.

The current technology generation is the open source Kubernetes container orchestration platform. Kubernetes, and cloud native computing broadly, represent a paradigm shift in the way organizations build, implement and manage the massively scalable infrastructure that supports dynamic software deployment at scale.

Data Protection Evolves

Data protection technologies and processes have evolved apace. On-premises data protection centers on backing up storage volumes and databases. Virtualization raises the bar on data protection, as virtual machines are more dynamic and complicated than physical servers, and therefore more difficult to back up and restore.

Cloud computing provides new data protection options while complicating the challenge further. Cloud providers offer comprehensive abstractions that deliver both infrastructure and database as a service, requiring increasingly intelligent data protection policies that address various levels of abstraction.

Today, Kubernetes represents an entirely new set of challenges for data protection, necessitating intelligent policies that simplify the work of practitioners while automatically and proactively tackling complex data protection scenarios.

Intelligent Policies for Kubernetes Data Protection

At the heart of Kubernetes is a control plane that enables the management of all aspects of the cloud native infrastructure as a matter of policy. Kubernetes supports different formats for such policies, including YAML, Helm charts and others, enabling both hands-on control as well as the construction of visual dashboards that provide visibility into any aspect of the deployment, including data protection.

Such policies, however, are static. They don’t provide for the inherently dynamic control that Kubernetes deployments require. The static nature of such policies is particularly limiting for data protection. Given the dynamic context for cloud native deployments, Kasten has implemented new, intelligent data protection policies that deal with changing circumstances to improve on these standard Kubernetes policy constructs. These policies are a level of abstraction above individual, static configuration-based policies, and as such, allow for necessary change while maintaining compliance with the policies in question.

Kasten’s intelligent policies implement a separation of concerns, empowering development teams to manage data protection capabilities without visibility into the underlying credentials or storage infrastructure.

Static vs. Intelligent Policies

In summary, Kasten enables two tiers of policy:

Static policies: The base policy tier might specify specific access control policies or the choice of where to store particular data. An example of a static policy might be to run a particular backup at a particular time.
Intelligent policies: An intelligent policy, by contrast, considers resource constraints, compliance restrictions and other factors to adjust backup times as needed.

Kasten’s intelligent policies focus on the automation of processes that involve specific configuration choices. For example, intelligent policies can stagger the start times of backups across multiple applications and manage various rate limitations that different cloud providers enforce. Practitioners can use the intelligent policy tier to specify organizational policies, service-level agreements and compliance policies without requiring visibility into the base tier.

The infrastructure automatically and proactively complies with such policies, and practitioners can work with data protection policies when they need to, without having access to sensitive information or security configurations. In this way, intelligent policies can boost an organization’s workforce productivity, helping it realize the operational benefits of cloud native technology by automating several inherently dynamic data management tasks.

The Intellyx Take

Intelligent data protection policies have necessarily evolved from the on-premises era to virtualization to the cloud and now to cloud native. With increasing complexity comes the need for simplicity, abstracting the technology details in order to give practitioners the ability to do their jobs without getting bogged down by the details of the infrastructure.

The move from base policies via increasing automation to intelligent policies supports an organization’s data protection priorities. For Kubernetes environments, the benefits of intelligent policies are twofold: improving the productivity and efficiency of the team while enforcing policy consistency and therefore providing protection for increasingly dynamic, complicated data environments.

Cloud native infrastructure is both dynamic by nature and abstracts application behavior from the data that supports it. In this environment of constant, unpredictable change, data protection is more important than ever before, and the policies that control it must be even more intelligent.

The post Intelligent Data Protection Policies for Kubernetes Productivity appeared first on The New Stack.

Mirantis Updates k0s Lightweight Kubernetes Distro

Steven J. Vaughan-Nichols — Wed, 10 May 2023 15:00:41 +0000

Mirantis, the Docker and Kubernetes developer company, has released the latest version of its lightweight, open source Kubernetes distribution, k0s. The new version boasts compatibility with the brand-new Kubernetes 1.27 release with various other improvements and bug fixes.

Back to Basics

K0s, for those that don’t know it, is one of several stripped-down, back-to-basics Kubernetes distros. Others include Minikube, k3s, and MicroK8s. While they all have their differences, the name of the game is to give developers the power to create Kubernetes clusters on low-end hardware. For example, K0s can run on as little as a single CPU and 1GB RAM for a single node.

The updated Mirantis k0s distribution significantly simplifies the installation and management process of Kubernetes clusters. One of the key enhancements includes support for containerd plug-ins, such as WebAssembly (WASM) and gVisor container sandboxes. This enhancement simplifies the running of these containers It also enables users to extend their clusters with additional container runtimes effortlessly.

Furthermore, to eliminate custom forks of project components and to ensure greater compatibility upstream Kubernetes functionality, Mirantis now provides its own system images, which in turn reduces complexity and improves security.

For one thing, many upstream Kubernetes system images contain Common Vulnerabilities and Exposures (CVE). For instance, Miska Kaipiainen, Mirantis VP Engineering, Strategy & Open Source Software, states that “If you scan a kube-proxy image at registry.k8s.io/kube-proxy:v1.25.8, you’ll see 12 vulnerabilities reported (or some other number, depending on the scanner you use).” Sure, many of these CVEs, such as old curl binaries and libs in the container, aren’t used at runtime. But you never know when that “harmless” CVE might turn out to be exploitable. So Mirantis takes full control of k0s images built with pure upstream functionality and doesn’t rely on any custom forks of project components.

The result? “As of this writing, system images shipping with k0s 1.27 come with zero (0) – yes, zero – known vulnerabilities. We have daily scanning in place, which lets us keep track of vulnerabilities as they pop up and mitigate them super-quickly.”

CNCF Certified

A Cloud Native Computing Foundation (CNCF)-certified Kubernetes distribution, k0s, is versatile enough to run on any Linux-based operating system, making it suitable for large-scale data center deployments, lightweight edge clusters, laptops, and even Raspberry Pi. K0s is distributed as a single binary and can be installed on any node from the internet with a single command.

For ease of management, platform deployment, and scaling can be administered locally via the k0s command line interface (CLI) and remotely via the k0sctl utility using configuration files. The built-in k0s Autopilot enables you to manage updates automatically. Additionally, operators can access k0s clusters via kubectl, Lens Desktop, and other standard Kubernetes CLIs and dashboards.

So, if you want a safe, lightweight Kubernetes for your work, play, or study, I’d give K0s a try. It’s a nice little distro.

The post Mirantis Updates k0s Lightweight Kubernetes Distro appeared first on The New Stack.